R command sheet
The
homepage www.statmethods.net
contains a short and clear presentation of many of
the
commands below.
This page is
dynamic and will be updated during the course as more commands are needed.
Loading and viewing data
First set
your working directory to the folder containing your data file. You can either
browse
from R
Studio using the menu Tools -> Set Working Directory -> Choose Directory.
Alternatively
you can copy the location to the command setwd(), e.g.:
setwd( 'C:/Users/Susanne/Rworks/'
)
Load data
using the command read.dbf() from the foreign-package:
library(foreign)
d <- read.dbf(
'datafile.dbf' )
head( d ) head( d,
n=10 ) |
Prints
the first 6 lines of the data set named d. The additional argument n sets the
number of lines to be printed (here to 10) |
View( d ) |
Prints
the data set as Excel-look-alike in a new tab. |
Data manipulations
d2 <-
subset(d, Grp==1) |
Define a sub data set of d named d2 containing only those elements of d for which the logical condition Grp==1 is fulfilled. I.e. only observations in group 1. |
d$var1 |
Use
$-notation to access a variable named var1 in data set d. |
d$new.var <- d$var1 / d$var2 |
Define a
new variable in data set d named new.var containing
var1 divided by var2. |
d$grp.var <- cut( d$var1, breaks=c(0,100,200,1000), labels=c('label1','label2','label3') ) |
Creating
a factor (group variable) in d named grp.var by
chopping d$var1 into pieces. Cut points are specified by breaks argument.
Make sure that lower and upper value in breaks exceeds the range of x. Specify
labels argument to control the naming of the groups. |
d$heavy <- 1*( d$weight > 100) |
An
example of defining a binary (0/1) variable based on a quantitative variable |
Exporting data
The data
set named d can be exported in dbf-format by specifying
library(foreign)
write.dbf( d, 'newFile.dbf')
which
will generate a dBase file with name newFile.dbf in
your working directory.
You can also export a csv-file (Comma Separated
Values). Excel can read such files. Simply write:
write.csv2( d, 'newFile.csv')
to have R create a csv-file in your working
directory.
Working with vectors
seq( from=x,
to=y) |
Generates
a sequence of numbers from X to Y with steps of size 1. Additional arguments
: by=b
specifies the steps to be of size b. length=l
specifies the total length to equal l. See item
F Day 1 for more details. |
rep(x, times=t) |
Repeats
element x t times. See item G Day 1 for more details |
x[ 1:4 ] |
Use [] to
pick out specific elements of a vector x. See Item H Day 1. |
sort( x ) |
Sorts elements in x. |
rank( x ) |
Determines
the ranks of x. |
length( x ) |
Gives the
length of the vector, i.e. the number of elements. |
which( x ) |
Tells
which elements of x that are true (i.e. x has to be a TRUE/FALSE vector (e.g.
a condition like y>600)). |
Working with data
dim( d ) |
Gives
dimension of the data set d. |
summary(d) |
Gives a
summary of each of the variables in d |
median( d$var1 ) |
Median of
variable named var1 in data set d. You might need argument na.rm=T if any
missing values. |
quantile( d$var1 ) |
Quantiles
of a variable named var1 in data set d. You need to add argument
na.rm=T if any missing values. Extra argument probs can be used to control which quantiles to compute, e.g. probs=c(.025,.975) to determine lower and upper 2.5% quantile |
max( d$var1 ) min( d$var1 ) range( d$var1 ) |
Max of
variable var1. Min of
variable var1. Range
gives min and max of var1. All
functions require na.rm=T if missing values |
mean( d$var1) |
Mean of variable
named var1 in data set d. You might need argument na.rm=T if any missing
values |
sd( d$var1 ) |
Standard
deviation of variable named var1 in d. You might need argument na.rm=T if any
missing values. |
Generating data
sample(
x, no, replace=T) |
Draws no
numbers from a vector x with replacement. Replace T
with F to draw without replacement. |
Calculations
sqrt( X ) |
Calculate
square root of a number X |
log( X ) log2( X ) or log( X, base=2) log10( X
) or log( X, base=10) |
Natural log
of X Log base
2 of X Log base 10 of X |
exp( X ) |
Anti-log
of X (natural log), exp(1)=e~2.71 |
X^y |
The y-th power of X |
round( X, 2 ) |
X is a
number to round with 2 decimals |
Tables
mytable <- matrix( c(1,2,3,4), nrow=2) |
Make a 2x2 table with elements 1-4
|
table( d$x ) |
One-way
table of x. Add argument
useNA='ifany' to count also the missing values. |
table( d$x , d$y ) |
Tabulate x vs y |
table( d$x , d$y, d$z
) |
Tabulate
x vs y stratified on z |
mytable <-table(d$x, d$y
) prop.table(mytable) prop.table(mytable, 1) prop.table(mytable,
2) |
Determine cell percentages Determine row percentages Determine column percentages |
chisq.test( mytable, correct=F ) |
Pearson chi-square test of independence between x
and y. If argument correct=F is omitted for 2 by 2 tables, Yates' continuity correction
will be applied |
prop.test(mytable)
|
Test and CI of the difference between two proportions
|
binom.test(x,n)
|
Exact binomial test observing x of n possible successes. Tests hypothesis p=0.5 per default (change with additional argument p=).
Provides exact CI.
|
fisher.test( mytable
) |
Fisher test of independence between x and y |
oddsratio( mytable, method='wald'
) |
Calculates OR in a 2 by 2 table. The oddsratio()
command is found in the package 'epitools' (i.e.
use library(epitools) before using oddsratio()). Use method='wald'
to have CI based in the Wald method. Use rev='rows' to reverse the rows (i.e. flip OR). |
riskratio( mytable, method='wald'
) |
Calculates relative risk RR in a 2 by 2 table. The
command is found in the package 'epitools' (i.e.
use library(epitools) before using riskratio()). CI are based on the Wald method.
Use rev='rows' to reverse the rows (i.e. flip RR) |
T-test
t.test( d$var1) |
One sample
t-test on variable named var1 |
t.test(d$var1, mu=7) |
One
sample t-test, test hypothesis that mean = 7 |
t.test( d$var1~d$group, var.equal=T) t.test( d1$var1, d2$var2, var.equal=T) |
Two-sample
t-test comparing the means of var1 in the two groups specified by a variable
named group. Alternative
way of requesting the t-test. Here we compare the mean of var1 in data set d1
with the mean of var2 in data set d2. |
t.test(d1$var1, d2$var2, paired=T ) |
A paired
t-test comparing mean of var1 to mean of var2 (supposed to be measured on
same individuals / animals / items etc) |
var.test( d$var1~d$group ) |
A formal test
of whether the variances of var1 in the two groups specified by group are the
same. |
Non-parametric comparisons
wilcox.test( d$var1, mu=7 ) |
One-sample
Wilcoxon test investigating whether median equals 7 |
wilcox.test(d$var1~d$group ) wilcox.test(d1$var1, d2$var2) |
Two-sample
Wilcoxon test investigating whether the medians in the two groups specified
by group are equal. Only works for two groups. Alternative
use. Here we compare the median of var1 in data set d1 with the median of
var2 in data set d2 |
kruskal.test(d$var1~d$group) |
Kruskal-Wallis
test. Can be used for comparing 2 or more groups. |
Correlation
cor.test( d$x, d$y ) |
Determines Pearson correlation between x and y. Add option method='spearman' or method='kendall' to determine Spearmans correlation or Kendall's tau. NB: cor.test takes no data-option, $-notation is needed. |
Linear regression models
lm1 <- lm(y ~ x, data=d) |
lm=Linear Model. Performs linear
regression analysis of y on x. |
summary(lm1) |
Gives a summary of the results from the fitted regression model lm1. |
coef(lm1) |
Estimated coefficients from the model (intercept and slope) |
confint(lm1) |
Confidence intervals of estimated parameters (intercept and slope). |
plot(lm1) |
Gives various plots used for model assessment. Add option which=1 for residuals vs predicted values, which=2 for quantile-quantile plot, which=4 for Cooks distance. |
predict(lm1, newdata=newD) |
Prediction of mean values for individuals with values of x specified in a data fram newD (e.g. newD= data.frame(x=1:5), newD MUST contain same variable names as used in the model). Add option interval='confidence' or interval='prediction' to determine confidence intervals for the means resp prediction intervals. |
cooks.distance(lm1) |
Determines Cook's distance for each individual. Compare with 4/n. |
dfbetas(lm1) |
DFBETAS for each individual (estimate of how much an observation has effected the estimated coefficients). Compare with 2/sqrt(n). |
Logistic regression models
glm1
<- glm( y01 ~factor(group), data=d,
family=binomial ) |
glm=Generalized Linear Model. Performs a logistic
regression analysis. The response y01 has to have values 0 or 1. |
summary( glm1 ) |
Gives a
summary of the results from the fitted regression model glm1. |
relevel(
factor( group ), ref=r ) |
Is used
to change the reference group of a factor variable used in regression
analysis. The argument r to ref specifies which group should be the reference
(1=1st level, 2=2nd level etc.). |
coef( glm1 ) |
Requests
the estimated coefficients (differences in log-odds (ie
log OR) for a logistic regression). |
confint.default( glm1 ) |
Calculates
confidence intervals from a model fit (e.g. glm1) based on the Wald-method. |
drop1( glm1, test='Chisq') |
Performs overall
test for each variable in the model - asking whether each term may be deleted
assuming the remaining terms is kept in the model. |
anova( glm1, glm2) |
Compares
two models, the one being a coarser version of the other. |
predict(
glm1, newdata=newD,
predict='response') |
Predict probabilities from the model glm1 using values from a data frame named newD. See also section Predictions above. |
Calculating p-values by hand
Two-sided
p-values
2* (1 - pnorm( x ) ) |
From a normal distribution with observed test statistic. x is the positive value of the test statistic. Wald-test or Z-test. |
2* (1 - pt( x, df=f ) ) |
From a
t-distribution with f degrees of freedom. x is the positive value of the test statistic. |
Plots
hist( x ) |
Make histogram of x |
boxplot( y~x ) |
Boxplots
of y for each value of x |
stripchart( y~ x) |
Stripchart of y on groups defined by x. Use additional arguments: vertical=T
if vertical rather than horizontal plot, method='jitter'
to add noise in the data points (noise is not added in y, only around x). |
par(las=1) |
Run this
command before running plot to rotate numbers on y-axis. |
plot(x, y) |
Plot y as
a function of x |
plot(x,y,tybe='b') |
Plot y as
a function of x, plot type 'b' (see my video https://www.youtube.com/watch?v=FQZhsEXCAUM) Other
arguments you will need : col='blue',
see other color choices from link on Podio, item I. pch=2
to control plot symbols, see again item I. xlim=c(a,b). ylim=c(a,b). xlab='X
label'. Ylab='Y
label'. main='Your title' to add title to the plot. Axes=F to
customize axes using axis()-function next. |
scatter.smooth(x,y) |
Plots y
as a function of x and adds a smoother to the plot (a line being a kind of a moving
average). The line can be controlled with parameters given in additional
argument lpars, e.g. lpars=list(
col='red', lwd=2 ) |
axis(1) axis(2) |
Add x-axis
to plot. With additional argument at=c(0,6) ticks
are drawn at 0 and 6 - otherwise R will make the choice on where to put the
ticks. Add
y-axis. You may also use at-argument |
lines(x,y) |
Add lines
to a plot, plotting y as a function of x. Same additional arguments as to the
plot function |
points(x,y) |
Adds
points to a plot, plotting y as a function of x. Same additional arguments as
to the plot function |
arrows(
3, L, 3, U, code=3, length=0 ) |
Draws an arrow from (3, L) to (3,U). code=3 makes the arrow 'have arrows in both ends'. Use length=0 to avoid any arrows in both ends. |
abline(a,b) abline(
h=0 ) abline( v=0 ) abline( lm1 ) |
Draws a
straight line with intercept a, slope b. E.g. abline(0,1). Draws a horizontal
line at y=0. Draws a
vertical line at y=0. Adds estimated regression line from a fitted linear model (named lm1). |
legend('topright', c('x','y') ) |
Add
legend to a plot in the top right corner. The legend will contain two lines
with text 'x' and 'y' respectively. Alternatives
placements are 'bottomright', 'topleft','bottomleft'. Add extra
arguments : pch=c(1,2) to have plot symbols of type 1 and 2
in the two lines. lty=c(1,2) to have two line types. Use
inset=0.05 to move the legend 0.05 units towards the center of the plot. |