## R : How to Choose the Correct Statistical Method (t-test, chi-square, ANOVA, Fisher, McNemar, Wilcoxon,Binomial) for Hypothesis Testing

- 08/05/2016
- 256
- 0 Like

**Published In**

- Big Data
- Analytics
- Business Intelligence

**R : Statistical Analysis (Hypothesis Testing)**

**- Ankit Agarwal**

With so many different types of Statistical Tests for Hypothesis Testing, it becomes a nighmare for a non-statistical guy like me to identify the right test for a specific scenario. Thus I decided to take a stab at it and write this detailed blog around performing a number of Statistical Tests using R. Each section gives a brief description of the -

- aim of the statistical test,

- when it is used,

- an example showing the R commands and

- R (often abbreviated) output with a brief interpretation of the output

It is easier to understand the differences between different tests if we use the same input data to differentiate the methodology/functioning of a test as compared to another. Hence I would try and perform most of the tests on the same set of data.

**Input Data Set**

**Here is a brief overview of the Dataset -**

This data file (hsb) contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst).

**Snapshot Summary of the Dataset**

**Load and Attach the Input Data-set for performing tests**

hsb2 <- within(read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv"), {

race <- as.factor(race)

schtyp <- as.factor(schtyp)

prog <- as.factor(prog)

})

attach(hsb2)

#

#

head(hsb2,20)

## id female race ses schtyp prog read write math science socst

## 1 70 0 4 1 1 1 57 52 41 47 57

## 2 121 1 4 2 1 3 68 59 53 63 61

## 3 86 0 4 3 1 1 44 33 54 58 31

## 4 141 0 4 3 1 3 63 44 47 53 56

## 5 172 0 4 2 1 2 47 52 57 53 61

## 6 113 0 4 2 1 2 44 52 51 63 61

## 7 50 0 3 2 1 1 50 59 42 53 61

## 8 11 0 1 2 1 2 34 46 45 39 36

## 9 84 0 4 2 1 1 63 57 54 58 51

## 10 48 0 3 2 1 2 57 55 52 50 51

## 11 75 0 4 2 1 3 60 46 51 53 61

## 12 60 0 4 2 1 2 57 65 51 63 61

## 13 95 0 4 3 1 2 73 60 71 61 71

## 14 104 0 4 3 1 2 54 63 57 55 46

## 15 38 0 3 1 1 2 45 57 50 31 56

## 16 115 0 4 1 1 1 42 49 43 50 56

## 17 76 0 4 3 1 2 47 52 51 50 56

## 18 195 0 4 2 2 1 57 57 60 58 56

## 19 114 0 4 3 1 2 68 65 62 55 61

## 20 85 0 4 2 1 1 55 39 57 53 46

#

#

str(hsb2)

## 'data.frame': 200 obs. of 11 variables:

## $ id : int 70 121 86 141 172 113 50 11 84 48 ...

## $ female : int 0 1 0 0 0 0 0 0 0 0 ...

## $ race : Factor w/ 4 levels "1","2","3","4": 4 4 4 4 4 4 3 1 4 3 ...

## $ ses : int 1 2 3 3 2 2 2 2 2 2 ...

## $ schtyp : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...

## $ prog : Factor w/ 3 levels "1","2","3": 1 3 1 3 2 2 1 2 1 2 ...

## $ read : int 57 68 44 63 47 44 50 34 63 57 ...

## $ write : int 52 59 33 44 52 52 59 46 57 55 ...

## $ math : int 41 53 54 47 57 51 42 45 54 52 ...

## $ science: int 47 63 58 53 53 63 53 39 58 50 ...

## $ socst : int 57 61 31 56 61 61 61 36 51 51 ...

#

#

summary(hsb2)

## id female race ses schtyp prog

## Min. : 1.00 Min. :0.000 1: 24 Min. :1.000 1:168 1: 45

## 1st Qu.: 50.75 1st Qu.:0.000 2: 11 1st Qu.:2.000 2: 32 2:105

## Median :100.50 Median :1.000 3: 20 Median :2.000 3: 50

## Mean :100.50 Mean :0.545 4:145 Mean :2.055

## 3rd Qu.:150.25 3rd Qu.:1.000 3rd Qu.:3.000

## Max. :200.00 Max. :1.000 Max. :3.000

## read write math science

## Min. :28.00 Min. :31.00 Min. :33.00 Min. :26.00

## 1st Qu.:44.00 1st Qu.:45.75 1st Qu.:45.00 1st Qu.:44.00

## Median :50.00 Median :54.00 Median :52.00 Median :53.00

## Mean :52.23 Mean :52.77 Mean :52.65 Mean :51.85

## 3rd Qu.:60.00 3rd Qu.:60.00 3rd Qu.:59.00 3rd Qu.:58.00

## Max. :76.00 Max. :67.00 Max. :75.00 Max. :74.00

## socst

## Min. :26.00

## 1st Qu.:46.00

## Median :52.00

## Mean :52.41

## 3rd Qu.:61.00

## Max. :71.00

#

**One Sample t-test**

A one sample t-test allows us to test whether a sample mean (of a **normally** distributed interval variable) significantly differs from a hypothesized value. For example, using the hsb data file, say we wish to test whether the average writing score (write) differs significantly from 50. Test variable writing score (write), Test value 50. We can do this as shown below.

#

# Our given mean is 50 (mu=50)

#

t.test(write, mu = 50)

##

## One Sample t-test

##

## data: write

## t = 4.1403, df = 199, p-value = 5.121e-05

## alternative hypothesis: true mean is not equal to 50

## 95 percent confidence interval:

## 51.45332 54.09668

## sample estimates:

## mean of x

## 52.775

As we can notice from the p-value (p-value = 5.121e-05) , which is way less than .05 , that the difference between the average writing score (write) differs significantly from 50 thus confirming the alternative Hypothesis.

The mean of the variable write for this particular sample of students is 52.775, which is statistically significantly (p<.001) different from the test value of 50. We would conclude that this group of students has a significantly higher mean on the writing test than 50. This is consistent with the reported confidence interval (51.45,54.10) which excludes 50, of course the mid-point is the mean.

**Cohen’s d Effect Size Statistics**

It is often helpful to quantify the extent of a difference that exists between two independent groups, as this can suggest the practical significance of observed differences. While we have observed and confirmed above that writing test of students is significantly higher than mean value of 50. We can also calculate Cohen’s d measure to calculate the impact of this difference.

There are 2 formulae to calculate Cohen’s d value in this case -

#

# Cohen's d = t-value / squareroot(sample size) -- Sample size is 200 in our case

#

ttest <- t.test(write, mu = 50)

#

(cohd<- ttest$statistic / sqrt(length(write)))

## t

## 0.2927652

#

# Another Formula for Cohen's d is : d = Mean Difference / SD

#

(cohd<- (ttest$estimate-50) / sd(write))

## mean of x

## 0.2927652

#

We get Cohen’s d value = 0.2927634 which is a small-to-medium effect overall.

**Cohen’s d value interpretation -**

d evaluates the degree that the mean on the test variable differs from the test value in standard deviation units. Potentially, d can range in value from negative infinity to positive infinity. If d equals 0, the mean of the scores is equal to the test value. As d deviates from 0, we interpret the effect size to be stronger. What is a small versus a large d is dependent on the area of investigation. However, d values of .2, .5 and .8, regardless of sign, are by convention interpreted as small, medium, and large effect sizes, respectively.

**Summary of t-test and Cohen’s d estimate -**

A one-sample t test was conducted to evaluate whether the mean of the writing scores was significantly different from 50, the accepted mean. The sample mean of 52.78 ( SD = 9.48) was significantly different from 50, t(199) = 4.14, p < .001. The 95% confidence interval for the writing scores mean ranged from 51.45 to 54.10. The effect size d of .29 indicates a medium effect.

**One Sample median test (Wilcox Test)**

A one sample median test allows us to test whether a sample median differs significantly from a hypothesized value. We will use the same variable, write, as we did in the one sample t-test example above. But we do not need to assume that it is interval and normally distributed (we only need to assume that write is an **numerical** variable).

#

wilcox.test(write, mu = 50)

##

## Wilcoxon signed rank test with continuity correction

##

## data: write

## V = 13177, p-value = 3.702e-05

## alternative hypothesis: true location is not equal to 50

#

Notice that instead of a calculated t value, this test computes V.

With a p-value = 3.702e-05, we can confirm that alternative hypothesis is correct,i.e., write scores significanctly differ from the median value of 50.

**Binomial test**

A one sample binomial test allows us to test whether the proportion of successes on a two-level categorical dependent variable significantly differs from a hypothesized value. For example, using the hsb2 data file, say we wish to test whether the proportion of females (female) differs significantly from 50%, i.e., from .5

#

prop.test(sum(female), length(female), p = 0.5)

##

## 1-sample proportions test with continuity correction

##

## data: sum(female) out of length(female), null probability 0.5

## X-squared = 1.445, df = 1, p-value = 0.2293

## alternative hypothesis: true p is not equal to 0.5

## 95 percent confidence interval:

## 0.4733037 0.6149394

## sample estimates:

## p

## 0.545

#

**** Output ** :** The results indicate that there is no statistically significant difference (p-value= 0.229). In other words, the proportion of females in this sample does not significantly differ from the hypothesized value of 50%.

**Summary of Binomial-Test -**

We hypothesized that the proportion of females is 50%. A two-tailed, binomial test was conducted to assess this research hypothesis. The observed proportion of .455 did not differ significantly from the hypothesized value of .50, two-tailed p = .23. Our results suggest that the proportion of females do not differ dramatically from males.

Note : Do not confuse **prop.test** with **prop.table**. While the prior is used for Binomial test, the latter is used to calculate percentage distribution in a factor/categorical variable. This is what we would get by using prop.table (percentage of males and females in the class)

#

# 0 = Male , 1 = Female

#

prop.table(table(female))*100

## female

## 0 1

## 45.5 54.5

**Chi-square goodness of fit**

A chi-square goodness of fit test allows us to test whether the observed proportions for a categorical variable differ from hypothesized proportions. For example, let’s suppose that we believe that the general population consists of 10% Hispanic, 10% Asian, 10% African American and 70% White folks. We want to test whether the observed proportions from our sample differ significantly from these hypothesized proportions. Note this example employs input data (10, 10, 10, 70), in addition to input data from hsb2 file.

#

# Let us look at the race statistics

table(race)

## race

## 1 2 3 4

## 24 11 20 145

#

# 1 = Hispanic , 2= Asian, 3= African American, 4= White

#

chisq.test(table(race), p = c(10, 10, 10, 70)/100)

##

## Chi-squared test for given probabilities

##

## data: table(race)

## X-squared = 5.0286, df = 3, p-value = 0.1697

**Summary of output -**

These results show that racial composition in our sample does not differ significantly from the hypothesized values that we supplied (chi-square with three degrees of freedom = 5.029, p-value = 0.170).

**Two independent samples t-test**

An independent samples t-test is used when you want to compare the means of a normally distributed interval dependent variable for two independent groups. For example, using the hsb2 data file, say we wish to test whether the **mean for write is the same for males and females.**

#

# Let us look at the write data viz-a-viz the female data

#

(checkdata<-table(write,female))

## female

## write 0 1

## 31 4 0

## 33 4 0

## 35 0 2

## 36 1 1

## 37 2 1

## 38 1 0

## 39 4 1

## 40 3 0

## 41 6 4

## 42 1 1

## 43 1 0

## 44 6 6

## 45 1 0

## 46 3 6

## 47 1 1

## 49 7 4

## 50 0 2

## 52 6 9

## 53 0 1

## 54 5 12

## 55 1 2

## 57 5 7

## 59 11 14

## 60 1 3

## 61 2 2

## 62 5 13

## 63 2 2

## 65 6 10

## 67 2 5

#

# How about plotting this to get a better understanding (0=Male, 1=Female)

#

plot(checkdata, col=c("lightgray", "darkgray"), ,,main="Writing Score by Gender(Female/Male)")

#

# Let us now perform the test

# Note that tilde (~) is used here in place of comma as female is a categorical variable

#

t.test(write ~ female)

##

## Welch Two Sample t-test

##

## data: write by female

## t = -3.6564, df = 169.71, p-value = 0.0003409

## alternative hypothesis: true difference in means is not equal to 0

## 95 percent confidence interval:

## -7.499159 -2.240734

## sample estimates:

## mean in group 0 mean in group 1

## 50.12088 54.99083

**Summary of output -**

The results indicate that there is a statistically significant difference between the mean writing score for males and females (t=-3.656, p-value = 0.0003409). In other words, females have a statistically significantly higher mean score on writing (54.99) than males (50.12).

This is also supported by the negative confidence interval ( -7.499159 -2.240734) which is calculated as (male - female).

**Wilcoxon-Mann-Whitney test**

The Wilcoxon-Mann-Whitney test is a non-parametric analogous to the independent samples t-test and can be used when you **do not assume that the dependent variable is a normally distributed interval variable (you only assume that the variable is at least numerical).**

We will use the same data file (the hsb2 data file) and the same variables in this example as we did in the independent t-test example above. We will not assume that write, our dependent variable, is normally distributed and our purpose is again to test whether the **mean for write is the same for males and females.**

#

# Let us perform Wilcoxon-Mann-Whitney the test

#

wilcox.test(write ~ female)

##

## Wilcoxon rank sum test with continuity correction

##

## data: write by female

## W = 3606, p-value = 0.0008749

## alternative hypothesis: true location shift is not equal to 0

The results suggest that there is a statistically significant difference between the underlying distributions of the write scores of males and the write scores of females (p-value=0.0008749).

**Summary of output -**

A Wilcoxon test was conducted to evaluate whether writing score was affected by gender. The results indicated a significant difference .

**Chi-square test( Contingency Table)**

A chi-square test is used when you want to see if there is a relationship between two categorical variables. It is equivalent to the correlation between nominal(continuous) variables

**Note: chi-square test assumes that each category has an expected frequency of five or more**

A chi-square test is a common test for nominal (categorical) data. One application of a chi-square test is a test for independence. In this case, the null hypothesis is that the occurrence of the outcomes for the two groups is equal. If your data for two groups came from the same participants (i.e. the data were paired), you should use the **McNemar’s test**, while for k groups you should use **Cochran’s Q test**.

Using the hsb2 data file, let’s see if there is a relationship between the type of school attended (schtyp) and students’ gender (female).

#

# Let us look at the school-type data viz-a-viz the female data

#

(checkdata<-table(schtyp,female))

## female

## schtyp 0 1

## 1 77 91

## 2 14 18

#

# Percentage of students in different school types based on their genders can be given my following formula

#

prop.table(table(schtyp,female),2)

## female

## schtyp 0 1

## 1 0.8461538 0.8348624

## 2 0.1538462 0.1651376

#

# How about plotting this to get a better understanding female-(0=Male, 1=Female),schtype- (1=Public,2=Private)

#

barplot(checkdata, col=c("lightgray", "darkgray"),

legend=rownames(checkdata),,, beside=TRUE)

#

# Let us now perform the test

#

chisq.test(table(female, schtyp))

##

## Pearson's Chi-squared test with Yates' continuity correction

##

## data: table(female, schtyp)

## X-squared = 0.00054009, df = 1, p-value = 0.9815

These results indicate that there is no statistically significant relationship between the type of school attended and gender (chi-square with one degree of freedom=0.00054009, p=0.9815).

**Summary of output -**

A two-way contingency table analysis was conducted to evaluate whether type of school exhibited a gender bias. School and gender were found to not be significantly related, Pearson X2 (1, N = 200) = 0.00054009, p = 0.9815.

Let’s look at another example of Chi-Square Test.

this time looking at the relationship between gender (female) and socio-economic status (ses). The point of this example is that one (or both) variables may have more than two levels, and that the variables do not have to have the same number of levels. In this example, female has two levels (male and female) and ses has three levels (low, medium and high).

#

# Let us look at the socio-economic status data viz-a-viz the female data

#

(checkdata<-table(ses,female))

## female

## ses 0 1

## 1 15 32

## 2 47 48

## 3 29 29

#

# Percentage of students of different socio-economic status based on their genders can be given my following formula . Here 2 is used to calculate proportions column-wise

#

prop.table(table(ses,female),2)

## female

## ses 0 1

## 1 0.1648352 0.2935780

## 2 0.5164835 0.4403670

## 3 0.3186813 0.2660550

#

# How about plotting this to get a better understanding female-(0=Male, 1=Female),ses- (1=Low,2=Medium,3=High)

#

barplot(checkdata, col=c("red", "blue","green"),

legend=rownames(checkdata),,, beside=TRUE)

#

# Let us now perform the test

#

chisq.test(table(female, ses))

##

## Pearson's Chi-squared test

##

## data: table(female, ses)

## X-squared = 4.5765, df = 2, p-value = 0.1014

Again we find that there is no statistically significant relationship between the variables (chi-square with two degrees of freedom=4.577, p-value=0.101).

A two-way contingency table analysis was conducted to evaluate whether gender was related to social economic status (SES). Gender and SES were not found to be significantly related, Pearson X2 (2, N = 200) = 4.58, p = .101 .

**Fisher’s exact test**

The Fisher’s exact test is used when you want to conduct a chi-square test but one or more of your cells has an expected frequency of five or less. Remember that the chi-square test assumes that each cell has an expected frequency of five or more. Fisher’s exact test has no such assumption and can be used regardless of how small the expected frequency is.

It is appropriate to use the Fisher’s exact test ** when the table you create is 2 × 2 **, that is, when both variables have two categories, and/or your expected cell sizes are small (< 5). In cases where the tables are larger, for example one variable has two categories and another has three, you would use the chi-square test (as we have seen above).

Let us use the same example that we used above for Ch-Square test.

Using the hsb2 data file, let’s see if there is a relationship between the type of school attended (schtyp) and students’ gender (female).

#

# Let us look at the school-type data viz-a-viz the female data

#

(checkdata<-table(schtyp,female))

## female

## schtyp 0 1

## 1 77 91

## 2 14 18

#

# Percentage of students in different school types based on their genders can be given my following formula

#

prop.table(table(schtyp,female),2)

## female

## schtyp 0 1

## 1 0.8461538 0.8348624

## 2 0.1538462 0.1651376

#

# How about plotting this to get a better understanding female-(0=Male, 1=Female),schtype- (1=Public,2=Private)

#

barplot(checkdata, col=c("yellow", "orange"),

legend=rownames(checkdata),,, beside=TRUE)

#

# Let us now perform the test

#

fisher.test(table(schtyp,female))

##

## Fisher's Exact Test for Count Data

##

## data: table(schtyp, female)

## p-value = 0.8492

## alternative hypothesis: true odds ratio is not equal to 1

## 95 percent confidence interval:

## 0.4755259 2.5307479

## sample estimates:

## odds ratio

## 1.087428

These results indicate that there is no statistically significant relationship between the type of school attended and gender (p=0.8492).

**Summary of output -**

A two-way contingency table analysis was conducted to evaluate whether type of school exhibited a gender bias. School and gender were found to be NOT significantly related (p = 0.8492) .

**One-way ANOVA**

A one-way analysis of variance (ANOVA) is used when you have a categorical independent variable (with two or more categories) and a normally distributed interval dependent variable. You wish to test for differences in the means of the dependent variable broken down by the levels of the independent variable.

Note: I would highly recommend this video for a good understanding of ANOVA

For example, using the hsb2 data file, say we wish **to test whether the mean of write differs between the three program types (prog).**

# Let us look at the write data viz-a-viz the prog type data

# Prog : (1=General, 2= Academic, 3=Vocation)

#

(checkdata<-table(write,prog))

## prog

## write 1 2 3

## 31 1 0 3

## 33 2 1 1

## 35 0 0 2

## 36 1 0 1

## 37 0 1 2

## 38 0 1 0

## 39 2 0 3

## 40 0 2 1

## 41 2 4 4

## 42 0 1 1

## 43 0 1 0

## 44 6 1 5

## 45 0 0 1

## 46 1 4 4

## 47 0 1 1

## 49 4 3 4

## 50 0 2 0

## 52 2 9 4

## 53 1 0 0

## 54 6 9 2

## 55 0 2 1

## 57 4 5 3

## 59 6 18 1

## 60 0 3 1

## 61 0 4 0

## 62 3 12 3

## 63 0 3 1

## 65 3 13 0

## 67 1 5 1

#

# How about plotting this to get a better understanding

#

plot(checkdata, col=c("green2", "blue1","yellow2"), ,,main="Writing Score by Program Type")

#

# Let us now perform the test

#

aov(write ~ prog)

## Call:

## aov(formula = write ~ prog)

##

## Terms:

## prog Residuals

## Sum of Squares 3175.698 14703.177

## Deg. of Freedom 2 197

##

## Residual standard error: 8.639179

## Estimated effects may be unbalanced

#

# Summarizing the results

#

summary(aov(write ~ prog))

## Df Sum Sq Mean Sq F value Pr(>F)

## prog 2 3176 1587.8 21.27 4.31e-09 ***

## Residuals 197 14703 74.6

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

**Summary of output -**

A one-way analysis of variance was conducted to evaluate the relationship between writing score and the type of program. The independent variable, the type of program, included three levels, general, academic and vocation. The dependent variable was the writing score. The ANOVA was significant at the .05 level, p-value = 4.31e-09 , F-Value (2,197)= 21.27 .

We can say that mean of write differs significantly between the three program types (prog).

**Kruskal Wallis test**

The Kruskal Wallis test is used when you have **one independent variable with two or more levels and an ordinal continuous dependent variable**. In other words, **it is the non-parametric version of ANOVA and a generalized form of the Mann-Whitney test method**, since it permits two or more groups. We will use the same data file as the one way ANOVA example above (the hsb2 data file) and the same variables (writing scores and program types) as in the example above, but we will not assume that write is a normally distributed interval variable.

Note: One-way Anova requires a categorical independent variable (with two or more categories) and a normally distributed interval dependent variable.Mann-Whitney test requires one independent variable with NOT MORE THAN two levels and an ordinal continuous dependent variable

Again, same as above,using the hsb2 data file, say we wish **to test whether the mean of write differs between the three program types (prog).**

#

# Let us look at the write data viz-a-viz the prog type data

# Prog : (1=General, 2= Academic, 3=Vocation)

#

(checkdata<-table(write,prog))

## prog

## write 1 2 3

## 31 1 0 3

## 33 2 1 1

## 35 0 0 2

## 36 1 0 1

## 37 0 1 2

## 38 0 1 0

## 39 2 0 3

## 40 0 2 1

## 41 2 4 4

## 42 0 1 1

## 43 0 1 0

## 44 6 1 5

## 45 0 0 1

## 46 1 4 4

## 47 0 1 1

## 49 4 3 4

## 50 0 2 0

## 52 2 9 4

## 53 1 0 0

## 54 6 9 2

## 55 0 2 1

## 57 4 5 3

## 59 6 18 1

## 60 0 3 1

## 61 0 4 0

## 62 3 12 3

## 63 0 3 1

## 65 3 13 0

## 67 1 5 1

#

# Let us now perform the test

#

kruskal.test(write, prog)

##

## Kruskal-Wallis rank sum test

##

## data: write and prog

## Kruskal-Wallis chi-squared = 34.045, df = 2, p-value = 4.047e-08

The results indicate that there is a statistically significant difference p-value = 4.047e-08 (p < .0005) among the three type of programs.

**Summary of output -**

A Kruskal-Wallis test was conducted to evaluate differences among the three types of program (general, academic and vocation) on median change in the writing score). The test, which was corrected for tied ranks, was significant, X2 (2, n = 200) = 34.045, p-value = 4.047e-08 (p < .001). Very low p-value signifies a fairly strong relationship between type of program and writing score.

**Paired t-test**

A paired (samples) t-test is used when you have **two related observations** (i.e., two observations per subject) and you want to see if the means on these two normally distributed interval variables differ from one another.

For example, using the hsb2 data file we will test **whether the mean of read scores is equal to the mean of write scores**

#

# Let us look at the Read Scores vs Write Scores on a plot

#

plot(write, col=c("blue"), ,ylab="Scores",,main="Read and Write Scores")

lines(read,col="red")

#

# It is visible that read and write scores are somewhat co-related

#

# Let us now perform the paired t-test

#

(ttest<-t.test(write, read, paired = TRUE))

##

## Paired t-test

##

## data: write and read

## t = 0.86731, df = 199, p-value = 0.3868

## alternative hypothesis: true difference in means is not equal to 0

## 95 percent confidence interval:

## -0.6941424 1.7841424

## sample estimates:

## mean of the differences

## 0.545

#

# Cohen's d = t-value / squareroot(sample size) -- Sample size is 200 in our case

#

(cohd<- ttest$statistic / sqrt(length(write)))

## t

## 0.06132783

#

These results indicate that the mean of read is not statistically significantly different from the mean of write (t=0.867, p-value = 0.3868).

**Summary of output -**

A paired-samples t test was conducted to evaluate whether reading and writing scores were related. The results indicated that the mean score for writing was not significantly greater than the mean score for reading , t (199) = 0.86731, p = 0.3868. The standardized effect size index, d , was .06, which is pretty low. The 95% confidence interval for the mean difference between the two ratings was -0.69 to 1.78.

**Wilcoxon signed rank sum test**

The Wilcoxon signed rank sum test is the **non-parametric version of a paired samples t-test**. You use the Wilcoxon signed rank sum test when you do not wish to assume that the difference between the two variables is interval and normally distributed (but you do assume the difference is ordinal/continuous).

We will use the same example as above, **but we will not assume that the difference between read and write is interval and normally distributed.**

Again, using the hsb2 data file we will test **whether the mean of read scores is equal to the mean of write scores**

# Let us now perform the Wilcoxon signed rank sum test

#

wilcox.test(write, read, paired = TRUE)

##

## Wilcoxon signed rank test with continuity correction

##

## data: write and read

## V = 9261, p-value = 0.3666

## alternative hypothesis: true location shift is not equal to 0

**Summary of output -**

The results suggest that there is not a statistically significant difference (p-value = 0.3666) between read and write scores.

**Sign test**

If you believe the differences between read and write were not ordinal but could merely be classified as positive and negative, then you may want to consider a sign test in lieu of sign rank test. The Sign test answers the question “**How Often**?”, whereas other tests answer the question “How Much?”.

We will use the same example as above, **and assume that this difference is not ordinal**

Again, using the hsb2 data file we will test **how often the mean of read scores is equal to the mean of write scores**

#

#

test<-as.data.frame(cbind(write,read))

#

# Capture number of times when Write Score is greater than Read Score

#

length(which(test$write > test$read))

## [1] 97

#

# Let us now perform the Sign Test

#

binom.test(length(which(test$write > test$read)),n=200)

##

## Exact binomial test

##

## data: length(which(test$write > test$read)) and 200

## number of successes = 97, number of trials = 200, p-value = 0.7238

## alternative hypothesis: true probability of success is not equal to 0.5

## 95 percent confidence interval:

## 0.4139148 0.5565364

## sample estimates:

## probability of success

## 0.485

**Summary of output -**

We conclude that no statistically significant difference between the scores was found (p-value = 0.7238).

**McNemar test**

There is often a need to test change in a dichotomous variable (yes/no) before and after an intervention. A standard chi-square cannot be used because it assumes that the groups are independent. Obviously, this is not the case when you are testing same set of students for their scores in different subjects (or for example in medicinal science when you are testing clients’ pre- and post-intervention scores).

**Here is an example used-case of when McNemar Test is used** - An outpatient clinic treating patients diagnosed with lupus develops an intervention to help increase medication compliance. Twenty clients are selected to receive daily texts from the clinic reminding them to take their medication. Prior to the intervention, the patients are asked a simple yes/no question, “Are you taking your medication on a daily basis?” They are asked the same question 6 weeks later. The question to test is as follows: Is the patient’s rate of daily compliance (answering yes to the question) higher after the intervention?

For our this blog, to make it easier for understanding, we would continue with the hsb2 dataset used in several above examples. Let us create two binary outcomes in our dataset: himath (Students with Math scores greater than 60) and hiread (Students with Read scores greater than 60). These outcomes can be considered in a two-way contingency table.

The null hypothesis is that the proportion of students in the himath group is the same as the proportion of students in hiread group (i.e., that the contingency table is symmetric).

#

# Prepare a 2X2 contigency table between himath and hiread

#

(t<-prop.table(table(himath=math>60,hiread=read>60))*100)

## hiread

## himath FALSE TRUE

## FALSE 67.5 10.5

## TRUE 9.0 13.0

#

# Let us now perform the McNemar Test

#

mcnemar.test(t)

##

## McNemar's Chi-squared test with continuity correction

##

## data: t

## McNemar's chi-squared = 0.012821, df = 1, p-value = 0.9099

**Summary of output -**

McNemar’s chi-square statistic suggests that there is not a statistically significant difference in the proportion of students in the himath group and the proportion of students in the hiread group ( p-value = 0.9099).

Hoefully this should give a good basic understandic of the Statistical Techniques used for Hypothesis testing which lay the foundation of the Statistical Analysis of Data.

- 08/05/2016
- 256
- 0 Like

## R : How to Choose the Correct Statistical Method (t-test, chi-square, ANOVA, Fisher, McNemar, Wilcoxon,Binomial) for Hypothesis Testing

- 08/05/2016
- 256
- 0 Like

#### Ankit Agarwal

Analytics Manager - Deloitte Advisory at Deloitte

Opinions expressed by Gladwin Analytics members are their own.

#### Top Authors

**R : Statistical Analysis (Hypothesis Testing)**

**- Ankit Agarwal**

With so many different types of Statistical Tests for Hypothesis Testing, it becomes a nighmare for a non-statistical guy like me to identify the right test for a specific scenario. Thus I decided to take a stab at it and write this detailed blog around performing a number of Statistical Tests using R. Each section gives a brief description of the -

- aim of the statistical test,

- when it is used,

- an example showing the R commands and

- R (often abbreviated) output with a brief interpretation of the output

It is easier to understand the differences between different tests if we use the same input data to differentiate the methodology/functioning of a test as compared to another. Hence I would try and perform most of the tests on the same set of data.

**Input Data Set**

**Here is a brief overview of the Dataset -**

This data file (hsb) contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst).

**Snapshot Summary of the Dataset**

**Load and Attach the Input Data-set for performing tests**

hsb2 <- within(read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv"), {

race <- as.factor(race)

schtyp <- as.factor(schtyp)

prog <- as.factor(prog)

})

attach(hsb2)

#

#

head(hsb2,20)

## id female race ses schtyp prog read write math science socst

## 1 70 0 4 1 1 1 57 52 41 47 57

## 2 121 1 4 2 1 3 68 59 53 63 61

## 3 86 0 4 3 1 1 44 33 54 58 31

## 4 141 0 4 3 1 3 63 44 47 53 56

## 5 172 0 4 2 1 2 47 52 57 53 61

## 6 113 0 4 2 1 2 44 52 51 63 61

## 7 50 0 3 2 1 1 50 59 42 53 61

## 8 11 0 1 2 1 2 34 46 45 39 36

## 9 84 0 4 2 1 1 63 57 54 58 51

## 10 48 0 3 2 1 2 57 55 52 50 51

## 11 75 0 4 2 1 3 60 46 51 53 61

## 12 60 0 4 2 1 2 57 65 51 63 61

## 13 95 0 4 3 1 2 73 60 71 61 71

## 14 104 0 4 3 1 2 54 63 57 55 46

## 15 38 0 3 1 1 2 45 57 50 31 56

## 16 115 0 4 1 1 1 42 49 43 50 56

## 17 76 0 4 3 1 2 47 52 51 50 56

## 18 195 0 4 2 2 1 57 57 60 58 56

## 19 114 0 4 3 1 2 68 65 62 55 61

## 20 85 0 4 2 1 1 55 39 57 53 46

#

#

str(hsb2)

## 'data.frame': 200 obs. of 11 variables:

## $ id : int 70 121 86 141 172 113 50 11 84 48 ...

## $ female : int 0 1 0 0 0 0 0 0 0 0 ...

## $ race : Factor w/ 4 levels "1","2","3","4": 4 4 4 4 4 4 3 1 4 3 ...

## $ ses : int 1 2 3 3 2 2 2 2 2 2 ...

## $ schtyp : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...

## $ prog : Factor w/ 3 levels "1","2","3": 1 3 1 3 2 2 1 2 1 2 ...

## $ read : int 57 68 44 63 47 44 50 34 63 57 ...

## $ write : int 52 59 33 44 52 52 59 46 57 55 ...

## $ math : int 41 53 54 47 57 51 42 45 54 52 ...

## $ science: int 47 63 58 53 53 63 53 39 58 50 ...

## $ socst : int 57 61 31 56 61 61 61 36 51 51 ...

#

#

summary(hsb2)

## id female race ses schtyp prog

## Min. : 1.00 Min. :0.000 1: 24 Min. :1.000 1:168 1: 45

## 1st Qu.: 50.75 1st Qu.:0.000 2: 11 1st Qu.:2.000 2: 32 2:105

## Median :100.50 Median :1.000 3: 20 Median :2.000 3: 50

## Mean :100.50 Mean :0.545 4:145 Mean :2.055

## 3rd Qu.:150.25 3rd Qu.:1.000 3rd Qu.:3.000

## Max. :200.00 Max. :1.000 Max. :3.000

## read write math science

## Min. :28.00 Min. :31.00 Min. :33.00 Min. :26.00

## 1st Qu.:44.00 1st Qu.:45.75 1st Qu.:45.00 1st Qu.:44.00

## Median :50.00 Median :54.00 Median :52.00 Median :53.00

## Mean :52.23 Mean :52.77 Mean :52.65 Mean :51.85

## 3rd Qu.:60.00 3rd Qu.:60.00 3rd Qu.:59.00 3rd Qu.:58.00

## Max. :76.00 Max. :67.00 Max. :75.00 Max. :74.00

## socst

## Min. :26.00

## 1st Qu.:46.00

## Median :52.00

## Mean :52.41

## 3rd Qu.:61.00

## Max. :71.00

#

**One Sample t-test**

A one sample t-test allows us to test whether a sample mean (of a **normally** distributed interval variable) significantly differs from a hypothesized value. For example, using the hsb data file, say we wish to test whether the average writing score (write) differs significantly from 50. Test variable writing score (write), Test value 50. We can do this as shown below.

#

# Our given mean is 50 (mu=50)

#

t.test(write, mu = 50)

##

## One Sample t-test

##

## data: write

## t = 4.1403, df = 199, p-value = 5.121e-05

## alternative hypothesis: true mean is not equal to 50

## 95 percent confidence interval:

## 51.45332 54.09668

## sample estimates:

## mean of x

## 52.775

As we can notice from the p-value (p-value = 5.121e-05) , which is way less than .05 , that the difference between the average writing score (write) differs significantly from 50 thus confirming the alternative Hypothesis.

The mean of the variable write for this particular sample of students is 52.775, which is statistically significantly (p<.001) different from the test value of 50. We would conclude that this group of students has a significantly higher mean on the writing test than 50. This is consistent with the reported confidence interval (51.45,54.10) which excludes 50, of course the mid-point is the mean.

**Cohen’s d Effect Size Statistics**

It is often helpful to quantify the extent of a difference that exists between two independent groups, as this can suggest the practical significance of observed differences. While we have observed and confirmed above that writing test of students is significantly higher than mean value of 50. We can also calculate Cohen’s d measure to calculate the impact of this difference.

There are 2 formulae to calculate Cohen’s d value in this case -

#

# Cohen's d = t-value / squareroot(sample size) -- Sample size is 200 in our case

#

ttest <- t.test(write, mu = 50)

#

(cohd<- ttest$statistic / sqrt(length(write)))

## t

## 0.2927652

#

# Another Formula for Cohen's d is : d = Mean Difference / SD

#

(cohd<- (ttest$estimate-50) / sd(write))

## mean of x

## 0.2927652

#

We get Cohen’s d value = 0.2927634 which is a small-to-medium effect overall.

**Cohen’s d value interpretation -**

d evaluates the degree that the mean on the test variable differs from the test value in standard deviation units. Potentially, d can range in value from negative infinity to positive infinity. If d equals 0, the mean of the scores is equal to the test value. As d deviates from 0, we interpret the effect size to be stronger. What is a small versus a large d is dependent on the area of investigation. However, d values of .2, .5 and .8, regardless of sign, are by convention interpreted as small, medium, and large effect sizes, respectively.

**Summary of t-test and Cohen’s d estimate -**

A one-sample t test was conducted to evaluate whether the mean of the writing scores was significantly different from 50, the accepted mean. The sample mean of 52.78 ( SD = 9.48) was significantly different from 50, t(199) = 4.14, p < .001. The 95% confidence interval for the writing scores mean ranged from 51.45 to 54.10. The effect size d of .29 indicates a medium effect.

**One Sample median test (Wilcox Test)**

A one sample median test allows us to test whether a sample median differs significantly from a hypothesized value. We will use the same variable, write, as we did in the one sample t-test example above. But we do not need to assume that it is interval and normally distributed (we only need to assume that write is an **numerical** variable).

#

wilcox.test(write, mu = 50)

##

## Wilcoxon signed rank test with continuity correction

##

## data: write

## V = 13177, p-value = 3.702e-05

## alternative hypothesis: true location is not equal to 50

#

Notice that instead of a calculated t value, this test computes V.

With a p-value = 3.702e-05, we can confirm that alternative hypothesis is correct,i.e., write scores significanctly differ from the median value of 50.

**Binomial test**

A one sample binomial test allows us to test whether the proportion of successes on a two-level categorical dependent variable significantly differs from a hypothesized value. For example, using the hsb2 data file, say we wish to test whether the proportion of females (female) differs significantly from 50%, i.e., from .5

#

prop.test(sum(female), length(female), p = 0.5)

##

## 1-sample proportions test with continuity correction

##

## data: sum(female) out of length(female), null probability 0.5

## X-squared = 1.445, df = 1, p-value = 0.2293

## alternative hypothesis: true p is not equal to 0.5

## 95 percent confidence interval:

## 0.4733037 0.6149394

## sample estimates:

## p

## 0.545

#

**** Output ** :** The results indicate that there is no statistically significant difference (p-value= 0.229). In other words, the proportion of females in this sample does not significantly differ from the hypothesized value of 50%.

**Summary of Binomial-Test -**

We hypothesized that the proportion of females is 50%. A two-tailed, binomial test was conducted to assess this research hypothesis. The observed proportion of .455 did not differ significantly from the hypothesized value of .50, two-tailed p = .23. Our results suggest that the proportion of females do not differ dramatically from males.

Note : Do not confuse **prop.test** with **prop.table**. While the prior is used for Binomial test, the latter is used to calculate percentage distribution in a factor/categorical variable. This is what we would get by using prop.table (percentage of males and females in the class)

#

# 0 = Male , 1 = Female

#

prop.table(table(female))*100

## female

## 0 1

## 45.5 54.5

**Chi-square goodness of fit**

A chi-square goodness of fit test allows us to test whether the observed proportions for a categorical variable differ from hypothesized proportions. For example, let’s suppose that we believe that the general population consists of 10% Hispanic, 10% Asian, 10% African American and 70% White folks. We want to test whether the observed proportions from our sample differ significantly from these hypothesized proportions. Note this example employs input data (10, 10, 10, 70), in addition to input data from hsb2 file.

#

# Let us look at the race statistics

table(race)

## race

## 1 2 3 4

## 24 11 20 145

#

# 1 = Hispanic , 2= Asian, 3= African American, 4= White

#

chisq.test(table(race), p = c(10, 10, 10, 70)/100)

##

## Chi-squared test for given probabilities

##

## data: table(race)

## X-squared = 5.0286, df = 3, p-value = 0.1697

**Summary of output -**

These results show that racial composition in our sample does not differ significantly from the hypothesized values that we supplied (chi-square with three degrees of freedom = 5.029, p-value = 0.170).

**Two independent samples t-test**

An independent samples t-test is used when you want to compare the means of a normally distributed interval dependent variable for two independent groups. For example, using the hsb2 data file, say we wish to test whether the **mean for write is the same for males and females.**

#

# Let us look at the write data viz-a-viz the female data

#

(checkdata<-table(write,female))

## female

## write 0 1

## 31 4 0

## 33 4 0

## 35 0 2

## 36 1 1

## 37 2 1

## 38 1 0

## 39 4 1

## 40 3 0

## 41 6 4

## 42 1 1

## 43 1 0

## 44 6 6

## 45 1 0

## 46 3 6

## 47 1 1

## 49 7 4

## 50 0 2

## 52 6 9

## 53 0 1

## 54 5 12

## 55 1 2

## 57 5 7

## 59 11 14

## 60 1 3

## 61 2 2

## 62 5 13

## 63 2 2

## 65 6 10

## 67 2 5

#

# How about plotting this to get a better understanding (0=Male, 1=Female)

#

plot(checkdata, col=c("lightgray", "darkgray"), ,,main="Writing Score by Gender(Female/Male)")

#

# Let us now perform the test

# Note that tilde (~) is used here in place of comma as female is a categorical variable

#

t.test(write ~ female)

##

## Welch Two Sample t-test

##

## data: write by female

## t = -3.6564, df = 169.71, p-value = 0.0003409

## alternative hypothesis: true difference in means is not equal to 0

## 95 percent confidence interval:

## -7.499159 -2.240734

## sample estimates:

## mean in group 0 mean in group 1

## 50.12088 54.99083

**Summary of output -**

The results indicate that there is a statistically significant difference between the mean writing score for males and females (t=-3.656, p-value = 0.0003409). In other words, females have a statistically significantly higher mean score on writing (54.99) than males (50.12).

This is also supported by the negative confidence interval ( -7.499159 -2.240734) which is calculated as (male - female).

**Wilcoxon-Mann-Whitney test**

The Wilcoxon-Mann-Whitney test is a non-parametric analogous to the independent samples t-test and can be used when you **do not assume that the dependent variable is a normally distributed interval variable (you only assume that the variable is at least numerical).**

We will use the same data file (the hsb2 data file) and the same variables in this example as we did in the independent t-test example above. We will not assume that write, our dependent variable, is normally distributed and our purpose is again to test whether the **mean for write is the same for males and females.**

#

# Let us perform Wilcoxon-Mann-Whitney the test

#

wilcox.test(write ~ female)

##

## Wilcoxon rank sum test with continuity correction

##

## data: write by female

## W = 3606, p-value = 0.0008749

## alternative hypothesis: true location shift is not equal to 0

The results suggest that there is a statistically significant difference between the underlying distributions of the write scores of males and the write scores of females (p-value=0.0008749).

**Summary of output -**

A Wilcoxon test was conducted to evaluate whether writing score was affected by gender. The results indicated a significant difference .

**Chi-square test( Contingency Table)**

A chi-square test is used when you want to see if there is a relationship between two categorical variables. It is equivalent to the correlation between nominal(continuous) variables

**Note: chi-square test assumes that each category has an expected frequency of five or more**

A chi-square test is a common test for nominal (categorical) data. One application of a chi-square test is a test for independence. In this case, the null hypothesis is that the occurrence of the outcomes for the two groups is equal. If your data for two groups came from the same participants (i.e. the data were paired), you should use the **McNemar’s test**, while for k groups you should use **Cochran’s Q test**.

Using the hsb2 data file, let’s see if there is a relationship between the type of school attended (schtyp) and students’ gender (female).

#

# Let us look at the school-type data viz-a-viz the female data

#

(checkdata<-table(schtyp,female))

## female

## schtyp 0 1

## 1 77 91

## 2 14 18

#

# Percentage of students in different school types based on their genders can be given my following formula

#

prop.table(table(schtyp,female),2)

## female

## schtyp 0 1

## 1 0.8461538 0.8348624

## 2 0.1538462 0.1651376

#

# How about plotting this to get a better understanding female-(0=Male, 1=Female),schtype- (1=Public,2=Private)

#

barplot(checkdata, col=c("lightgray", "darkgray"),

legend=rownames(checkdata),,, beside=TRUE)

#

# Let us now perform the test

#

chisq.test(table(female, schtyp))

##

## Pearson's Chi-squared test with Yates' continuity correction

##

## data: table(female, schtyp)

## X-squared = 0.00054009, df = 1, p-value = 0.9815

These results indicate that there is no statistically significant relationship between the type of school attended and gender (chi-square with one degree of freedom=0.00054009, p=0.9815).

**Summary of output -**

A two-way contingency table analysis was conducted to evaluate whether type of school exhibited a gender bias. School and gender were found to not be significantly related, Pearson X2 (1, N = 200) = 0.00054009, p = 0.9815.

Let’s look at another example of Chi-Square Test.

this time looking at the relationship between gender (female) and socio-economic status (ses). The point of this example is that one (or both) variables may have more than two levels, and that the variables do not have to have the same number of levels. In this example, female has two levels (male and female) and ses has three levels (low, medium and high).

#

# Let us look at the socio-economic status data viz-a-viz the female data

#

(checkdata<-table(ses,female))

## female

## ses 0 1

## 1 15 32

## 2 47 48

## 3 29 29

#

# Percentage of students of different socio-economic status based on their genders can be given my following formula . Here 2 is used to calculate proportions column-wise

#

prop.table(table(ses,female),2)

## female

## ses 0 1

## 1 0.1648352 0.2935780

## 2 0.5164835 0.4403670

## 3 0.3186813 0.2660550

#

# How about plotting this to get a better understanding female-(0=Male, 1=Female),ses- (1=Low,2=Medium,3=High)

#

barplot(checkdata, col=c("red", "blue","green"),

legend=rownames(checkdata),,, beside=TRUE)

#

# Let us now perform the test

#

chisq.test(table(female, ses))

##

## Pearson's Chi-squared test

##

## data: table(female, ses)

## X-squared = 4.5765, df = 2, p-value = 0.1014

Again we find that there is no statistically significant relationship between the variables (chi-square with two degrees of freedom=4.577, p-value=0.101).

A two-way contingency table analysis was conducted to evaluate whether gender was related to social economic status (SES). Gender and SES were not found to be significantly related, Pearson X2 (2, N = 200) = 4.58, p = .101 .

**Fisher’s exact test**

The Fisher’s exact test is used when you want to conduct a chi-square test but one or more of your cells has an expected frequency of five or less. Remember that the chi-square test assumes that each cell has an expected frequency of five or more. Fisher’s exact test has no such assumption and can be used regardless of how small the expected frequency is.

It is appropriate to use the Fisher’s exact test ** when the table you create is 2 × 2 **, that is, when both variables have two categories, and/or your expected cell sizes are small (< 5). In cases where the tables are larger, for example one variable has two categories and another has three, you would use the chi-square test (as we have seen above).

Let us use the same example that we used above for Ch-Square test.

# Let us look at the school-type data viz-a-viz the female data

#

(checkdata<-table(schtyp,female))

## female

## schtyp 0 1

## 1 77 91

## 2 14 18

# Percentage of students in different school types based on their genders can be given my following formula

#

prop.table(table(schtyp,female),2)

## female

## schtyp 0 1

## 1 0.8461538 0.8348624

## 2 0.1538462 0.1651376

#

# How about plotting this to get a better understanding female-(0=Male, 1=Female),schtype- (1=Public,2=Private)

#

barplot(checkdata, col=c("yellow", "orange"),

legend=rownames(checkdata),,, beside=TRUE)

#

# Let us now perform the test

#

fisher.test(table(schtyp,female))

##

## Fisher's Exact Test for Count Data

##

## data: table(schtyp, female)

## p-value = 0.8492

## alternative hypothesis: true odds ratio is not equal to 1

## 95 percent confidence interval:

## 0.4755259 2.5307479

## sample estimates:

## odds ratio

## 1.087428

These results indicate that there is no statistically significant relationship between the type of school attended and gender (p=0.8492).

**Summary of output -**

A two-way contingency table analysis was conducted to evaluate whether type of school exhibited a gender bias. School and gender were found to be NOT significantly related (p = 0.8492) .

**One-way ANOVA**

A one-way analysis of variance (ANOVA) is used when you have a categorical independent variable (with two or more categories) and a normally distributed interval dependent variable. You wish to test for differences in the means of the dependent variable broken down by the levels of the independent variable.

Note: I would highly recommend this video for a good understanding of ANOVA

For example, using the hsb2 data file, say we wish **to test whether the mean of write differs between the three program types (prog).**

# Let us look at the write data viz-a-viz the prog type data

# Prog : (1=General, 2= Academic, 3=Vocation)

#

(checkdata<-table(write,prog))

## prog

## write 1 2 3

## 31 1 0 3

## 33 2 1 1

## 35 0 0 2

## 36 1 0 1

## 37 0 1 2

## 38 0 1 0

## 39 2 0 3

## 40 0 2 1

## 41 2 4 4

## 42 0 1 1

## 43 0 1 0

## 44 6 1 5

## 45 0 0 1

## 46 1 4 4

## 47 0 1 1

## 49 4 3 4

## 50 0 2 0

## 52 2 9 4

## 53 1 0 0

## 54 6 9 2

## 55 0 2 1

## 57 4 5 3

## 59 6 18 1

## 60 0 3 1

## 61 0 4 0

## 62 3 12 3

## 63 0 3 1

## 65 3 13 0

## 67 1 5 1

#

# How about plotting this to get a better understanding

#

plot(checkdata, col=c("green2", "blue1","yellow2"), ,,main="Writing Score by Program Type")

#

# Let us now perform the test

#

aov(write ~ prog)

## Call:

## aov(formula = write ~ prog)

##

## Terms:

## prog Residuals

## Sum of Squares 3175.698 14703.177

## Deg. of Freedom 2 197

##

## Residual standard error: 8.639179

## Estimated effects may be unbalanced

#

# Summarizing the results

#

summary(aov(write ~ prog))

## Df Sum Sq Mean Sq F value Pr(>F)

## prog 2 3176 1587.8 21.27 4.31e-09 ***

## Residuals 197 14703 74.6

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

**Summary of output -**

A one-way analysis of variance was conducted to evaluate the relationship between writing score and the type of program. The independent variable, the type of program, included three levels, general, academic and vocation. The dependent variable was the writing score. The ANOVA was significant at the .05 level, p-value = 4.31e-09 , F-Value (2,197)= 21.27 .

We can say that mean of write differs significantly between the three program types (prog).

**Kruskal Wallis test**

The Kruskal Wallis test is used when you have **one independent variable with two or more levels and an ordinal continuous dependent variable**. In other words, **it is the non-parametric version of ANOVA and a generalized form of the Mann-Whitney test method**, since it permits two or more groups. We will use the same data file as the one way ANOVA example above (the hsb2 data file) and the same variables (writing scores and program types) as in the example above, but we will not assume that write is a normally distributed interval variable.

Note: One-way Anova requires a categorical independent variable (with two or more categories) and a normally distributed interval dependent variable.Mann-Whitney test requires one independent variable with NOT MORE THAN two levels and an ordinal continuous dependent variable

Again, same as above,using the hsb2 data file, say we wish **to test whether the mean of write differs between the three program types (prog).**

#

# Let us look at the write data viz-a-viz the prog type data

# Prog : (1=General, 2= Academic, 3=Vocation)

#

(checkdata<-table(write,prog))

## write 1 2 3

## 31 1 0 3

## 33 2 1 1

## 35 0 0 2

## 36 1 0 1

## 37 0 1 2

## 38 0 1 0

## 39 2 0 3

## 40 0 2 1

## 41 2 4 4

## 42 0 1 1

## 43 0 1 0

## 44 6 1 5

## 45 0 0 1

## 46 1 4 4

## 47 0 1 1

## 49 4 3 4

## 50 0 2 0

## 52 2 9 4

## 53 1 0 0

## 54 6 9 2

## 55 0 2 1

## 57 4 5 3

## 59 6 18 1

## 60 0 3 1

## 61 0 4 0

## 62 3 12 3

## 63 0 3 1

## 65 3 13 0

## 67 1 5 1

#

# Let us now perform the test

#

kruskal.test(write, prog)

##

## Kruskal-Wallis rank sum test

##

## data: write and prog

## Kruskal-Wallis chi-squared = 34.045, df = 2, p-value = 4.047e-08

The results indicate that there is a statistically significant difference p-value = 4.047e-08 (p < .0005) among the three type of programs.

**Summary of output -**

A Kruskal-Wallis test was conducted to evaluate differences among the three types of program (general, academic and vocation) on median change in the writing score). The test, which was corrected for tied ranks, was significant, X2 (2, n = 200) = 34.045, p-value = 4.047e-08 (p < .001). Very low p-value signifies a fairly strong relationship between type of program and writing score.

**Paired t-test**

A paired (samples) t-test is used when you have **two related observations** (i.e., two observations per subject) and you want to see if the means on these two normally distributed interval variables differ from one another.

For example, using the hsb2 data file we will test **whether the mean of read scores is equal to the mean of write scores**

#

# Let us look at the Read Scores vs Write Scores on a plot

#

plot(write, col=c("blue"), ,ylab="Scores",,main="Read and Write Scores")

lines(read,col="red")

#

# It is visible that read and write scores are somewhat co-related

#

# Let us now perform the paired t-test

#

(ttest<-t.test(write, read, paired = TRUE))

##

## Paired t-test

##

## data: write and read

## t = 0.86731, df = 199, p-value = 0.3868

## alternative hypothesis: true difference in means is not equal to 0

## 95 percent confidence interval:

## -0.6941424 1.7841424

## sample estimates:

## mean of the differences

## 0.545

#

# Cohen's d = t-value / squareroot(sample size) -- Sample size is 200 in our case

#

(cohd<- ttest$statistic / sqrt(length(write)))

## t

## 0.06132783

#

These results indicate that the mean of read is not statistically significantly different from the mean of write (t=0.867, p-value = 0.3868).

**Summary of output -**

A paired-samples t test was conducted to evaluate whether reading and writing scores were related. The results indicated that the mean score for writing was not significantly greater than the mean score for reading , t (199) = 0.86731, p = 0.3868. The standardized effect size index, d , was .06, which is pretty low. The 95% confidence interval for the mean difference between the two ratings was -0.69 to 1.78.

**Wilcoxon signed rank sum test**

The Wilcoxon signed rank sum test is the **non-parametric version of a paired samples t-test**. You use the Wilcoxon signed rank sum test when you do not wish to assume that the difference between the two variables is interval and normally distributed (but you do assume the difference is ordinal/continuous).

We will use the same example as above, **but we will not assume that the difference between read and write is interval and normally distributed.**

Again, using the hsb2 data file we will test **whether the mean of read scores is equal to the mean of write scores**

# Let us now perform the Wilcoxon signed rank sum test

#

wilcox.test(write, read, paired = TRUE)

##

## Wilcoxon signed rank test with continuity correction

##

## data: write and read

## V = 9261, p-value = 0.3666

## alternative hypothesis: true location shift is not equal to 0

**Summary of output -**

The results suggest that there is not a statistically significant difference (p-value = 0.3666) between read and write scores.

**Sign test**

If you believe the differences between read and write were not ordinal but could merely be classified as positive and negative, then you may want to consider a sign test in lieu of sign rank test. The Sign test answers the question “**How Often**?”, whereas other tests answer the question “How Much?”.

We will use the same example as above, **and assume that this difference is not ordinal**

Again, using the hsb2 data file we will test **how often the mean of read scores is equal to the mean of write scores**

#

#

test<-as.data.frame(cbind(write,read))

#

# Capture number of times when Write Score is greater than Read Score

#

length(which(test$write > test$read))

## [1] 97

#

# Let us now perform the Sign Test

#

binom.test(length(which(test$write > test$read)),n=200)

##

## Exact binomial test

##

## data: length(which(test$write > test$read)) and 200

## number of successes = 97, number of trials = 200, p-value = 0.7238

## alternative hypothesis: true probability of success is not equal to 0.5

## 95 percent confidence interval:

## 0.4139148 0.5565364

## sample estimates:

## probability of success

## 0.485

**Summary of output -**

We conclude that no statistically significant difference between the scores was found (p-value = 0.7238).

**McNemar test**

There is often a need to test change in a dichotomous variable (yes/no) before and after an intervention. A standard chi-square cannot be used because it assumes that the groups are independent. Obviously, this is not the case when you are testing same set of students for their scores in different subjects (or for example in medicinal science when you are testing clients’ pre- and post-intervention scores).

**Here is an example used-case of when McNemar Test is used** - An outpatient clinic treating patients diagnosed with lupus develops an intervention to help increase medication compliance. Twenty clients are selected to receive daily texts from the clinic reminding them to take their medication. Prior to the intervention, the patients are asked a simple yes/no question, “Are you taking your medication on a daily basis?” They are asked the same question 6 weeks later. The question to test is as follows: Is the patient’s rate of daily compliance (answering yes to the question) higher after the intervention?

For our this blog, to make it easier for understanding, we would continue with the hsb2 dataset used in several above examples. Let us create two binary outcomes in our dataset: himath (Students with Math scores greater than 60) and hiread (Students with Read scores greater than 60). These outcomes can be considered in a two-way contingency table.

The null hypothesis is that the proportion of students in the himath group is the same as the proportion of students in hiread group (i.e., that the contingency table is symmetric).

#

# Prepare a 2X2 contigency table between himath and hiread

#

(t<-prop.table(table(himath=math>60,hiread=read>60))*100)

## hiread

## himath FALSE TRUE

## FALSE 67.5 10.5

## TRUE 9.0 13.0

#

# Let us now perform the McNemar Test

#

mcnemar.test(t)

##

## McNemar's Chi-squared test with continuity correction

##

## data: t

## McNemar's chi-squared = 0.012821, df = 1, p-value = 0.9099

**Summary of output -**

McNemar’s chi-square statistic suggests that there is not a statistically significant difference in the proportion of students in the himath group and the proportion of students in the hiread group ( p-value = 0.9099).

Hoefully this should give a good basic understandic of the Statistical Techniques used for Hypothesis testing which lay the foundation of the Statistical Analysis of Data.

- 08/05/2016
- 256
- 0 Like

## Ankit Agarwal

Analytics Manager - Deloitte Advisory at Deloitte

Opinions expressed by Gladwin Analytics members are their own.