in R
http://rpubs.com/newajay/chisquaretest
#http://www.r-tutor.com/elementary-statistics/goodness-fit/chi-squared-test-independence
library(MASS)
tbl = table(survey$Smoke, survey$Exer)
tbl
##
## Freq None Some
## Heavy 7 1 3
## Never 87 18 84
## Occas 12 3 4
## Regul 9 1 7
table(survey$Smoke)
##
## Heavy Never Occas Regul
## 11 189 19 17
dim(survey)
## [1] 237 12
#Test the hypothesis whether the students
#smoking habit is independent of
#their exercise level at .05 significance level.
chisq.test(tbl)
## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: tbl
## X-squared = 5.4885, df = 6, p-value = 0.4828
#As the p-value 0.4828 is greater than the .05 significance level, we do not reject the null hypothesis that the smoking habit is
#independent of the exercise level of the students.
ctbl = cbind(tbl[,"Freq"], tbl[,"None"] + tbl[,"Some"])
ctbl
## [,1] [,2]
## Heavy 7 4
## Never 87 102
## Occas 12 7
## Regul 9 8
chisq.test(ctbl)
##
## Pearson's Chi-squared test
##
## data: ctbl
## X-squared = 3.2328, df = 3, p-value = 0.3571
#As the p-value 0.3571 is greater than the .05 significance level, we do not reject the null hypothesis that the smoking habit is
#independent of the exercise level of the students.
#The warning message found in the solution
#above is due to the small cell values in
#the contingency table.
in Python
https://github.com/decisionstats/pythonfordatascience/blob/master/chi%2Bsquare%2Btest.ipynb
with code from http://rpubs.com/newajay/chisquaretest
In [11]:
from scipy.stats import chi2_contingency
import numpy as np
In [13]:
obs = np.array([[7, 87, 12,9], [4, 102, 7,8]])
In [15]:
chi2, p, dof, expected = chi2_contingency(obs)
In [16]:
print (p)
In [17]:
print (chi2)
In [18]:
print (dof)
In [19]:
print (expected)
In [ ]: