Category Archives: Assumption Checking

Assumption Checking - Part I

Reading Time: 2 minutes

Often when working, we are under deadlines to produce results in a reasonable timeframe. Sometimes an analyst may not check his assumptions if he is under a tight deadline. A simple example to illustrate this would be a one sample t-test. You might need to test your sample to see if the mean is different from a specific number. One assumption of a t-test that is often overlooked, is that the sample needs be drawn randomly from the population and the population is suppose to follow a Gaussian distribution. When is the last time in the workplace that you heard of someone performing a normality test before running a t-test? It is considered an extra step that is not usually taken. It should really not be considered a burden and can easily be accomplished with a wrapper function in R.

mytest <- function(x, value=0) {
xx <- as.character(substitute(x))
if(!is.numeric(x)) stop(sprintf('%s is not numeric', xx))
print(t.test(x, mu=value))
print(wilcox.test(x, mu=value))

We can combine that with another function to produce a density plot.

myplot <- function(x,color="blue"){
xx <- as.character(substitute(x))
if(!is.numeric(x)) stop(sprintf('%s is not numeric', xx))
title <- paste("Density Plot","\n","Dataset = ",deparse(substitute(x)))
mydens <- density(x)

Now, let's see how our functions work.  If we generate some random values from a Gaussian distribution, we would expect it to "normally" pass a normality test and a t-test to be performed. However, if we had data that was generated from another distribution that is not 'normal', than typically we would expect to see the results from the Wilcox test.

n <- 1000
normal <- rnorm(n,0,1)
chisq <- rchisq(n,df=5)


#Test for difference from 5 for chi-square data
myplot(chisq ,color="orange")

Density Plots

Results from 'mytest(normal)':

One Sample t-test
data: x
t = 0.5143, df = 999, p-value = 0.6072
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.04541145 0.07766719
sample estimates:
mean of x

Results from 'mytest(chisq,value=5)':

Wilcoxon signed rank test with continuity correction

data: x
V = 214385, p-value = 8.644e-05
alternative hypothesis: true location is not equal to 5

The benefit of working ahead can be seen. Once you have these functions written you can add them to your personal R package that you host on github. Then you will be able to use them whenever you have an internet connection and the whole R community has the chance to benefit. Also, it is easy to combine these two functions into one.


#Combine the functions
PlotAndTest <- function(x){


Leave a Comment

Filed under Assumption Checking, R