Author Archives: Michael Floren

About Michael Floren

Michael Floren is a doctoral student in the Applied Statistics and Research Methods program at the University of Northern Colorado. He received a B.A. in Mathematics from Bethel University in 2011 and was awarded a Master of Science in Methodology from the University of Northern Colorado in 2014. He currently serves as a Graduate Assistant on the College of Education and Behavioral Science’s Assessment Team and has served as a Senior Consultant in the Research Consulting Lab. His work has led to presentations locally and nationally, including presenting at the 28th annual conference of the American Evaluation Association.

Correctly Reporting P-Values in Summary Tables Reported with xtable

Reading Time: 3 minutes

Often when writing a manuscript in using knitr and xtable I am flustered by my p-values. In simple summary tables, R conveniently rounds my p-values to be 0: a mathematically inappropriate task. A colleague recently commented on the poor reporting of my table (shown below using print.xtable with the type="html" argument), inspiring a much needed change.

Estimate Std.err Wald Pr(>|W|)
(Intercept) 0.001704 0.000005 100409.770956 0.000000
sizemedium 0.000046 0.000005 90.534705 0.000000
sizesmall 0.000003 0.000005 0.294331 0.587458
time -0.000004 0.000001 11.614917 0.000654

 

The fix is actually fairly straight forward, and can be summarized in a simple function: "fixp", with the code shown below:

fixp <- function(x, dig=3){
  x <- as.data.frame(x)
  
  if(substr(names(x)[ncol(x)],1,2) != "Pr")
    warning("The name of the last column didn't start with Pr. This may indicate that p-values weren't in the last row, and thus, that this function is inappropriate.")
  x[,ncol(x)] <- round(x[,ncol(x)], dig)
  for(i in 1:nrow(x)){
    if(x[i,ncol(x)] == 0)
      x[i,ncol(x)] <- paste0("< .", paste0(rep(0,dig-1), collapse=""), "1")
  }
  
  x
}

All that's going on: the function is pulling in the summary table (usually through a $coef), trying to turn it into a dataframe (some already are, though some tables are numeric (e.g. lm)), throwing a warning if the last heading doesn't begin with "Pr" (as it may not be the column that contains p-values), and editing any values that were rounded to 0 (at the user specified rounding point) to be < the smallest number that could be rounded to (e.g. <.01). Then we output the edited table, all ready for reporting! To mimic what was above, we set our digits to be equal to 6 (so go out 6 decimal places for the p-value), and re-run:

Estimate Std.err Wald Pr(>|W|)
(Intercept) 0.001704 0.000005 100409.770956 < .000001
sizemedium 0.000046 0.000005 90.534705 < .000001
sizesmall 0.000003 0.000005 0.294331 0.587458
time -0.000004 0.000001 11.614917 0.000654

 

Much better! Also, to report a two digit p-value (for some writing styles), we simply set dig = 2:

Estimate Std.err Wald Pr(>|W|)
(Intercept) 0.001704 0.000005 100409.770956 < .01
sizemedium 0.000046 0.000005 90.534705 < .01
sizesmall 0.000003 0.000005 0.294331 0.59
time -0.000004 0.000001 11.614917 < .01

 

By design, the p-values can be manipulated independent of the estimates. This allows reporting of the estimated coefficients in meaningful units (in the above example, very small units), while reporting the p-values on a scale that many writing styles request.

Want to try this yourself? Here's an example that you can try with just a built in dataset in R:

#this gives a summary table with a small p-value. Trying to report this with xtable would results in an R rounding issue!
(mod <- coef(summary(lm(uptake ~ conc + Treatment + Type + Plant, data=CO2))))

#this fixes the p-value to 2 digits, correctly reporting p-values that would have been rounded to 0
fixp(mod,dig=2)

Here's the final output via print.xtable (dig=2 for fixp and xtable):

Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.42 4.67 8.00 < .01
conc 0.02 0.00 7.96 < .01
Treatmentchilled -12.50 5.10 -2.45 0.02
TypeMississippi -23.33 6.01 -3.88 < .01
Plant.L 21.58 11.14 1.94 0.06
Plant.Q -4.62 2.27 -2.03 0.05
Plant.C 1.46 5.10 0.29 0.78
Plant^4 2.34 2.27 1.03 0.31
Plant^5 -0.48 5.77 -0.08 0.93
Plant^6 -0.04 2.27 -0.02 0.99
Plant^7 -1.91 3.64 -0.53 0.6
Plant^8 -3.28 2.27 -1.44 0.15
Plant^10 0.55 2.27 0.24 0.81

 

Limitations (ish):

  1. Again, this assumes that the last column is the one to be transformed. This is by design, though may be inconvenient in some situations. If needed, the change is easily made through the definition of the function.
  2. When the last column is manipulated, it becomes a character column in the dataframe. Alternatively, when it is rounded but no entry rounds to 0, it is numeric.
  3. This assumes a dataframe-style format of your table. Thus, this method will NOT be effective at correcting reported p-values for an individual test: say a t-test, where only the statistic is reported (and not a table). Personally this is not a concern, as I deal with these situations in other ways, but for some users seeking an overall "p-value fixing" method this may not be the answer.

As with other functions I write posts on, this function is available in my package (creatively named "myStuff")  via Github. If you'd like to play with the most current version of the function, I'd encourage you to check it out here. Alternatively, to have access to other fun functions, install the package directly from GitHub with the code below (requires devtools):

devtools::install_github("flor3652/myStuff")

Leave a Comment

Filed under R

The Keep Function

Reading Time: 2 minutes

Occasionally when I am jotting some code I find myself creating several temporary variables with the intention of later getting rid of them. These variables involve quick names that are defined in a local scope and get quite confusing out of their context. If the project expands, however, I find myself with three options for playing with the next level of the project: remember what I've already used and try to avoid it, rename all of the temporary variables (e.g. rewrite the code of the base level of the project), or wipe the variables for use later. This decision is usually made based on how the project is going. If the project is going well, I'll go back and dutifully rewrite the initial code to track variables in a more unique way. If the project is still a bit shaky, I will clear the variable names that I tend to use as temporary variables and keep exploring. Remembering variable names never turns out well for me; I inevitably forget that a variable was defined in a previous section, use it thinking I had redefined it (when I didn't), and wonder at the strange results I get.

Clearing all variables names can be a task, though. The problem comes as, in order to move on to the next stage, I actually wish to keep a few of the variables and get rid of all of the rest. After playing with a few ideas (removing the unwanted variables one at a time, writing out the variables that I wanted to keep, etc.), I decided on the idea of writing a keep  function. The keep function does just what it says: given a list of variables as arguments, it keeps those variables and removes the rest. For an example, consider the vectors "x" and "y", which are combined to give a matrix "initMod." Using  keep(initMod)  keeps the matrix and will eliminate all other objects in the global environment (including "x" and "y"), allowing me to reuse the variable names "x" and "y" as temporary variables again (say, for "modifiedMod").

Code for keep can be found on GitHub, here. Note that the function will self delete if defined in the global environment, so add in a segment to always leave keep. You could also simply grab the containing personal package (myStuff) off of GitHub using the code below.

if(!require(devtools)) install.packages("devtools", dependencies=TRUE)
devtools::install_github("flor3652/myStuff")
library(myStuff)

Note that when using the function from the package you don't have to worry about it being deleted as it isn't in the global environment.

Leave a Comment

Filed under R