# Category Archives: Statistics

## Vectorize a Function

I was recently working and decided to write a function to assist in the process. It assigns a label to a number based upon the value. My first attempt worked, but only for one value at a time.

```KMO_adequacy2 <- function(x){
if ( x > .90 ) {result <- "Marvelous"}
if (x <=.90 & x >.80 ) { result <-"Meritorious" }
if (x <= .8 & x >.70 ) { result <- "Middling" }
if (x <=.7 & x>.60)    { result <-"Mediocre" }
if (x <=.6 & x>.50)    {result <-"Miserable" }
if (x <=.5)            { result <- "Unacceptable" }
return(result)
}
```

This works.

``` KMO_adequacy2(.95)
# "Marvelous"```

This does not work.

``` KMO_adequacy2(c(.95,.50))
# "Marvelous"
# Warning messages:
#   1: In if (x > 0.9) { :
#       the condition has length > 1 and only the first element will be used
#     2: In if (x <= 0.9 & x > 0.8) { :
#         the condition has length > 1 and only the first element will be used
#       3: In if (x <= 0.8 & x > 0.7) { :
#           the condition has length > 1 and only the first element will be used
#         4: In if (x <= 0.7 & x > 0.6) { :
#             the condition has length > 1 and only the first element will be used
#           5: In if (x <= 0.6 & x > 0.5) { :
#               the condition has length > 1 and only the first element will be used
#             6: In if (x <= 0.5) { :
#                 the condition has length > 1 and only the first element will be used```

I should have thought more about the end goal of the function before I started coding, but I didn't. I started searching for good ways to vectorize a function in R. I found there is a function in base R called 'Vectorize'. All I needed was to create a new function using 'Vectorize' and I was done.

```KMO_adequacy <- Vectorize(KMO_adequacy2)
# "Marvelous"    "Unacceptable"```

This allowed me to easily add a column to my data frame containing individual KMO measures and associate a label. I reached out to my fellow blogger Jeremy and he gave me a quick re-write of my original function. Here is his approach.

```# Jeremy's re-write
KMO_J <- function(x){
names(x) <- x
names(x) <- ifelse(x > .90, "Marvelous", names(x))
names(x) <- ifelse(x <=.9 & x >.80, "Meritorious", names(x))
names(x) <- ifelse(x <=.8 & x >.70, "Middling", names(x))
names(x) <- ifelse(x <=.7 & x>.60, "Mediocre", names(x))
names(x) <- ifelse(x <=.6 & x>.50, "Miserable", names(x))
names(x) <- ifelse(x <=.5, "Unacceptable", names(x))
return(names(x))
}

KMO_J(c(.95,.50))
# "Marvelous"    "Unacceptable"

```

We can do a quick check to make sure that we are getting the same output.

```test <- seq(0,1,.1)
all.equal ( KMO_adequacy(test) , KMO_J(test) )
# TRUE```

My next question, is there a major performance difference between the two?  I ran a short simulation which is summarized in the plot below and shows that there is not a large difference in performance for the samples tested. Have a better way to solve this problem? Post it in the comments below. If you are wondering what this KMO thing is all about, it is a measure of sampling accuracy (MSA) for conducting exploratory factor analysis (EFA). The cutoffs and names were taken from:

1. Barbara A. Cerny , Henry F. Kaiser
Multivariate Behavioral Research
Vol. 12, Iss. 1, 1977