Using the apastats package to write reproducible reports

Overview

Writing a research report using RMarkdown allows you to start your analysis even before you have collected the data. This is especially helpful in report writing and thesis writing. When you, web-crawler, or a panel are still collecting data, you can already start writing the thesis. However, writing results before having the data is somewhat challenging. Keep in mind that writing a discussion before having collected the data is a lot harder.

What we’re going to need two things:

  • a package that formats output in a way that we need it. In our case apastats.
  • string manipulation: stringr and glue

Installing apastats

You can install the apastats package using the following command:

devtools::install_github('achetverikov/apastats',subdir='apastats')

Now, we can start. Let’s assume we have already collected the following data about iris flowers 😉.

df <- iris %>% sample_n(5) 
df %>% kable()
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
5.13.41.50.2setosa
5.72.84.51.3versicolor
5.43.04.51.5versicolor
6.32.85.11.5virginica
4.73.21.60.2setosa

The analysis

Deciding on our hypothesis

We have found in literature that the setosa species has a rather small petal length of only about 3-4 cm. Since we are sampling from random flowers we want to test whether our sample is also smaller than, let us say, 5cm. Thus, our null hypothesis would be \(H_0: PL > 5\).

Selecting a test

We would test this using a one-sample t-test on the subset of data.

res <- df %>% 
  filter(Species == "setosa") %>% 
  pull(Petal.Length) %>% 
  t.test(mu = 5)
res
## 
##  One Sample t-test
## 
## data:  .
## t = -69, df = 1, p-value = 0.009226
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
##  0.9146898 2.1853102
## sample estimates:
## mean of x 
##      1.55

Well… This is not our final data yet, so we should not bee too happy about this result already. And this looks nothing like a report. How can we write the test and the interpretation that it incorporates the results?

Using apastats to describe

The describe.ttest function returns the correctly formatted markdown for reporting a t-test. So the following would always generate the correct output.

tresult <- apastats::describe.ttest(res)
tresult
## [1] "_t_(1.0) = -69.00, _p_ = .009"

Meh. This is not really helpful either, right? We must include the text result as an inline result using the `r tresult`. Then we can say:

Setosa iris have petal leaves whose length is statistically significant different from 5 (t(1.0) = -69.00, p = .009).

Preparing for both outcomes

This is nice, but we don’t know whether our sample of \(n=2\) actually represents the population. Therefore, we must prepare for both cases. Here the glue package is helpful in filling in the blanks in pre-written output.

library(glue)


# test the p-value
alpha_level <- .05

# default is not significant
have_txt <- "do not have"
if (res$p.value < alpha_level) {
  have_txt <- "have"
} 

phrase <- glue("> Setosa iris {have_txt} petal leaves whose length is ", 
               "statistically significant different from 5 ({tresult}).")

This we can now output using: `r phrase`.

Setosa iris have petal leaves whose length is statistically significant different from 5 (t(1.0) = -69.00, p = .009).

Collecting the final data and running our analysis

Now, we can run the full script again, but with the full data set. I have omitted unnecessary output in the following code chunk and directly construct the output phrase.

# run the test
res <- iris %>% 
  filter(Species == "setosa") %>% 
  pull(Petal.Length) %>% 
  t.test(mu = 5)

# describe results
tresult <- apastats::describe.ttest(res)

# generate phrase
have_txt <- "do not have"
if (res$p.value < alpha_level) {
  have_txt <- "have"
} 

# you can use markdown here as well
phrase <- glue("> Setosa iris {have_txt} petal leaves whose length is ",  
               "statistically significant different from 5 ({tresult}).")

The final outcome of our analysis now is:

Setosa iris have petal leaves whose length is statistically significant different from 5 (t(49.0) = -144.06, p < .001).

Avatar
André Calero Valdez
Junior Research Group Leader

I am the research group leader of the research group “Digitale Mündigkeit” studying effects human-algorithm interaction.

Related