Using the apastats package to write reproducible reports

Last updated on Jan 4, 2020 4 min read

Overview

Writing a research report using RMarkdown allows you to start your analysis even before you have collected the data. This is especially helpful in report writing and thesis writing. When you, web-crawler, or a panel are still collecting data, you can already start writing the thesis. However, writing results before having the data is somewhat challenging. Keep in mind that writing a discussion before having collected the data is a lot harder.

What we’re going to need two things:

a package that formats output in a way that we need it. In our case apastats.
string manipulation: stringr and glue

Installing apastats

You can install the apastats package using the following command:

devtools::install_github('achetverikov/apastats',subdir='apastats')

Now, we can start. Let’s assume we have already collected the following data about iris flowers 😉.

df <- iris %>% sample_n(5) 
df %>% kable()

Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
5.8	2.7	4.1	1.0	versicolor
6.4	2.8	5.6	2.1	virginica
4.4	3.2	1.3	0.2	setosa
4.3	3.0	1.1	0.1	setosa
7.0	3.2	4.7	1.4	versicolor

The analysis

Deciding on our hypothesis

We have found in literature that the setosa species has a rather small petal length of only about 3-4 cm. Since we are sampling from random flowers we want to test whether our sample is also smaller than, let us say, 5cm. Thus, our null hypothesis would be \(H_0: PL > 5\).

Selecting a test

We would test this using a one-sample t-test on the subset of data.

res <- df %>% 
  filter(Species == "setosa") %>% 
  pull(Petal.Length) %>% 
  t.test(mu = 5)
res

## 
##  One Sample t-test
## 
## data:  .
## t = -38, df = 1, p-value = 0.01675
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
##  -0.07062047  2.47062047
## sample estimates:
## mean of x 
##       1.2

Well… This is not our final data yet, so we should not bee too happy about this result already. And this looks nothing like a report. How can we write the test and the interpretation that it incorporates the results?

Using apastats to describe

The describe.ttest function returns the correctly formatted markdown for reporting a t-test. So the following would always generate the correct output.

tresult <- apastats::describe.ttest(res)
tresult

## [1] "_t_(1.0) = -38.00, _p_ = .017"

Meh. This is not really helpful either, right? We must include the text result as an inline result using the `r tresult`. Then we can say:

Setosa iris have petal leaves whose length is statistically significant different from 5 (t(1.0) = -38.00, p = .017).

Preparing for both outcomes

This is nice, but we don’t know whether our sample of \(n=2\) actually represents the population. Therefore, we must prepare for both cases. Here the glue package is helpful in filling in the blanks in pre-written output.

library(glue)


# test the p-value
alpha_level <- .05

# default is not significant
have_txt <- "do not have"
if (res$p.value < alpha_level) {
  have_txt <- "have"
} 

phrase <- glue("> Setosa iris {have_txt} petal leaves whose length is ", 
               "statistically significant different from 5 ({tresult}).")

This we can now output using: `r phrase`.

Setosa iris have petal leaves whose length is statistically significant different from 5 (t(1.0) = -38.00, p = .017).

Collecting the final data and running our analysis

Now, we can run the full script again, but with the full data set. I have omitted unnecessary output in the following code chunk and directly construct the output phrase.

# run the test
res <- iris %>% 
  filter(Species == "setosa") %>% 
  pull(Petal.Length) %>% 
  t.test(mu = 5)

# describe results
tresult <- apastats::describe.ttest(res)

# generate phrase
have_txt <- "do not have"
if (res$p.value < alpha_level) {
  have_txt <- "have"
} 

# you can use markdown here as well
phrase <- glue("> Setosa iris {have_txt} petal leaves whose length is ",  
               "statistically significant different from 5 ({tresult}).")

The final outcome of our analysis now is:

Setosa iris have petal leaves whose length is statistically significant different from 5 (t(49.0) = -144.06, p < .001).

stats R tutorial

André Calero Valdez

Professor of Human-Computer Interaction and Usable Safety Engineering

I am insterested in studying effects human-algorithm interaction and their impact on safety.