# Using the apastats package to write reproducible reports

## Overview

Writing a research report using RMarkdown allows you to start your analysis even before you have collected the data. This is especially helpful in report writing and thesis writing. When you, web-crawler, or a panel are still collecting data, you can already start writing the thesis. However, writing results before having the data is somewhat challenging. Keep in mind that writing a discussion before having collected the data is a lot harder.

What we’re going to need two things:

• a package that formats output in a way that we need it. In our case apastats.
• string manipulation: stringr and glue

## Installing apastats

You can install the apastats package using the following command:

devtools::install_github('achetverikov/apastats',subdir='apastats')

Now, we can start. Let’s assume we have already collected the following data about iris flowers 😉.

df <- iris %>% sample_n(5)
df %>% kable()
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
5.13.41.50.2setosa
5.72.84.51.3versicolor
5.43.04.51.5versicolor
6.32.85.11.5virginica
4.73.21.60.2setosa

## The analysis

### Deciding on our hypothesis

We have found in literature that the setosa species has a rather small petal length of only about 3-4 cm. Since we are sampling from random flowers we want to test whether our sample is also smaller than, let us say, 5cm. Thus, our null hypothesis would be $$H_0: PL > 5$$.

### Selecting a test

We would test this using a one-sample t-test on the subset of data.

res <- df %>%
filter(Species == "setosa") %>%
pull(Petal.Length) %>%
t.test(mu = 5)
res
##
##  One Sample t-test
##
## data:  .
## t = -69, df = 1, p-value = 0.009226
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
##  0.9146898 2.1853102
## sample estimates:
## mean of x
##      1.55

Well… This is not our final data yet, so we should not bee too happy about this result already. And this looks nothing like a report. How can we write the test and the interpretation that it incorporates the results?

### Using apastats to describe

The describe.ttest function returns the correctly formatted markdown for reporting a t-test. So the following would always generate the correct output.

tresult <- apastats::describe.ttest(res)
tresult
## [1] "_t_(1.0) = -69.00, _p_ = .009"

Meh. This is not really helpful either, right? We must include the text result as an inline result using the r tresult. Then we can say:

Setosa iris have petal leaves whose length is statistically significant different from 5 (t(1.0) = -69.00, p = .009).

### Preparing for both outcomes

This is nice, but we don’t know whether our sample of $$n=2$$ actually represents the population. Therefore, we must prepare for both cases. Here the glue package is helpful in filling in the blanks in pre-written output.

library(glue)

# test the p-value
alpha_level <- .05

# default is not significant
have_txt <- "do not have"
if (res$p.value < alpha_level) { have_txt <- "have" } phrase <- glue("> Setosa iris {have_txt} petal leaves whose length is ", "statistically significant different from 5 ({tresult}).") This we can now output using: r phrase. Setosa iris have petal leaves whose length is statistically significant different from 5 (t(1.0) = -69.00, p = .009). ## Collecting the final data and running our analysis Now, we can run the full script again, but with the full data set. I have omitted unnecessary output in the following code chunk and directly construct the output phrase. # run the test res <- iris %>% filter(Species == "setosa") %>% pull(Petal.Length) %>% t.test(mu = 5) # describe results tresult <- apastats::describe.ttest(res) # generate phrase have_txt <- "do not have" if (res$p.value < alpha_level) {
have_txt <- "have"
}

# you can use markdown here as well
phrase <- glue("> Setosa iris {have_txt} petal leaves whose length is ",
"statistically significant different from 5 ({tresult}).")

The final outcome of our analysis now is:

Setosa iris have petal leaves whose length is statistically significant different from 5 (t(49.0) = -144.06, p < .001).

##### André Calero Valdez
###### Junior Research Group Leader

I am the research group leader of the research group “Digitale Mündigkeit” studying effects human-algorithm interaction.