# Using the apastats package to write reproducible reports

## Overview

Writing a research report using RMarkdown allows you to start your analysis even before you have collected the data. This is especially helpful in report writing and thesis writing. When you, web-crawler, or a panel are still collecting data, you can already start writing the thesis. However, writing results before having the data is somewhat challenging. Keep in mind that writing a discussion before having collected the data is a lot harder.

What we’re going to need two things:

- a package that formats output in a way that we need it. In our case
`apastats`

. - string manipulation:
`stringr`

and`glue`

## Installing apastats

You can install the `apastats`

package using the following command:

`devtools::install_github('achetverikov/apastats',subdir='apastats')`

Now, we can start. Let’s assume we have already collected the following data about iris flowers 😉.

```
df <- iris %>% sample_n(5)
df %>% kable()
```

Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|

5.1 | 3.4 | 1.5 | 0.2 | setosa |

5.7 | 2.8 | 4.5 | 1.3 | versicolor |

5.4 | 3.0 | 4.5 | 1.5 | versicolor |

6.3 | 2.8 | 5.1 | 1.5 | virginica |

4.7 | 3.2 | 1.6 | 0.2 | setosa |

## The analysis

### Deciding on our hypothesis

We have found in literature that the *setosa* species has a rather small petal length of only about 3-4 cm. Since we are sampling from random flowers we want to test whether our sample is also smaller than, let us say, 5cm. Thus, our null hypothesis would be \(H_0: PL > 5\).

### Selecting a test

We would test this using a one-sample t-test on the subset of data.

```
res <- df %>%
filter(Species == "setosa") %>%
pull(Petal.Length) %>%
t.test(mu = 5)
res
```

```
##
## One Sample t-test
##
## data: .
## t = -69, df = 1, p-value = 0.009226
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
## 0.9146898 2.1853102
## sample estimates:
## mean of x
## 1.55
```

Well… This is not our final data yet, so we should not bee too happy about this result already. And this looks nothing like a report. How can we write the test and the interpretation that it incorporates the results?

### Using apastats to describe

The `describe.ttest`

function returns the correctly formatted markdown for reporting a t-test. So the following would always generate the correct output.

```
tresult <- apastats::describe.ttest(res)
tresult
```

`## [1] "_t_(1.0) = -69.00, _p_ = .009"`

Meh. This is not really helpful either, right? We must include the text result as an inline result using the ``r tresult``

. Then we can say:

Setosa iris have petal leaves whose length is statistically significant different from 5 (

t(1.0) = -69.00,p= .009).

### Preparing for both outcomes

This is nice, but we don’t know whether our sample of \(n=2\) actually represents the population. Therefore, we must prepare for both cases. Here the `glue`

package is helpful in filling in the blanks in pre-written output.

```
library(glue)
# test the p-value
alpha_level <- .05
# default is not significant
have_txt <- "do not have"
if (res$p.value < alpha_level) {
have_txt <- "have"
}
phrase <- glue("> Setosa iris {have_txt} petal leaves whose length is ",
"statistically significant different from 5 ({tresult}).")
```

This we can now output using: ``r phrase``

.

Setosa iris have petal leaves whose length is statistically significant different from 5 (

t(1.0) = -69.00,p= .009).

## Collecting the final data and running our analysis

Now, we can run the full script again, but with the full data set. I have omitted unnecessary output in the following code chunk and directly construct the output phrase.

```
# run the test
res <- iris %>%
filter(Species == "setosa") %>%
pull(Petal.Length) %>%
t.test(mu = 5)
# describe results
tresult <- apastats::describe.ttest(res)
# generate phrase
have_txt <- "do not have"
if (res$p.value < alpha_level) {
have_txt <- "have"
}
# you can use markdown here as well
phrase <- glue("> Setosa iris {have_txt} petal leaves whose length is ",
"statistically significant different from 5 ({tresult}).")
```

The final outcome of our analysis now is:

Setosa iris have petal leaves whose length is statistically significant different from 5 (

t(49.0) = -144.06,p< .001).