---
title: "“Test for Excess Significance” Challenge"
author: "Richard D. Morey"
date: "29 March 2015"
output: html_document
---
First we load needed libraries.
```{r}
library(knitr) ## for kable, nice HTML tables
```
### Settings
The following R code sets up the simulation, as agreed.
```{r}
M = 1000000 # Number of simulations
## Settings as specified by Greg Francis
random.seed = 19374013 # Random seed
power = .5
set.seed(random.seed)
```
### Utility stuff
The following R functions are utility functions that will perform different tasks in the simulation. They are broken up into separate functions for clarity.
```{r}
## This function counts the number of sig., non-sig, and total studies in any set.
## studies is a vector of TRUEs and FALSEs (indicating significance or not)
count.studies = function(studies)
c(significant = as.integer(sum(studies)),
nonsignificant = as.integer(sum(!studies)),
total = length(studies))
## Perform an experiment with given power
## return TRUE if significant, FALSE otherwise
do.experiment = function(power)
as.logical(rbinom(1,1,power))
## Set up a matrix in which to store the results
whole.literature = matrix(as.integer(-1), M, 3)
colnames(whole.literature) = c("Significant","Nonsignificant","Total")
```
## Relevant functions for the simulation
This function is the critical function in the simulation. It represents the experimenter's decision to publish. For our first run, the decision is based on the total number of nonsignificant studies (in this case, a single nonsigificant study spurs a decision to publish). We will change this later to show how the results change.
```{r}
## This function returns TRUE when "researchers"
## decide to publish all studies, and FALSE otherwise
decide.to.publish = function(studies){
any(!studies)
}
```
### Do the simulation
This code actually performs the simulation.
```{r}
## This is very inefficient, but it is written for clarity.
for( m in 1:M ){
studies = logical(0) # We haven't done any studies yet
while(!decide.to.publish(studies)){ # As long as we haven't decided to publish...
new.study = do.experiment(power) ## Perform new experiment
studies = c(studies, new.study) ## Add new experiment to already-done studies
}
## We will only get here when we decide to publish
## Publish all studies in row m
whole.literature[m, ] = count.studies(studies)
}
```
## Simulation results
We can now examine the simulation results. First, we look at all studies. How many were significant, and how many were not?
```{r}
## How many studies are significant vs nonsignificant?
colSums(whole.literature)
## Proportions
colSums(whole.literature) / sum(whole.literature[,"Total"])
```
Notice that these are very close to what we'd expect by the power of the experiments. No studies are missing, so about half of the studies --- maginalized over the length of the set --- show significant results. Also note that an effect size estimate based on these experiments will be unbiased (so long as the statistic is itself unbiased), since all studies are published.
However, when we look at a specific $n$ (that is, total number of studies in a set, as considered by I&T's TES), we see a very different story.
```{r}
## What is the expected number of nonsignificant studies for each n?
n.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], mean)
sd.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], sd)
n.table = data.frame("Total studies" = rownames(n.table),
"Mean nonsig. studies" = n.table,
"Expected by TES" = as.numeric(rownames(n.table)) * power,
"SD nonsig. studies" = sd.table,
"Count" = as.vector(table(whole.literature[,"Total"])))
rownames(n.table) = NULL
```
The table should be self-explanatory.
```{r}
kable(n.table)
```
For every set, the expected number of nonsignificant studies is precisely 1 under this decision rule. Under this particular decision rule, in order to know whether there is publication bias, *one must examine the distribution of $n$ itself*, which the TES does not do.
## Redo with binomial assumption
With a simple change of th decision rule, we can make the results look as I&T would expect. We need only change the publishing decision rule...
```{r}
decide.to.publish = function(studies){
length(studies) == 5
}
```
...and redo the simulation
```{r}
## Reset the seed
set.seed(random.seed)
for( m in 1:M ){
studies = logical(0) # We haven't done any studies yet
while(!decide.to.publish(studies)){ # As long as we haven't decided to publish...
new.study = do.experiment(power) ## Perform new experiment
studies = c(studies, new.study)
}
## We will only get here when we decide to publish
## Publish all studies in row m
whole.literature[m, ] = count.studies(studies)
}
```
## Simulation results
```{r}
## How many studies are significant vs nonsignificant?
colSums(whole.literature)
## Proportions
colSums(whole.literature) / sum(whole.literature[,"Total"])
```
Notice that these are very close to what we'd expect by the power of the experiments. No studies are missing, so about half of the studies show significant results.
```{r}
## What is the expected number of nonsignificant studies for each n?
n.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], mean)
sd.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], sd)
n.table = data.frame("Total studies" = rownames(n.table),
"Mean nonsig. studies" = n.table,
"Expected by TES" = as.numeric(rownames(n.table)) * power,
"SD nonsig. studies" = sd.table,
"Count" = as.vector(table(whole.literature[,"Total"])))
rownames(n.table) = NULL
```
Now we see what we'd expect to see from I&T:
```{r}
kable(n.table)
```
Note that there are no studies with length of anything *other* than $n=5$, because the decision rule stipulated that $n$ must be 5 to publish.
## Conclusion
The expected number of nonsignificant studies in a set of $n$ differs according to how authors decide to publish, even when all studies are published and no questionable research practices are present. How one chooses to look for publication bias must necessarily depend on this decision rule; under the first decision rule discussed here, one must look at the distribution of $n$ itself; under I&T's implicit decision rule, one can look at the number of non-significant studies.
Given that we don't know any authors' decision rule, and that decision rules are likely to be extremely heterogeneous, there doesn't seem to be a way to correct the TES for these unknown decision rules (until, perhaps, pre-registration becomes popular...).
## Details
For completeness, here is information about the R session.
```{r}
sessionInfo()
```