First we load needed libraries.

`library(knitr) ## for kable, nice HTML tables`

The following R code sets up the simulation, as agreed.

```
M = 1000000 # Number of simulations
## Settings as specified by Greg Francis
random.seed = 19374013 # Random seed
power = .5
set.seed(random.seed)
```

The following R functions are utility functions that will perform different tasks in the simulation. They are broken up into separate functions for clarity.

```
## This function counts the number of sig., non-sig, and total studies in any set.
## studies is a vector of TRUEs and FALSEs (indicating significance or not)
count.studies = function(studies)
c(significant = as.integer(sum(studies)),
nonsignificant = as.integer(sum(!studies)),
total = length(studies))
## Perform an experiment with given power
## return TRUE if significant, FALSE otherwise
do.experiment = function(power)
as.logical(rbinom(1,1,power))
## Set up a matrix in which to store the results
whole.literature = matrix(as.integer(-1), M, 3)
colnames(whole.literature) = c("Significant","Nonsignificant","Total")
```

This function is the critical function in the simulation. It represents the experimenter’s decision to publish. For our first run, the decision is based on the total number of nonsignificant studies (in this case, a single nonsigificant study spurs a decision to publish). We will change this later to show how the results change.

```
## This function returns TRUE when "researchers"
## decide to publish all studies, and FALSE otherwise
decide.to.publish = function(studies){
any(!studies)
}
```

This code actually performs the simulation.

```
## This is very inefficient, but it is written for clarity.
for( m in 1:M ){
studies = logical(0) # We haven't done any studies yet
while(!decide.to.publish(studies)){ # As long as we haven't decided to publish...
new.study = do.experiment(power) ## Perform new experiment
studies = c(studies, new.study) ## Add new experiment to already-done studies
}
## We will only get here when we decide to publish
## Publish all studies in row m
whole.literature[m, ] = count.studies(studies)
}
```

We can now examine the simulation results. First, we look at all studies. How many were significant, and how many were not?

```
## How many studies are significant vs nonsignificant?
colSums(whole.literature)
```

```
## Significant Nonsignificant Total
## 1000661 1000000 2000661
```

```
## Proportions
colSums(whole.literature) / sum(whole.literature[,"Total"])
```

```
## Significant Nonsignificant Total
## 0.5001652 0.4998348 1.0000000
```

Notice that these are very close to what we’d expect by the power of the experiments. No studies are missing, so about half of the studies — maginalized over the length of the set — show significant results. Also note that an effect size estimate based on these experiments will be unbiased (so long as the statistic is itself unbiased), since all studies are published.

However, when we look at a specific \(n\) (that is, total number of studies in a set, as considered by I&T’s TES), we see a very different story.

```
## What is the expected number of nonsignificant studies for each n?
n.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], mean)
sd.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], sd)
n.table = data.frame("Total studies" = rownames(n.table),
"Mean nonsig. studies" = n.table,
"Expected by TES" = as.numeric(rownames(n.table)) * power,
"SD nonsig. studies" = sd.table,
"Count" = as.vector(table(whole.literature[,"Total"])))
rownames(n.table) = NULL
```

The table should be self-explanatory.

`kable(n.table)`

Total.studies | Mean.nonsig..studies | Expected.by.TES | SD.nonsig..studies | Count |
---|---|---|---|---|

1 | 1 | 0.5 | 0 | 499917 |

2 | 1 | 1.0 | 0 | 249690 |

3 | 1 | 1.5 | 0 | 125269 |

4 | 1 | 2.0 | 0 | 62570 |

5 | 1 | 2.5 | 0 | 31309 |

6 | 1 | 3.0 | 0 | 15640 |

7 | 1 | 3.5 | 0 | 7718 |

8 | 1 | 4.0 | 0 | 3958 |

9 | 1 | 4.5 | 0 | 1986 |

10 | 1 | 5.0 | 0 | 975 |

11 | 1 | 5.5 | 0 | 498 |

12 | 1 | 6.0 | 0 | 237 |

13 | 1 | 6.5 | 0 | 127 |

14 | 1 | 7.0 | 0 | 50 |

15 | 1 | 7.5 | 0 | 24 |

16 | 1 | 8.0 | 0 | 17 |

17 | 1 | 8.5 | 0 | 5 |

18 | 1 | 9.0 | 0 | 6 |

19 | 1 | 9.5 | 0 | 2 |

20 | 1 | 10.0 | NA | 1 |

22 | 1 | 11.0 | NA | 1 |

For every set, the expected number of nonsignificant studies is precisely 1 under this decision rule. Under this particular decision rule, in order to know whether there is publication bias, *one must examine the distribution of \(n\) itself*, which the TES does not do.

With a simple change of th decision rule, we can make the results look as I&T would expect. We need only change the publishing decision rule…

```
decide.to.publish = function(studies){
length(studies) == 5
}
```

…and redo the simulation

```
## Reset the seed
set.seed(random.seed)
for( m in 1:M ){
studies = logical(0) # We haven't done any studies yet
while(!decide.to.publish(studies)){ # As long as we haven't decided to publish...
new.study = do.experiment(power) ## Perform new experiment
studies = c(studies, new.study)
}
## We will only get here when we decide to publish
## Publish all studies in row m
whole.literature[m, ] = count.studies(studies)
}
```

```
## How many studies are significant vs nonsignificant?
colSums(whole.literature)
```

```
## Significant Nonsignificant Total
## 2501903 2498097 5000000
```

```
## Proportions
colSums(whole.literature) / sum(whole.literature[,"Total"])
```

```
## Significant Nonsignificant Total
## 0.5003806 0.4996194 1.0000000
```

Notice that these are very close to what we’d expect by the power of the experiments. No studies are missing, so about half of the studies show significant results.

```
## What is the expected number of nonsignificant studies for each n?
n.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], mean)
sd.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], sd)
n.table = data.frame("Total studies" = rownames(n.table),
"Mean nonsig. studies" = n.table,
"Expected by TES" = as.numeric(rownames(n.table)) * power,
"SD nonsig. studies" = sd.table,
"Count" = as.vector(table(whole.literature[,"Total"])))
rownames(n.table) = NULL
```

Now we see what we’d expect to see from I&T:

`kable(n.table)`

Total.studies | Mean.nonsig..studies | Expected.by.TES | SD.nonsig..studies | Count |
---|---|---|---|---|

5 | 2.498097 | 2.5 | 1.118572 | 1e+06 |

Note that there are no studies with length of anything *other* than \(n=5\), because the decision rule stipulated that \(n\) must be 5 to publish.

The expected number of nonsignificant studies in a set of \(n\) differs according to how authors decide to publish, even when all studies are published and no questionable research practices are present. How one chooses to look for publication bias must necessarily depend on this decision rule; under the first decision rule discussed here, one must look at the distribution of \(n\) itself; under I&T’s implicit decision rule, one can look at the number of non-significant studies.

Given that we don’t know any authors’ decision rule, and that decision rules are likely to be extremely heterogeneous, there doesn’t seem to be a way to correct the TES for these unknown decision rules (until, perhaps, pre-registration becomes popular…).

For completeness, here is information about the R session.

`sessionInfo()`

```
## R version 3.1.3 (2015-03-09)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.2 (Yosemite)
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.9
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.8 evaluate_0.5.5 formatR_1.0 htmltools_0.2.6
## [5] rmarkdown_0.4.2 stringr_0.6.2 tools_3.1.3 yaml_2.1.13
```