First we load needed libraries.

library(knitr) ## for kable, nice HTML tables

Settings

The following R code sets up the simulation, as agreed.

M = 1000000    # Number of simulations

## Settings as specified by Greg Francis
random.seed =  19374013 # Random seed
power = .5

set.seed(random.seed)

Utility stuff

The following R functions are utility functions that will perform different tasks in the simulation. They are broken up into separate functions for clarity.

## This function counts the number of sig., non-sig, and total studies in any set. 
## studies is a vector of TRUEs and FALSEs (indicating significance or not)
count.studies = function(studies)
  c(significant = as.integer(sum(studies)), 
    nonsignificant = as.integer(sum(!studies)), 
    total = length(studies))

## Perform an experiment with given power
## return TRUE if significant, FALSE otherwise
do.experiment = function(power)
  as.logical(rbinom(1,1,power))

## Set up a matrix in which to store the results
whole.literature = matrix(as.integer(-1), M, 3)
colnames(whole.literature) = c("Significant","Nonsignificant","Total")

Relevant functions for the simulation

This function is the critical function in the simulation. It represents the experimenter’s decision to publish. For our first run, the decision is based on the total number of nonsignificant studies (in this case, a single nonsigificant study spurs a decision to publish). We will change this later to show how the results change.

## This function returns TRUE when "researchers"
## decide to publish all studies, and FALSE otherwise
decide.to.publish = function(studies){
  any(!studies)
}

Do the simulation

This code actually performs the simulation.

## This is very inefficient, but it is written for clarity.

for( m in 1:M ){
  studies = logical(0) # We haven't done any studies yet 
  while(!decide.to.publish(studies)){ # As long as we haven't decided to publish...
    new.study = do.experiment(power) ## Perform new experiment
    studies = c(studies, new.study) ## Add new experiment to already-done studies
  }
  ## We will only get here when we decide to publish
  ## Publish all studies in row m
  whole.literature[m, ] = count.studies(studies)
}

Simulation results

We can now examine the simulation results. First, we look at all studies. How many were significant, and how many were not?

## How many studies are significant vs nonsignificant?
colSums(whole.literature)
##    Significant Nonsignificant          Total 
##        1000661        1000000        2000661
## Proportions
colSums(whole.literature) / sum(whole.literature[,"Total"])
##    Significant Nonsignificant          Total 
##      0.5001652      0.4998348      1.0000000

Notice that these are very close to what we’d expect by the power of the experiments. No studies are missing, so about half of the studies — maginalized over the length of the set — show significant results. Also note that an effect size estimate based on these experiments will be unbiased (so long as the statistic is itself unbiased), since all studies are published.

However, when we look at a specific \(n\) (that is, total number of studies in a set, as considered by I&T’s TES), we see a very different story.

## What is the expected number of nonsignificant studies for each n?
n.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], mean)
sd.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], sd)

n.table = data.frame("Total studies" = rownames(n.table), 
                     "Mean nonsig. studies" = n.table, 
                     "Expected by TES" = as.numeric(rownames(n.table)) * power,
                     "SD nonsig. studies" = sd.table,
                     "Count" = as.vector(table(whole.literature[,"Total"])))
rownames(n.table) = NULL

The table should be self-explanatory.

kable(n.table)
Total.studies Mean.nonsig..studies Expected.by.TES SD.nonsig..studies Count
1 1 0.5 0 499917
2 1 1.0 0 249690
3 1 1.5 0 125269
4 1 2.0 0 62570
5 1 2.5 0 31309
6 1 3.0 0 15640
7 1 3.5 0 7718
8 1 4.0 0 3958
9 1 4.5 0 1986
10 1 5.0 0 975
11 1 5.5 0 498
12 1 6.0 0 237
13 1 6.5 0 127
14 1 7.0 0 50
15 1 7.5 0 24
16 1 8.0 0 17
17 1 8.5 0 5
18 1 9.0 0 6
19 1 9.5 0 2
20 1 10.0 NA 1
22 1 11.0 NA 1

For every set, the expected number of nonsignificant studies is precisely 1 under this decision rule. Under this particular decision rule, in order to know whether there is publication bias, one must examine the distribution of \(n\) itself, which the TES does not do.

Redo with binomial assumption

With a simple change of th decision rule, we can make the results look as I&T would expect. We need only change the publishing decision rule…

decide.to.publish = function(studies){
  length(studies) == 5
}

…and redo the simulation

## Reset the seed
set.seed(random.seed)

for( m in 1:M ){
  studies = logical(0) # We haven't done any studies yet 
  while(!decide.to.publish(studies)){ # As long as we haven't decided to publish...
    new.study = do.experiment(power) ## Perform new experiment
    studies = c(studies, new.study)
  }
  ## We will only get here when we decide to publish
  ## Publish all studies in row m
  whole.literature[m, ] = count.studies(studies)
}

Simulation results

## How many studies are significant vs nonsignificant?
colSums(whole.literature)
##    Significant Nonsignificant          Total 
##        2501903        2498097        5000000
## Proportions
colSums(whole.literature) / sum(whole.literature[,"Total"])
##    Significant Nonsignificant          Total 
##      0.5003806      0.4996194      1.0000000

Notice that these are very close to what we’d expect by the power of the experiments. No studies are missing, so about half of the studies show significant results.

## What is the expected number of nonsignificant studies for each n?
n.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], mean)
sd.table = tapply(whole.literature[,"Nonsignificant"], whole.literature[,"Total"], sd)

n.table = data.frame("Total studies" = rownames(n.table), 
                     "Mean nonsig. studies" = n.table, 
                     "Expected by TES" = as.numeric(rownames(n.table)) * power,
                     "SD nonsig. studies" = sd.table,
                     "Count" = as.vector(table(whole.literature[,"Total"])))
rownames(n.table) = NULL

Now we see what we’d expect to see from I&T:

kable(n.table)
Total.studies Mean.nonsig..studies Expected.by.TES SD.nonsig..studies Count
5 2.498097 2.5 1.118572 1e+06

Note that there are no studies with length of anything other than \(n=5\), because the decision rule stipulated that \(n\) must be 5 to publish.

Conclusion

The expected number of nonsignificant studies in a set of \(n\) differs according to how authors decide to publish, even when all studies are published and no questionable research practices are present. How one chooses to look for publication bias must necessarily depend on this decision rule; under the first decision rule discussed here, one must look at the distribution of \(n\) itself; under I&T’s implicit decision rule, one can look at the number of non-significant studies.

Given that we don’t know any authors’ decision rule, and that decision rules are likely to be extremely heterogeneous, there doesn’t seem to be a way to correct the TES for these unknown decision rules (until, perhaps, pre-registration becomes popular…).

Details

For completeness, here is information about the R session.

sessionInfo()
## R version 3.1.3 (2015-03-09)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.2 (Yosemite)
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.9
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.8    evaluate_0.5.5  formatR_1.0     htmltools_0.2.6
## [5] rmarkdown_0.4.2 stringr_0.6.2   tools_3.1.3     yaml_2.1.13