A comprehensive analysis synthesizing effects across diverse organisms and responses:
Effects across taxonomic groups:
Major groups (animals, plants, microorganisms)
Subgroups (e.g., vertebrates vs invertebrates)
Different response variables (growth, reproduction, behavior, biomarkers)
Additional contols (subsets of data or moderators/components added to the model)
Phylogenetic corrections (built in as random effect, some of the species not available in Open Tree of LIfe)
Random effects for study and pesticide identity
Different correlation structures
Field vs laboratory conditions
Realistic exposure scenarios
Taxonomic groups (i.e., animals, plants and microorganisms) were treated as fixed effects alongwith the moderator variables (i.e., pesticide class, study types, climatic zone, exposure medium, pesticide type, and number of added pesticide application rates)
Model comparison / selection is done by comparing base models vs. models with added moderators, testing interaction effects, using LRT between models.
The authors also tries to validate the common pattern from the huge dataset across different subsets and conditions.
Most important for pesticides,
The responses of different taxonomic groups and their growth, reproduction, behaviour and biomarkers to application rates under recommended fieldapplication rates are presented in Supplementary Tables 10–14, and these responses toapplicationrates under maximum environmentally relevant concentrations are presented in SupplementaryTables 17–22.
In other cases, data used for creating dose-response curves where high effects are expected can really skew the full picture.
While this meta-analysis provides valuable broad patterns, the effect sizes should be interpreted cautiously, especially for risk assessment purposes. The most meaningful insights likely come from the subset of studies using environmentally relevant concentrations under realistic exposure conditions.
As defined in the paper as modified lnRR, we don’t know what is the minimum biologically relevant effect sizes and you cannot expect a pesticide has no effects across all non-target organisms. Insecticides should kill insects when applied to a certain amount. Are there any other confounding factors for realistic exposure field studies?
Critical Points for Me
I would like to look at distribution of effects not just means.
Sample size is too big, any mean value could be significant just due to the sheer size. any moderator categories could also be significant due to large N in the standard error calculation.
No application rate associasion is very suspicious.
original lnRR v.s. modified version
Heterogeneity. I would also see how the assumptions behind meta-analysis is validate with such a huge heterogeneous data.
I would also expect pesticides to have negative impact after all, which is why they are needed. Insecticides need to have some effect on insects, at a certain level of concentrations. Especially lab tests that are designed to estimate a full dose-response curve are included in the analysis, which will highly skew the picture.
Effect size cannot be translated to something you can understand the biologcal relevance.
I am not convinced by the way how they addressed the publication bias.
Additionally, some minor issues: - many terminology unclear, especially in additional information e.g., pesticide species as random effect? - missing func.R for the fuctions to calculate lnRR used in the analysis. - Why? arcsine-square root transformation of percentage data
In details, I have questions in these aspects:
Statistical and Methodological Issues:
Use of (modified) log response ratios (lnRR) across vastly different types of responses
Significant publication bias detected but not fully addressed, the number given is from an unfair calculation, it doen’t mean the true effect is not 0, but means to have this true effect, you need a lot more small effects studies to balance the bigger negative effect observations / publications.
Missing variance data requiring imputation methods, and the author used the same imputation across all lnRRs? Or I understood wrongly?
Combining different exposure metrics and routes
Potential pseudoreplication in the data, the author addressed the repeated use of control by some variance structure which sees to be designed arbitrarily?
The exploration of effect size heterogeneity could be more thorough, right now I don’t really understand how it is addressed, except for the CV calculation.
Complex interactions between variables are not really explored, it seems they add and remove one-by-one. Except for the categorical summaries, it seems no other confounding variables, maybe included in the original study, included into this meta-analysis.
Data Integration Challenges:
Combining very different types of biological responses
Mixing lab and field studies
Taking absolute values for biomarker data loses directional information
Arbitrary standardization of doses using relative scaling
Variable quality of included studies
Ecological Context Limitations:
Limited consideration of population/community level effects
Complex species interactions not fully explored (only phylogenetic trees)
Temporal aspects poorly addressed (as independent data points)
Real-world relevance of many included studies unclear
Causality and Mechanistic Understanding:
Cannot establish direct causation (“True effect” definition unclear to me, relevant effect size also hard to define across so hetergenious responses from different scenario combinations).
Complex interactions between factors not fully explored
Exposure Assessment:
Different exposure routes and durations combined
May not reflect real-world exposure patterns
Arbitrary standardization of doses
These limitations should be considered when interpreting the conclusions about pesticide effects on non-target organisms.
Specifically on Publication Bias
The authors addess publication bias by:
Regression test on full dataset (20,212 observations from 1,705 articles)
a regression coefficient between the effect size and the sample variance was evaluated
(value of regressiontest=-24.879, P<0.001)
Trim-and-fill method (also proposed by Egger et al.)to assess result robustness ???
(Supplementary Table 2), used together along regression tests and Rosentha’s fail-safe N
Creates a funnel plot of study effect sizes vs their standard errors
Looks for asymmetry in the funnel plot that suggests missing studies
“Trims” off asymmetric studies to get a new estimate
“Fills” in missing studies by adding mirror images of the trimmed studies
Recalculates the overall effect size including these imputed studies
Residual analysis from mixed-effects models (Egger’s Test)
regression coefficients between residuals and the sample variaces.
Rosenthal fail-safe numbers calculation to the full dataset
Found a fail-safe N score of 11,459,423 (much larger than threshold of 101,070)
This suggests results are highly stable even accounting for potential publication bias
The high fail-safe N means an extremely large number of null studies would be needed to overturn their findings
Trim and fill method
The trim and fill method provides an intuitive way to detect and correct for potential publication bias in meta-analyses. Here’s why:
The Basic Principle:
In an unbiased literature, study results should be symmetrically distributed around the true effect size
Larger studies (more precise) cluster near the true effect
Smaller studies (less precise) spread out more evenly on both sides
This creates a symmetric funnel shape when plotted
Why Symmetry Indicates No Publication Bias:
Symmetry suggests all results had equal chance of being published
Both positive and negative findings are represented
No systematic suppression of certain types of results
More likely to represent the true distribution of effects
Provides more reliable estimate of the true effect size
How Asymmetry Indicates Publication Bias:
Usually shows up as “missing” studies on one side
Typically negative/null results from smaller studies are missing
Creates a skewed funnel plot
Suggests systematic exclusion of certain findings
Often biases results toward positive/significant effects
The Correction Process:
Identifies asymmetry in the funnel plot
“Trims” off asymmetric studies
Re-estimates the center
“Fills” in missing studies with mirror images
Provides adjusted effect size estimate
So symmetry is good because it suggests we’re seeing the complete picture of research results, not just selected positive findings. The trim and fill method helps identify and correct for this potential bias.
library(meta)data(Fleiss93)m1 <-metabin(event.e, n.e, event.c, n.c, data = Fleiss93, sm ="OR")tf1 <-trimfill(m1)summary(tf1)#> OR 95%-CI %W(random)#> 1 0.7197 [0.4890; 1.0593] 6.8#> 2 0.6808 [0.4574; 1.0132] 6.5#> 3 0.8029 [0.6065; 1.0629] 10.8#> 4 0.8007 [0.4863; 1.3186] 4.5#> 5 0.7981 [0.5526; 1.1529] 7.4#> 6 1.1327 [0.9347; 1.3728] 16.4#> 7 0.8950 [0.8294; 0.9657] 26.8#> Filled: 5 1.0466 [0.7246; 1.5118] 7.4#> Filled: 1 1.1607 [0.7886; 1.7084] 6.8#> Filled: 2 1.2271 [0.8244; 1.8263] 6.5#> #> Number of studies: k = 10 (with 3 added studies)#> Number of observations: o = 31987 (o.e = 16369, o.c = 15618)#> Number of events: e = 4775#> #> OR 95%-CI z p-value#> Random effects model 0.9228 [0.8228; 1.0350] -1.37 0.1699#> #> Quantifying heterogeneity (with 95%-CIs):#> tau^2 = 0.0113 [0.0000; 0.1145]; tau = 0.1061 [0.0000; 0.3384]#> I^2 = 37.4% [0.0%; 70.1%]; H = 1.26 [1.00; 1.83]#> #> Test of heterogeneity:#> Q d.f. p-value#> 14.37 9 0.1099#> #> Details of meta-analysis methods:#> - Inverse variance method#> - Restricted maximum-likelihood estimator for tau^2#> - Q-Profile method for confidence interval of tau^2 and tau#> - Calculation of I^2 based on Q#> - Trim-and-fill method to adjust for funnel plot asymmetry (L-estimator)funnel(tf1)
trimfill(m1$TE, m1$seTE, sm = m1$sm)#> Number of studies: k = 10 (with 3 added studies)#> #> OR 95%-CI z p-value#> Random effects model 0.9228 [0.8228; 1.0350] -1.37 0.1699#> #> Quantifying heterogeneity (with 95%-CIs):#> tau^2 = 0.0113 [0.0000; 0.1145]; tau = 0.1061 [0.0000; 0.3384]#> I^2 = 37.4% [0.0%; 70.1%]; H = 1.26 [1.00; 1.83]#> #> Test of heterogeneity:#> Q d.f. p-value#> 14.37 9 0.1099#> #> Details of meta-analysis methods:#> - Inverse variance method#> - Restricted maximum-likelihood estimator for tau^2#> - Q-Profile method for confidence interval of tau^2 and tau#> - Calculation of I^2 based on Q#> - Trim-and-fill method to adjust for funnel plot asymmetry (L-estimator)
Note that the Y-axis (Standard Error) is inverted, meaning:
0 at the top (most precise studies)
Increasing values going down (less precise studies)
Egger’s test
Regression coefficient significance of Y=standard error agains X= Effect sizes or residuals.
Slope significant means there is a publication bias.
The Egger’s Test Implementation in the paper:
Takes residuals from meta-regression model:
egger_dt <-data.frame(r =residuals(model.null),s =sign(residuals(model.null))*sqrt(diag(vcov(model.null, type ="resid"))))
Regresses residuals against their standard errors:
egg_test <-lm(r~s, egger_dt)
The Egger’s test results in this paper (first one being: value = -24.879, p < 0.001) indicate significant publication bias in the meta-analysis. Here’s what this means:
Test Interpretation: - The significant negative coefficient suggests small studies tend to report larger effects than expected - This could indicate selective reporting where null or negative results are less likely to be published - May reflect “file drawer” problem where non-significant results remain unpublished
Rosenthal’s fail-safe N concept and calculation:
Rosenthal’s fail-safe N is a statistical method used to assess the robustness of meta-analysis results against publication bias. It calculates the number of unpublished “null” studies (studies showing no effect) that would need to exist to nullify the significant findings of a meta-analysis. In fact, the enormous difference between the threshold and actual fail-safe N itself suggests the formula may need modification for large sample sizes.
Key aspects of Rosenthal’s fail-safe N:
Purpose: It estimates how many “null” or “negative” studies would need to exist to nullify the significant findings of a meta-analysis
Core concept: If this number is very large, it suggests the findings are robust against publication bias
In this paper: They found a fail-safe N of 11,459,423, compared to a threshold of 101,070
The calculation involves:
Converting effect sizes to Z-scores
Summing the Z-scores
Using Rosenthal’s formula: N = (∑Z/1.645)² - k Where:
∑Z is sum of Z-scores
1.645 is critical value for one-tailed p=0.05
k is number of studies
N is number of null studies needed
The large number (11,459,423) indicates:
Would need over 11 million null studies to overturn findings
Provides very strong evidence against publication bias
Results are extremely robust and reliable
Far exceeds the conservative threshold of 101,070
The large discrepancy between the calculated fail-safe N (11,459,423) and the threshold (101,070) suggests that the findings about pesticide effects on non-target organisms are highly stable and reliable, even accounting for potential unpublished negative results. This helps validate that the observed negative effects of pesticides on non-target organisms are real and not an artifact of publication bias.
In more detail
The Z-score Calculation:
Takes p-values from individual studies
Converts them to Z-scores (standardized normal deviates)
Uses one-tailed p=0.05 (Z=1.645) as critical threshold
Sums Z-scores to get overall effect
The Formula N = (∑Z/1.645)² - k:
∑Z = sum of Z-scores from observed studies
1.645 = critical value for one-tailed p=0.05
k = number of observed studies
N = number of null studies needed
Why it Works:
Adding null studies (Z=0) dilutes the overall Z-score
Formula calculates how many nulls needed to reach p=0.05
More extreme results require more null studies to nullify
Large N indicates robust finding
Small N indicates potentially fragile finding
I don’t think it is appropriate to use about Rosenthal’s fail-safe N threshold formula (5n + 10). Here’s why this threshold may be problematic for very large meta-analyses:
Formula Limitations:
The numbers 5 and 10 are somewhat arbitrary, based on reasonable but not rigorous assumptions
Linear scaling with n may not be appropriate for very large meta-analyses
The “+10” buffer becomes negligible with large sample sizes
Does not account for effect size magnitude
Issues with Large N:
In this study, n = 20,212 observations
Threshold = 5(20,212) + 10 = 101,070
Actual fail-safe N = 11,459,423
The huge difference (11,358,353) suggests the threshold may be too conservative
Why It May Not Work Well Here:
Formula was likely designed for smaller meta-analyses (hundreds rather than thousands of studies)
Linear scaling may not be appropriate for very large n
Additionally, does not consider the strength of the observed effects
Potential Improvements:
Use non-linear scaling for large n ?
Incorporate effect size magnitude ?
Set different thresholds based on study size ranges ?
Use more sophisticated publication bias methods, there are other options in metafor package.
I don’t think it is appropriate to use Rosenthal’s N calculation either. Suppose we have have p-values of 0.01 for 10 studies, then have p-value of 0.2 for 10 studies. I would thought half of the studies is null effects, half are significant, it means inconclusive, but according to the formula, it would be ((qnorm(0.8)*10+qnorm(0.99)*10)/1.645)^2-20 which is 351 null studies needed for nullify the meta-regression results.
The seemingly counterintuitive result comes from how Rosenthal’s fail-safe N works:
The formula gives more weight to significant results than null results:
The 10 significant studies (p=0.01) contribute large positive Z-scores (2.326 each)
The 10 null studies (p=0.2) still contribute positive but smaller Z-scores (0.842 each)
The Z-scores add rather than cancel each other out
Key implications:
Even though half the studies show null effects, their smaller positive Z-scores still contribute to the overall effect
The strong significant results from half the studies drive the high fail-safe N
The method doesn’t treat null results as evidence against an effect
It asks how many studies with Z=0 would be needed to nullify the significant findings
Limitations of this approach:
May overstate robustness when evidence is mixed
Doesn’t fully capture conflicting results
Better to look at effect sizes and heterogeneity
Should consider other methods for assessing publication bias
I believe mixed results suggest inconclusiveness, the fail-safe N formula focuses specifically on how many null studies would be needed to overturn the significant findings, leading to what seems like an overly optimistic assessment of robustness. The formula gives more weight to significant results than null results. This highlights why fail-safe N should be interpreted cautiously and in conjunction with other methods.
I checked the functions in metafor package they use, in the help files, it clearly says that “the method is primarily of interest for historical reasons, but the other methods described below are more closely aligned with the way meta-analyses are typically conducted in practice.” Tried different methods, the N number calculation can range from 598 (Rosenthal) to 26 for using different approaches.
>fsn(yi, vi, data=dat)Fail-safe N Calculation Using the Rosenthal ApproachObserved Significance Level:<.0001Target Significance Level:0.05Fail-safe N:598>fsn(yi, vi, data=dat, type="Rosenberg")Fail-safe N Calculation Using the Rosenberg ApproachAverage Effect Size:-0.4303Observed Significance Level:<.0001Target Significance Level:0.05Fail-safe N:370>fsn(yi, vi, data=dat, type="General") # based on a random-effects modelFail-safe N Calculation Using the General ApproachAverage Effect Size:-0.7145 (with file drawer:-0.2172)Amount of Heterogeneity:0.3132 (with file drawer:0.4228)Observed Significance Level:<.0001 (with file drawer:0.0513)Target Significance Level:0.05Fail-safe N:26>
qnorm(0.05)#> [1] -1.644854pnorm(1.645)#> [1] 0.9500151## Therefore the Z of 1.645 is the one-sided cut off for 95% confidene level. If a Z is greater than 1.645 then it would be (abs(qnorm(0.01))*10/1.645)^2-10#> [1] 189.99435*10+10#> [1] 60(abs(qnorm(0.01))*10/1.645)^2-10#> [1] 189.99435*100+10#> [1] 510(qnorm(0.8)*5+qnorm(0.90)*5)/1.645#> [1] 6.453413(qnorm(0.8)*10+qnorm(0.90)*10)/1.645#> [1] 12.90683