Binomial, Quasibinomial, DRC for proportions

code

stats

If all 3 types can model percentage data

Author

Zhenglei Gao

Published

December 17, 2024

In this article, I will try to analyze percentage data as proportions, using log-logistic model with quasibinomial and nonlinear fit to the log-logistic cumulative distribution function (CDF) curves.

Proportional data where the denominator (e.g. maximum possible number of presence for a given response type) is not known can be modeled using a Beta distribution. Binomial-logit-Normal model with observation-level random effects can also be applied to proportions to handle over-dispersion, similarly, quasibinomial models.

For quasibinomial, an additional method is to add subject or individual or treatment group level random effect, that way additional noise is added to the model and can count for the spread of the data in addition to the quasi-binomial variance. However, there is no direct R functions available to calculate all the associated uncertainties, post-processing must be implemented manually. Therefore, the chance of that becoming a standard approach is very low.

The data is generated in two ways, one is based on the logistic CDF with added noise, the other is just using binomial sampling. For the latter, the sample size information is excluded from the final dataset to mimic the percentage data characteristics. We can also generate over-dispersed binomial data following the steps described here or here

We are comparing the 3 methods with different data generation scheme.

Method A: simple quasi binomial
Method B: drc with normal spread around logistic CDF
Method B1: quasi-binomial with random effect (MASS::glmmPQL(percentages/100~ conc,random=~1|conc,family="quasibinomial",data=data))
Method C:

The conclusions are:

For the data generated based on logistic CDF, the standard DRC approach fit the data much better. In terms of EC50 estimation, and the fit of the curve.
For the data generated using binomial sampling, the examined approaches are similar, in terms of both EC50 estimation and the curve estimation.

Data generated using the logistic CDF.

The data is generated using the cumulative function for logistic distribution. Note that the logistic function is just an inverse logit ($\log(p/(1-p))$) function in R. The code is modified based on this blog post. \[F(x;\mu,s) = \frac{1}{2} + \frac{1}{2} \tanh((x-\mu)/2s) = \frac{1}{1+e^{-(x-\mu)/s}}\]

where, $\tanh(x) = \frac{\sinh(x)}{\cosh(x)} = \frac{e^x - e^{-x}}{e^x + e^{-x}}$, is the hyperbolic tangent function that maps real numbers to the range between -1 and 1.

Quantile of the CDF is then $\mu+s\log(\frac{p}{1-p})$, therefore, the EC50 should be $\mu$ or $\exp(\mu)$.

Random noises are added afterwards to the logistic distribution CDF.

We simulate $n$ studies over concentration $x$, denoted, $X_{1}, X_{2}, …, X_{n}$ for $k$ study $X=(x_{1}, x_{2}, …, x_{k})$, where $k$ is the number of different studies with different $\mu$ and $\sigma$.

Let’s say there are $k=6$ study groups with the following parameter sets, $\mu = \{9,2,3,5,7,5\}$ and $s=\{2,2,4,3,4,2\}$

generate_logit_cdf <- function(mu, s, 
                               sigma_y=0.1, 
                               x=seq(-5,20,0.1)) {
  x_ms <- (x-mu)/s
  y    <- 0.5 + 0.5 * tanh(x_ms)  
  y    <- abs(y + rnorm(length(x), 0, sigma_y))
  ix   <- which(y>=1.0)
  if(length(ix)>=1) { 
    y[ix] <- 1.0
  }
  return(y)
}
tanh(0)

[1] 0

set.seed(424242)
x      <- seq(-5,15,0.025) 
mu_vec <- c(1,2,3,5,7,8)   # 6 Studies
idx <- sapply(mu_vec,function(mu) which(x==mu))
s_vec  <- c(2,2,4,3,4,2)
# Synthetic Study ID
studies_df<- mapply(generate_logit_cdf, 
                              mu_vec, 
                              s_vec, 
                              MoreArgs = list(x=x))
# Give them names
colnames(studies_df) <- c("Study1", "Study2", "Study3", "Study4", "Study5", "Study6")
  
dim(studies_df)

[1] 801   6

library(ggplot2)
theme_set(theme_bw())
library(tidyverse)
df_all <- tidyr::pivot_longer(data.frame(x=(1:length(x))/10,studies_df),cols=2:7)
true_ec50 <- ((1:length(x))/10)[idx]
colnames(df_all) <- c("x", "study", "y")
df_all$study <- as.factor(df_all$study)

p_quasibinomial<- ggplot(df_all, aes(x=x, y=y, color=study)) +
        geom_point(alpha=0.2)  +
        scale_color_viridis_d()+
        # scale_color_brewer(palette = "Reds") + 
        # #geom_smooth(method="loess", se = FALSE, size=1.5) +
        geom_smooth(aes(group=study),method="glm", method.args=list(family="quasibinomial"), formula="y ~ log(x)",
                       se =TRUE, size=1.5) +
        xlab("Concentration") + 
        ylab("% Injury / 100") + ggtitle("Quasibinomial Fit")

p_quasibinomial#+scale_x_log10()

p_quasibinomial +facet_wrap(~study)

library(drc)
p_drc<- ggplot(df_all, aes(x=x, y=y, color=study)) +
        geom_point(alpha=0.2)  +
        scale_color_viridis_d()+
        # scale_color_brewer(palette = "Reds") + 
        # #geom_smooth(method="loess", se = FALSE, size=1.5) +
        geom_smooth(aes(group=study),method="drm", method.args=list(fct=LL.2()), formula="y ~ x",
                       se = FALSE, size=1.5) +
        xlab("Concentration") + 
        ylab("% Injury / 100") + ggtitle("DRC nonlinear normal fit")
p_drc#+scale_x_log10()

p_drc + facet_wrap(~study)

For each study, we can calculate EC50 and confidence intervals.

getEC50 <- function(mod){
  if("glm" %in% class(mod)){
    ec50 <- MASS::dose.p(mod,p=0.5)
    se <- attr(ec50,"SE")
    xmin <- as.numeric(ec50 - 1.96*se)
    xmax <- as.numeric(ec50 + 1.96*se) 
    res <- data.frame(EC50=exp(as.numeric(ec50)),
    lower=exp(xmin),upper=exp(xmax),se=se)
  }else{
    res1 <- ED(mod,50,interval="delta")
    res <- data.frame(EC50=res1[1,"Estimate"],lower=res1[1,"Lower"],upper=res1[1,"Upper"],se=res1[1,"Std. Error"])
  }
  return(res)
}
library(drc)
predictCIglm <- function(mod,dat,newdata=NULL){
 if(is.null(newdata)){
    preddata <- with(dat, data.frame(x = seq(min(x), max(x), length = 100)))
 }else predata <- newdata
  preds <- predict(mod, newdata = preddata, type = "link", se.fit = TRUE)
  critval <- 1.96 ## approx 95% CI qnorm(0.975)
  upr <- preds$fit + (critval * preds$se.fit)
  lwr <- preds$fit - (critval * preds$se.fit)
  fit <- preds$fit
  fit2 <- mod$family$linkinv(fit)
  upr2 <- mod$family$linkinv(upr)
  lwr2 <- mod$family$linkinv(lwr)
  preddata$Prediction <- fit2 
  preddata$Lower <- lwr2 
  preddata$Upper <- upr2 
  return(preddata)
}


preddrc <- function(mod,dat,newdata=NULL){
 if(is.null(newdata)){
    preddata <- with(dat, data.frame(x = seq(min(x), max(x), length = 100)))
 }else predata <- newdata
 preds <- predict(mod,newdata=preddata,interval="confidence")
 preddata <- cbind(preddata,preds)
 return(preddata)
}
modelall <- function(dat,addbeta = FALSE){
  mod1 <- glm(y~log(x),data=dat,family = "quasibinomial")
  mod2 <- drm(y~x,fct=LL.2(),data=dat)
 
  ## mod3 <- MASS::glmmPQL(y ~ log(x),random=~1|x,family="quasibinomial",data=dat) 
  ## Estimation the same as quasibinomial without random effects, uncertainty estimation wider. 
  ## mod3 <- lme4::glmer(y ~ log(x)+ (1|x),family="binomial",data=dat) ## dose not work without sample size.
  # mod4 <- betareg::betareg(y ~ log(x),link="logit",data=dat)
  # y3<- predict(mod3,newdata=data.frame(x=dat$x),type="response")
  y1 <- predictCIglm(mod1,dat=dat)
  y1$model <- "quasibinomial"

  y2 <- preddrc(mod2,dat = dat)
  y2$model <- "drc LL.2"
  preddata <- rbind(y1,y2)
  
  ec1 <- getEC50(mod1)
  ec2 <- getEC50(mod2)
  ec50 <- data.frame(rbind(ec1,ec2),model=c("quasibinomial","drc LL.2"))
  
  if(addbeta) {
    mod3 <- gam(y ~ log(x), family=betar(link="logit"), data=dat)
    y3 <- predictCIglm(mod3,dat=dat)
    y3$model <- "beta"
    preddata <- rbind(preddata,y3)
    ec3 <- getEC50(mod3)
    ec50 <- rbind(ec50,data.frame(ec3,model=c("beta")))
  }
  #ec4 <- uniroot(function(x) predict(mod4, newdata = data.frame(x=x)) - 0.5,lower = 1, upper = 80)$root
  ## 19.55968 beta regression does not perform better than binomial or quasi-binomial regression!
  
  return(list(ec50=ec50,predata=preddata))
}
modelallbeta <- function(dat){
  modelall(dat,addbeta=TRUE)
}

df_nested <- df_all %>% group_by(study) %>% nest()
dfres <- df_nested %>% mutate(modres=map(data,modelall))


Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 23.72221    0.13908 23.44921 23.99521

Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 27.53930    0.14351 27.25759 27.82100

Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 30.85168    0.20483 30.44961 31.25375

Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 39.35892    0.18648 38.99287 39.72498

Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 47.09152    0.20831 46.68262 47.50042

Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 51.84412    0.14198 51.56541 52.12283

ec50 <- dfres %>% mutate(ec50=modres[[1]][1]) %>% dplyr::select(-c(data,modres)) %>% unnest(cols=c(ec50)) %>% ungroup() %>% mutate(TrueEC50 = rep(true_ec50,each=2))
knitr::kable(ec50)

study	EC50	lower	upper	se	model	TrueEC50
Study1	20.93803	12.727863	34.44419	0.2539660	quasibinomial	24.1
Study1	23.72221	23.449208	23.99521	0.1390778	drc LL.2	24.1
Study2	24.47938	4.590626	130.53558	0.8539871	quasibinomial	28.1
Study2	27.53930	27.257594	27.82100	0.1435116	drc LL.2	28.1
Study3	27.64177	20.206538	37.81290	0.1598582	quasibinomial	32.1
Study3	30.85168	30.449615	31.25375	0.2048285	drc LL.2	32.1
Study4	35.49035	16.028210	78.58425	0.4055666	quasibinomial	40.1
Study4	39.35892	38.992866	39.72498	0.1864849	drc LL.2	40.1
Study5	43.08117	4.914720	377.63846	1.1075772	quasibinomial	48.1
Study5	47.09152	46.682623	47.50042	0.2083091	drc LL.2	48.1
Study6	47.45811	1.938822	1161.67056	1.6315137	quasibinomial	52.1
Study6	51.84412	51.565412	52.12283	0.1419849	drc LL.2	52.1

In summary, the dose reponse model with normal error predicts the true EC50 better and have smaller CI with lots of data available. We will further check the results when there are less data available.

predres <- dfres %>% mutate(preds=modres[[1]][2]) %>% dplyr::select(-c(data,modres)) %>% unnest(cols=c(preds)) %>% ungroup()

Note that the betar family in the gam package offers the same estimation as in betareg, but it can handle random effects and standard errors as well. For betareg fits, bootstrap confidence intervals need to be implemented or we need to use betareg(…, method = “BR”).

library(gam)
library(mgcv)
## df_all <- df_all %>% mutate(logx=log(x))
mygam = gam(y ~ log(x), family=betar(link="logit"), data = df_all %>% filter(study=="Study5"))
min <- min(df_all$x)
max <- max(df_all$x)
new.x <- (expand.grid(x = seq(min, max, length.out = 1000)))
#new.x$logx=log(new.x$x)
new.y <- predict(mygam, newdata = new.x, se.fit = TRUE, type="response")
new.y <- data.frame(new.y)
addThese <- data.frame(new.x, new.y)
addThese <- rename(addThese, y = fit, SE = se.fit)
addThese <- mutate(addThese, lwr = y - 1.96 * SE, upr = y + 1.96 * SE) # calculating the 95% confidence interval
predbeta <- predictCIglm(mygam,dat=df_all %>% filter(study=="Study5"))
ggplot(df_all%>% filter(study=="Study5"), aes(x = x, y = y)) +
  geom_point() +
  geom_smooth(data = addThese, aes(ymin = lwr, ymax = upr), stat = 'identity') + geom_ribbon(data=predbeta,                                                                                             aes(x=x,ymin=Lower,ymax=Upper,y=Prediction),alpha=0.2,col="red")

p_drc_qb<- ggplot(df_all, aes(x=x, y=y, color=study)) +
        geom_point(alpha=0.1)  +
        scale_color_viridis_d()+
        # scale_color_brewer(palette = "Reds") + 
        # #geom_smooth(method="loess", se = FALSE, size=1.5) +
        geom_smooth(aes(group=study),method="drm", method.args=list(fct=LL.2()), formula="y~x",
                       se = FALSE, size=1.5) +
                       geom_smooth(aes(group=study),method="glm", method.args=list(family="quasibinomial"), formula="y~log(x)",
                       se = FALSE, size=1.5,lty=3)+
        xlab("Concentration") + 
        ylab("% Injury / 100") 
p_drc_qb + facet_wrap(~study) + ggtitle("drc fit compared with quasi-binomial fit")

p_drc_qb <- ggplot(df_all, aes(x=x, y=y)) +
        geom_point(alpha=0.1)  +
        scale_color_viridis_d()+
        # scale_color_brewer(palette = "Reds") + 
        # #geom_smooth(method="loess", se = FALSE, size=1.5) +
        geom_smooth(aes(group=study),method="drm", method.args=list(fct=LL.2()), formula="y~x",
                       se = FALSE, size=1.5,col=scales::hue_pal()(2)[1]) +
                       geom_smooth(aes(group=study),method="glm", method.args=list(family="quasibinomial"), formula="y~log(x)",
                       se = FALSE, size=1.5,lty=3,col=scales::hue_pal()(2)[2])+
        xlab("Concentration") + 
        ylab("% Injury / 100")  + facet_wrap(~study)
 p_drc_qb + geom_hline(yintercept=0.5) + geom_ribbon(data=predres,aes(x=x,ymin=Lower,ymax=Upper,y=Prediction,fill=model),alpha=0.3) + ggtitle("drc and quasibinomial fit with CI")

dfresbeta <- df_nested %>% mutate(modres=map(data,modelallbeta))


Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 23.72221    0.13908 23.44921 23.99521

Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 27.53930    0.14351 27.25759 27.82100

Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 30.85168    0.20483 30.44961 31.25375

Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 39.35892    0.18648 38.99287 39.72498

Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 47.09152    0.20831 46.68262 47.50042

Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 51.84412    0.14198 51.56541 52.12283

predresbeta <- dfresbeta %>% mutate(preds=modres[[1]][2]) %>% dplyr::select(-c(data,modres)) %>% unnest(cols=c(preds)) %>% ungroup() 
library(betareg)
p_drc_qb_beta <- ggplot(df_all, aes(x=x, y=y)) +
        geom_point(alpha=0.1)  + geom_line(data=predresbeta,aes(x=x,y=Prediction,col=model))+
        scale_color_viridis_d()+geom_ribbon(data=predresbeta,aes(x=x,ymin=Lower,ymax=Upper,y=Prediction,fill=model,col=model),alpha=0.3) +scale_fill_viridis_d()+
  ##geom_smooth(aes(group=study),method="gam", method.args=list(family=betar(link="logit")), formula="y~log(x)", se = FALSE, size=1.5,lty=2,col="red")+
  ##+geom_smooth(method="betareg", size=1.5,method.args=list(link="logit"),lty=5,formula="y~log(x)",col="red")+ #  geom_smooth(aes(group=study),method="drm", method.args=list(fct=LL.2()), formula="y~x",
 #                      se = FALSE, size=1.5) +
                       # geom_smooth(aes(group=study),method="glm", method.args=list(family="quasibinomial"), formula="y~log(x)",
                       # se = FALSE, size=1.5,lty=2,col="green")+
        xlab("Concentration") + 
        ylab("% Injury / 100") + facet_wrap(~study)
p_drc_qb_beta+ggtitle("Quasibinomial vs. beta regression, beta reg the same as quasibinomial reg")

#  Failed to fit group 5.
#  Caused by error in `x$terms`:
# ! $ operator is invalid for atomic vectors 
# library(MASS)
# ggplot(df_all, aes(x=x, y=y, color=study)) +
#   geom_point(alpha=0.1)  +
#   scale_color_viridis_d() +geom_smooth(aes(group=study),method="glmmPQL", method.args=list(family="quasibinomial",random = ~1|x), formula="y ~ log(x)",
#                                        se = FALSE, size=1.5,lty=1) +
#   geom_smooth(aes(group=study),method="glm", method.args=list(family="quasibinomial"), formula="y~log(x)",
#               se = TRUE, size=1.5,lty=3)+
#   xlab("Concentration") + 
#   ylab("% Injury / 100")

Data coming from binomial

Design of data simulation

7 treatment groups, including control
Cover full dose response
Concentration: x <- c(0,2^(seq(0.1,4,by=1)))
Effect/Response: p <- plogis(x)

x <- c(2^(seq(-8,7,by=2)))
x

[1]  0.00390625  0.01562500  0.06250000  0.25000000  1.00000000  4.00000000
[7] 16.00000000 64.00000000

px <- plogis(log(x)) ## inverse logit, I know exactly the true value should be x=1!!!!! (the 5th value)
px

[1] 0.003891051 0.015384615 0.058823529 0.200000000 0.500000000 0.800000000
[7] 0.941176471 0.984615385

plot(x,px,type="l", log="x")

plogis(0)

[1] 0.5

simbinomdata <- purrr::map_dfr(lapply(1:length(x),function(i,n=10,size=100){
  pi <- px[i]
  x1 <- rep(x[i],n)
  yi <- rbinom(n=n,size=size,prob=pi)/size
  return(data.frame(x=x1,y=yi))
}),as.list)

ggplot(simbinomdata,aes(x=x,y=y))+geom_point()+scale_x_log10()+geom_line(data=data.frame(x=x,y=px),aes(x=x,y=px))

simbinomdata <- simbinomdata |> mutate(size = 100, damagecount= size*y)

mod0 <- glm(cbind(damagecount,size-damagecount)~log(x),data=simbinomdata,family = "binomial")
summary(mod0)


Call:
glm(formula = cbind(damagecount, size - damagecount) ~ log(x), 
    family = "binomial", data = simbinomdata)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.03479    0.03710  -0.938    0.348    
log(x)       0.98320    0.02133  46.095   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6306.22  on 79  degrees of freedom
Residual deviance:   73.79  on 78  degrees of freedom
AIC: 340.75

Number of Fisher Scoring iterations: 4

ec50 <- MASS::dose.p(mod0,p=0.5)
## check against MASS does.p function
ecx <- (VGAM::logitlink(0.5)-coef(mod0)[1])/coef(mod0)[2]
pd <- -cbind(1, ecx)/coef(mod0)[2]
ff = as.matrix(vcov(mod0)[1:2,1:2])
se <- sqrt(((pd %*% ff )* pd) %*% c(1, 1))
se

                  [,1]
(Intercept) 0.03774589

ec50

              Dose         SE
p = 0.5: 0.0353803 0.03774589

exp(ec50)

             Dose         SE
p = 0.5: 1.036014 0.03774589

ypred <- predict(mod0,newdata=data.frame(x=x),type="response",se.fit=TRUE)
ggplot(simbinomdata,aes(x=x,y=y))+geom_point()+scale_x_log10()+geom_line(data=data.frame(x=x,y=px),aes(x=x,y=px))+geom_line(data=data.frame(x=x,y=ypred$fit),aes(x=x,y=y),col="red")+geom_ribbon(data=data.frame(x=x,ymin=ypred$fit-1.96*ypred$se.fit,ymax=ypred$fit + 1.96*ypred$se.fit,y=ypred$fit),aes(ymin=ymin,ymax=ymax),fill="red",alpha=0.1)

mod1 <- glm(y~log(x),data=simbinomdata,family = "quasibinomial")
summary(mod1)


Call:
glm(formula = y ~ log(x), family = "quasibinomial", data = simbinomdata)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.03479    0.03518  -0.989    0.326    
log(x)       0.98320    0.02022  48.618   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasibinomial family taken to be 0.008989206)

    Null deviance: 63.0622  on 79  degrees of freedom
Residual deviance:  0.7379  on 78  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 6

ec50_qb <- MASS::dose.p(mod1,p=0.5)
ec50_qb

              Dose        SE
p = 0.5: 0.0353803 0.0357874

ec50

              Dose         SE
p = 0.5: 0.0353803 0.03774589

ypred1 <- predict(mod1,newdata=data.frame(x=x),type="response",se.fit=TRUE)
p1 <- ggplot(simbinomdata,aes(x=x,y=y))+geom_point()+scale_x_log10()+geom_line(data=data.frame(x=x,y=px),aes(x=x,y=px))+geom_line(data=data.frame(x=x,y=ypred$fit),aes(x=x,y=y),col="red")+ geom_ribbon(data=data.frame(x=x,ymin=ypred1$fit-1.96*ypred1$se.fit,ymax=ypred1$fit + 1.96*ypred1$se.fit,y=ypred$fit),aes(ymin=ymin,ymax=ymax),fill="green",alpha=0.1)+geom_line(data=data.frame(x=x,ymin=ypred1$fit-1.96*ypred1$se.fit,ymax=ypred1$fit + 1.96*ypred1$se.fit,y=ypred$fit),aes(x=x,y=y),col="green")+geom_ribbon(data=data.frame(x=x,ymin=ypred$fit-1.96*ypred$se.fit,ymax=ypred$fit + 1.96*ypred$se.fit,y=ypred$fit),aes(ymin=ymin,ymax=ymax),fill="red",alpha=0.2)
p1

p1+scale_x_log10()+coord_cartesian(xlim=c(0.1,10))

mod3 <-  drm(y ~ x,data=simbinomdata,type="continuous",fct=LL.2())
plot(mod3,type="all",main="DRC fit")

ypred3 <- as.data.frame(predict(mod3,newdata=data.frame(x=x),se.fit=TRUE))


df <- rbind(data.frame(x=x,ymin=ypred$fit-1.96*ypred$se.fit,ymax=ypred$fit + 1.96*ypred$se.fit,y=ypred$fit,model="binomial"),
data.frame(x=x,ymin=ypred$fit-1.96*ypred$se.fit,ymax=ypred$fit + 1.96*ypred$se.fit,y=ypred$fit,model="quasibinomial"),
data.frame(x=x,ymin=ypred3$Prediction-1.96*ypred3$SE,ymax=ypred3$Prediction + 1.96*ypred3$SE,y=ypred3$Prediction,model="drc"))
df$trt <- factor(df$x)
p <- ggplot(data=df,aes(x=model,y=y))+geom_errorbar(aes(ymin=ymin,ymax=ymax,col=model))+facet_grid(~trt,scale="free")+geom_point()
p+coord_flip()

getEC50(mod0)

             EC50     lower    upper         se
p = 0.5: 1.036014 0.9621339 1.115566 0.03774589

EC50 <- data.frame(upper = (ecx+se*1.96)
,lower =  (ecx-se*1.96))
df1 = data.frame(rbind(getEC50(mod0),
getEC50(mod1),getEC50(mod3)),model=c("binmoial","quasibinomial","drc-LL.2"))


Estimated effective doses

       Estimate Std. Error    Lower    Upper
e:1:50 1.029652   0.025759 0.978370 1.080933

Adding noise to binomial data

Instead of fixed px, each replicate in every treatment group was generated from px+random noise.

simQuasiBinomdata <- purrr::map_dfr(lapply(1:length(x),function(i,n=30,size=20){
  pi <- px[i]
  pin <- rlnorm(n=n,log(px[i]),sd=0.1)
  pin[pin>1] <- 1
  x1 <- rep(x[i],n)
  yi <- rbinom(n=n,size=size,prob=pin)/size
  return(data.frame(x=x1,y=yi))
}),as.list)

simQuasiBinomdata$size <- 20
ggplot(simQuasiBinomdata,aes(x=x,y=y))+geom_point()+scale_x_log10()+geom_line(data=data.frame(x=x,y=px),aes(x=x,y=px))

comp3models <- function(simdata,size=10,...){
  simdata <- simdata |> mutate(damagecount=size*y)
  mod0 <- glm(cbind(damagecount,size-damagecount)~log(x),data=simdata,family = "binomial")
  x <- c(2^(seq(-8,7,by=0.5)))
  ypred <- predict(mod0,newdata=data.frame(x=x),  type="response",se.fit=TRUE)
  
  mod1 <- glm(y~log(x),data=simdata,family = "quasibinomial")
  ypred1 <- predict(mod1,newdata=data.frame(x=x),type="response",se.fit=TRUE)
  
  mod3 <-  drm(y ~ x,data=simdata,type="continuous",fct=LL.2())
  ypred3 <- as.data.frame(predict(mod3,newdata=data.frame(x=x),se.fit=TRUE)) 
  df <- rbind(data.frame(x=x,ymin=ypred$fit-1.96*ypred$se.fit,ymax=ypred$fit + 1.96*ypred$se.fit,y=ypred$fit,model="binomial"),
  data.frame(x=x,ymin=ypred$fit-1.96*ypred$se.fit,ymax=ypred$fit + 1.96*ypred$se.fit,y=ypred$fit,model="quasibinomial"),
  data.frame(x=x,ymin=ypred3$Prediction-1.96*ypred3$SE,ymax=ypred3$Prediction + 1.96*ypred3$SE,y=ypred3$Prediction,model="drc"))
  df$trt <- factor(df$x)
  df1 = data.frame(rbind(getEC50(mod0),
  getEC50(mod1),getEC50(mod3)),model=c("binmoial","quasibinomial","drc-LL.2"))
  df1$Prediction <- c(predict(mod0,data.frame(x=df1$EC50[1]),type="response"),predict(mod1,data.frame(x=df1$EC50[2]),type="response"),predict(mod3,data.frame(x=df1$EC50[3])))
  return(list(pred=df,ec50=df1))
}

res <- comp3models(simQuasiBinomdata,size=20)


Estimated effective doses

       Estimate Std. Error   Lower   Upper
e:1:50  1.02731    0.04302 0.94257 1.11206

p1 <- ggplot(simQuasiBinomdata,aes(x=x,y=y))+geom_point()+scale_x_log10()+geom_line(data=data.frame(x=x,y=px),aes(x=x,y=px))
p2 <- p1 + geom_ribbon(data=res$pred,aes(ymin=ymin,ymax=ymax,col=model,fill=model),alpha=0.1) + geom_line(data=res$pred,aes(col=model))
p2

p2+scale_x_log10()+coord_cartesian(xlim=c(0.1,10))

res$ec50

              EC50     lower    upper         se         model Prediction
p = 0.5:  1.043983 0.9492636 1.148154 0.04852665      binmoial        0.5
p = 0.5:1 1.043983 0.9111862 1.196134 0.06941401 quasibinomial        0.5
1         1.027314 0.9425659 1.112062 0.04301966      drc-LL.2        0.5

! need to be checked!

simfun <- function(x=2^(seq(-8,7,by=2)) ,n=10,size=10,sd=0.1){
  px <- plogis(x)
  simdata <- purrr::map_dfr(lapply(1:length(x),function(i,nn,size,sd){
  pxi <- px[i]
  pin <- rlnorm(n=nn,log(pxi),sd=sd)
  pin[pin>1] <- 1
  x1 <- rep(x[i],n)
  yi <- rbinom(n=nn,size=size,prob=pin)/size
  return(data.frame(x=x1,y=yi))
  },nn=n,size=size,sd=sd),as.list)
  return(simdata)
}

simdata <- simfun(x=2^(seq(-8,7,by=2)),10,10,0.1)
res <- comp3models(simQuasiBinomdata,size=10)


Estimated effective doses

       Estimate Std. Error   Lower   Upper
e:1:50  1.02731    0.04302 0.94257 1.11206

library(SimEngine)

Numerical Considerations

The quasi-binomial isn’t necessarily a particular distribution. The only assumption is that for a generalized linear model with binomial assumption of “p” (mean of the binomial should be $np$), the variance is $\phi$ times the variance for that binomial = np(1-p). In case of missing sample size, then it is just $\phi p(1-p)$

Quasi-binomial is exactly the same as binomial for the coefficient / parameter estimates but not for the standard errors. In fact, quasi-likelihood models do a post-fitting adjustment to the standard errors of the parameters and the associated statistics; they don’t change anything about the way the model is fitted. Some analysis uses observation-level random effects (OLRE) rather than quasi-likelihood to handle overdispersion at the level of individual observations. The way to adjust the coefficient standard error and to model OLRE is in this link

In terms of GLM, the dispersion for non-over-dispersed model like binomial and Poisson is 1. For quasi-families, it is estimated by the residual Chi-squared statistic divided by the residual degrees of freedom. Quasi binomial takes sqrt(overdispersion) ($\sqrt{\Phi}$) parameter to adjust standard errors accordingly. In case we have binomial data, the dispersion parameter can be calculated via $\Phi = \chi^2/(N-k)$ where $df = N-k$

For estimation of EC50 or other ECx, other than inverse regression methods such as here and here

\[\log \Big( \frac{p}{1-p} \Big) = \beta_0 + \beta_1 x.\]

Setting $p = = 0.5 $ gives the corresponding explanatory value (here the LC50 concentration):

\[ \mbox{LC}_{50} \equiv \frac{\log(\tfrac{1}{2}/\tfrac{1}{2}) - \beta_0}{\beta_1} = \frac{\log(1) - \beta_0}{\beta_1} = - \frac{\beta_0}{\beta_1}.\]

There is a {dose.p} function in the MASS package for calculating LCx or ECx together with the standard errors which can be used for the response confidence intervals. The vcov function can be used to extract the covariance function and used for estimation the CI of LCx or ECx. An alternative approach is to use bootstrap confidence intervals.

Since the goal of the data analysis is to measure the association between a set of regression parameters and the outcome, quasibinomial models can be useful with the variance assumption being relaxed. However, it is required to estimate the uncertainty associated with that relationship.

## install.packages("VGAM")
## data:
Concentration <- as.numeric(c(0, 100, 200, 300, 400, 500, 600, 700, 800))
Survival <- as.integer(c(20, 20, 20, 18, 13, 8, 3, 0, 0))
Total <- as.integer(c(20, 20, 20, 20, 20, 20, 20, 20, 20))
Data <- data.frame(Concentration, Survival, Total)
Data$Dead <- Data$Total-Data$Survival
Data$Proport_Surv <- (Data$Total - Data$Dead) / Data$Total

mod0 <- glm(cbind(Survival, Dead) ~ Concentration, Data, family=binomial(link = "logit"))
mod <- glm(cbind(Survival, Dead) ~ Concentration, Data, family=quasibinomial(link = "logit"))

NewData1 <- expand.grid(Conc  = seq(0, 800, length = 9))

P_logit <- predict(mod, newdata = NewData1, se = TRUE, type = "link")
NewData1$P_logit <- exp(P_logit$fit) / (1 + exp(P_logit$fit))
NewData1$SeUp <- exp(P_logit$fit + 1.96*P_logit$se.fit) / (1 + exp(P_logit$fit + 1.96*P_logit$se.fit))
NewData1$SeLo <- exp(P_logit$fit - 1.96*P_logit$se.fit) / (1 + exp(P_logit$fit - 1.96*P_logit$se.fit))
plot <- plot + geom_point(data = Data, aes(y = Proport_Surv, x = Concentration), shape = 1, size = 2.5)
plot <- ggplot() + xlab("Concentration") + ylab("Proportion of surviving organisms") + theme(text = element_text(size=15)) + theme_bw()+ geom_line(data = NewData1, aes(x = Concentration, y = P_logit), colour = "black")+ geom_ribbon(data = NewData1, aes(x = Concentration, ymax = SeUp, ymin = SeLo), alpha = 0.2)+ geom_hline(yintercept = 0.5, linetype = "dashed") + annotate("point", x = 460.1353, y = 0.5, size = 3.25, colour = "blue")

ec = 0.5
library(VGAM)
eta <- VGAM::logitlink(ec) 
beta <- coef(mod)[1:2] 
ecx <- (eta - beta[1])/beta[2] 
pd <- -cbind(1, ecx)/beta[2] 
ff = as.matrix(vcov(mod)[1:2,1:2])
se <- sqrt(((pd %*% ff )* pd) %*% c(1, 1))
upper = (ecx+se*1.96)
lower = (ecx-se*1.96)
df1 = data.frame(ecx, lower, upper)

ecx0 <- (eta - coef(mod0)[1])/coef(mod0)[2] 
ff0 = as.matrix(vcov(mod0)[1:2,1:2])
se0 <- sqrt(((pd %*% ff0 )* pd) %*% c(1, 1))
upper0 = (ecx+se0*1.96)
lower0 = (ecx-se0*1.96)
df0 = data.frame(ecx, lower0, upper0)

plot <- plot + geom_vline(xintercept = df1$ecx, linetype = "dashed")
plot <- plot + geom_vline(xintercept = df1$lower, linetype = "dashed")
plot <- plot + geom_vline(xintercept = df1$upper, linetype = "dashed")
plot

For binomial data, the quasibinomial fit generated a narrower CI compared to binomial fit, unexpected.

df0

                 ecx   lower0   upper0
(Intercept) 460.1353 424.5291 495.7415

df1

                 ecx    lower    upper
(Intercept) 460.1353 440.1298 480.1408

In fact, a limitation of these models is that they cannot yield accurate prediction intervals, the Pearson residuals cannot tell you much about how accurate the mean model is, and information criteria like the AIC or BIC cannot effectively compare these models to other types of models.

AIC(mod0)

[1] 19.40066

# [1] 19.40066
try(AIC(mod1))

[1] NA

# > AIC(mod1)
# Error in h(simpleError(msg, call)) :
#   error in evaluating the argument 'object' in selecting a method for function 'AIC': object 'mod1' not found

Generate data

logit_scale = seq(-6, 6, by = 0.1)

plot( logit_scale,plogis(logit_scale), ylab = 'probabilities', type = 'n',main="Logistic Function")
lines( logit_scale,plogis(logit_scale))

plot( 1/(1+exp(-logit_scale)),plogis(logit_scale), ylab = 'probabilities', type = 'l',main="Linear Transformation of above")

library('MASS')
library('stats')

gen_data <- function(n=100, p=3) {

  set.seed(1)  
  weights <- stats::rgamma(n=n, shape=rep(1, n), rate=rep(1, n))
  y <- stats::rbinom(n=n, size=1, prob=0.5)
  theta <- stats::rnorm(n=p, mean=0, sd=1)
  means <- colMeans(as.matrix(y) %*% theta)
  x <- MASS::mvrnorm(n=n, means, diag(1, p, p))

  return(list(x=x, y=y, weights=weights, theta=theta))  
}

fit_glm <- function(family) {
  data <- gen_data()
  fit <- stats::glm.fit(x = data$x,
                        y = data$y,
                        weights = data$weights,
                        family = family)
  return(fit)
}

fit1 <- fit_glm(family=stats::binomial(link = "logit"))
fit2 <- fit_glm(family=stats::quasibinomial(link = "logit"))

all(fit1$coefficients == fit2$coefficients)

[1] TRUE

data1 <- gen_data()

summary1 <- stats::summary.glm(fit1)
summary2 <- stats::summary.glm(fit2)

print("Equality of unscaled variance-covariance-matrix:")

[1] "Equality of unscaled variance-covariance-matrix:"

all(summary1$cov.unscaled == summary2$cov.unscaled)

[1] TRUE

print("Equality of variance-covariance matrix scaled by `dispersion`:")

[1] "Equality of variance-covariance matrix scaled by `dispersion`:"

all(summary1$cov.scaled == summary2$cov.scaled)

[1] FALSE

print(summary1$coefficients)

       Estimate Std. Error   z value   Pr(>|z|)
[1,] -0.3726848  0.1959110 -1.902317 0.05712978
[2,]  0.5887384  0.2721666  2.163155 0.03052930
[3,]  0.3161643  0.2352180  1.344133 0.17890528

print(summary2$coefficients)

       Estimate Std. Error   t value   Pr(>|t|)
[1,] -0.3726848  0.1886017 -1.976042 0.05099072
[2,]  0.5887384  0.2620122  2.246988 0.02690735
[3,]  0.3161643  0.2264421  1.396226 0.16583365

Compare R results with SAS results

The data was generated by Changjian using SAS Probnorm with Vx=0.25 and Vr=0.15, a=-2.5, b=1.0, ER50=12.18. Model: probit(y) = a + b*logdose + ri;

dattab <- read.table(textConnection("Obs    rep dose    yt  y0 
1   1   2   0.02021 0
2   2   2   0.02491 0
3   3   2   0.00760 0
4   5   2   0.04466 0
5   6   2   0.00037 0
6   8   2   0.05386 0
7   9   2   0.07205 0
8   10  2   0.01125 0
9   1   4   0.02011 0
10  7   4   0.09058 0
11  8   4   0.06255 0
12  10  8   0.09431 0
13  4   2   0.14060 A
14  7   2   0.22223 A
15  2   4   0.20943 A
16  3   4   0.20959 A
17  4   4   0.17305 A
18  9   4   0.11405 A
19  10  4   0.17668 A
20  1   8   0.12756 A
21  6   8   0.11478 A
22  6   32  0.20602 A
23  5   4   0.26650 B
24  6   4   0.27344 B
25  3   8   0.27021 B
26  5   8   0.30662 B
27  8   8   0.29319 B
28  9   8   0.37300 B
29  6   16  0.36224 B
30  9   16  0.31316 B
31  2   8   0.55845 C
32  4   8   0.44811 C
33  3   16  0.42677 C
34  3   32  0.52315 C
35  7   8   0.67080 D
36  2   16  0.71776 D
37  4   16  0.73038 D
38  5   16  0.64232 D
39  7   16  0.68720 D
40  8   16  0.61088 D
41  5   32  0.72342 D
42  8   32  0.63594 D
43  1   16  0.77171 E
44  10  16  0.74087 E
45  2   32  0.79477 E
46  4   32  0.88546 E
47  7   32  0.78002 E
48  9   32  0.81456 E
49  10  32  0.89465 E
50  1   32  0.96129 F
51  1   64  0.96127 F
52  2   64  0.91687 F
53  3   64  0.97204 F
54  4   64  0.99268 F
55  5   64  0.98935 F
56  6   64  0.96263 F
57  7   64  0.95435 F
58  8   64  0.92081 F
59  9   64  0.91776 F
60  10  64  0.99104 F
"),header = TRUE)


dattab_new <- read.table(textConnection("Obs    rep dose    yt  y0 
1   1   2   0.02021 0
2   2   2   0.02491 0
3   3   2   0.00760 0
4   5   2   0.04466 0
5   6   2   0.00037 0
6   8   2   0.05386 0
7   9   2   0.07205 0
8   10  2   0.01125 0
9   1   4   0.02011 0
10  7   4   0.09058 0
11  8   4   0.06255 0
12  10  8   0.09431 0
13  4   2   0.14060 A
14  7   2   0.22223 A
15  2   4   0.20943 A
16  3   4   0.20959 A
17  4   4   0.17305 A
18  9   4   0.11405 A
19  10  4   0.17668 A
20  1   8   0.12756 A
21  6   8   0.11478 A
22  6   32  0.20602 A
23  5   4   0.26650 B
24  6   4   0.27344 B
25  3   8   0.27021 B
26  5   8   0.30662 B
27  8   8   0.29319 B
28  9   8   0.37300 B
29  6   16  0.36224 B
30  9   16  0.31316 B
31  2   8   0.55845 C
32  4   8   0.44811 C
33  3   16  0.42677 C
34  3   32  0.52315 C
35  7   8   0.67080 D
36  2   16  0.71776 D
37  4   16  0.73038 D
38  5   16  0.64232 D
39  7   16  0.68720 D
40  8   16  0.61088 D
41  5   32  0.72342 D
42  8   32  0.63594 D
43  1   16  0.77171 E
44  10  16  0.74087 E
45  2   32  0.79477 E
46  4   32  0.88546 E
47  7   32  0.78002 E
48  9   32  0.81456 E
49  10  32  0.89465 E
50  1   32  0.96129 F
51  1   64  0.96127 F
52  2   64  0.91687 F
53  3   64  0.97204 F
54  4   64  0.99268 F
55  5   64  0.98935 F
56  6   64  0.96263 F
57  7   64  0.95435 F
58  8   64  0.92081 F
59  9   64  0.91776 F
60  10  64  0.99104 F"),header = TRUE)

dattab_new <- dattab_new %>% mutate(yy = as.numeric(plyr::mapvalues(y0,from = c("0","A","B","C","D","E","F"),to = c(0,10,30,50,70,90,100)))/100)

ftable(xtabs(~ y0 + dose, data = dattab_new))

   dose  2  4  8 16 32 64
y0                       
0        8  3  1  0  0  0
A        2  5  2  0  1  0
B        0  2  4  2  0  0
C        0  0  2  1  1  0
D        0  0  1  5  2  0
E        0  0  0  2  5  0
F        0  0  0  0  1 10

## dattab %>% group_by(y0) %>% summarise(n=n(),meany=mean(yt)) %>% knitr::kable(.,digits = 3)

dattab_new %>% group_by(y0) %>% summarise(n=n(),meany=mean(yt),meanyy=mean(yy)) %>% knitr::kable(.,digits = 3)

y0	n	meany	meanyy
0	12	0.042	0.0
A	10	0.169	0.1
B	8	0.307	0.3
C	4	0.489	0.5
D	8	0.677	0.7
E	7	0.812	0.9
F	11	0.958	1.0

dattab_new %>% group_by(dose) %>% summarise(n=n(),meany=mean(yt),meanyy=mean(yy)) %>% knitr::kable(.,digits = 3)

dose	n	meany	meanyy
2	10	0.060	0.02
4	10	0.160	0.11
8	10	0.326	0.31
16	10	0.600	0.64
32	10	0.722	0.75
64	10	0.958	1.00

ggplot(dattab,aes(x = dose, y=yt))+geom_point() +  geom_smooth(method="glm", method.args=list(family="quasibinomial"), formula="y ~ log(x)",
                       se =TRUE, size=1.5)

y0	2	4	8	16	32	64
0	8	3	1	0	0	0
A	2	5	2	0	1	0
B	0	2	4	2	0	0
C	0	0	2	1	1	0
D	0	0	1	5	2	0
E	0	0	0	2	5	0
F	0	0	0	0	1	10

## fit ordered logit model and store results 'm'
dattab_new $y0 <- factor(dattab_new$y0, levels = c("0","A","B","C","D","E","F"),ordered = TRUE)

m <- MASS::polr(y0 ~ log(dose), data = dattab_new, Hess=TRUE)
summary(m)

Call:
MASS::polr(formula = y0 ~ log(dose), data = dattab_new, Hess = TRUE)

Coefficients:
          Value Std. Error t value
log(dose) 3.465     0.4876   7.106

Intercepts:
    Value   Std. Error t value
0|A  4.0061  0.8091     4.9513
A|B  6.3415  1.0283     6.1669
B|C  8.1921  1.2395     6.6092
C|D  9.1287  1.3489     6.7677
D|E 10.9709  1.5691     6.9917
E|F 12.9482  1.8234     7.1011

Residual Deviance: 128.7906 
AIC: 142.7906

ctable <- coef(summary(m))

## At ER50, the cumulative probability probability of the response being in a higher category is close to 1.
plogis(ctable[,1] + ctable[,2]*log(12.18))

log(dose)       0|A       A|B       B|C       C|D       D|E       E|F 
0.9908438 0.9975973 0.9998653 0.9999875 0.9999963 0.9999997 1.0000000

## calculate and store p values
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2

## combined table
(ctable <- cbind(ctable, "p value" = p))

              Value Std. Error  t value      p value
log(dose)  3.465155  0.4876298 7.106117 1.193530e-12
0|A        4.006148  0.8091031 4.951344 7.370254e-07
A|B        6.341529  1.0283150 6.166913 6.963590e-10
B|C        8.192141  1.2395138 6.609156 3.865169e-11
C|D        9.128696  1.3488680 6.767672 1.308710e-11
D|E       10.970866  1.5691374 6.991654 2.716642e-12
E|F       12.948174  1.8234117 7.101070 1.237950e-12

(ci <- confint(m)) ## profiled CI

   2.5 %   97.5 % 
2.582836 4.504378

## OR and CI
exp(cbind(OR = coef(m), ci))

             OR       ci
2.5 %  31.98141 13.23462
97.5 % 31.98141 90.41212

library(GLMcat)

dattab1 <- dattab %>% rename(x=dose,y=yt)
dfres <- modelall(dattab1,addbeta = TRUE)


Estimated effective doses

       Estimate Std. Error   Lower   Upper
e:1:50  13.0659     1.0351 10.9939 15.1380

dfres$ec50 %>% knitr::kable(.,digits = 3)

	EC50	lower	upper	se	model
p = 0.5:	12.916	11.038	15.114	0.080	quasibinomial
1	13.066	10.994	15.138	1.035	drc LL.2
p = 0.5:1	12.835	11.123	14.810	0.073	beta

predres <- dfres$predata
p_drc_qb <- ggplot(dattab1, aes(x=x, y=y)) +
        geom_point(alpha=0.1)  +
        scale_color_viridis_d()+
        # scale_color_brewer(palette = "Reds") + 
        # #geom_smooth(method="loess", se = FALSE, size=1.5) +
        geom_smooth(method="drm", method.args=list(fct=LL.2()), formula="y~x",
                       se = FALSE, size=1.5,col=scales::hue_pal()(2)[1]) +
                       geom_smooth(method="glm", method.args=list(family="quasibinomial"), formula="y~log(x)",
                       se = FALSE, size=1.5,lty=3,col=scales::hue_pal()(2)[2])+
        xlab("Concentration") + 
        ylab("% Injury / 100")  


 p_drc_qb + geom_hline(yintercept=0.5) + geom_ribbon(data=predres,aes(x=x,ymin=Lower,ymax=Upper,y=Prediction,fill=model),alpha=0.3) + ggtitle("drc and quasibinomial fit with CI") ##+ scale_x_log10()

p_drc_qb <- ggplot(dattab1, aes(x=x, y=y)) +
        geom_point(alpha=0.1)  +
        scale_color_viridis_d()+
        geom_line(data=predres,aes(x=x,y=Prediction,col=model))+
  geom_ribbon(data=predres,aes(x=x,ymin=Lower,ymax=Upper,y=Prediction,fill=model),alpha=0.3)  + 
        xlab("Concentration") + 
        ylab("% Injury / 100")  

p_drc_qb + geom_hline(yintercept=0.5) + scale_x_log10()

Adding individual level random effect

modr <- MASS::glmmPQL(yt~ log(dose),random=~1|Obs,family="quasibinomial",data=dattab)

summary(modr)

Linear mixed-effects model fit by maximum likelihood
  Data: dattab 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 Formula: ~1 | Obs
        (Intercept)    Residual
StdDev:   0.8883142 2.02464e-05

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects:  yt ~ log(dose) 
                Value  Std.Error DF   t-value p-value
(Intercept) -4.326393 0.26598326 58 -16.26566       0
log(dose)    1.714092 0.09853347 58  17.39603       0
 Correlation: 
          (Intr)
log(dose) -0.899

Standardized Within-Group Residuals:
          Min            Q1           Med            Q3           Max 
-1.754784e-04 -4.189350e-05  8.955260e-06  4.092003e-05  1.439488e-04 

Number of Observations: 60
Number of Groups: 60

uniroot(function(x) {
  y <- as.numeric(predict(modr, newdata = data.frame(dose=x),level=0) - 0.5)
  },lower =10, upper = 25)$root

[1] 16.70512