GEE and categorical responses

code
analysis
Try using GEE for categorical responses
Author

Zhenglei Gao

Published

January 14, 2025

Use the data from previous posts

Multinomial or ordinal GEE

Data: Rheumatoid self-assessment scores for 302 patients, measured on a five-level ordinal response scale at three follow-up times.

library(multgee)
Warning: package 'multgee' was built under R version 4.4.2
Loading required package: gnm
Warning: package 'gnm' was built under R version 4.4.2
data(arthritis)
intrinsic.pars(y, arthritis, id, time) ## assess the underlying association pattern.
[1] 0.6517843 0.9097341 0.9022272
fitmod <- ordLORgee(formula = y ~ factor(time) + factor(trt) + factor(baseline),
  data = arthritis, id = id, repeated = time, LORstr = "uniform")
summary(fitmod)
GEE FOR ORDINAL MULTINOMIAL RESPONSES 
version 1.6.0 modified 2017-07-10 

Link : Cumulative logit 

Local Odds Ratios:
Structure:         uniform
Model:             3way

call:
ordLORgee(formula = y ~ factor(time) + factor(trt) + factor(baseline), 
    data = arthritis, id = id, repeated = time, LORstr = "uniform")

Summary of residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.5161111 -0.2399227 -0.0749670  0.0000219 -0.0066995  0.9932912 

Number of Iterations: 5 

Coefficients:
                  Estimate   san.se   san.z Pr(>|san.z|)    
beta10            -1.84315  0.38929 -4.7346      < 2e-16 ***
beta20             0.26692  0.35013  0.7624      0.44585    
beta30             2.23132  0.36625  6.0924      < 2e-16 ***
beta40             4.52542  0.42123 10.7434      < 2e-16 ***
factor(time)3      0.00140  0.12183  0.0115      0.99080    
factor(time)5     -0.36172  0.11395 -3.1743      0.00150 ** 
factor(trt)2      -0.51212  0.16799 -3.0486      0.00230 ** 
factor(baseline)2 -0.66963  0.38036 -1.7605      0.07832 .  
factor(baseline)3 -1.26070  0.35252 -3.5763      0.00035 ***
factor(baseline)4 -2.64373  0.41282 -6.4041      < 2e-16 ***
factor(baseline)5 -3.96613  0.53164 -7.4602      < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Local Odds Ratios Estimates:
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
 [1,] 0.000 0.000 0.000 0.000 2.257 2.257 2.257 2.257 2.257 2.257 2.257 2.257
 [2,] 0.000 0.000 0.000 0.000 2.257 2.257 2.257 2.257 2.257 2.257 2.257 2.257
 [3,] 0.000 0.000 0.000 0.000 2.257 2.257 2.257 2.257 2.257 2.257 2.257 2.257
 [4,] 0.000 0.000 0.000 0.000 2.257 2.257 2.257 2.257 2.257 2.257 2.257 2.257
 [5,] 2.257 2.257 2.257 2.257 0.000 0.000 0.000 0.000 2.257 2.257 2.257 2.257
 [6,] 2.257 2.257 2.257 2.257 0.000 0.000 0.000 0.000 2.257 2.257 2.257 2.257
 [7,] 2.257 2.257 2.257 2.257 0.000 0.000 0.000 0.000 2.257 2.257 2.257 2.257
 [8,] 2.257 2.257 2.257 2.257 0.000 0.000 0.000 0.000 2.257 2.257 2.257 2.257
 [9,] 2.257 2.257 2.257 2.257 2.257 2.257 2.257 2.257 0.000 0.000 0.000 0.000
[10,] 2.257 2.257 2.257 2.257 2.257 2.257 2.257 2.257 0.000 0.000 0.000 0.000
[11,] 2.257 2.257 2.257 2.257 2.257 2.257 2.257 2.257 0.000 0.000 0.000 0.000
[12,] 2.257 2.257 2.257 2.257 2.257 2.257 2.257 2.257 0.000 0.000 0.000 0.000

p-value of Null model: < 0.0001 
confint(fitmod)
                       2.5 %      97.5 %
beta10            -2.6061497 -1.08015721
beta20            -0.4193163  0.95314883
beta30             1.5134867  2.94915281
beta40             3.6998267  5.35101277
factor(time)3     -0.2373805  0.24018865
factor(time)5     -0.5850535 -0.13837735
factor(trt)2      -0.8413778 -0.18287050
factor(baseline)2 -1.4151270  0.07587068
factor(baseline)3 -1.9516257 -0.56978096
factor(baseline)4 -3.4528305 -1.83462340
factor(baseline)5 -5.0081187 -2.92413445
library(ggstats)
ggcoef_multinom(fitmod)
ℹ multgee model detected.
✔ `tidy_multgee()` used instead.
ℹ Add `tidy_fun = broom.helpers::tidy_multgee` to quiet these messages.

What is the difference between Ordinal GEE and ordinal logistic regression?

GEE Basics

  • It is a quasilikelihood approach.
  • Could include cluster/group correlation structure(e.g., repeated measurements from one subject)
  • No distributional assumption.
  • Could handle binary or categorical responses (multinomial).
  • Computationally simpler for categorical compared to maximum likelihood approach.
  • No likelihood based post hoc testing, comparison or inference.
URL <- "http://static.lib.virginia.edu/statlab/materials/data/depression.csv"
dat <- read.csv(URL, stringsAsFactors = TRUE)
dat$id <- factor(dat$id)
dat$drug <- relevel(dat$drug, ref = "standard")
head(dat, n = 3)
  diagnose     drug id time depression
1     mild standard  1    0          1
2     mild standard  1    1          1
3     mild standard  1    2          1
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
with(dat, tapply(depression, list(diagnose, drug, time), mean)) %>% 
  ftable() %>% 
  round(2)
                    0    1    2
                               
mild   standard  0.51 0.59 0.68
       new       0.53 0.79 0.97
severe standard  0.21 0.28 0.46
       new       0.18 0.50 0.83
##install.packages("gee")
library(gee) # version 4.13-25
Warning: package 'gee' was built under R version 4.4.2
dep_gee <- gee(depression ~ diagnose + drug*time,
               data = dat, 
               id = id, 
               family = binomial,
               corstr = "independence")
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
   (Intercept) diagnosesevere        drugnew           time   drugnew:time 
   -0.02798843    -1.31391092    -0.05960381     0.48241209     1.01744498 
summary(dep_gee)

 GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
 gee S-function, version 4.13 modified 98/01/27 (1998) 

Model:
 Link:                      Logit 
 Variance to Mean Relation: Binomial 
 Correlation Structure:     Independent 

Call:
gee(formula = depression ~ diagnose + drug * time, id = id, data = dat, 
    family = binomial, corstr = "independence")

Summary of Residuals:
        Min          1Q      Median          3Q         Max 
-0.94844242 -0.40683252  0.05155758  0.38830952  0.80242231 


Coefficients:
                  Estimate Naive S.E.    Naive z Robust S.E.   Robust z
(Intercept)    -0.02798843  0.1627083 -0.1720160   0.1741865 -0.1606808
diagnosesevere -1.31391092  0.1453432 -9.0400569   0.1459845 -9.0003423
drugnew        -0.05960381  0.2205812 -0.2702126   0.2285385 -0.2608042
time            0.48241209  0.1139224  4.2345663   0.1199350  4.0222784
drugnew:time    1.01744498  0.1874132  5.4288855   0.1876938  5.4207709

Estimated Scale Parameter:  0.9854113
Number of Iterations:  1

Working Correlation
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
res <- glm(depression ~ diagnose + drug*time,
               data = dat,
               family = quasibinomial)
summary(res)

Call:
glm(formula = depression ~ diagnose + drug * time, family = quasibinomial, 
    data = dat)

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.02799    0.16271  -0.172    0.863    
diagnosesevere -1.31391    0.14534  -9.040  < 2e-16 ***
drugnew        -0.05960    0.22058  -0.270    0.787    
time            0.48241    0.11392   4.235 2.50e-05 ***
drugnew:time    1.01744    0.18741   5.429 7.09e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasibinomial family taken to be 0.9854187)

    Null deviance: 1411.9  on 1019  degrees of freedom
Residual deviance: 1161.9  on 1015  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 4

The “Estimated Scale Parameter” is reported as 0.9854113 = the dispersion parameter = similar to what is reported (Dispersion parameter for quasibinomial family taken to be 0.9854187) when running glm() with family = quasibinomial in order to model over-dispersion.

dep_gee2 <- gee(depression ~ diagnose + drug*time,
               data = dat, 
               id = id, 
               family = binomial,
               corstr = "exchangeable")
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
   (Intercept) diagnosesevere        drugnew           time   drugnew:time 
   -0.02798843    -1.31391092    -0.05960381     0.48241209     1.01744498 
summary(dep_gee2)

 GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
 gee S-function, version 4.13 modified 98/01/27 (1998) 

Model:
 Link:                      Logit 
 Variance to Mean Relation: Binomial 
 Correlation Structure:     Exchangeable 

Call:
gee(formula = depression ~ diagnose + drug * time, id = id, data = dat, 
    family = binomial, corstr = "exchangeable")

Summary of Residuals:
        Min          1Q      Median          3Q         Max 
-0.94843397 -0.40683122  0.05156603  0.38832332  0.80238627 


Coefficients:
                  Estimate Naive S.E.    Naive z Robust S.E.   Robust z
(Intercept)    -0.02809866  0.1625499 -0.1728617   0.1741791 -0.1613205
diagnosesevere -1.31391033  0.1448627 -9.0700418   0.1459630 -9.0016667
drugnew        -0.05926689  0.2205340 -0.2687427   0.2285569 -0.2593091
time            0.48246420  0.1141154  4.2278625   0.1199383  4.0226037
drugnew:time    1.01719312  0.1877051  5.4191018   0.1877014  5.4192084

Estimated Scale Parameter:  0.985392
Number of Iterations:  2

Working Correlation
             [,1]         [,2]         [,3]
[1,]  1.000000000 -0.003432732 -0.003432732
[2,] -0.003432732  1.000000000 -0.003432732
[3,] -0.003432732 -0.003432732  1.000000000
dep_gee3 <- gee(depression ~ diagnose + drug*time,
               data = dat, 
               id = id, 
               family = binomial,
               corstr = "AR-M", Mv = 1)
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
   (Intercept) diagnosesevere        drugnew           time   drugnew:time 
   -0.02798843    -1.31391092    -0.05960381     0.48241209     1.01744498 
dep_gee3$working.correlation
             [,1]        [,2]         [,3]
[1,] 1.000000e+00 0.008477443 7.186704e-05
[2,] 8.477443e-03 1.000000000 8.477443e-03
[3,] 7.186704e-05 0.008477443 1.000000e+00
library(lme4) # version 1.1-31
Loading required package: Matrix

Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':

    expand, pack, unpack
dep_glmer <- glmer(depression ~ diagnose + drug*time + (1|id), 
               data = dat, family = binomial)
summary(dep_glmer, corr = FALSE)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: depression ~ diagnose + drug * time + (1 | id)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
  1173.9   1203.5   -581.0   1161.9     1014 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.2849 -0.8268  0.2326  0.7964  2.0181 

Random effects:
 Groups Name        Variance Std.Dev.
 id     (Intercept) 0.003229 0.05682 
Number of obs: 1020, groups:  id, 340

Fixed effects:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -0.02796    0.16407  -0.170    0.865    
diagnosesevere -1.31488    0.15261  -8.616  < 2e-16 ***
drugnew        -0.05968    0.22240  -0.268    0.788    
time            0.48273    0.11566   4.174 3.00e-05 ***
drugnew:time    1.01817    0.19150   5.317 1.06e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(ggeffects) # version 1.2.0
Warning: package 'ggeffects' was built under R version 4.4.2
plot(ggemmeans(dep_glmer, terms = c("time", "drug"), 
               condition = c(diagnose = "severe"))) + 
  ggplot2::ggtitle("GLMER Effect plot")
You are calculating adjusted predictions on the population-level (i.e.
  `type = "fixed"`) for a *generalized* linear mixed model.
  This may produce biased estimates due to Jensen's inequality. Consider
  setting `bias_correction = TRUE` to correct for this bias.
  See also the documentation of the `bias_correction` argument.

library(emmeans) # version 1.8.4-1
Warning: package 'emmeans' was built under R version 4.4.2
Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'
emm_out <- emmeans(dep_gee2, specs = c("time", "drug"), 
        at = list(diagnose = "severe"), 
        cov.keep = "time", 
        regrid = "response") %>% 
  as.data.frame()
library(ggplot2) # version 3.4.1
ggplot(emm_out) +
  aes(x = time, y = prob, color = drug, fill = drug) +
  geom_line() +
  geom_ribbon(aes(ymin = asymp.LCL, ymax = asymp.UCL, 
                  color = NULL), alpha = 0.15) +
  scale_color_brewer(palette = "Set1") +
  scale_fill_brewer(palette = "Set1", guide = NULL) +
  scale_y_continuous(labels = scales::percent) +
  theme_ggeffects() +
  labs(title = "GEE Effect plot", y = "depression")

library(geepack) # version 1.3.9
Warning: package 'geepack' was built under R version 4.4.2
dep_gee4 <- geeglm(depression ~ diagnose + drug*time,
                   data = dat,
                   id = id,
                   family = binomial,
                   corstr = "exchangeable")
# create effect plot
plot(ggemmeans(dep_gee4, terms = c("time", "drug"),
               condition = c(diagnose = "severe"))) + 
  ggplot2::ggtitle("GEE Effect plot")

References

Additional references on survey data