Frailty models for clustered data

library(survival)
library(parfm)
library(tibble)
library(dplyr)

The Diabetic Retinopathy Study was conducted in the US by the National Eye Institute to assess the effect of laser photocoagulation in delaying onset of severe visual loss (“blindness”) in patients with diabetic retinopathy. One eye of each patient was randomly selected for photocoagulation and the other was observed without treatment. The patients were followed over several years for the occurrence of blindness (event = blindness) in their left and right eyes. Censoring is caused by death, dropout, or end of the study. We consider only a subset of the original data set containing \(197\) high risk patients. See Huster et al, 1989, Biometrics, for a discussion and further references.

There are two lines for each patient in the data set: one line per eye. The variables are:

id: patient number
trteye: treated eye (1=right, 2=left)
ageonset: age at diagnosis of diabetes
typediab: type of diabetes (1= juvenile age at onset < 20 years , 2=adult)
time: follow-up time in months
status: status for eye (0=censored, 1=blindness)
trt: treatment of eye (1=control eye, 2=treated eye)

Before you start, you will have to read the data. Use the R-command:

eye=read.table("https://www.med.uio.no/imb/english/research/centres/ocbe/courses/imb9335/retinopathy.txt", header=T) %>% as_tibble()
eye

# A tibble: 394 × 7
      id trteye ageonset typediab status  time   trt
   <int>  <int>    <int>    <int>  <int> <dbl> <int>
 1     5      2       28        2      0 46.2      1
 2    14      1       12        1      1 31.3      1
 3    16      1        9        1      0 42.3      1
 4    25      2        9        1      0 20.6      1
 5    29      2       13        1      1  0.3      1
 6    46      1       12        1      1 54.3      1
 7    49      1        8        1      1 10.8      1
 8    56      1       12        1      0 23.2      1
 9    61      1       16        1      0  1.47     1
10    71      1       21        2      1 13.8      1
# ℹ 384 more rows

We start out by only using treatment as a covariate.

We first fit a model with Weibull baseline and no frailty by the command:

nofrail.model = parfm(Surv(time,status)~factor(trt),data=eye)
nofrail.model


Frailty distribution: none 
Baseline hazard distribution: Weibull 
Loglikelihood: -836.379 

             ESTIMATE SE    p-val    
rho           0.810   0.058          
lambda        0.032   0.008          
factor(trt)2 -0.790   0.169 <.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ci.parfm(nofrail.model)

               low    up
factor(trt)2 0.326 0.632

Perform the command, and make sure that you understand what the output tells you.

We then fit a model with Weibull baseline and gamma frailty:

frail.model = parfm(Surv(time,status)~factor(trt),cluster="id", frailty="gamma", data=eye)
frail.model


Frailty distribution: gamma 
Baseline hazard distribution: Weibull 
Loglikelihood: -827.744 

             ESTIMATE SE    p-val    
theta         1.040   0.329          
rho           0.948   0.075          
lambda        0.028   0.007          
factor(trt)2 -0.960   0.182 <.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kendall's Tau: 0.342

Perform the command, and compare with the result for the model without frailty. Is there a significant frailty effect?

LR=2*(attributes(frail.model)$loglik - attributes(nofrail.model)$loglik)
LR

[1] 17.26943

(1-pchisq(LR, 1))/2 # YES!

[1] 1.621814e-05

Significant risk variation between the clusters (individuals here)!

Fit a model with gamma frailty and treatment and age at onset as covariates. Is there a significant effect of age at onset? What about type of diabetes?

frail.model.ageonset = parfm(Surv(time,status)~factor(trt) + ageonset, cluster="id", frailty="gamma", data=eye)
frail.model.ageonset # NO! (coef close to 0, p-value large)


Frailty distribution: gamma 
Baseline hazard distribution: Weibull 
Loglikelihood: -827.469 

             ESTIMATE SE    p-val    
theta         1.034   0.328          
rho           0.948   0.075          
lambda        0.025   0.008          
factor(trt)2 -0.966   0.182 <.001 ***
ageonset      0.006   0.008 0.456    
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kendall's Tau: 0.341

frail.model.typediab = parfm(Surv(time,status)~factor(trt) + factor(typediab), cluster="id", frailty="gamma", data=eye)
frail.model.typediab # NO! (coef close to 0, p-value large)


Frailty distribution: gamma 
Baseline hazard distribution: Weibull 
Loglikelihood: -827.73 

                  ESTIMATE SE    p-val    
theta              1.037   0.329          
rho                0.947   0.075          
lambda             0.028   0.008          
factor(trt)2      -0.961   0.182 <.001 ***
factor(typediab)2  0.040   0.232 0.864    
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kendall's Tau: 0.341

Fit a Cox frailty model (with gamma frailty) using treatment as the only covariate (see lectures for R commands). Compare the results with those you obtained in question b.

frail.cox = coxph(Surv(time,status) ~ factor(trt) + frailty(id), data=eye)
frail.cox

Call:
coxph(formula = Surv(time, status) ~ factor(trt) + frailty(id), 
    data = eye)

                coef se(coef)     se2   Chisq   DF       p
factor(trt)2  -0.910    0.174   0.171  27.295  1.0 1.7e-07
frailty(id)                           114.448 84.6   0.017

Iterations: 6 outer, 30 Newton-Raphson
     Variance of random effect= 0.854   I-likelihood = -850.9 
Degrees of freedom for terms=  1.0 84.6 
Likelihood ratio test=201  on 85.6 df, p=3e-11
n= 394, number of events= 155

# cox with no frailty
cox = coxph(Surv(time, status) ~ factor(trt), data = eye)

LRcox = 2*(frail.cox$history$frailty$c.loglik - cox$loglik[2])
LRcox # log-likelihood ratio

[1] 11.88032

1 - pchisq(LRcox,df=1)

[1] 0.0005673029

Finally we want to illustrate how stratified Cox regression can be used to analyse paired survival data like the ones we have in the present situation. We may fit a stratified Cox model with a separate baseline for each patient (per eye pair pretty much) by the commands:

eyefit.strat=coxph(Surv(time,status)~trt+strata(id),data=eye)
summary(eyefit.strat)

Call:
coxph(formula = Surv(time, status) ~ trt + strata(id), data = eye)

  n= 394, number of events= 155 

       coef exp(coef) se(coef)      z Pr(>|z|)    
trt -0.9623    0.3820   0.2016 -4.773 1.82e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    exp(coef) exp(-coef) lower .95 upper .95
trt     0.382      2.618    0.2573    0.5672

Concordance= 0.748  (se = 0.058 )
Likelihood ratio test= 25.49  on 1 df,   p=4e-07
Wald test            = 22.78  on 1 df,   p=2e-06
Score (logrank) test = 24.59  on 1 df,   p=7e-07

Explain the idea behind such an approach, and discuss how it compares to the approach in question d).

We got same approximately results! Why? See equation (slide no. 24): \(a_{ij}(t|Z_i) = Z_i a_0(t) e^{\beta x_{ij}}\)

Cox model uses different baseline hazards per stratum (\(Z_i=1\))
Parametric Frailty models are better in most practical applications (\(a_0(t)\) same for all individuals, \(Z_i\) varies and is estimated)