Frailty models for clustered data

library(survival)
library(parfm)
library(tibble)
library(dplyr)

The Diabetic Retinopathy Study was conducted in the US by the National Eye Institute to assess the effect of laser photocoagulation in delaying onset of severe visual loss (“blindness”) in patients with diabetic retinopathy. One eye of each patient was randomly selected for photocoagulation and the other was observed without treatment. The patients were followed over several years for the occurrence of blindness (event = blindness) in their left and right eyes. Censoring is caused by death, dropout, or end of the study. We consider only a subset of the original data set containing \(197\) high risk patients. See Huster et al, 1989, Biometrics, for a discussion and further references.

There are two lines for each patient in the data set: one line per eye. The variables are:

Before you start, you will have to read the data. Use the R-command:

eye=read.table("https://www.med.uio.no/imb/english/research/centres/ocbe/courses/imb9335/retinopathy.txt", header=T) %>% as_tibble()
eye
# A tibble: 394 × 7
      id trteye ageonset typediab status  time   trt
   <int>  <int>    <int>    <int>  <int> <dbl> <int>
 1     5      2       28        2      0 46.2      1
 2    14      1       12        1      1 31.3      1
 3    16      1        9        1      0 42.3      1
 4    25      2        9        1      0 20.6      1
 5    29      2       13        1      1  0.3      1
 6    46      1       12        1      1 54.3      1
 7    49      1        8        1      1 10.8      1
 8    56      1       12        1      0 23.2      1
 9    61      1       16        1      0  1.47     1
10    71      1       21        2      1 13.8      1
# ℹ 384 more rows

We start out by only using treatment as a covariate.

  1. We first fit a model with Weibull baseline and no frailty by the command:
nofrail.model = parfm(Surv(time,status)~factor(trt),data=eye)
nofrail.model

Frailty distribution: none 
Baseline hazard distribution: Weibull 
Loglikelihood: -836.379 

             ESTIMATE SE    p-val    
rho           0.810   0.058          
lambda        0.032   0.008          
factor(trt)2 -0.790   0.169 <.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ci.parfm(nofrail.model)
               low    up
factor(trt)2 0.326 0.632

Perform the command, and make sure that you understand what the output tells you.

  1. We then fit a model with Weibull baseline and gamma frailty:
frail.model = parfm(Surv(time,status)~factor(trt),cluster="id", frailty="gamma", data=eye)
frail.model

Frailty distribution: gamma 
Baseline hazard distribution: Weibull 
Loglikelihood: -827.744 

             ESTIMATE SE    p-val    
theta         1.040   0.329          
rho           0.948   0.075          
lambda        0.028   0.007          
factor(trt)2 -0.960   0.182 <.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kendall's Tau: 0.342 

Perform the command, and compare with the result for the model without frailty. Is there a significant frailty effect?

LR=2*(attributes(frail.model)$loglik - attributes(nofrail.model)$loglik)
LR
[1] 17.26943
(1-pchisq(LR, 1))/2 # YES!
[1] 1.621814e-05
  1. Fit a model with gamma frailty and treatment and age at onset as covariates. Is there a significant effect of age at onset? What about type of diabetes?
frail.model.ageonset = parfm(Surv(time,status)~factor(trt) + ageonset, cluster="id", frailty="gamma", data=eye)
frail.model.ageonset # NO! (coef close to 0, p-value large)

Frailty distribution: gamma 
Baseline hazard distribution: Weibull 
Loglikelihood: -827.469 

             ESTIMATE SE    p-val    
theta         1.034   0.328          
rho           0.948   0.075          
lambda        0.025   0.008          
factor(trt)2 -0.966   0.182 <.001 ***
ageonset      0.006   0.008 0.456    
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kendall's Tau: 0.341 
frail.model.typediab = parfm(Surv(time,status)~factor(trt) + factor(typediab), cluster="id", frailty="gamma", data=eye)
frail.model.typediab # NO! (coef close to 0, p-value large)

Frailty distribution: gamma 
Baseline hazard distribution: Weibull 
Loglikelihood: -827.73 

                  ESTIMATE SE    p-val    
theta              1.037   0.329          
rho                0.947   0.075          
lambda             0.028   0.008          
factor(trt)2      -0.961   0.182 <.001 ***
factor(typediab)2  0.040   0.232 0.864    
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kendall's Tau: 0.341 
  1. Fit a Cox frailty model (with gamma frailty) using treatment as the only covariate (see lectures for R commands). Compare the results with those you obtained in question b.
frail.cox = coxph(Surv(time,status) ~ factor(trt) + frailty(id), data=eye)
frail.cox
Call:
coxph(formula = Surv(time, status) ~ factor(trt) + frailty(id), 
    data = eye)

                coef se(coef)     se2   Chisq   DF       p
factor(trt)2  -0.910    0.174   0.171  27.295  1.0 1.7e-07
frailty(id)                           114.448 84.6   0.017

Iterations: 6 outer, 30 Newton-Raphson
     Variance of random effect= 0.854   I-likelihood = -850.9 
Degrees of freedom for terms=  1.0 84.6 
Likelihood ratio test=201  on 85.6 df, p=3e-11
n= 394, number of events= 155 
# cox with no frailty
cox = coxph(Surv(time, status) ~ factor(trt), data = eye)

LRcox = 2*(frail.cox$history$frailty$c.loglik - cox$loglik[2])
LRcox # log-likelihood ratio
[1] 11.88032
1 - pchisq(LRcox,df=1)
[1] 0.0005673029
  1. Finally we want to illustrate how stratified Cox regression can be used to analyse paired survival data like the ones we have in the present situation. We may fit a stratified Cox model with a separate baseline for each patient (per eye pair pretty much) by the commands:
eyefit.strat=coxph(Surv(time,status)~trt+strata(id),data=eye)
summary(eyefit.strat)
Call:
coxph(formula = Surv(time, status) ~ trt + strata(id), data = eye)

  n= 394, number of events= 155 

       coef exp(coef) se(coef)      z Pr(>|z|)    
trt -0.9623    0.3820   0.2016 -4.773 1.82e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    exp(coef) exp(-coef) lower .95 upper .95
trt     0.382      2.618    0.2573    0.5672

Concordance= 0.748  (se = 0.058 )
Likelihood ratio test= 25.49  on 1 df,   p=4e-07
Wald test            = 22.78  on 1 df,   p=2e-06
Score (logrank) test = 24.59  on 1 df,   p=7e-07

Explain the idea behind such an approach, and discuss how it compares to the approach in question d).

We got same approximately results! Why? See equation (slide no. 24): \(a_{ij}(t|Z_i) = Z_i a_0(t) e^{\beta x_{ij}}\)