Additive regression

library(survival)
library(tibble)
library(dplyr)

At the university hospital of the University of Massachusetts one has for a number of years studied the survival of patients admitted with an acute myocardial infarction (AMI- aka HEART ATTACK!). One aim of the study has been to investigate whether the survival of AMI patients has improved over time. A number of covariates were measured at hospitalization. In addition to information on time of hospitalization (here given in five years periods), we will in this problem restrict attention to the two demographic covariates age and sex, one covariate that is related to the seriousness of the AMI (amount of “heart enzyme”), and one covariate that gives information on whether the patient has had an AMI earlier or not. Further we will restrict attention to the first three years after AMI, so patients who live longer than \(1095 = 3 \times 365\) days have been censored at that time.

You may read the AMI-data into R by the command:

ami=read.table("https://www.med.uio.no/imb/english/research/centres/ocbe/courses/imb9335/ami.txt",header=T) %>% as_tibble()
ami
# A tibble: 481 × 8
      id  days status   per   age   sex enzym  prev
   <int> <int>  <int> <int> <int> <int> <int> <int>
 1     1     1      1     1    62     1   485     0
 2     2     1      1     1    78     1   910     1
 3     3     1      1     1    81     1   320     0
 4     4     1      1     1    79     1  3290     1
 5     5     2      1     1    60     1  2500     1
 6     6     2      1     1    72     0    99     0
 7     7     2      1     1    60     1  1200     0
 8     8     3      1     1    83     1   160     0
 9     9     3      1     1    78     0    66     1
10    10  1095      0     1    72     1    99     0
# ℹ 471 more rows

The data are organized with one line for each of the \(481\) patients, and with the following variables in the seven columns:

We will study the effect of the five covariates (per, age, sex, enzym and prev) using the additive regression model. Our main interest lies in a simultaneous analysis of all five covariates. But nevertheless we will start out by fitting additive regression models with one covariate at a time.

  1. First we fit an additive model with sex as the only covariate and plot the estimated cumulative regression function. This is achieved by the commands:
fit.sex = survival::aareg(Surv(days,status) ~ factor(sex), data=ami)
fit.sex
Call:
survival::aareg(formula = Surv(days, status) ~ factor(sex), data = ami)

  n= 481 
    115 out of 115 unique event times used

                slope    coef se(coef)    z        p
Intercept    0.000734 0.00219 0.000222 9.85 6.93e-23
factor(sex)1 0.000405 0.00102 0.000407 2.51 1.20e-02

Chisq=6.31 on 1 df, p=0.012; test weights=aalen
print(fit.sex, maxtime = 100)
Call:
survival::aareg(formula = Surv(days, status) ~ factor(sex), data = ami)

  n= 481 
    44 out of 115 unique event times used

               slope    coef se(coef)    z        p
Intercept    0.00380 0.00186 0.000251 7.42 1.21e-13
factor(sex)1 0.00285 0.00124 0.000478 2.60 9.27e-03

Chisq=6.77 on 1 df, p=0.00927; test weights=aalen

Also test the null hypothesis that there is no effect of sex (marginally, i.e. when not corrected for the other covariates). You obtain the standardized test and its p-value from the last two columns of the output from the command print(fit.sex). (The maxtime option also works for the print command.)

res = summary(fit.sex) # , maxtime = 100
res$table[2,'p'] # borderline
[1] 0.01200272

Intercept figure always shows the baseline hazard (reference category when modeling with one covariate or all reference values when modeling with more covariates):

par(mfrow=c(1,2))
plot(fit.sex)

You may want to focus on the first 100 days after hospitalizations.

par(mfrow=c(1,2))
plot(fit.sex, maxtime = 100)

Effect of sex variable is stronger the first 100 days (p-value):

res = summary(fit.sex, maxtime = 100)
res$table[2,'p']
[1] 0.009270794
  1. Fit an additive regression model for each of the other four covariates per, age, enzym and prev (one at a time). For each of the covariates you should interpret the estimates you obtain for the cumulative baseline and the cumulative regression function and decide whether the covariate has a significant effect (when not corrected for the effects of the other covariates). In order to ease interpretation of the estimated cumulative baseline function, you should center the numeric covariates age and enzym by subtracting their means.

Scale continuous variables:

ami2 = ami %>% 
  mutate(age = (age - mean(age))/sd(age), enzym = (enzym - mean(enzym))/sd(enzym))
fit.per = aareg(formula = Surv(days, status) ~ factor(per), data = ami)
fit.per
Call:
aareg(formula = Surv(days, status) ~ factor(per), data = ami)

  n= 481 
    115 out of 115 unique event times used

                 slope      coef se(coef)      z        p
Intercept     8.66e-04  0.002540 0.000328  7.750 9.49e-15
factor(per)2  1.52e-05  0.000196 0.000458  0.428 6.69e-01
factor(per)3 -6.25e-05 -0.000104 0.000471 -0.222 8.25e-01

Chisq=0.43 on 2 df, p=0.805; test weights=aalen
par(mfrow = c(1,3))
plot(fit.per)

fit.age = aareg(formula = Surv(days, status) ~ age, data = ami)
fit.age
Call:
aareg(formula = Surv(days, status) ~ age, data = ami)

  n= 481 
    115 out of 115 unique event times used

              slope      coef se(coef)     z        p
Intercept -1.74e-03 -0.005130 1.00e-03 -5.13 2.90e-07
age        4.22e-05  0.000116 1.64e-05  7.10 1.27e-12

Chisq=50.38 on 1 df, p=1.27e-12; test weights=aalen
par(mfrow = c(1,2))
plot(fit.age)

fit.age2 = aareg(formula = Surv(days, status) ~ age, data = ami2)
fit.age2
Call:
aareg(formula = Surv(days, status) ~ age, data = ami2)

  n= 481 
    115 out of 115 unique event times used

             slope    coef se(coef)    z        p
Intercept 0.001010 0.00275 0.000204 13.4 3.46e-41
age       0.000535 0.00147 0.000208  7.1 1.27e-12

Chisq=50.38 on 1 df, p=1.27e-12; test weights=aalen
par(mfrow = c(1,2))
plot(fit.age2)

fit.enzym = aareg(formula = Surv(days, status) ~ enzym, data = ami2)
fit.enzym
Call:
aareg(formula = Surv(days, status) ~ enzym, data = ami2)

  n= 481 
    115 out of 115 unique event times used

             slope     coef se(coef)     z        p
Intercept 0.000909 0.002580 0.000190 13.50 1.10e-41
enzym     0.000223 0.000444 0.000269  1.65 9.92e-02

Chisq=2.72 on 1 df, p=0.0992; test weights=aalen
par(mfrow = c(1,2))
plot(fit.enzym)

fit.prev = aareg(formula = Surv(days, status) ~ factor(prev), data = ami)
fit.prev
Call:
aareg(formula = Surv(days, status) ~ factor(prev), data = ami)

  n= 481 
    115 out of 115 unique event times used

                 slope    coef se(coef)    z        p
Intercept     0.000654 0.00203 0.000207 9.85 6.93e-23
factor(prev)1 0.000674 0.00166 0.000444 3.75 1.77e-04

Chisq=14.06 on 1 df, p=0.000177; test weights=aalen
par(mfrow = c(1,2))
plot(fit.prev)

  1. Fit an additive regression model with all the five covariates. Determine which of covariates have a significant effect on the mortality (using “backwards elimination”), and thereby obtain a “final model” where all covariates have a significant effect. Interpret the estimates of the cumulative baseline and the cumulative regression functions for your “final model”.
fit.all = aareg(formula = Surv(days, status) ~ factor(per) + age + factor(sex) + enzym + factor(prev), data = ami2)
fit.all
Call:
aareg(formula = Surv(days, status) ~ factor(per) + age + factor(sex) + 
    enzym + factor(prev), data = ami2)

  n= 481 
    115 out of 115 unique event times used

                  slope      coef se(coef)      z        p
Intercept      0.000842  0.002390 0.000425  5.630 1.83e-08
factor(per)2  -0.000140 -0.000176 0.000477 -0.368 7.13e-01
factor(per)3  -0.000297 -0.000681 0.000474 -1.440 1.50e-01
age            0.000552  0.001480 0.000222  6.680 2.44e-11
factor(sex)1   0.000208  0.000389 0.000426  0.914 3.61e-01
enzym          0.000387  0.000873 0.000283  3.090 2.01e-03
factor(prev)1  0.000647  0.001560 0.000454  3.430 5.98e-04

Chisq=67.78 on 6 df, p=1.16e-12; test weights=aalen
par(mfrow = c(2,4))
plot(fit.all)

print(fit.all, maxtime = 100)
Call:
aareg(formula = Surv(days, status) ~ factor(per) + age + factor(sex) + 
    enzym + factor(prev), data = ami2)

  n= 481 
    44 out of 115 unique event times used

                 slope      coef se(coef)     z        p
Intercept      0.00519  0.002180 0.000481  4.53 5.91e-06
factor(per)2  -0.00273 -0.000834 0.000552 -1.51 1.31e-01
factor(per)3  -0.00206 -0.000728 0.000568 -1.28 1.99e-01
age            0.00293  0.001260 0.000254  4.98 6.50e-07
factor(sex)1   0.00217  0.000874 0.000497  1.76 7.87e-02
enzym          0.00388  0.001350 0.000395  3.42 6.27e-04
factor(prev)1  0.00277  0.001390 0.000514  2.71 6.77e-03

Chisq=46.16 on 6 df, p=2.75e-08; test weights=aalen
par(mfrow = c(2,4))
plot(fit.all, maxtime = 100)

Let’s keep only the variables in the model that had a significant effect:

fit.final = aareg(formula = Surv(days, status) ~ age + enzym + factor(prev), data = ami2)
fit.final
Call:
aareg(formula = Surv(days, status) ~ age + enzym + factor(prev), 
    data = ami2)

  n= 481 
    115 out of 115 unique event times used

                 slope     coef se(coef)     z        p
Intercept     0.000752 0.002270 0.000224 10.10 3.81e-24
age           0.000554 0.001510 0.000212  7.10 1.23e-12
enzym         0.000358 0.000812 0.000279  2.90 3.67e-03
factor(prev)1 0.000653 0.001570 0.000450  3.49 4.85e-04

Chisq=62.99 on 3 df, p=1.35e-13; test weights=aalen
par(mfrow = c(2,2))
plot(fit.final)

  1. When analyzing censored survival data, Cox regression is commonly used. Discuss benefits and drawbacks of using the additive model (rather than Cox’s model) for analyzing the AMI data.