LOG TO SHOW HOW TO CALCULATE (non-time-dependent) COX-MODEL L0G-LIKELIHOODS AND SCORE STATISTICS ============================================ 11/3/05 We illustrate the methods for the Mayo lung cancer data. > library(survival); data(lung) ## We already have a fitted model object, and we want to ## reproduce its results without coxph ## Recall that we subtracted 1 from lung$status so that ## 1=dead and 0=censored > lung$status <- lung$status-1 > lung1$call coxph(formula = Surv(time, status) ~ sex + ph.ecog + ph.karno + wt.loss, data = lung) ## We delete all observations for which any of the ### covariates was missing. > lung <- lung[!is.na(data.matrix(lung[,c(5:7,10)]) %*% lung1$coef),] ### 15 rows were deleted ## To begin with, we must code a logPL function > logPL <- function(beta, colinds, data=lung) { ## beta represents coefficients for the covariate columns ## numbered by colinds in data-frame "data" timvec <- as.numeric(names(table(data$time[data$status==1]))) ordvec <- rev(order(data$time)) ## reverse ordering of event times riskscor <- c(data.matrix(data[,colinds]) %*% beta) dthinds <- (data$status==1)[ordvec] sum(riskscor*data$status) - sum(log(cumsum( exp(riskscor)[ordvec]))[dthinds]) } > logPL(lung1$coef, c(5:7,10)) [1] -659.5066 > lung1$loglik [1] -675.0244 -659.4920 ### The difference in log-lik's is ### trivial, due to the different handling of ties ### Now we contrast the optimized coefficients using this function ### with the ones previously found in "lung1" > nlm(function(x) -logPL(x,c(5:7,10)), rep(0,4)) $minimum [1] 659.5066 ## exactly the negative of logLik found above! $estimate [1] -0.622552534 0.752966080 0.013803830 -0.009007201 $gradient [1] -4.547474e-05 4.888534e-05 2.310117e-04 1.680291e-04 $code [1] 1 ### means proper convergence $iterations [1] 23 ### Estimated coeff's essentially the same as coxph-fitted coef's > lung1$coef sex ph.ecog ph.karno wt.loss -0.623109513 0.753441139 0.013815564 -0.009007685 --------------------------------------------------------------- # Resume by coding a function to calculate score-statistic # (before normalization) with respect to any single beta # coefficient with the others fixed. ---------------------------------------------------------------- > ScorStat <- function(beta, colnum, colinds, data=lung) { timvec <- as.numeric(names(table(data$time[data$status==1]))) ordvec <- rev(order(data$time)) ## reverse ordering of event times riskscor <- c(data.matrix(data[,colinds]) %*% beta) dthinds <- (data$status==1)[ordvec] fac <- exp(riskscor[ordvec]) denom <- cumsum(fac) num1 <- cumsum(fac*data[ordvec,colnum]) num2 <- cumsum(fac*(data[ordvec,colnum]^2)) c(scor=sum(data[,colnum]*data$status)-sum((num1/denom)[dthinds]), SE = sqrt(sum((num2/denom - (num1/denom)^2)[dthinds]))) } > ScorStat(lung1$coef,5,c(5:7,10)) scor SE 0.01665988 5.73199634 ### Compare the first compoent with the difference quotient for ### the log-likelihood > 1.e6*(logPL(lung1$coef+c(1.e-6,0,0,0),c(5:7,10))- logPL(lung1$coef,c(5:7,10))) [1] 0.01664364 ### Compare the second component with the (1,1) entry of the lung1 ### information matrix obtained by inverting the variance matrix > sqrt(solve(lung1$var)[1,1]) [1] 5.732542 ## But note that although this is the variance that the ScorStat ## function calculates, it is not necessarily the right one ## to use in assessing the importance of the sex-coefficient ## in the model because it ignores the variability of the other ## fitted model-coefficients. Thus, .0166/5.732 is quite ## different from the normal (Wald-test) z deviate in summary ## for lung1: coef exp(coef) se(coef) z p sex -0.6231 0.536 0.17639 -3.53 0.00041 ------------------------------------------------------------ AS WE SAW PREVIOUSLY, THERE IS CONSIDERABLE CODING NEEDED TO GET WALD-STATISTICS IN THE TIME-DEPENDENT COVARIATE CONTEXT. HOWEVER, FOR SCORE-STATISTICS ONLY, WE CAN USE THE SAME KIND OF R FUNCTION AS ABOVE, SLIGHTLY MODIFIED: > ScorTim <- function(beta, colnum, colinds, Tfcn = function(t) log(t), data=lung) { timvec <- as.numeric(names(table(data$time[data$status==1]))) ordvec <- rev(order(data$time)) ## reverse ordering of event times riskscor <- c(data.matrix(data[,colinds]) %*% beta) dthinds <- (data$status==1)[ordvec] fac <- exp(riskscor[ordvec]) denom <- cumsum(fac) num1 <- cumsum(fac*data[ordvec,colnum]*Tfcn(data$time[ordvec])) num2 <- cumsum(fac*((data[ordvec,colnum]*Tfcn(data$time[ordvec]))^2)) c(scor=sum(data[,colnum]*data$status*Tfcn(data$time))- sum((num1/denom)[dthinds]), SE = sqrt(sum((num2/denom - (num1/denom)^2)[dthinds]))) } NOTE: THIS FUNCTION SHOULD WORK LIKE THE OTHER, BUT HAS NOT BEEN FULLY TESTED ! Let's try it on the same dataset, after first re-coding the sex variable to be 0, 1 instead of 1,2 : > lung$sex <- lung$sex-1 > ScorTim(lung1$coef,5,c(5:7,10)) scor SE -27.69990 35.9414 ### Indicates non-significant interaction ### effect for time-dependent sex by log(t) .