SASLR_Prob_9.8_9.10, p. 3.17-3.18, 5th ed. ============================================== ***Problem 9-8; PROC FORMAT; VALUE YESNO 1='Yes' 0='No'; VALUE OUTCOME 1='Case' 0='Control'; RUN; DATA SMOKING; DO SUBJECT = 1 TO 1000; DO OUTCOME = 0,1; IF RANUNI(567) LT .1 OR RANUNI(0)*OUTCOME GT .5 THEN SMOKING = 1; ELSE SMOKING = 0; IF RANUNI(0) LT .05 OR (RANUNI(0)*OUTCOME + .1*SMOKING) GT .6 THEN ASBESTOS = 1; ELSE ASBESTOS = 0; IF RANUNI(0) LT .3 OR OUTCOME*RANUNI(0) GT .9 THEN SES = '1-Low '; ELSE IF RANUNI(0) LT .3 OR OUTCOME*RANUNI(0) GT .8 THEN SES = '2-Medium'; ELSE SES = '3-High'; OUTPUT; END; END; FORMAT SMOKING ASBESTOS YESNO. OUTCOME OUTCOME.; RUN; PROC PRINT DATA=SMOKING; VAR SUBJECT OUTCOME SMOKING ASBESTOS SES; RUN; Obs SUBJECT OUTCOME SMOKING ASBESTOS SES 1 1 Control No No 3-High 2 1 Case Yes No 3-High 3 2 Control No No 1-Low 4 2 Case Yes No 3-High 5 3 Control No No 3-High 6 3 Case No Yes 1-Low 7 4 Control No No 3-High 8 4 Case Yes No 2-Medium 9 5 Control No No 3-High 10 5 Case No Yes 2-Medium 11 6 Control No No 2-Medium 12 6 Case No No 3-High 13 7 Control No No 3-High 14 7 Case No No 3-High 15 8 Control No No 3-High ................................................................. ................................................................. ................................................................. 1989 995 Control Yes No 3-High 1990 995 Case No Yes 2-Medium 1991 996 Control Yes Yes 2-Medium 1992 996 Case No Yes 3-High 1993 997 Control No No 1-Low 1994 997 Case No No 2-Medium 1995 998 Control No Yes 3-High 1996 998 Case Yes Yes 2-Medium 1997 999 Control No No 3-High 1998 999 Case No No 1-Low 1999 1000 Control No No 2-Medium 2000 1000 Case Yes Yes 3-High PROC FREQ DATA=SMOKING; TITLE "Relationship between Smoking and Outcome"; TABLES SMOKING*OUTCOME/ CHISQ CMH; RUN; The FREQ Procedure Table of SMOKING by OUTCOME SMOKING OUTCOME: No Disease, Disease Frequency| Percent | Row Pct | Col Pct |Control |Case | Total ---------+--------+--------+ No | 899 | 459 | 1358 | 44.95 | 22.95 | 67.90 | 66.20 | 33.80 | | 89.90 | 45.90 | ---------+--------+--------+ Yes | 101 | 541 | 642 | 5.05 | 27.05 | 32.10 | 15.73 | 84.27 | | 10.10 | 54.10 | ---------+--------+--------+ Total 1000 1000 2000 50.00 50.00 100.00 Odds Ratio = 899*541/(101*459)=10.49115 RR(Col 1) = (899/1358)/(101/642)=4.207979 RR(Col 2) = (459/1358)/(541/642)=0.4010982 Statistics for Table of SMOKING by OUTCOME Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 444.1202 <.0001 Likelihood Ratio Chi-Square 1 476.3716 <.0001 Continuity Adj. Chi-Square 1 442.1038 <.0001 Mantel-Haenszel Chi-Square 1 443.8982 <.0001 Phi Coefficient 0.4712 Contingency Coefficient 0.4263 Cramer's V 0.4712 Fisher's Exact Test ---------------------------------- Cell (1,1) Frequency (F) 899 Left-sided Pr <= F 1.0000 Right-sided Pr >= F 2.209E-105 Table Probability (P) 2.072E-104 Two-sided Pr <= P 4.418E-105 Sample Size = 2000 Relationship between Smoking and Outcome Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value ----------------------------------------------- Case-Control Mantel-Haenszel 10.4911 (Odds Ratio) Logit 10.4911 Cohort Mantel-Haenszel 4.2080 (Col1 Risk) Logit 4.2080 Cohort Mantel-Haenszel 0.4011 (Col2 Risk) Logit 0.4011 Type of Study Method 95% Confidence Limits ------------------------------------------------------------- Case-Control Mantel-Haenszel 8.2496 13.3418 (Odds Ratio) Logit 8.2496 13.3418 Cohort Mantel-Haenszel 3.5042 5.0531 (Col1 Risk) Logit 3.5042 5.0531 Cohort Mantel-Haenszel 0.3697 0.4352 (Col2 Risk) Logit 0.3697 0.4352 Total Sample Size = 2000 PROC LOGISTIC DATA=SMOKING ORDER=FORMATTED; TITLE "SMOKING Only"; MODEL OUTCOME = SMOKING; RUN; Since we use ORDER=FORMATTED, SAS knows "case" is "success" since "case" come before "control", and we do not have to use DESCENDING. See below. Note: ORDER=FORMATTED is the default, included here just to remind ourselves. The LOGISTIC Procedure Model Information Data Set WORK.SMOKING Response Variable OUTCOME Number of Response Levels 2 Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 2000 Number of Observations Used 2000 Response Profile Ordered Total Value OUTCOME Frequency 1 Case 1000 2 Control 1000 Probability modeled is OUTCOME='Case'. <---!!! Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 2774.589 2300.217 SC 2780.190 2311.419 -2 Log L 2772.589 2296.217 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 476.3716 1 <.0001 Score 444.1202 1 <.0001 Wald 367.3132 1 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -0.6722 0.0574 137.3131 <.0001 SMOKING 1 2.3503 0.1226 367.3132 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits SMOKING 10.488 8.248 13.338 If want more options use: PROC LOGISTIC DATA=SMOKING ORDER=FORMATTED; TITLE "SMOKING Only"; MODEL OUTCOME = SMOKING / CTABLE PPROB=0 TO 1 BY .1 OUTROC=ROCDATA; RUN; ***Problem 9-10; Here we create a proper categorical variable SES using 2 dummies (1-Low,3-High). If 1-Low=1 and 3-High=0 Then SES=1-Low if 1-Low=0 and 3-High=1 Then SES=3-High 1-Low=0 and 3-High=0 Then SES=2-Medium PROC LOGISTIC DATA=SMOKING ORDER=FORMATTED; ***Note: ORDER=FORMATTED is the default, included just to remind ourselves; TITLE "SMOKING, ASBESTOS, AND SES IN THE MODEL "; CLASS SES (PARAM=REF REF='2-Medium'); MODEL OUTCOME = SMOKING ASBESTOS SES / CTABLE PPROB=0 TO 1 BY .1 OUTROC=ROCDATA; RUN; The LOGISTIC Procedure Model Information Data Set WORK.SMOKING Response Variable OUTCOME Number of Response Levels 2 Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 2000 Number of Observations Used 2000 Response Profile Ordered Total Value OUTCOME Frequency 1 Case 1000 2 Control 1000 Probability modeled is OUTCOME='Case'. Class Level Information Design Class Value Variables SES 1-Low 1 0 2-Medium 0 0 <--Reference 3-High 0 1 Note: The model contains only 1-Low and 3-High. If 1-Low=0 and 3-High=0, then the value of SES is 2-Medium. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 2774.589 1946.815 SC 2780.190 1974.819 -2 Log L 2772.589 1936.815 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 835.7741 4 <.0001 Score 696.0566 4 <.0001 Wald 490.5215 4 <.0001 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq SMOKING 1 277.2260 <.0001 ASBESTOS 1 232.7040 <.0001 SES 2 27.5915 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -0.9494 0.1179 64.7902 <.0001 SMOKING 1 2.2050 0.1324 277.2260 <.0001 ASBESTOS 1 2.5606 0.1679 232.7040 <.0001 SES 1-Low 1 0.1275 0.1484 0.7381 0.3903 SES 3-High 1 -0.5203 0.1431 13.2191 0.0003 Model is: log(p/(1-p))= -0.9494 + 2.2050*Smoking+2.5606*Asbestos+0.1275*SesLow-0.5203*SesHigh Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits SMOKING 9.070 6.997 11.758 ASBESTOS 12.944 9.315 17.986 SES 1-Low vs 2-Medium 1.136 0.849 1.519 SES 3-High vs 2-Medium 0.594 0.449 0.787 Check: Lets compute Odds Ratio for Smoking: First Smoking=1 all others 0 exp(-0.9494+2.2050)=3.509944 Then Smoking=0 all others 0 exp(-0.9494)=0.3869731 Odds Ratio: 3.509944/0.3869731=9.070253 OK!! log(p/(1-p))=-0.9494 + 2.2050*Smoking+2.5606*Asbestos+0.1275*SesLow-0.5203*SesHigh Similarly, lets compute Odds Ratio for 1-Low vs 2-Medium exp(-0.9494 + 0.1275)/exp(-0.9494)=1.135985 OK!! Now get ROC Curve: of Sensitivity vs Specifity. SYMBOL1 V=DOT I=SM60 COLOR=BLACK WIDTH=2; PROC GPLOT DATA=ROCDATA; TITLE "ROC Curve"; PLOT _SENSIT_ * _1MSPEC_ ; LABEL _SENSIT_ = 'Sensitivity' _1MSPEC_ = '1 - Specificity'; RUN; SYMBOL1 V=DOT I=SM60 COLOR=BLACK WIDTH=2; PROC PLOT DATA=ROCDATA; TITLE "ROC Curve"; PLOT _SENSIT_ * _1MSPEC_ ; LABEL _SENSIT_ = 'Sensitivity' _1MSPEC_ = '1 - Specificity'; RUN; Plot of _SENSIT_*_1MSPEC_. Legend: A = 1 obs, B = 2 obs, etc. | | 1.0 + A | | | A | | A 0.8 + | | A S | e | n | s 0.6 + A i | A t | i | A v | i | t 0.4 + A y | | A | | A | 0.2 + | A | | A | | 0.0 + | --+-----------+-----------+-----------+-----------+-----------+- 0.0 0.2 0.4 0.6 0.8 1.0 1 - Specificity