R 語言:邏輯回歸 Logistic Regression using R language (三)
文件上有另一個多個數值變數的回歸分析的例子,這裏簡單記錄一下分析流程與結果。
有三組自變數 { w, c, wc(w+c) }, 應變數為 seeen {0,1}
STEP 1. 找出相關性
根據上面的結果發現其相關性與 seen 甚低。
相關性低的話,若仍是想知道回歸分析結果,可以見到 "deviance was reduced by 8.157 points on 7 degree of freedom", p-value 則為 1- pchisq(8.157, df=7) = 0.3189. 表示 no significant reduction in deviance (no significant difference from null model). 表示 model 極差。
ANOVA 分析結果顯示 deviance reduction 從 65.438 降至 57.281 on 7 degree. 而由上而下,deviance reduction 效果越顯著,然而最好的 w:c:cw 僅降低 3.3053 on 1 degree of freedom with p=0.069.
有三組自變數 { w, c, wc(w+c) }, 應變數為 seeen {0,1}
STEP 1. 找出相關性
> str(gorilla)
'data.frame': 49 obs. of 4 variables:
$ seen: int 0 0 0 0 0 0 0 0 0 0 ...
$ W : int 126 118 61 69 57 78 114 81 73 93 ...
$ C : int 86 76 66 48 59 64 61 85 57 50 ...
$ CW : int 64 54 44 32 42 53 41 47 33 45 ...
> cor(gorilla)
seen W C CW
seen 1.00000000 -0.03922667 0.05437115 0.06300865
W -0.03922667 1.00000000 0.43044418 0.35943580
C 0.05437115 0.43044418 1.00000000 0.64463361
CW 0.06300865 0.35943580 0.64463361 1.00000000
>
根據上面的結果發現其相關性與 seen 甚低。
> glm.out = glm(seen ~ W*C*CW, family=binomial(logit), data=gorilla)
> summary(glm.out)
Call:
glm(formula = seen ~ W * C * CW, family = binomial(logit), data = gorilla)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8073 -0.9897 -0.5740 1.2368 1.7362
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.323e+02 8.037e+01 -1.646 0.0998 .
W 1.316e+00 7.514e-01 1.751 0.0799 .
C 2.129e+00 1.215e+00 1.753 0.0797 .
CW 2.206e+00 1.659e+00 1.329 0.1837
W:C -2.128e-02 1.140e-02 -1.866 0.0621 .
W:CW -2.201e-02 1.530e-02 -1.439 0.1502
C:CW -3.582e-02 2.413e-02 -1.485 0.1376
W:C:CW 3.579e-04 2.225e-04 1.608 0.1078
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 65.438 on 48 degrees of freedom
Residual deviance: 57.281 on 41 degrees of freedom
AIC: 73.281
Number of Fisher Scoring iterations: 5
相關性低的話,若仍是想知道回歸分析結果,可以見到 "deviance was reduced by 8.157 points on 7 degree of freedom", p-value 則為 1- pchisq(8.157, df=7) = 0.3189. 表示 no significant reduction in deviance (no significant difference from null model). 表示 model 極差。
anova(glm.out, test='Chisq')
Analysis of Deviance Table
Model: binomial, link: logit
Response: seen
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 48 65.438
W 1 0.0755 47 65.362 0.78351
C 1 0.3099 46 65.052 0.57775
CW 1 0.1061 45 64.946 0.74467
W:C 1 2.3632 44 62.583 0.12423
W:CW 1 0.5681 43 62.015 0.45103
C:CW 1 1.4290 42 60.586 0.23193
W:C:CW 1 3.3053 41 57.281 0.06906 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANOVA 分析結果顯示 deviance reduction 從 65.438 降至 57.281 on 7 degree. 而由上而下,deviance reduction 效果越顯著,然而最好的 w:c:cw 僅降低 3.3053 on 1 degree of freedom with p=0.069.
> abline(v=30.5,col='red')
> abline(h=.3,col='green')
> abline(h=.5,col='green')
> text(15,.9, 'seen=0')
> text(40,.9, 'seen=1')
由上圖表可知,大多發生在 0.5 以下,而且 unseen (seen=0) 的佔約七八成。
Comments
Post a Comment