R 語言:邏輯回歸 Logistic Regression using R language (三)

文件上有另一個多個數值變數的回歸分析的例子,這裏簡單記錄一下分析流程與結果。
有三組自變數 { w, c, wc(w+c) }, 應變數為 seeen {0,1}

STEP 1. 找出相關性
 > str(gorilla)  
 'data.frame':     49 obs. of 4 variables:  
  $ seen: int 0 0 0 0 0 0 0 0 0 0 ...  
  $ W  : int 126 118 61 69 57 78 114 81 73 93 ...  
  $ C  : int 86 76 66 48 59 64 61 85 57 50 ...  
  $ CW : int 64 54 44 32 42 53 41 47 33 45 ...  
 > cor(gorilla)  
       seen      W     C     CW  
 seen 1.00000000 -0.03922667 0.05437115 0.06300865  
 W  -0.03922667 1.00000000 0.43044418 0.35943580  
 C   0.05437115 0.43044418 1.00000000 0.64463361  
 CW  0.06300865 0.35943580 0.64463361 1.00000000  
 >   

根據上面的結果發現其相關性與 seen 甚低。

 > glm.out = glm(seen ~ W*C*CW, family=binomial(logit), data=gorilla)  
 > summary(glm.out)  
 Call:  
 glm(formula = seen ~ W * C * CW, family = binomial(logit), data = gorilla)  
 Deviance Residuals:   
   Min    1Q  Median    3Q   Max   
 -1.8073 -0.9897 -0.5740  1.2368  1.7362   
 Coefficients:  
        Estimate Std. Error z value Pr(>|z|)   
 (Intercept) -1.323e+02 8.037e+01 -1.646  0.0998 .  
 W      1.316e+00 7.514e-01  1.751  0.0799 .  
 C      2.129e+00 1.215e+00  1.753  0.0797 .  
 CW      2.206e+00 1.659e+00  1.329  0.1837   
 W:C     -2.128e-02 1.140e-02 -1.866  0.0621 .  
 W:CW    -2.201e-02 1.530e-02 -1.439  0.1502   
 C:CW    -3.582e-02 2.413e-02 -1.485  0.1376   
 W:C:CW    3.579e-04 2.225e-04  1.608  0.1078   
 ---  
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
 (Dispersion parameter for binomial family taken to be 1)  
   Null deviance: 65.438 on 48 degrees of freedom  
 Residual deviance: 57.281 on 41 degrees of freedom  
 AIC: 73.281  
 Number of Fisher Scoring iterations: 5  

相關性低的話,若仍是想知道回歸分析結果,可以見到 "deviance was reduced by 8.157 points on 7 degree of freedom", p-value 則為 1- pchisq(8.157, df=7) = 0.3189. 表示  no significant reduction in deviance (no significant difference from null model). 表示 model 極差。

 anova(glm.out, test='Chisq')  
 Analysis of Deviance Table  
 Model: binomial, link: logit  
 Response: seen  
 Terms added sequentially (first to last)  
     Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
 NULL           48   65.438        
 W    1  0.0755    47   65.362 0.78351   
 C    1  0.3099    46   65.052 0.57775   
 CW   1  0.1061    45   64.946 0.74467   
 W:C   1  2.3632    44   62.583 0.12423   
 W:CW  1  0.5681    43   62.015 0.45103   
 C:CW  1  1.4290    42   60.586 0.23193   
 W:C:CW 1  3.3053    41   57.281 0.06906 .  
 ---  
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  

ANOVA 分析結果顯示 deviance reduction 從 65.438 降至 57.281 on 7 degree. 而由上而下,deviance reduction 效果越顯著,然而最好的 w:c:cw 僅降低 3.3053 on 1 degree of freedom with p=0.069.

 > abline(v=30.5,col='red')  
 > abline(h=.3,col='green')  
 > abline(h=.5,col='green')  
 > text(15,.9, 'seen=0')  
 > text(40,.9, 'seen=1')  
由上圖表可知,大多發生在 0.5 以下,而且 unseen (seen=0) 的佔約七八成。









Comments

Popular posts from this blog

股票評價(Stock Valuation) - 股利折現模型

openwrt feed的使用

R 語言:邏輯回歸 Logistic Regression using R language (二)