R 語言：邏輯回歸 Logistic Regression using R language （一）

June 08, 2015

最近被 hive 搞得頭很大，加上選取的特徵量很多不確定性檔案又大，光是table砍了又建就增加了不少困難，乾脆直接取一小部分在local端用 R 做分析簡單又有效率的多（雖然可靠度降低又是另一個問題）。先前常用線性回歸，這裏記錄一下羅輯回歸的 R 如何實現。

1:   library('ggplot2')  
2:   x <- c(-100:100)  
3:   y <- 1./(1.+exp(-x*0.1))  
4:   df <- data.frame(x,y)  
5:   png('out.png')  
6:   g <- qplot(x,y,data=df, geom='line', color = "Exponential Dist")   
7:   dev.off()

一開始先列出回歸方程式

$\begin{equation} y = \frac{1}{1+\exp{\beta^{T}x}}\end{equation}$
有了羅輯回歸的 S-shaped / sigmoid curve 曲線圖也有了方程式較容易理解它的意思。為什麼這麼說呢？因為邏輯回歸不只是單純尋找自變數 {x} 與應變數 {y} 的對應關係，這裏 {y} 有兩個層面上的意思：（I）機率分佈的特性。（II）亦可以理解為 binary data (dichotomous variable)。
對(I)而言，可以想像成增加一個x單位，會改變多少的機率。
對(II)而言，就形成了分類，像是 0/1 或是輸贏。對，就是勝率（odds ratio）。
假設

$p(succeed) = \frac{\beta^Tx}{1+\beta^Tx}$ ,

$\begin{equation} odds = p(succeed)_{p(failure)} = \exp{\beta^Tx} \end{equation}$
取 log的話,

$\begin{equation}logit = log(odds) = \beta^Tx \end{equation}$
相當神奇，取個對數它又回到了線性關係上，因此我們稱 logistic regression 的 logit transform 為線性。

參考
http://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html

http://ww2.coastal.edu/kingw/statistics/R-tutorials/formulae.html

Search This Blog

JOGG's

R 語言：邏輯回歸 Logistic Regression using R language （一）

Comments

Post a Comment

Popular posts from this blog

股票評價(Stock Valuation) - 股利折現模型

openwrt feed的使用

R 語言：邏輯回歸 Logistic Regression using R language （二）