關於 object identification algorithm (1): HOG

April 28, 2018

HOG, histogram of orientation gradient, 傳統影像處理上在做 image processing/object identification/object classification 最常見的名詞, 是由 Dalal 與 Triggs 在 CVPR 2005 的發表 Histogram of Oriented Gradients for Human Detection, 主要是證明一個物件局部外觀可以透過其邊界方向分布 (或稱方向梯度)有效描述 (local object appearance can be effectively described by distribution of edge direction or orientated gradients)。網路上介紹這算法的文章太多，這裡僅是整理一下個人的認知。


skimage.feature.hog(image, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(3, 3), block_norm=None, visualize=False, visualise=None, transform_sqrt=False, feature_vector=True, multichannel=None)

以上 scikit-iamge 的 hog descripter 用法，從參數上理解是
image: input image
orientations: 指的是 HOG 算法中的梯度/方向, 即 histogram 中的 bins, 9 表示將 180分成 [20,40,60,80,100,120,140,160,180].
pixels_per_cell: 定義作 convolution 時的最小單位 cell 所包含的 pixel 數, 預設[8x8] = 64
cells_per_block: 定義 block 內作 normalization 時所包含的最小單位 cell數, 預設[3x3] = 9
block_norm: 指的是針對normalization 時作 L1 或 L2.

接著是影像處理常見的步驟:

1. preprocessing
常用的是 resize, crop, auto_canny, find_contour 等等找出要 train 的 object, 非常耗時...

2. feature extraction
影像中找 feature 常用的有 HOG, SIFT, SURF 等作法, 這裡僅參考HOG作業流程:

2.1 convolution
即利用 kernel 作 convolution (即內積), 常見的 kernel 有
$$g_x=\begin{bmatrix}-1, 0, 1\end{bmatrix}, g_y=\begin{bmatrix}-1\\0\\1\end{bmatrix}$$

cv2.Sobel(src, ddepth, dx, dy[, dst[, ksize[, scale[, delta[, borderType]]]]])

sobelx = cv2.Sobel(img,cv2.CV_64F,1,0,ksize=3)

sobely = cv2.Sobel(img,cv2.CV_64F,0,1,ksize=3)

opencv 裡面可以透過 sobel 函式計算, 另外

$$\rvert g\rvert =\sqrt{{g_x}^2+{g_y}^2}$$
$$\theta = \arctan {g_y \over g_x} $$

如此可以針對每個cell, 算出 gradient 與 orientation. 另外由於 gradient 是 unsigned, orientation 範圍落在 0 - 180 之間.

2.2 block normalization
因為 cell 總數太多, 計算量太大, 這裡以改以 block 為單位 (假測採用hog算法預設參數, 以一張240x120的照片為例):


vectors = [] # 81*blocks

foreach block in blocks:

    vectors_per_block = [] #81 vectors, i.e., a block contains 9 cells * 9 pixels = 81

    foreach cell in block:
       vector = [ 0 for i in range(orientations)] # 9 indice, i.e., 20,40,60,...,180

       foreach point in cell:

           theta = sampling(point.theta)
           vector[theta] += point.gradients
       vectors_per_block.append(vector)      

    norm_vectors_per_block = norm(vectors_per_block)

    vectors.append(norm_vectors_per_block)

這裡計算一下 vector 數:
vertical 平移次數 240 / (8*3) -1 = 9次
horizonal 平移次數 120/(8*3) -1 = 4次
每個 block 共有 8*8 cells * 9 orientations = 576 個 vectors,
整張圖共有 9*4*576 = 20736 個 feature vectors.

2.3 feature vector
透過 block normalization 取得的 vectors 稱作 feature vector, 接著可以用 SVM 去訓練了

2.4 predciton


data=[]; labels = []

foreach image_ori  in imageset:

    img  = preprocessing(image_ori)

    h = skimage.feature.hog(img)

    data.append(h)

    labels.append(image_ori.label)



model = sklearn.svm.SVC()

model.fit(data, labels)



foreach image_ori in testset:

     img  = preprocessing(image_ori)

     pred = model.predict(img.reshape(1,-1))[0]

最後，
HOG 的缺點如下：
１．feature vector 數量是效率的致命傷
２．orientations, pixels_per_cell, cells_per_block 需要 tune
３．若辨識的物體是 substantial structural variation (即部分的結構方向會改變？)則效果不會太好
４．速度不是最佳

參考［一］HOG實現的完整說明
參考［二］關於 convolution 加速
參考［三］關於HOG的實例

Search This Blog

JOGG's

關於 object identification algorithm (1): HOG

Comments

Post a Comment

Popular posts from this blog

股票評價(Stock Valuation) - 股利折現模型

openwrt feed的使用

R 語言：邏輯回歸 Logistic Regression using R language （二）