Introduction to Recommendation

Introduction to Recommendation - 關於 TFIDF

October 05, 2013

這次上課主要介紹 TFIDF

[原由]
　primitive search engine 中, 當user 對某些term做搜尋, 得到的是所有包含這個 term 的結果. 能否根據 documents 中的 term 出現頻率將搜尋結果排序?
　問題1: 當user 輸入 "civil" "war", 可能 "civil" 比 "war" 來的有意義.
　問題2: user 輸入 "civil" x 10000 跟 "war" x 15000 可能沒多大意義

[定義]
　TFIDF = Term Frequency * Inverse Document Frequency
　TF: 一個 term 在一份 document 中出現的次數
　IDF: 這個 term　在 documents 中有多稀有

[用途]
　1. create a profile of a document/object
　2. this TFIDF profiles can be combined with rating to create user profiles, then match against the future profiles.

[Variants and alternatives]
　1. 0/1 boolean frq
　2. Log (TF+1)
　3. normalized frq. [document length]

[可能遇到的問題]
　1. core term/concept 沒被使用
　2. poor search

[TFIDF的限制]
　1. Phrase and n-grams - 像是computer science 不等同 computer 跟 science
　2. significance in documents - title, tag, heading　覺其他 term 更有意義
　3. general document authority - 像是 google的 pagerank 跟 movie 的 rating 等訊息
　4. implied content - 包含的 link, usage　等等

Search This Blog

JOGG's

Introduction to Recommendation - 關於 TFIDF

Comments

Post a Comment

Popular posts from this blog

股票評價(Stock Valuation) - 股利折現模型

openwrt feed的使用

R 語言：邏輯回歸 Logistic Regression using R language （二）