[python] pymongo

紀錄一下閱讀pymongodb manual

What is MongoDB:
   1. a single service
        => 其實跟MySQL一樣: CONNECT, DEAL, CLOSE.
   2. collections-based, not table-based
        => JSON格式, 可以任意插入資料, 不用照TABLE來區分.
   3. less schema

        => 這裡不太懂,不過我想應該是指若要修改每一筆資料,要增加或刪除某個資料型態,直接做就好了。

   4. No need  to learn other language
=> 其實之前寫的RDB語法早忘光光了,但在MongoDB我只要告訴它部份資料,它就會回傳有該資料型態的collections
   5. well-supported in PHP
  步驟:
   (1) 跟mongod建立連線
       # conn = Mongod.Connection()
       * default port =27017
       *  指定dbfolder  --dbpath
   (2) 取得database
       # db = conn['GccDB']   // or  db = conn.GccDB (attribute style access)
   (3) 取得 collection (類似RDB裡面的 TABLE )
       # collection = db['GccTB']  //  db.GccTB
   (4) access data into/from collection
       ex.  Insertion a document
          # collection.posts.insert(transaction)
       ex.  Query
          # collection.posts.find({'Key':'Value'})

 

from pymongo import Connection

conn = Connection()
print "Database Server info:"
print " Host:", conn.host,
print ", port:", conn.port,
print ", Max pool size:", conn.max_pool_size
print

def queryDB():
print "###########################"
print "#    Query Database       #"
print "###########################"
for dbn in conn.database_names():
print ">Database:",  dbn
db = conn[dbn]
for collectname in db.collection_names():
print "->Collection:", collectname
collect = db[collectname]
for items in collect.find():
print "-->Document:"
for k,v in items.iteritems():
print "--->",k,":",v

def createDB(name=None):
if name in [None, '']:
print "[Error] No database name assigned."
return
if name in conn.database_names():
print "[Error] duplicated database name."
return
return conn[name]

def createTBL(db, name=None):
if name in [None, '']:
print "[Error] No collection name assigned."
return
if name in db.collection_names():
print "[Info] Collection", name, "created already."
else:
print "[Error] Failed to create DB %s, existed already!" % name
return
return db[name]

def addTransaction(tbl, document):
if document in [None, '']:
print "[Error] No document assigned."
return False
items = tbl.find(document)
print "(Before)Identical: #%d." % items.count()
if items.count() is 0:
tbl.insert(document)
print "Insert Okay."
else:
for item in items:
print item['_id']
return True

def delTransaction(tbl, document):
if document in [None, '']:
print "[Error] No document assigned."
return False
items = tbl.find(document)
print "(Before)Identical: #%d." % items.count()
if items.count():
tbl.remove(document, safe=True)
items = tbl.find(document)
print "(After)Identical: #%d." % items.count()
return True

def main():
queryDB()
db = createDB(name="test_database")
tbl = createTBL(db, name="test_collect.posts")
print addTransaction(tbl, {"date":"Nonono", "age":18, "companies":["KMP","DDP"], "name":"alan"})
conn.disconnect()

if __name__ == '__main__':
main()


另外在看Manual時也有提到幾個不同的NoSQL的database: GridFS跟Map/Reduce. 順便紀錄一下
GridFS 是指將file的metadata還有rawdata分開來儲存(定義兩個collections, 一個儲存file information: fs.files, 另一個則是儲存file content[以chunks的方式儲存在document中])的一種儲存規範.
做法很簡單:

conn = Connection()

db = conn["TEST_DB"]

gfs = GridFS(db)

#將資料寫入GridFS, 可以任意加入任何attributes

with open("MyData.MP4", "rb+") as fd:

    id = gfs.put(fd.read(), filename="MyData.MP4", uploader="glob", label=13)

    print gfs.list()

#collection "fs.files" 紀錄各個file的metadata

print gfs.fs.files.find()

#collection "fs.chunks" 紀錄各個file的rawdata

print gfs.fs.chunks.find()

#從GridFS中取得file的rawdata

with open("MyData2.MP4", "wb+") as fd:

    out = gfs.get_last_version("MyData.MP4")

     fd.write(out.read())
print out.name, out.length


參考連結

Comments

Popular posts from this blog

股票評價(Stock Valuation) - 股利折現模型

openwrt feed的使用

R 語言:邏輯回歸 Logistic Regression using R language (二)