[python] pymongo

December 09, 2011

紀錄一下閱讀pymongodb manual

What is MongoDB:
1. a single service
=> 其實跟MySQL一樣: CONNECT, DEAL, CLOSE.
2. collections-based, not table-based
=> JSON格式, 可以任意插入資料, 不用照TABLE來區分.
3. less schema

=> 這裡不太懂，不過我想應該是指若要修改每一筆資料，要增加或刪除某個資料型態，直接做就好了。

4. No need to learn other language
=> 其實之前寫的RDB語法早忘光光了，但在MongoDB我只要告訴它部份資料，它就會回傳有該資料型態的collections
5. well-supported in PHP
步驟:
(1) 跟mongod建立連線
# conn = Mongod.Connection()
* default port =27017
* 指定dbfolder --dbpath
(2) 取得database
# db = conn['GccDB'] // or db = conn.GccDB (attribute style access)
(3) 取得 collection (類似RDB裡面的 TABLE )
# collection = db['GccTB'] // db.GccTB
(4) access data into/from collection
ex. Insertion a document
# collection.posts.insert(transaction)
ex. Query
# collection.posts.find({'Key':'Value'})

from pymongo import Connection

conn = Connection()
print "Database Server info:"
print " Host:", conn.host,
print ", port:", conn.port,
print ", Max pool size:", conn.max_pool_size
print

def queryDB():
 print "###########################"
 print "#    Query Database       #"
 print "###########################"
 for dbn in conn.database_names():
  print ">Database:",  dbn
  db = conn[dbn]
  for collectname in db.collection_names():
   print "->Collection:", collectname
   collect = db[collectname]
   for items in collect.find():
    print "-->Document:"
    for k,v in items.iteritems():
     print "--->",k,":",v

def createDB(name=None):
 if name in [None, '']:
  print "[Error] No database name assigned."
  return
 if name in conn.database_names():
  print "[Error] duplicated database name."
  return
 return conn[name]

def createTBL(db, name=None):
 if name in [None, '']:
  print "[Error] No collection name assigned."
  return
 if name in db.collection_names():
  print "[Info] Collection", name, "created already."
 else:
  print "[Error] Failed to create DB %s, existed already!" % name
  return
 return db[name]

def addTransaction(tbl, document):
 if document in [None, '']:
  print "[Error] No document assigned."
  return False
 items = tbl.find(document)
 print "(Before)Identical: #%d." % items.count()
 if items.count() is 0:
  tbl.insert(document)
  print "Insert Okay."
 else:
  for item in items:
   print item['_id']
 return True

def delTransaction(tbl, document):
 if document in [None, '']:
  print "[Error] No document assigned."
  return False
 items = tbl.find(document)
 print "(Before)Identical: #%d." % items.count()
 if items.count():
  tbl.remove(document, safe=True)
  items = tbl.find(document)
  print "(After)Identical: #%d." % items.count()
 return True

def main():
 queryDB()
 db = createDB(name="test_database")
 tbl = createTBL(db, name="test_collect.posts")
 print addTransaction(tbl, {"date":"Nonono", "age":18, "companies":["KMP","DDP"], "name":"alan"})
 conn.disconnect()

if __name__ == '__main__':
 main()

另外在看Manual時也有提到幾個不同的NoSQL的database: GridFS跟Map/Reduce. 順便紀錄一下
GridFS 是指將file的metadata還有rawdata分開來儲存（定義兩個collections, 一個儲存file information: fs.files, 另一個則是儲存file content[以chunks的方式儲存在document中]）的一種儲存規範.
做法很簡單:

conn = Connection()

db = conn["TEST_DB"]

gfs = GridFS(db)

#將資料寫入GridFS, 可以任意加入任何attributes

with open("MyData.MP4", "rb+") as fd:

    id = gfs.put(fd.read(), filename="MyData.MP4", uploader="glob", label=13)

    print gfs.list()

#collection "fs.files" 紀錄各個file的metadata

print gfs.fs.files.find()

#collection "fs.chunks" 紀錄各個file的rawdata

print gfs.fs.chunks.find()

#從GridFS中取得file的rawdata

with open("MyData2.MP4", "wb+") as fd:

    out = gfs.get_last_version("MyData.MP4")

     fd.write(out.read())
    print out.name, out.length

參考連結

Search This Blog

JOGG's

[python] pymongo

Comments

Post a Comment

Popular posts from this blog

股票評價(Stock Valuation) - 股利折現模型

openwrt feed的使用

R 語言：邏輯回歸 Logistic Regression using R language （二）