使用 Python 将 Pandas 数据帧中的行作为单个文档插入到 mongodb 集合中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33979983/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
insert rows from Pandas dataframe into mongodb collection as individual documents using Python
提问by BigUglyDataScientist
I have been attempting to insert the rows of a pandas dataframe into a mongodb collection as individual documents. I am pulling the data from MongoDB using pymongo, performing some transformations, running a scoring algorithm, and adding the score as an additional column to the dataframe. The last step will be to insert the rows into a special collection on the mongoDB database as individual documents but I am completely stuck. My example dataframe dflooks like this.
我一直在尝试将 Pandas 数据帧的行作为单个文档插入到 mongodb 集合中。我正在使用 pymongo 从 MongoDB 中提取数据,执行一些转换,运行评分算法,并将分数作为附加列添加到数据框中。最后一步是将行作为单独的文档插入到 mongoDB 数据库上的一个特殊集合中,但我完全被卡住了。我的示例数据框df看起来像这样。
memberID dxCodes dxCount score
0 856589080 [4280, 4293, 4241, 4240, 4242, 4243] 6 1.8
1 906903383 [V7612] 1 2.6
2 837210554 [4550, 4553, V1582] 3 3.1
3 935634391 [78791, 28860, V1582, 496, 25000, 4019] 6 1.1
4 929185103 [30500, 42731, 4280, 496, 59972, 4019, 3051] 7 2.8
MemberID is a string, dx codes would be an array (in MongoDB terminology), dxCount is an int, and score is a float. I have been toying around with a piece of code I found posted in response to a vaguely similar question.
MemberID 是一个字符串,dx 代码将是一个数组(在 MongoDB 术语中),dxCount 是一个 int,而 score 是一个浮点数。我一直在玩弄一段我在回答一个模糊类似的问题时发现的代码。
import json
import datetime
df = pandas.DataFrame.from_dict({'A': {1: datetime.datetime.now()}})
records = json.loads(df.T.to_json()).values()
db.temp.insert_many(records)
This is what I was able to get in my collection:
这是我能够在我的收藏中得到的:
{
"_id" : ObjectId("565a8f206d8bc51a08745de0"),
"A" : NumberLong(1448753856695)
}
It's not much, but its as close as I have gotten. I have spent a lot of time googling and taking shots in the dark but haven't cracked it yet. Any guidance is greatly appreciated, thanks in advance for your assistance!
这并不多,但它和我得到的一样接近。我花了很多时间在黑暗中谷歌搜索和拍摄,但还没有破解。非常感谢任何指导,在此先感谢您的帮助!
回答by styvane
You need to convert your DataFrame to list of dictionary using the .to_dict()
method.
您需要使用该.to_dict()
方法将 DataFrame 转换为字典列表。
>>> from pprint import pprint # to pretty print the cursor result.
>>> import pandas as pd
>>> import pymongo
>>> client = pymongo.MongoClient()
>>> db = client.test
>>> collection = db.collection
>>> memberID = ['856589080', '906903383', '837210554', '935634391', '929185103']
>>> dxCodes = [[4280, 4293, 4241, 4240, 4242, 4243], [7612], [4550, 4553, 1582],[78791, 28860, 1582, 496, 25000, 4019], [30500, 42731, 4280, 496, 59972, 4019, 3051]]
>>> dxCount = [6, 1, 3, 6, 7]
>>> score = [1.8, 2.6, 3.1, 1.1, 2.8]
>>> df = pd.DataFrame({'memberID': memberID, 'dxCodes': dxCodes, 'score': score})
>>> df
dxCodes memberID score
0 [4280, 4293, 4241, 4240, 4242, 4243] 856589080 1.8
1 [7612] 906903383 2.6
2 [4550, 4553, 1582] 837210554 3.1
3 [78791, 28860, 1582, 496, 25000, 4019] 935634391 1.1
4 [30500, 42731, 4280, 496, 59972, 4019, 3051] 929185103 2.8
>>> collection.insert_many(df.to_dict('records')) # you need to pass the 'records' as argument in order to get a list of dict.
<pymongo.results.InsertManyResult object at 0x7fcd7035d990>
>>> pprint(list(collection.find()))
[{'_id': ObjectId('565b189f0acf45181c69d464'),
'dxCodes': [4280, 4293, 4241, 4240, 4242, 4243],
'memberID': '856589080',
'score': 1.8},
{'_id': ObjectId('565b189f0acf45181c69d465'),
'dxCodes': [7612],
'memberID': '906903383',
'score': 2.6},
{'_id': ObjectId('565b189f0acf45181c69d466'),
'dxCodes': [4550, 4553, 1582],
'memberID': '837210554',
'score': 3.1},
{'_id': ObjectId('565b189f0acf45181c69d467'),
'dxCodes': [78791, 28860, 1582, 496, 25000, 4019],
'memberID': '935634391',
'score': 1.1},
{'_id': ObjectId('565b189f0acf45181c69d468'),
'dxCodes': [30500, 42731, 4280, 496, 59972, 4019, 3051],
'memberID': '929185103',
'score': 2.8}]
>>>