Python 使用 PyMongo 将 Pandas Dataframe 插入 mongodb

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20167194/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:44:07  来源:igfitidea点击:

Insert a Pandas Dataframe into mongodb using PyMongo

pythonmongodbpython-2.7pandaspymongo

提问by Nyxynyx

What is the quickest way to insert a pandas DataFrame into mongodb using PyMongo?

使用 将 Pandas DataFrame 插入 mongodb 的最快方法是什么PyMongo

Attempts

尝试

db.myCollection.insert(df.to_dict())

gave an error

出错了

InvalidDocument: documents must have only string keys, the key was Timestamp('2013-11-23 13:31:00', tz=None)

InvalidDocument: documents must have only string keys, the key was Timestamp('2013-11-23 13:31:00', tz=None)



 db.myCollection.insert(df.to_json())

gave an error

出错了

TypeError: 'str' object does not support item assignment

TypeError: 'str' object does not support item assignment



 db.myCollection.insert({id: df.to_json()})

gave an error

出错了

InvalidDocument: documents must have only string a keys, key was <built-in function id>

InvalidDocument: documents must have only string a keys, key was <built-in function id>



df

df

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 150 entries, 2013-11-23 13:31:26 to 2013-11-23 13:24:07
Data columns (total 3 columns):
amount    150  non-null values
price     150  non-null values
tid       150  non-null values
dtypes: float64(2), int64(1)

采纳答案by alko

I doubt there is a both quickestand simplemethod. If you don't worry about data conversion, you can do

我怀疑有一种既快速简单的方法。如果你不担心数据转换,你可以做

>>> import json
>>> df = pd.DataFrame.from_dict({'A': {1: datetime.datetime.now()}})
>>> df
                           A
1 2013-11-23 21:14:34.118531

>>> records = json.loads(df.T.to_json()).values()
>>> db.myCollection.insert(records)

But in case you try to load data back, you'll get:

但如果您尝试重新加载数据,您将获得:

>>> df = read_mongo(db, 'myCollection')
>>> df
                     A
0  1385241274118531000
>>> df.dtypes
A    int64
dtype: object

so you'll have to convert 'A' columnt back to datetimes, as well as all not int, floator strfields in your DataFrame. For this example:

所以你必须将“A” columnt回datetimeS,以及所有不intfloatstr字段您DataFrame。对于这个例子:

>>> df['A'] = pd.to_datetime(df['A'])
>>> df
                           A
0 2013-11-23 21:14:34.118531

回答by PasteBT

how about this:

这个怎么样:

db.myCollection.insert({id: df.to_json()})

id will be a unique string for that df

id 将是该 df 的唯一字符串

回答by dieguico

Here you have the very quickest way. Using the insert_manymethod from pymongo 3 and 'records' parameter of to_dictmethod.

在这里,您有最快的方法。使用insert_manypymongo 3 中的方法和方法的“记录”参数to_dict

db.collection.insert_many(df.to_dict('records'))

回答by Femto Trader

odocan do it using

odo可以使用

odo(df, db.myCollection)

回答by Radical Edward

If your dataframe has missing data (i.e None,nan) and you don't want null key values in your documents:

如果您的数据框缺少数据(即无,nan)并且您不希望文档中的键值为空:

db.insert_many(df.to_dict("records"))will insert keys with null values. If you don't want the empty key values in your documents you can use a modified version of pandas .to_dict("records")code below:

db.insert_many(df.to_dict("records"))将插入具有空值的键。如果您不希望文档中出现空键值,您可以使用下面的修改版 Pandas.to_dict("records")代码:

from pandas.core.common import _maybe_box_datetimelike
my_list = [dict((k, _maybe_box_datetimelike(v)) for k, v in zip(df.columns, row) if v != None and v == v) for row in df.values]
db.insert_many(my_list)

where the if v != None and v == vI've added checks to make sure the value is not Noneor nanbefore putting it in the row's dictionary. Now your .insert_manywill only include keys with values in the documents (and no nulldata types).

if v != None and v == v我添加了检查,以确保该值不None还是nan把它在该行的字典了。现在您.insert_many将只在文档中包含带有值的键(并且没有null数据类型)。

回答by Rafael Valero

I think there is cool ideas in this question. In my case I have been spending time more taking care of the movement of large dataframes. In those case pandas tends to allow you the option of chunksize(for examples in the pandas.DataFrame.to_sql). So I think I con contribute here by adding the function I am using in this direction.

我认为这个问题中有很酷的想法。就我而言,我一直在花更多时间来处理大型数据帧的移动。在这些情况下,大熊猫往往让你的选择CHUNKSIZE(在例子pandas.DataFrame.to_sql)。所以我想我可以通过添加我在这个方向使用的函数来做出贡献。

def write_df_to_mongoDB(  my_df,\
                          database_name = 'mydatabasename' ,\
                          collection_name = 'mycollectionname',
                          server = 'localhost',\
                          mongodb_port = 27017,\
                          chunk_size = 100):
    #"""
    #This function take a list and create a collection in MongoDB (you should
    #provide the database name, collection, port to connect to the remoete database,
    #server of the remote database, local port to tunnel to the other machine)
    #
    #---------------------------------------------------------------------------
    #Parameters / Input
    #    my_list: the list to send to MongoDB
    #    database_name:  database name
    #
    #    collection_name: collection name (to create)
    #    server: the server of where the MongoDB database is hosted
    #        Example: server = '132.434.63.86'
    #    this_machine_port: local machine port.
    #        For example: this_machine_port = '27017'
    #    remote_port: the port where the database is operating
    #        For example: remote_port = '27017'
    #    chunk_size: The number of items of the list that will be send at the
    #        some time to the database. Default is 100.
    #
    #Output
    #    When finished will print "Done"
    #----------------------------------------------------------------------------
    #FUTURE modifications.
    #1. Write to SQL
    #2. Write to csv
    #----------------------------------------------------------------------------
    #30/11/2017: Rafael Valero-Fernandez. Documentation
    #"""



    #To connect
    # import os
    # import pandas as pd
    # import pymongo
    # from pymongo import MongoClient

    client = MongoClient('localhost',int(mongodb_port))
    db = client[database_name]
    collection = db[collection_name]
    # To write
    collection.delete_many({})  # Destroy the collection
    #aux_df=aux_df.drop_duplicates(subset=None, keep='last') # To avoid repetitions
    my_list = my_df.to_dict('records')
    l =  len(my_list)
    ran = range(l)
    steps=ran[chunk_size::chunk_size]
    steps.extend([l])

    # Inser chunks of the dataframe
    i = 0
    for j in steps:
        print j
        collection.insert_many(my_list[i:j]) # fill de collection
        i = j

    print('Done')
    return

回答by Jon Pan

Just make string keys!

只需制作字符串键!

import json
dfData = json.dumps(df.to_dict('records'))
savaData = {'_id': 'a8e42ed79f9dae1cefe8781760231ec0', 'df': dfData}
res = client.insert_one(savaData)

##### load dfData
data = client.find_one({'_id': 'a8e42ed79f9dae1cefe8781760231ec0'}).get('df')
dfData = json.loads(data)
df = pd.DataFrame.from_dict(dfData)