如何将 mongodb 集合中的数据加载到 Pandas 的 DataFrame 中？

Question

提问by user2161725

I am new to pandas (well, to all things "programming"...), but have been encouraged to give it a try. I have a mongodb database - "test" - with a collection called "tweets". I access the database in ipython:

我是 Pandas 的新手（嗯，对所有“编程”......），但一直被鼓励尝试一下。我有一个 mongodb 数据库 - “test” - 有一个名为“tweets”的集合。我在 ipython 中访问数据库：

import sys
import pymongo
from pymongo import Connection
connection = Connection()
db = connection.test
tweets = db.tweets

the document structure of documents in tweets is as follows:

推文中文档的文档结构如下：

entities': {u'hashtags': [],
  u'symbols': [],
  u'urls': [],
  u'user_mentions': []},
 u'favorite_count': 0,
 u'favorited': False,
 u'filter_level': u'medium',
 u'geo': {u'coordinates': [placeholder coordinate, -placeholder coordinate], u'type': u'Point'},
 u'id': 349223842700472320L,
 u'id_str': u'349223842700472320',
 u'in_reply_to_screen_name': None,
 u'in_reply_to_status_id': None,
 u'in_reply_to_status_id_str': None,
 u'in_reply_to_user_id': None,
 u'in_reply_to_user_id_str': None,
 u'lang': u'en',
 u'place': {u'attributes': {},
  u'bounding_box': {u'coordinates': [[[placeholder coordinate, placeholder coordinate],
     [-placeholder coordinate, placeholder coordinate],
     [-placeholder coordinate, placeholder coordinate],
     [-placeholder coordinate, placeholder coordinate]]],
   u'type': u'Polygon'},
  u'country': u'placeholder country',
  u'country_code': u'example',
  u'full_name': u'name, xx',
  u'id': u'user id',
  u'name': u'name',
  u'place_type': u'city',
  u'url': u'http://api.twitter.com/1/geo/id/1820d77fb3f65055.json'},
 u'retweet_count': 0,
 u'retweeted': False,
 u'source': u'<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
 u'text': u'example text',
 u'truncated': False,
 u'user': {u'contributors_enabled': False,
  u'created_at': u'Sat Jan 22 13:42:59 +0000 2011',
  u'default_profile': False,
  u'default_profile_image': False,
  u'description': u'example description',
  u'favourites_count': 100,
  u'follow_request_sent': None,
  u'followers_count': 100,
  u'following': None,
  u'friends_count': 100,
  u'geo_enabled': True,
  u'id': placeholder_id,
  u'id_str': u'placeholder_id',
  u'is_translator': False,
  u'lang': u'en',
  u'listed_count': 0,
  u'location': u'example place',
  u'name': u'example name',
  u'notifications': None,
  u'profile_background_color': u'000000',
  u'profile_background_image_url': u'http://a0.twimg.com/images/themes/theme19/bg.gif',
  u'profile_background_image_url_https': u'https://si0.twimg.com/images/themes/theme19/bg.gif',
  u'profile_background_tile': False,
  u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/241527685/1363314054',
  u'profile_image_url':       u'http://a0.twimg.com/profile_images/378800000038841219/8a71d0776da0c48dcc4ef6fee9f78880_normal.jpeg',
  u'profile_image_url_https':     u'https://si0.twimg.com/profile_images/378800000038841219/8a71d0776da0c48dcc4ef6fee9f78880_normal.jpeg', 
  u'profile_link_color': u'000000',
  u'profile_sidebar_border_color': u'FFFFFF',
  u'profile_sidebar_fill_color': u'000000',
  u'profile_text_color': u'000000',
  u'profile_use_background_image': False,
  u'protected': False,
  u'screen_name': placeholder screen_name',
  u'statuses_count': xxxx,
  u'time_zone': u'placeholder time_zone',
  u'url': None,
  u'utc_offset': -21600,
  u'verified': False}}

Now, as far as I understand, pandas' main data structure - a spreadsheet-like table - is called DataFrame. How can I load the data from my "tweets" collection into pandas' DataFrame? And how can I query for a subdocument within the database?

现在，据我所知，pandas 的主要数据结构——一个类似电子表格的表格——被称为 DataFrame。如何将“推文”集合中的数据加载到 Pandas 的 DataFrame 中？以及如何查询数据库中的子文档？

Answer 1

回答by waitingkuo

Comprehend the cursor you got from the MongoDB before passing it to DataFrame

在将游标传递给 DataFrame 之前理解从 MongoDB 获得的游标

import pandas as pd
df = pd.DataFrame(list(tweets.find()))

Answer 2

回答by Mark Unsworth

If you have data in MongoDb like this:

如果您在 MongoDb 中有这样的数据：

[
    {
        "name": "Adam", 
        "age": 27, 
        "address":{
            "number": 4, 
            "street": "Main Road", 
            "city": "Oxford"
        }
     },
     {
        "name": "Steve", 
        "age": 32, 
        "address":{
            "number": 78, 
            "street": "High Street", 
            "city": "Cambridge"
        }
     }
]

You can put the data straight into a dataframe like this:

您可以将数据直接放入数据框中，如下所示：

from pandas import DataFrame

df = DataFrame(list(db.collection_name.find({}))

And you will get this output:

你会得到这个输出：

df.head()

|    | name    | age  | address                                                   |
|----|---------|------|-----------------------------------------------------------|
| 1  | "Steve" | 27   | {"number": 4, "street": "Main Road", "city": "Oxford"}    | 
| 2  | "Adam"  | 32   | {"number": 78, "street": "High St", "city": "Cambridge"}  |

However the subdocuments will just appear as JSON inside the subdocument cell. If you want to flatten objects so that subdocument properties are shown as individual cells you can use json_normalizewithout any parameters.

但是，子文档将在子文档单元格内仅显示为 JSON。如果您想展平对象以便子文档属性显示为单个单元格，您可以使用不带任何参数的json_normalize。

from pandas.io.json import json_normalize

datapoints = list(db.collection_name.find({})

df = json_normalize(datapoints)

df.head()

This will give the dataframe in this format:

这将以这种格式提供数据框：

|    | name   | age  | address.number | address.street | address.city |
|----|--------|------|----------------|----------------|--------------|
| 1  | Thomas | 27   |     4          | "Main Road"    | "Oxford"     |
| 2  | Mary   | 32   |     78         | "High St"      | "Cambridge"  |

Answer 3

回答by saimadhu.polamuri

You can load your MongoDB data to pandas DataFame using this code. It works for me. Hope for you too.

您可以使用此代码将 MongoDB 数据加载到 Pandas DataFame。这个对我有用。也希望你。

import pymongo
import pandas as pd
from pymongo import Connection
connection = Connection()
db = connection.database_name
input_data = db.collection_name
data = pd.DataFrame(list(input_data.find()))

Answer 4

回答by user3575499

Use: df=pd.DataFrame.from_dict(collection)

使用：df=pd.DataFrame.from_dict(collection)

如何将 mongodb 集合中的数据加载到 Pandas 的 DataFrame 中？

提问by user2161725

回答by waitingkuo

回答by Mark Unsworth

回答by saimadhu.polamuri

回答by user3575499

相关推荐

最近更新

标签

如何将 mongodb 集合中的数据加载到 Pandas 的 DataFrame 中？

提问by user2161725

回答by waitingkuo

回答by Mark Unsworth

回答by saimadhu.polamuri

回答by user3575499

相关推荐

wpf 绑定到父数据上下文（超出项目源）

在 WPF 中的网格中显示对其他控件的控件

WPF：调度程序处理已暂停，但仍在处理消息

将 WPF 网页浏览器控件设置为使用 IE10 模式

相关推荐

最近更新

标签