Python 如何将字典附加到熊猫数据框?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31695108/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:26:29  来源:igfitidea点击:

How to append a dictionary to a pandas dataframe?

pythonjsonfor-loopdictionarypandas

提问by Blue Moon

I have a set of urls containing json files and an empty pandas dataframe with columns representing the attributes of the jsnon files. Not all json files have all the attributes in the pandas dataframe. What I need to do is to create dictionaries out of the json files and then append each dictionary to the pandas dataframe as a new row and, in case the json file doesn't have an attribute matching a column in the dataframe this has to be filled blank.

我有一组包含 json 文件的 url 和一个空的 Pandas 数据框,其中的列代表 jsnon 文件的属性。并非所有 json 文件都具有 pandas 数据帧中的所有属性。我需要做的是从 json 文件中创建字典,然后将每个字典作为新行附加到 Pandas 数据帧,如果 json 文件没有与数据帧中的列匹配的属性,这必须是填空。

I managed to create dictionaries as:

我设法将字典创建为:

import urllib2
import json  

url = "https://cws01.worldstores.co.uk/api/product.php?product_sku=ULST:7BIS01CF"
data = urllib2.urlopen(url).read()
data = json.loads(data)

and then I tried to create a for loop as follows:

然后我尝试创建一个 for 循环,如下所示:

row = -1
for i in links:
    row = row + 1
    data = urllib2.urlopen(str(i)).read()
    data = json.loads(data)
    for key in data.keys():
        for column in df.columns:
            if str(column) == str(key):
                df.loc[[str(column)],row] = data[str(key)]
            else:
                df.loc[[str(column)],row] = None

where df is the dataframe and links is the set of urls

其中 df 是数据框, links 是一组 url

However, I get the following error:

但是,我收到以下错误:

raise KeyError('%s not in index' % objarr[mask])

KeyError: "['2_seater_depth_mm'] not in index"

where ['2_seater_depth_mm'] is the first column of the pandas dataframe

其中 ['2_seater_depth_mm'] 是熊猫数据框的第一列

采纳答案by zuku

For me below code works:

对我来说,下面的代码有效:

row = -1
for i in links:
    row = row + 1
    data = urllib2.urlopen(str(i)).read()
    data = json.loads(data)
    for key in data.keys():
        df.loc[row,key] = data[key]

You have mixed order of arguments in .loc()and have one to much []

你有混合的参数顺序.loc()并且有一对多[]

回答by dermen

Assuming that dfis empty and has the same columns as the url dictionary keys, i.e.

假设它df是空的并且与 url 字典键具有相同的列,即

list(df)
#[u'alternate_product_code',
# u'availability',
# u'boz',
# ...

len(df)
#0

then you can use pandas.append

那么你可以使用pandas.append

for url in links:
    url_data = urllib2.urlopen(str(url)).read()
    url_dict = json.loads(url_data)
    a_dict   = { k:pandas.Series([str(v)], index=[0]) for k,v in url_dict.iteritems() }
    new_df = pandas.DataFrame.from_dict(a_dict)
    df.append(new_df, ignore_index=True)

Not too sure why your code won't work, but consider the following few edits which should clean things up, should you still want to use it:

不太确定为什么您的代码不起作用,但是如果您仍然想使用它,请考虑以下几个应该清理的编辑:

for row,url in enumerate(links):
    data      = urllib2.urlopen(str(url)).read()
    data_dict = json.loads(data)
    for key,val in data_dict.items():
        if key in list(df):
            df.ix[row,key] = val

I used enumerateto iterate over the index and value of links array, in this way you dont need an index counter (rowin your code) and then I used the .itemsdictionary method, so I can iterate over key and values at once. I believe pandas will automatically handle the empty dataframe entries.

我曾经enumerate迭代过链接数组的索引和值,这样你就不需要索引计数器(row在你的代码中),然后我使用了.items字典方法,所以我可以一次迭代键和值。我相信熊猫会自动处理空的数据框条目。