pandas 将字典列表转换为数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40973211/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert list of Dictionaries to a Dataframe
提问by Arshad Islam
I am facing a basic problem of converting a list of dictionaries obtained from parsing a column with text in json format. Below is the brief snapshot of data:
我面临一个基本问题,即转换从解析带有 json 格式文本的列中获得的字典列表。以下是数据的简要快照:
[{u'PAGE TYPE': u'used-serp.model.brand.city'},
{u'BODY TYPE': u'MPV Cars',
u'ENGINE CAPACITY': u'1461',
u'FUEL TYPE': u' Diesel',
u'MODEL NAME': u'Renault Lodgy',
u'OEM NAME': u'Renault',
u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
{u'PAGE TYPE': u'used-serp.brand.city'},
{u'BODY TYPE': u'SUV Cars',
u'ENGINE CAPACITY': u'2477',
u'FUEL TYPE': u' Diesel',
u'MODEL NAME': u'Mitsubishi Pajero',
u'OEM NAME': u'Mitsubishi',
u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
{u'BODY TYPE': u'Hatchback Cars',
u'ENGINE CAPACITY': u'1198',
u'FUEL TYPE': u' Petrol , Diesel',
u'MODEL NAME': u'Volkswagen Polo',
u'OEM NAME': u'Volkswagen',
u'PAGE TYPE': u'New-ModelPage.GalleryTab'},
Furthermore, the code i am using to parse is detailed below:
此外,我用来解析的代码详述如下:
stdf_noncookie = []
stdf_noncookiejson = []
for index, row in df_noncookie.iterrows():
try:
loop_data = json.loads(row['attributes'])
stdf_noncookie.append(loop_data)
except ValueError:
loop_nondata = row['attributes']
stdf_noncookiejson.append(loop_nondata)
stdf_noncookie is the list of dictionaries i am trying to convert into a pandas dataframe. 'attributes' is the column with text in json format. I have tried to get some learning from this link, however this was not able to solve my problem. Any suggestion/tips for converting a list of dictionaries to panda dataframe will be helpful.
stdf_noncookie 是我试图转换为Pandas数据框的字典列表。'attributes' 是带有 json 格式文本的列。我试图从这个链接中学到一些东西,但这并不能解决我的问题。任何将字典列表转换为Pandas数据框的建议/技巧都会有所帮助。
回答by CraicerHyman
To convert your list of dicts to a pandas dataframe use the following:
要将您的 dicts 列表转换为 pandas 数据框,请使用以下命令:
stdf_noncookiejson = pd.DataFrame.from_records(data)
DataFrame.from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)
DataFrame.from_records (data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)
You can set the index, name the columns etc as you read it in
您可以在阅读时设置索引,命名列等
If youre working with json you can also use the read_json
method
如果你使用 json 你也可以使用这个read_json
方法
stdf_noncookiejson = pd.read_json(data)
pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, lines=False)
pandas.read_json (path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precision_float=False, date_unit=None, encoding=None,行=假)
回答by amin
Simply, you can use the pandas DataFrame
constructor.
简单地说,您可以使用 pandasDataFrame
构造函数。
import pandas as pd
print (pd.DataFrame(data))
回答by Arshad Islam
回答by Arshad Islam
Finally found a way to convert a list of dict to panda dataframe. Below is the code:
终于找到了一种将 dict 列表转换为Pandas数据框的方法。下面是代码:
Method A
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = stdf_noncookie.apply(pd.Series)
Method B
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = pd.DataFrame(stdf_noncookie.tolist())
Method A is much quicker than Method B. I will create another post asking for help on the difference between the two methods. Also, on some datasets Method B is not working.
方法 A 比方法 B 快得多。我将创建另一篇文章,寻求有关两种方法之间差异的帮助。此外,在某些数据集上,方法 B 不起作用。
回答by Warren
I was able to do it with a list comprehension. But my problem was that I left my dict's json encoded so they looked like strings.
我能够通过列表理解来做到这一点。但我的问题是我留下了我的 dict 的 json 编码,所以它们看起来像字符串。
d = r.zrangebyscore('live-ticks', '-inf', time.time())
dform = [json.loads(i) for i in d]
df = pd.DataFram(dfrom)