Python 将字典列表转换为 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20638006/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert list of dictionaries to a pandas DataFrame
提问by appleLover
I have a list of dictionaries like this:
我有一个这样的字典列表:
[{'points': 50, 'time': '5:00', 'year': 2010},
{'points': 25, 'time': '6:00', 'month': "february"},
{'points':90, 'time': '9:00', 'month': 'january'},
{'points_h1':20, 'month': 'june'}]
And I want to turn this into a pandas DataFramelike this:
我想把它变成这样的熊猫DataFrame:
month points points_h1 time year
0 NaN 50 NaN 5:00 2010
1 february 25 NaN 6:00 NaN
2 january 90 NaN 9:00 NaN
3 june NaN 20 NaN NaN
Note: Order of the columns does not matter.
注意:列的顺序无关紧要。
How can I turn the list of dictionaries into a pandas DataFrame as shown above?
如何将字典列表转换为 Pandas DataFrame,如上所示?
采纳答案by joris
Supposing dis your list of dicts, simply:
假设d是你的字典列表,简单地说:
pd.DataFrame(d)
回答by szeitlin
In pandas 16.2, I had to do pd.DataFrame.from_records(d)to get this to work.
在 pandas 16.2 中,我必须这样做pd.DataFrame.from_records(d)才能使其正常工作。
回答by shivsn
You can also use pd.DataFrame.from_dict(d)as :
您还可以pd.DataFrame.from_dict(d)用作:
In [8]: d = [{'points': 50, 'time': '5:00', 'year': 2010},
...: {'points': 25, 'time': '6:00', 'month': "february"},
...: {'points':90, 'time': '9:00', 'month': 'january'},
...: {'points_h1':20, 'month': 'june'}]
In [12]: pd.DataFrame.from_dict(d)
Out[12]:
month points points_h1 time year
0 NaN 50.0 NaN 5:00 2010.0
1 february 25.0 NaN 6:00 NaN
2 january 90.0 NaN 9:00 NaN
3 june NaN 20.0 NaN NaN
回答by cs95
How do I convert a list of dictionaries to a pandas DataFrame?
如何将字典列表转换为 Pandas DataFrame?
The other answers are correct, but not much has been explained in terms of advantages and limitations of these methods. The aim of this post will be to show examples of these methods under different situations, discuss when to use (and when not to use), and suggest alternatives.
其他答案是正确的,但在这些方法的优点和局限性方面没有得到太多解释。这篇文章的目的是展示这些方法在不同情况下的示例,讨论何时使用(以及何时不使用),并提出替代方案。
DataFrame(), DataFrame.from_records(), and .from_dict()
DataFrame(), DataFrame.from_records(), 和.from_dict()
Depending on the structure and format of your data, there are situations where either all three methods work, or some work better than others, or some don't work at all.
根据数据的结构和格式,在某些情况下,这三种方法都有效,或者有些方法比其他方法更好,或者有些方法根本不起作用。
Consider a very contrived example.
考虑一个非常人为的例子。
np.random.seed(0)
data = pd.DataFrame(
np.random.choice(10, (3, 4)), columns=list('ABCD')).to_dict('r')
print(data)
[{'A': 5, 'B': 0, 'C': 3, 'D': 3},
{'A': 7, 'B': 9, 'C': 3, 'D': 5},
{'A': 2, 'B': 4, 'C': 7, 'D': 6}]
This list consists of "records" with every keys present. This is the simplest case you could encounter.
该列表由“记录”组成,每个键都存在。这是您可能遇到的最简单的情况。
# The following methods all produce the same output.
pd.DataFrame(data)
pd.DataFrame.from_dict(data)
pd.DataFrame.from_records(data)
A B C D
0 5 0 3 3
1 7 9 3 5
2 2 4 7 6
Word on Dictionary Orientations: orient='index'/'columns'
字典方向上的单词:orient='index'/'columns'
Before continuing, it is important to make the distinction between the different types of dictionary orientations, and support with pandas. There are two primary types: "columns", and "index".
在继续之前,重要的是要区分不同类型的字典方向,并支持 pandas。有两种主要类型:“列”和“索引”。
orient='columns'
Dictionaries with the "columns" orientation will have their keys correspond to columns in the equivalent DataFrame.
orient='columns'
具有“列”方向的字典的键将对应于等效 DataFrame 中的列。
For example, dataabove is in the "columns" orient.
例如,data上面是在“列”方向。
data_c = [
{'A': 5, 'B': 0, 'C': 3, 'D': 3},
{'A': 7, 'B': 9, 'C': 3, 'D': 5},
{'A': 2, 'B': 4, 'C': 7, 'D': 6}]
pd.DataFrame.from_dict(data_c, orient='columns')
A B C D
0 5 0 3 3
1 7 9 3 5
2 2 4 7 6
Note: If you are using pd.DataFrame.from_records, the orientation is assumed to be "columns" (you cannot specify otherwise), and the dictionaries will be loaded accordingly.
注意:如果您使用pd.DataFrame.from_records,则假定方向为“列”(您不能另外指定),并且将相应地加载词典。
orient='index'
With this orient, keys are assumed to correspond to index values. This kind of data is best suited for pd.DataFrame.from_dict.
orient='index'
有了这个方向,假设键对应于索引值。这种数据最适合pd.DataFrame.from_dict.
data_i ={
0: {'A': 5, 'B': 0, 'C': 3, 'D': 3},
1: {'A': 7, 'B': 9, 'C': 3, 'D': 5},
2: {'A': 2, 'B': 4, 'C': 7, 'D': 6}}
pd.DataFrame.from_dict(data_i, orient='index')
A B C D
0 5 0 3 3
1 7 9 3 5
2 2 4 7 6
This case is not considered in the OP, but is still useful to know.
OP 中不考虑这种情况,但了解它仍然很有用。
Setting Custom Index
设置自定义索引
If you need a custom index on the resultant DataFrame, you can set it using the index=...argument.
如果您需要在结果 DataFrame 上自定义索引,您可以使用index=...参数设置它。
pd.DataFrame(data, index=['a', 'b', 'c'])
# pd.DataFrame.from_records(data, index=['a', 'b', 'c'])
A B C D
a 5 0 3 3
b 7 9 3 5
c 2 4 7 6
This is not supported by pd.DataFrame.from_dict.
这不受pd.DataFrame.from_dict.
Dealing with Missing Keys/Columns
处理丢失的键/列
All methods work out-of-the-box when handling dictionaries with missing keys/column values. For example,
在处理缺少键/列值的字典时,所有方法都是开箱即用的。例如,
data2 = [
{'A': 5, 'C': 3, 'D': 3},
{'A': 7, 'B': 9, 'F': 5},
{'B': 4, 'C': 7, 'E': 6}]
# The methods below all produce the same output.
pd.DataFrame(data2)
pd.DataFrame.from_dict(data2)
pd.DataFrame.from_records(data2)
A B C D E F
0 5.0 NaN 3.0 3.0 NaN NaN
1 7.0 9.0 NaN NaN NaN 5.0
2 NaN 4.0 7.0 NaN 6.0 NaN
Reading Subset of Columns
读取列子集
"What if I don't want to read in every single column"? You can easily specify this using the columns=...parameter.
“如果我不想阅读每一列怎么办”?您可以使用columns=...参数轻松指定它。
For example, from the example dictionary of data2above, if you wanted to read only columns "A', 'D', and 'F', you can do so by passing a list:
例如,从data2上面的示例字典中,如果您只想读取列“A”、“D”和“F”,您可以通过传递一个列表来实现:
pd.DataFrame(data2, columns=['A', 'D', 'F'])
# pd.DataFrame.from_records(data2, columns=['A', 'D', 'F'])
A D F
0 5.0 3.0 NaN
1 7.0 NaN 5.0
2 NaN NaN NaN
This is not supported by pd.DataFrame.from_dictwith the default orient "columns".
pd.DataFrame.from_dict默认方向“列”不支持此操作。
pd.DataFrame.from_dict(data2, orient='columns', columns=['A', 'B'])
ValueError: cannot use columns parameter with orient='columns'
Reading Subset of Rows
读取行子集
Not supported by any of these methods directly. You will have to iterate over your data and perform a reverse deletein-place as you iterate. For example, to extract only the 0thand 2ndrows from data2above, you can use:
这些方法中的任何一个都不直接支持。您将必须迭代数据并在迭代时就地执行反向删除。例如,为了仅提取0次和2次的行从data2上述,可以使用:
rows_to_select = {0, 2}
for i in reversed(range(len(data2))):
if i not in rows_to_select:
del data2[i]
pd.DataFrame(data2)
# pd.DataFrame.from_dict(data2)
# pd.DataFrame.from_records(data2)
A B C D E
0 5.0 NaN 3 3.0 NaN
1 NaN 4.0 7 NaN 6.0
The Panacea: json_normalizefor Nested Data
灵丹妙药:json_normalize嵌套数据
A strong, robust alternative to the methods outlined above is the json_normalizefunction which works with lists of dictionaries (records), and in addition can also handle nested dictionaries.
上述方法的一个强大而健壮的替代方案是json_normalize使用字典(记录)列表的函数,此外还可以处理嵌套字典。
pd.io.json.json_normalize(data)
A B C D
0 5 0 3 3
1 7 9 3 5
2 2 4 7 6
pd.io.json.json_normalize(data2)
A B C D E
0 5.0 NaN 3 3.0 NaN
1 NaN 4.0 7 NaN 6.0
Again, keep in mind that the data passed to json_normalizeneeds to be in the list-of-dictionaries (records) format.
再次记住,传递给的数据json_normalize需要采用字典列表(记录)格式。
As mentioned, json_normalizecan also handle nested dictionaries. Here's an example taken from the documentation.
如前所述,json_normalize还可以处理嵌套字典。这是从文档中获取的示例。
data_nested = [
{'counties': [{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}],
'info': {'governor': 'Rick Scott'},
'shortname': 'FL',
'state': 'Florida'},
{'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}],
'info': {'governor': 'John Kasich'},
'shortname': 'OH',
'state': 'Ohio'}
]
pd.io.json.json_normalize(data_nested,
record_path='counties',
meta=['state', 'shortname', ['info', 'governor']])
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
For more information on the metaand record_patharguments, check out the documentation.
有关meta和record_path参数的更多信息,请查看文档。
Summarising
总结
Here's a table of all the methods discussed above, along with supported features/functionality.
这是上面讨论的所有方法的表格,以及支持的特性/功能。
* Use orient='columns'and then transpose to get the same effect as orient='index'.
* 使用orient='columns'然后转置以获得与orient='index'.
回答by scottapotamus
I know a few people will come across this and find nothing here helps. The easiest way I have found to do it is like this:
我知道有些人会遇到这个问题,但在这里找不到任何帮助。我发现最简单的方法是这样的:
dict_count = len(dict_list)
df = pd.DataFrame(dict_list[0], index=[0])
for i in range(1,dict_count-1):
df = df.append(dict_list[i], ignore_index=True)
Hope this helps someone!
希望这对某人有帮助!
回答by Soum
Pyhton3:Most of the solutions listed previously work. However, there are instances when row_number of the dataframe is not required and the each row (record) has to be written individually.
The following method is useful in that case.
Pyhton3:以前列出的大多数解决方案都有效。但是,有些情况下不需要数据帧的 row_number 并且必须单独写入每一行(记录)。
在这种情况下,以下方法很有用。
import csv
my file= 'C:\Users\John\Desktop\export_dataframe.csv'
records_to_save = data2 #used as in the thread.
colnames = list[records_to_save[0].keys()]
# remember colnames is a list of all keys. All values are written corresponding
# to the keys and "None" is specified in case of missing value
with open(myfile, 'w', newline="",encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(colnames)
for d in records_to_save:
writer.writerow([d.get(r, "None") for r in colnames])
回答by Günel
list=[{'points': 50, 'time': '5:00', 'year': 2010},
{'points': 25, 'time': '6:00', 'month': "february"},
{'points':90, 'time': '9:00', 'month': 'january'},
{'points_h1':20, 'month': 'june'}]
and simple call:
和简单的调用:
pd=DataFrame.from_dict(list, orient='columns', dtype=None)
print(pd)
回答by Armin Ahmadi Nasab
For converting a list of dictionaries to a pandas DataFrame, you can use "append":
要将字典列表转换为 Pandas DataFrame,您可以使用“append”:
We have a dictionary called dicand dic has 30 list items (list1, list2,…, list30)
我们有一个名为dicdic的字典有 30 个列表项 ( list1, list2,..., list30)
- step1: define a variable for keeping your result (ex:
total_df) - step2: initialize
total_dfwithlist1 - step3: use "for loop" for append all lists to
total_df
- 步骤1:定义一个变量保持你的结果(例如:
total_df) - 第二步:初始化
total_df与list1 - 步骤 3:使用“for 循环”将所有列表附加到
total_df
total_df=list1
nums=Series(np.arange(start=2, stop=31))
for num in nums:
total_df=total_df.append(dic['list'+str(num)])


