Python 将字典列表转换为 Pandas DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20638006/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:54:13  来源:igfitidea点击:

Convert list of dictionaries to a pandas DataFrame

pythondictionarypandasdataframe

提问by appleLover

I have a list of dictionaries like this:

我有一个这样的字典列表:

[{'points': 50, 'time': '5:00', 'year': 2010}, 
{'points': 25, 'time': '6:00', 'month': "february"}, 
{'points':90, 'time': '9:00', 'month': 'january'}, 
{'points_h1':20, 'month': 'june'}]

And I want to turn this into a pandas DataFramelike this:

我想把它变成这样的熊猫DataFrame

      month  points  points_h1  time  year
0       NaN      50        NaN  5:00  2010
1  february      25        NaN  6:00   NaN
2   january      90        NaN  9:00   NaN
3      june     NaN         20   NaN   NaN

Note: Order of the columns does not matter.

注意:列的顺序无关紧要。

How can I turn the list of dictionaries into a pandas DataFrame as shown above?

如何将字典列表转换为 Pandas DataFrame,如上所示?

采纳答案by joris

Supposing dis your list of dicts, simply:

假设d是你的字典列表,简单地说:

pd.DataFrame(d)

回答by szeitlin

In pandas 16.2, I had to do pd.DataFrame.from_records(d)to get this to work.

在 pandas 16.2 中,我必须这样做pd.DataFrame.from_records(d)才能使其正常工作。

回答by shivsn

You can also use pd.DataFrame.from_dict(d)as :

您还可以pd.DataFrame.from_dict(d)用作:

In [8]: d = [{'points': 50, 'time': '5:00', 'year': 2010}, 
   ...: {'points': 25, 'time': '6:00', 'month': "february"}, 
   ...: {'points':90, 'time': '9:00', 'month': 'january'}, 
   ...: {'points_h1':20, 'month': 'june'}]

In [12]: pd.DataFrame.from_dict(d)
Out[12]: 
      month  points  points_h1  time    year
0       NaN    50.0        NaN  5:00  2010.0
1  february    25.0        NaN  6:00     NaN
2   january    90.0        NaN  9:00     NaN
3      june     NaN       20.0   NaN     NaN

回答by cs95

How do I convert a list of dictionaries to a pandas DataFrame?

如何将字典列表转换为 Pandas DataFrame?

The other answers are correct, but not much has been explained in terms of advantages and limitations of these methods. The aim of this post will be to show examples of these methods under different situations, discuss when to use (and when not to use), and suggest alternatives.

其他答案是正确的,但在这些方法的优点和局限性方面没有得到太多解释。这篇文章的目的是展示这些方法在不同情况下的示例,讨论何时使用(以及何时不使用),并提出替代方案。



DataFrame(), DataFrame.from_records(), and .from_dict()

DataFrame(), DataFrame.from_records(), 和.from_dict()

Depending on the structure and format of your data, there are situations where either all three methods work, or some work better than others, or some don't work at all.

根据数据的结构和格式,在某些情况下,这三种方法都有效,或者有些方法比其他方法更好,或者有些方法根本不起作用。

Consider a very contrived example.

考虑一个非常人为的例子。

np.random.seed(0)
data = pd.DataFrame(
    np.random.choice(10, (3, 4)), columns=list('ABCD')).to_dict('r')

print(data)
[{'A': 5, 'B': 0, 'C': 3, 'D': 3},
 {'A': 7, 'B': 9, 'C': 3, 'D': 5},
 {'A': 2, 'B': 4, 'C': 7, 'D': 6}]

This list consists of "records" with every keys present. This is the simplest case you could encounter.

该列表由“记录”组成,每个键都存在。这是您可能遇到的最简单的情况。

# The following methods all produce the same output.
pd.DataFrame(data)
pd.DataFrame.from_dict(data)
pd.DataFrame.from_records(data)

   A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6

Word on Dictionary Orientations: orient='index'/'columns'

字典方向上的单词:orient='index'/'columns'

Before continuing, it is important to make the distinction between the different types of dictionary orientations, and support with pandas. There are two primary types: "columns", and "index".

在继续之前,重要的是要区分不同类型的字典方向,并支持 pandas。有两种主要类型:“列”和“索引”。

orient='columns'
Dictionaries with the "columns" orientation will have their keys correspond to columns in the equivalent DataFrame.

orient='columns'
具有“列”方向的字典的键将对应于等效 DataFrame 中的列。

For example, dataabove is in the "columns" orient.

例如,data上面是在“列”方向。

data_c = [
 {'A': 5, 'B': 0, 'C': 3, 'D': 3},
 {'A': 7, 'B': 9, 'C': 3, 'D': 5},
 {'A': 2, 'B': 4, 'C': 7, 'D': 6}]

pd.DataFrame.from_dict(data_c, orient='columns')

   A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6

Note: If you are using pd.DataFrame.from_records, the orientation is assumed to be "columns" (you cannot specify otherwise), and the dictionaries will be loaded accordingly.

注意:如果您使用pd.DataFrame.from_records,则假定方向为“列”(您不能另外指定),并且将相应地加载词典。

orient='index'
With this orient, keys are assumed to correspond to index values. This kind of data is best suited for pd.DataFrame.from_dict.

orient='index'
有了这个方向,假设键对应于索引值。这种数据最适合pd.DataFrame.from_dict.

data_i ={
 0: {'A': 5, 'B': 0, 'C': 3, 'D': 3},
 1: {'A': 7, 'B': 9, 'C': 3, 'D': 5},
 2: {'A': 2, 'B': 4, 'C': 7, 'D': 6}}

pd.DataFrame.from_dict(data_i, orient='index')

   A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6

This case is not considered in the OP, but is still useful to know.

OP 中不考虑这种情况,但了解它仍然很有用。

Setting Custom Index

设置自定义索引

If you need a custom index on the resultant DataFrame, you can set it using the index=...argument.

如果您需要在结果 DataFrame 上自定义索引,您可以使用index=...参数设置它。

pd.DataFrame(data, index=['a', 'b', 'c'])
# pd.DataFrame.from_records(data, index=['a', 'b', 'c'])

   A  B  C  D
a  5  0  3  3
b  7  9  3  5
c  2  4  7  6

This is not supported by pd.DataFrame.from_dict.

这不受pd.DataFrame.from_dict.

Dealing with Missing Keys/Columns

处理丢失的键/列

All methods work out-of-the-box when handling dictionaries with missing keys/column values. For example,

在处理缺少键/列值的字典时,所有方法都是开箱即用的。例如,

data2 = [
     {'A': 5, 'C': 3, 'D': 3},
     {'A': 7, 'B': 9, 'F': 5},
     {'B': 4, 'C': 7, 'E': 6}]

# The methods below all produce the same output.
pd.DataFrame(data2)
pd.DataFrame.from_dict(data2)
pd.DataFrame.from_records(data2)

     A    B    C    D    E    F
0  5.0  NaN  3.0  3.0  NaN  NaN
1  7.0  9.0  NaN  NaN  NaN  5.0
2  NaN  4.0  7.0  NaN  6.0  NaN

Reading Subset of Columns

读取列子集

"What if I don't want to read in every single column"? You can easily specify this using the columns=...parameter.

“如果我不想阅读每一列怎么办”?您可以使用columns=...参数轻松指定它。

For example, from the example dictionary of data2above, if you wanted to read only columns "A', 'D', and 'F', you can do so by passing a list:

例如,从data2上面的示例字典中,如果您只想读取列“A”、“D”和“F”,您可以通过传递一个列表来实现:

pd.DataFrame(data2, columns=['A', 'D', 'F'])
# pd.DataFrame.from_records(data2, columns=['A', 'D', 'F'])

     A    D    F
0  5.0  3.0  NaN
1  7.0  NaN  5.0
2  NaN  NaN  NaN

This is not supported by pd.DataFrame.from_dictwith the default orient "columns".

pd.DataFrame.from_dict默认方向“列”不支持此操作。

pd.DataFrame.from_dict(data2, orient='columns', columns=['A', 'B'])

ValueError: cannot use columns parameter with orient='columns'

Reading Subset of Rows

读取行子集

Not supported by any of these methods directly. You will have to iterate over your data and perform a reverse deletein-place as you iterate. For example, to extract only the 0thand 2ndrows from data2above, you can use:

这些方法中的任何一个都不直接支持。您将必须迭代数据并在迭代时就地执行反向删除。例如,为了仅提取0和2的行从data2上述,可以使用:

rows_to_select = {0, 2}
for i in reversed(range(len(data2))):
    if i not in rows_to_select:
        del data2[i]

pd.DataFrame(data2)
# pd.DataFrame.from_dict(data2)
# pd.DataFrame.from_records(data2)

     A    B  C    D    E
0  5.0  NaN  3  3.0  NaN
1  NaN  4.0  7  NaN  6.0


The Panacea: json_normalizefor Nested Data

灵丹妙药:json_normalize嵌套数据

A strong, robust alternative to the methods outlined above is the json_normalizefunction which works with lists of dictionaries (records), and in addition can also handle nested dictionaries.

上述方法的一个强大而健壮的替代方案是json_normalize使用字典(记录)列表的函数,此外还可以处理嵌套字典。

pd.io.json.json_normalize(data)

   A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6

pd.io.json.json_normalize(data2)

     A    B  C    D    E
0  5.0  NaN  3  3.0  NaN
1  NaN  4.0  7  NaN  6.0

Again, keep in mind that the data passed to json_normalizeneeds to be in the list-of-dictionaries (records) format.

再次记住,传递给的数据json_normalize需要采用字典列表(记录)格式。

As mentioned, json_normalizecan also handle nested dictionaries. Here's an example taken from the documentation.

如前所述,json_normalize还可以处理嵌套字典。这是从文档中获取的示例。

data_nested = [
  {'counties': [{'name': 'Dade', 'population': 12345},
                {'name': 'Broward', 'population': 40000},
                {'name': 'Palm Beach', 'population': 60000}],
   'info': {'governor': 'Rick Scott'},
   'shortname': 'FL',
   'state': 'Florida'},
  {'counties': [{'name': 'Summit', 'population': 1234},
                {'name': 'Cuyahoga', 'population': 1337}],
   'info': {'governor': 'John Kasich'},
   'shortname': 'OH',
   'state': 'Ohio'}
]

pd.io.json.json_normalize(data_nested, 
                          record_path='counties', 
                          meta=['state', 'shortname', ['info', 'governor']])

         name  population    state shortname info.governor
0        Dade       12345  Florida        FL    Rick Scott
1     Broward       40000  Florida        FL    Rick Scott
2  Palm Beach       60000  Florida        FL    Rick Scott
3      Summit        1234     Ohio        OH   John Kasich
4    Cuyahoga        1337     Ohio        OH   John Kasich

For more information on the metaand record_patharguments, check out the documentation.

有关metarecord_path参数的更多信息,请查看文档。



Summarising

总结

Here's a table of all the methods discussed above, along with supported features/functionality.

这是上面讨论的所有方法的表格,以及支持的特性/功能。

enter image description here

在此处输入图片说明

* Use orient='columns'and then transpose to get the same effect as orient='index'.

* 使用orient='columns'然后转置以获得与orient='index'.

回答by scottapotamus

I know a few people will come across this and find nothing here helps. The easiest way I have found to do it is like this:

我知道有些人会遇到这个问题,但在这里找不到任何帮助。我发现最简单的方法是这样的:

dict_count = len(dict_list)
df = pd.DataFrame(dict_list[0], index=[0])
for i in range(1,dict_count-1):
    df = df.append(dict_list[i], ignore_index=True)

Hope this helps someone!

希望这对某人有帮助!

回答by Soum

Pyhton3:Most of the solutions listed previously work. However, there are instances when row_number of the dataframe is not required and the each row (record) has to be written individually.

The following method is useful in that case.

Pyhton3:以前列出的大多数解决方案都有效。但是,有些情况下不需要数据帧的 row_number 并且必须单独写入每一行(记录)。

在这种情况下,以下方法很有用。

import csv

my file= 'C:\Users\John\Desktop\export_dataframe.csv'

records_to_save = data2 #used as in the thread. 


colnames = list[records_to_save[0].keys()] 
# remember colnames is a list of all keys. All values are written corresponding
# to the keys and "None" is specified in case of missing value 

with open(myfile, 'w', newline="",encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(colnames)
    for d in records_to_save:
        writer.writerow([d.get(r, "None") for r in colnames])

回答by Günel

list=[{'points': 50, 'time': '5:00', 'year': 2010}, 
{'points': 25, 'time': '6:00', 'month': "february"}, 
{'points':90, 'time': '9:00', 'month': 'january'}, 
{'points_h1':20, 'month': 'june'}]

and simple call:

和简单的调用:

pd=DataFrame.from_dict(list, orient='columns', dtype=None)

print(pd)

回答by Armin Ahmadi Nasab

For converting a list of dictionaries to a pandas DataFrame, you can use "append":

要将字典列表转换为 Pandas DataFrame,您可以使用“append”:

We have a dictionary called dicand dic has 30 list items (list1, list2,…, list30)

我们有一个名为dicdic的字典有 30 个列表项 ( list1, list2,..., list30)

  1. step1: define a variable for keeping your result (ex: total_df)
  2. step2: initialize total_dfwith list1
  3. step3: use "for loop" for append all lists to total_df
  1. 步骤1:定义一个变量保持你的结果(例如:total_df
  2. 第二步:初始化total_dflist1
  3. 步骤 3:使用“for 循环”将所有列表附加到 total_df
total_df=list1
nums=Series(np.arange(start=2, stop=31))
for num in nums:
    total_df=total_df.append(dic['list'+str(num)])