Python 从条目具有不同长度的字典创建数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19736080/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:28:36  来源:igfitidea点击:

Creating dataframe from a dictionary where entries have different lengths

pythonpandas

提问by Josh

Say I have a dictionary with 10 key-value pairs. Each entry holds a numpy array. However, the length of the array is not the same for all of them.

假设我有一个包含 10 个键值对的字典。每个条目都包含一个 numpy 数组。但是,所有数组的长度并不相同。

How can I create a dataframe where each column holds a different entry?

如何创建一个数据框,其中每列都包含不同的条目?

When I try:

当我尝试:

pd.DataFrame(my_dict)

I get:

我得到:

ValueError: arrays must all be the same length

Any way to overcome this? I am happy to have Pandas use NaNto pad those columns for the shorter entries.

有什么办法可以克服这个吗?我很高兴让 PandasNaN为较短的条目填充这些列。

采纳答案by Jeff

In Python 3.x:

在 Python 3.x 中:

In [6]: d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )

In [7]: pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ]))
Out[7]: 
    A  B
0   1  1
1   2  2
2 NaN  3
3 NaN  4

In Python 2.x:

在 Python 2.x 中:

replace d.items()with d.iteritems().

替换d.items()d.iteritems().

回答by dezzan

Here's a simple way to do that:

这是一个简单的方法来做到这一点:

In[20]: my_dict = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
In[21]: df = pd.DataFrame.from_dict(my_dict, orient='index')
In[22]: df
Out[22]: 
   0  1   2   3
A  1  2 NaN NaN
B  1  2   3   4
In[23]: df.transpose()
Out[23]: 
    A  B
0   1  1
1   2  2
2 NaN  3
3 NaN  4

回答by user2015487

While this does not directly answer the OP's question. I found this to be an excellent solution for my case when I had unequal arrays and I'd like to share:

虽然这并不能直接回答 OP 的问题。当我有不相等的数组时,我发现这是一个很好的解决方案,我想分享:

from pandas documentation

来自熊猫文档

In [31]: d = {'one' : Series([1., 2., 3.], index=['a', 'b', 'c']),
   ....:      'two' : Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
   ....: 

In [32]: df = DataFrame(d)

In [33]: df
Out[33]: 
   one  two
a    1    1
b    2    2
c    3    3
d  NaN    4

回答by OrangeSherbet

A way of tidying up your syntax, but still do essentially the same thing as these other answers, is below:

一种整理语法但仍然与其他答案基本相同的方法如下:

>>> mydict = {'one': [1,2,3], 2: [4,5,6,7], 3: 8}

>>> dict_df = pd.DataFrame({ key:pd.Series(value) for key, value in mydict.items() })

>>> dict_df

   one  2    3
0  1.0  4  8.0
1  2.0  5  NaN
2  3.0  6  NaN
3  NaN  7  NaN

A similar syntax exists for lists, too:

列表也存在类似的语法:

>>> mylist = [ [1,2,3], [4,5], 6 ]

>>> list_df = pd.DataFrame([ pd.Series(value) for value in mylist ])

>>> list_df

     0    1    2
0  1.0  2.0  3.0
1  4.0  5.0  NaN
2  6.0  NaN  NaN

Another syntax for lists is:

列表的另一种语法是:

>>> mylist = [ [1,2,3], [4,5], 6 ]

>>> list_df = pd.DataFrame({ i:pd.Series(value) for i, value in enumerate(mylist) })

>>> list_df

   0    1    2
0  1  4.0  6.0
1  2  5.0  NaN
2  3  NaN  NaN

You may additionally have to transpose the result and/or change the column data types (float, integer, etc).

您可能还需要转置结果和/或更改列数据类型(浮点数、整数等)。

回答by jpp

You can also use pd.concatalong axis=1with a list of pd.Seriesobjects:

您还可以与对象列表pd.concat一起axis=1使用pd.Series

import pandas as pd, numpy as np

d = {'A': np.array([1,2]), 'B': np.array([1,2,3,4])}

res = pd.concat([pd.Series(v, name=k) for k, v in d.items()], axis=1)

print(res)

     A  B
0  1.0  1
1  2.0  2
2  NaN  3
3  NaN  4

回答by Ismail Hachimi

Both the following lines work perfectly :

以下两行都可以完美运行:

pd.DataFrame.from_dict(df, orient='index').transpose() #A

pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in df.items() ])) #B (Better)

But with %timeit on Jupyter, I've got a ratio of 4x speed for B vs A, which is quite impressive especially when working with a huge data set (mainly with a big number of columns/features).

但是在 Jupyter 上使用 %timeit 时,B 与 A 的速度比为 4 倍,这非常令人印象深刻,尤其是在处理大量数据集(主要是具有大量列/特征)时。

回答by Rohan Chandratre

If you don't want it to show NaNand you have two particular lengths, adding a 'space' in each remaining cell would also work.

如果您不希望它显示NaN并且您有两个特定的长度,则在每个剩余的单元格中添加一个“空格”也可以。

import pandas

long = [6, 4, 7, 3]
short = [5, 6]

for n in range(len(long) - len(short)):
    short.append(' ')

df = pd.DataFrame({'A':long, 'B':short}]
# Make sure Excel file exists in the working directory
datatoexcel = pd.ExcelWriter('example1.xlsx',engine = 'xlsxwriter')
df.to_excel(datatoexcel,sheet_name = 'Sheet1')
datatoexcel.save()

   A  B
0  6  5
1  4  6
2  7   
3  3   

If you have more than 2 lengths of entries, it is advisable to make a function which uses a similar method.

如果您有超过 2 个长度的条目,建议创建一个使用类似方法的函数。

回答by john joy

pd.DataFrame([my_dict]) will do!

pd.DataFrame([my_dict]) 会做!