Python 从条目具有不同长度的字典创建数据框

Question

提问by Josh

Say I have a dictionary with 10 key-value pairs. Each entry holds a numpy array. However, the length of the array is not the same for all of them.

假设我有一个包含 10 个键值对的字典。每个条目都包含一个 numpy 数组。但是，所有数组的长度并不相同。

How can I create a dataframe where each column holds a different entry?

如何创建一个数据框，其中每列都包含不同的条目？

When I try:

当我尝试：

pd.DataFrame(my_dict)

I get:

我得到：

ValueError: arrays must all be the same length

Any way to overcome this? I am happy to have Pandas use NaNto pad those columns for the shorter entries.

有什么办法可以克服这个吗？我很高兴让 PandasNaN为较短的条目填充这些列。

Answer 1

采纳答案by Jeff

In Python 3.x:

在 Python 3.x 中：

In [6]: d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )

In [7]: pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ]))
Out[7]: 
    A  B
0   1  1
1   2  2
2 NaN  3
3 NaN  4

In Python 2.x:

在 Python 2.x 中：

replace d.items()with d.iteritems().

替换d.items()为d.iteritems().

Answer 2

回答by dezzan

Here's a simple way to do that:

这是一个简单的方法来做到这一点：

In[20]: my_dict = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
In[21]: df = pd.DataFrame.from_dict(my_dict, orient='index')
In[22]: df
Out[22]: 
   0  1   2   3
A  1  2 NaN NaN
B  1  2   3   4
In[23]: df.transpose()
Out[23]: 
    A  B
0   1  1
1   2  2
2 NaN  3
3 NaN  4

Answer 3

回答by user2015487

While this does not directly answer the OP's question. I found this to be an excellent solution for my case when I had unequal arrays and I'd like to share:

虽然这并不能直接回答 OP 的问题。当我有不相等的数组时，我发现这是一个很好的解决方案，我想分享：

from pandas documentation

来自熊猫文档

In [31]: d = {'one' : Series([1., 2., 3.], index=['a', 'b', 'c']),
   ....:      'two' : Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
   ....: 

In [32]: df = DataFrame(d)

In [33]: df
Out[33]: 
   one  two
a    1    1
b    2    2
c    3    3
d  NaN    4

Answer 4

回答by OrangeSherbet

A way of tidying up your syntax, but still do essentially the same thing as these other answers, is below:

一种整理语法但仍然与其他答案基本相同的方法如下：

>>> mydict = {'one': [1,2,3], 2: [4,5,6,7], 3: 8}

>>> dict_df = pd.DataFrame({ key:pd.Series(value) for key, value in mydict.items() })

>>> dict_df

   one  2    3
0  1.0  4  8.0
1  2.0  5  NaN
2  3.0  6  NaN
3  NaN  7  NaN

A similar syntax exists for lists, too:

列表也存在类似的语法：

>>> mylist = [ [1,2,3], [4,5], 6 ]

>>> list_df = pd.DataFrame([ pd.Series(value) for value in mylist ])

>>> list_df

     0    1    2
0  1.0  2.0  3.0
1  4.0  5.0  NaN
2  6.0  NaN  NaN

Another syntax for lists is:

列表的另一种语法是：

>>> mylist = [ [1,2,3], [4,5], 6 ]

>>> list_df = pd.DataFrame({ i:pd.Series(value) for i, value in enumerate(mylist) })

>>> list_df

   0    1    2
0  1  4.0  6.0
1  2  5.0  NaN
2  3  NaN  NaN

You may additionally have to transpose the result and/or change the column data types (float, integer, etc).

您可能还需要转置结果和/或更改列数据类型（浮点数、整数等）。

Answer 5

回答by jpp

You can also use pd.concatalong axis=1with a list of pd.Seriesobjects:

您还可以与对象列表pd.concat一起axis=1使用pd.Series：

import pandas as pd, numpy as np

d = {'A': np.array([1,2]), 'B': np.array([1,2,3,4])}

res = pd.concat([pd.Series(v, name=k) for k, v in d.items()], axis=1)

print(res)

     A  B
0  1.0  1
1  2.0  2
2  NaN  3
3  NaN  4

Answer 6

回答by Ismail Hachimi

Both the following lines work perfectly :

以下两行都可以完美运行：

pd.DataFrame.from_dict(df, orient='index').transpose() #A

pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in df.items() ])) #B (Better)

But with %timeit on Jupyter, I've got a ratio of 4x speed for B vs A, which is quite impressive especially when working with a huge data set (mainly with a big number of columns/features).

但是在 Jupyter 上使用 %timeit 时，B 与 A 的速度比为 4 倍，这非常令人印象深刻，尤其是在处理大量数据集（主要是具有大量列/特征）时。

Answer 7

回答by Rohan Chandratre

If you don't want it to show NaNand you have two particular lengths, adding a 'space' in each remaining cell would also work.

如果您不希望它显示NaN并且您有两个特定的长度，则在每个剩余的单元格中添加一个“空格”也可以。

import pandas

long = [6, 4, 7, 3]
short = [5, 6]

for n in range(len(long) - len(short)):
    short.append(' ')

df = pd.DataFrame({'A':long, 'B':short}]
# Make sure Excel file exists in the working directory
datatoexcel = pd.ExcelWriter('example1.xlsx',engine = 'xlsxwriter')
df.to_excel(datatoexcel,sheet_name = 'Sheet1')
datatoexcel.save()

   A  B
0  6  5
1  4  6
2  7   
3  3

If you have more than 2 lengths of entries, it is advisable to make a function which uses a similar method.

如果您有超过 2 个长度的条目，建议创建一个使用类似方法的函数。

Answer 8

回答by john joy

pd.DataFrame([my_dict]) will do!

pd.DataFrame([my_dict]) 会做！

Python 从条目具有不同长度的字典创建数据框

提问by Josh

采纳答案by Jeff

回答by dezzan

回答by user2015487

回答by OrangeSherbet

回答by jpp

回答by Ismail Hachimi

回答by Rohan Chandratre

回答by john joy

相关推荐

最近更新

标签

Python 从条目具有不同长度的字典创建数据框

提问by Josh

采纳答案by Jeff

回答by dezzan

回答by user2015487

回答by OrangeSherbet

回答by jpp

回答by Ismail Hachimi

回答by Rohan Chandratre

回答by john joy

相关推荐

Python Flask，TypeError：“dict”对象不可调用

Python PyGame - 获取加载图像的大小

Python 熊猫，如何按列值过滤数据框

Python 将 tkinter 的 intvar 添加到整数

相关推荐

最近更新

标签