pandas 将空列表列添加到 DataFrame

Question

提问by vk1011

Similar to this question How to add an empty column to a dataframe?, I am interested in knowing the best way to add a column of empty lists to a DataFrame.

类似于这个问题如何将空列添加到数据框？，我有兴趣了解将一列空列表添加到 DataFrame 的最佳方法。

What I am trying to do is basically initialize a column and as I iterate over the rows to process some of them, then add a filled list in this new column to replace the initialized value.

我想要做的基本上是初始化一个列，当我遍历行以处理其中的一些时，然后在这个新列中添加一个填充列表来替换初始化值。

For example, if below is my initial DataFrame:

例如，如果下面是我的初始 DataFrame：

df = pd.DataFrame(d = {'a': [1,2,3], 'b': [5,6,7]}) # Sample DataFrame

>>> df
   a  b
0  1  5
1  2  6
2  3  7

Then I want to ultimately end up with something like this, where each row has been processed separately (sample results shown):

然后我想最终得到这样的结果，其中每一行都被单独处理（显示了示例结果）：

>>> df
   a  b          c
0  1  5     [5, 6]
1  2  6     [9, 0]
2  3  7  [1, 2, 3]

Of course, if I try to initialize like df['e'] = []as I would with any other constant, it thinks I am trying to add a sequence of items with length 0, and hence fails.

当然，如果我尝试像df['e'] = []使用任何其他常量一样初始化，它认为我正在尝试添加长度为 0 的项目序列，因此失败。

If I try initializing a new column as Noneor NaN, I run in to the following issues when trying to assign a list to a location.

如果我尝试将新列初始化为None或NaN，则在尝试将列表分配给某个位置时会遇到以下问题。

df['d'] = None

>>> df
   a  b     d
0  1  5  None
1  2  6  None
2  3  7  None

Issue 1 (it would be perfect if I can get this approach to work! Maybe something trivial I am missing):

问题 1（如果我能用这种方法就完美了！也许我遗漏了一些微不足道的东西）：

>>> df.loc[0,'d'] = [1,3]

...
ValueError: Must have equal len keys and value when setting with an iterable

Issue 2 (this one works, but not without a warning because it is not guaranteed to work as intended):

问题 2（这个有效，但并非没有警告，因为不能保证按预期工作）：

>>> df['d'][0] = [1,3]

C:\Python27\Scripts\ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

Hence I resort to initializing with empty lists and extending them as needed. There are a couple of methods I can think of to initialize this way, but is there a more straightforward way?

因此，我使用空列表进行初始化并根据需要扩展它们。我可以想到几种方法来以这种方式初始化，但是有没有更直接的方法？

Method 1:

方法一：

df['empty_lists1'] = [list() for x in range(len(df.index))]

>>> df
   a  b   empty_lists1
0  1  5             []
1  2  6             []
2  3  7             []

Method 2:

方法二：

 df['empty_lists2'] = df.apply(lambda x: [], axis=1)

>>> df
   a  b   empty_lists1   empty_lists2
0  1  5             []             []
1  2  6             []             []
2  3  7             []             []

Summary of questions:

问题总结：

Is there any minor syntax change that can be addressed in Issue 1 that can allow a list to be assigned to a None/NaNinitialized field?

是否有任何可以在问题 1 中解决的小的语法更改可以允许将列表分配给None/NaN初始化字段？

If not, then what is the best way to initialize a new column with empty lists?

如果不是，那么用空列表初始化新列的最佳方法是什么？

Answer 1

回答by ComputerFellow

One more way is to use np.empty:

另一种方法是使用np.empty：

df['empty_list'] = np.empty((len(df), 0)).tolist()

You could also knock off .indexin your "Method 1" when trying to find lenof df.

你也可以收工.index试图找到当你的“方法1”len的df。

df['empty_list'] = [[] for _ in range(len(df))]

Turns out, np.emptyis faster...

事实证明，np.empty速度更快......

In [1]: import pandas as pd

In [2]: df = pd.DataFrame(pd.np.random.rand(1000000, 5))

In [3]: timeit df['empty1'] = pd.np.empty((len(df), 0)).tolist()
10 loops, best of 3: 127 ms per loop

In [4]: timeit df['empty2'] = [[] for _ in range(len(df))]
10 loops, best of 3: 193 ms per loop

In [5]: timeit df['empty3'] = df.apply(lambda x: [], axis=1)
1 loops, best of 3: 5.89 s per loop

Answer 2

回答by tozCSS

I timed all the three methods in the accepted answer, the fastest one took 216 ms on my machine. However, this took only 28 ms:

我对接受的答案中的所有三种方法进行了计时，最快的一种在我的机器上花费了 216 毫秒。但是，这仅用了 28 毫秒：

df['empty4'] = [[]] * len(df)

Note: Similarly, df['e5'] = [set()] * len(df)also took 28ms.

注：同理，df['e5'] = [set()] * len(df)也用了28ms。

pandas 将空列表列添加到 DataFrame

提问by vk1011

回答by ComputerFellow

回答by tozCSS

相关推荐

最近更新

标签

pandas 将空列表列添加到 DataFrame

提问by vk1011

回答by ComputerFellow

回答by tozCSS

相关推荐

pandas 如何在matplotlib中按不同组绘制直方图？

pandas 如何在同一图形上绘制两个 DataFrame 以进行比较

Python Pandas 数据框到 XML

将 Pandas 数据框中的一些列转换为列表列表

相关推荐

最近更新

标签