pandas 如何将熊猫列的值设置为列表

Question

提问by Unni

I want to set the value of a pandas column as a list of strings. However, my efforts to do so didn't succeed because pandas take the column value as an iterable and I get a: ValueError: Must have equal len keys and value when setting with an iterable.

我想将 pandas 列的值设置为字符串列表。但是，我这样做的努力没有成功，因为 Pandas 将列值作为可迭代对象，而我得到了一个: ValueError: Must have equal len keys and value when setting with an iterable。

Here is an MWE

这是一个 MWE

>> df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
>> df
col1    col2
0   1   4
1   2   5
2   3   6

>> df['new_col'] = None
>> df.loc[df.col1 == 1, 'new_col'] = ['a', 'b']
ValueError: Must have equal len keys and value when setting with an iterable

I tried to set the dtypeas listusing df.new_col = df.new_col.astype(list)and that didn't work either.

我试图将设置dtype为listusingdf.new_col = df.new_col.astype(list)并且这也不起作用。

I am wondering what would be the correct approach here.

我想知道这里的正确方法是什么。

EDIT

编辑

The answer provided here: Python pandas insert list into a cellusing atdidn't work for me either.

此处提供的答案：Python pandas insert list into a cellusingat对我也不起作用。

Answer 1

采纳答案by jezrael

Not easy, one possible solution is create helper Series:

不容易，一种可能的解决方案是创建助手Series：

df.loc[df.col1 == 1, 'new_col'] = pd.Series([['a', 'b']] * len(df))
print (df)
   col1  col2 new_col
0     1     4  [a, b]
1     2     5     NaN
2     3     6     NaN

Another solution, if need set missing values to empty list too is use list comprehension:

另一种解决方案，如果也需要将缺失值设置为空列表，则使用列表理解：

#df['new_col'] = [['a', 'b'] if x == 1 else np.nan for x in df['col1']]

df['new_col'] = [['a', 'b'] if x == 1 else [] for x in df['col1']]
print (df)
   col1  col2 new_col
0     1     4  [a, b]
1     2     5      []
2     3     6      []

But then you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks.

但是随后您将失去使用连续内存块中保存的 NumPy 数组的矢量化功能。

Answer 2

回答by jpp

Don't do this.

不要这样做。

Pandas was never designed to hold lists in series / columns. You can concoct expensive workarounds, but these are not recommended.

Pandas 从来没有被设计为在系列/列中保存列表。您可以编造昂贵的解决方法，但不推荐使用这些方法。

The main reason holding lists in series is not recommended is you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks. Your series will be of objectdtype, which represents a sequence of pointers, much like list. You will lose benefits in terms of memory and performance, as well as access to optimized Pandas methods.

不推荐连续保存列表的主要原因是您失去了使用连续内存块中保存的 NumPy 数组的矢量化功能。你的系列将是objectdtype，它代表一个指针序列，很像list. 您将失去内存和性能方面的优势，以及对优化 Pandas 方法的访问。

See also What are the advantages of NumPy over regular Python lists?The arguments in favour of Pandas are the same as for NumPy.

另请参阅NumPy 相对于常规 Python 列表的优势是什么？支持 Pandas 的论据与支持 NumPy 的论据相同。

That said, since you are going againstthe purpose and design of Pandas, there are many who face the same problem and have asked similar questions:

也就是说，由于您违背了 Pandas 的目的和设计，因此有很多人面临同样的问题并提出了类似的问题：

Answer 3

回答by Karn Kumar

you answer is simple: select column to convert to list here

你的答案很简单：选择要转换为列表的列在这里

my_list = df["col1"].tolist()



>>> df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
>>> df
   col1  col2
0     1     4
1     2     5
2     3     6
>>> my_list = df["col1"].tolist()
>>> my_list
[1, 2, 3]

Answer 4

回答by Pranay

You can try below code:

你可以试试下面的代码：

list1=[1,2,3]
list2=[4,5,6]
col=[str(“,”.join(map(str, list1))), str(“,”.join(map(str, list2)))]
df=pd.DataFrame(np.random.randint(low=0, high=0, size(5,2)), columns=col)
print(df)

Hope this is the expected output:

希望这是预期的输出：

Answer 5

回答by Loochie

Also using np.where:

还使用np.where：

df['new_col'] = np.where(df.col1 == 1,  pd.Series([['a', 'b']]) , np.nan)

pandas 如何将熊猫列的值设置为列表

提问by Unni

采纳答案by jezrael

回答by jpp

Don't do this.

不要这样做。

回答by Karn Kumar

回答by Pranay

回答by Loochie

相关推荐

最近更新

标签

pandas 如何将熊猫列的值设置为列表

提问by Unni

采纳答案by jezrael

回答by jpp

Don't do this.

不要这样做。

回答by Karn Kumar

回答by Pranay

回答by Loochie

相关推荐

pandas 获取熊猫布尔系列为 True 的索引列表

Pandas AttributeError: 'DataFrame' 对象没有属性 'Datetime'

Pandas Python：连接具有相同列的数据帧

pandas Matplotlib pyplot 并排放置两个图

相关推荐

最近更新

标签