pandas 如何将熊猫列的值设置为列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52552198/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to set the value of a pandas column as list
提问by Unni
I want to set the value of a pandas column as a list of strings. However, my efforts to do so didn't succeed because pandas take the column value as an iterable and I get a: ValueError: Must have equal len keys and value when setting with an iterable
.
我想将 pandas 列的值设置为字符串列表。但是,我这样做的努力没有成功,因为 Pandas 将列值作为可迭代对象,而我得到了一个: ValueError: Must have equal len keys and value when setting with an iterable
。
Here is an MWE
这是一个 MWE
>> df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
>> df
col1 col2
0 1 4
1 2 5
2 3 6
>> df['new_col'] = None
>> df.loc[df.col1 == 1, 'new_col'] = ['a', 'b']
ValueError: Must have equal len keys and value when setting with an iterable
I tried to set the dtype
as list
using df.new_col = df.new_col.astype(list)
and that didn't work either.
我试图将 设置dtype
为list
usingdf.new_col = df.new_col.astype(list)
并且这也不起作用。
I am wondering what would be the correct approach here.
我想知道这里的正确方法是什么。
EDIT
编辑
The answer provided here: Python pandas insert list into a cellusing at
didn't work for me either.
此处提供的答案:Python pandas insert list into a cellusingat
对我也不起作用。
采纳答案by jezrael
Not easy, one possible solution is create helper Series
:
不容易,一种可能的解决方案是创建助手Series
:
df.loc[df.col1 == 1, 'new_col'] = pd.Series([['a', 'b']] * len(df))
print (df)
col1 col2 new_col
0 1 4 [a, b]
1 2 5 NaN
2 3 6 NaN
Another solution, if need set missing values to empty list too is use list comprehension:
另一种解决方案,如果也需要将缺失值设置为空列表,则使用列表理解:
#df['new_col'] = [['a', 'b'] if x == 1 else np.nan for x in df['col1']]
df['new_col'] = [['a', 'b'] if x == 1 else [] for x in df['col1']]
print (df)
col1 col2 new_col
0 1 4 [a, b]
1 2 5 []
2 3 6 []
But then you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks.
但是随后您将失去使用连续内存块中保存的 NumPy 数组的矢量化功能。
回答by jpp
Don't do this.
不要这样做。
Pandas was never designed to hold lists in series / columns. You can concoct expensive workarounds, but these are not recommended.
Pandas 从来没有被设计为在系列/列中保存列表。您可以编造昂贵的解决方法,但不推荐使用这些方法。
The main reason holding lists in series is not recommended is you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks. Your series will be of object
dtype, which represents a sequence of pointers, much like list
. You will lose benefits in terms of memory and performance, as well as access to optimized Pandas methods.
不推荐连续保存列表的主要原因是您失去了使用连续内存块中保存的 NumPy 数组的矢量化功能。你的系列将是object
dtype,它代表一个指针序列,很像list
. 您将失去内存和性能方面的优势,以及对优化 Pandas 方法的访问。
See also What are the advantages of NumPy over regular Python lists?The arguments in favour of Pandas are the same as for NumPy.
另请参阅NumPy 相对于常规 Python 列表的优势是什么?支持 Pandas 的论据与支持 NumPy 的论据相同。
That said, since you are going againstthe purpose and design of Pandas, there are many who face the same problem and have asked similar questions:
也就是说,由于您违背了 Pandas 的目的和设计,因此有很多人面临同样的问题并提出了类似的问题:
回答by Karn Kumar
you answer is simple: select column to convert to list here
你的答案很简单:选择要转换为列表的列在这里
my_list = df["col1"].tolist()
>>> df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
>>> df
col1 col2
0 1 4
1 2 5
2 3 6
>>> my_list = df["col1"].tolist()
>>> my_list
[1, 2, 3]
回答by Pranay
You can try below code:
你可以试试下面的代码:
list1=[1,2,3]
list2=[4,5,6]
col=[str(“,”.join(map(str, list1))), str(“,”.join(map(str, list2)))]
df=pd.DataFrame(np.random.randint(low=0, high=0, size(5,2)), columns=col)
print(df)
Hope this is the expected output:
希望这是预期的输出: