pandas 如何将元组值设置为熊猫数据框?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46985607/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I set tuple value to pandas dataframe?
提问by user40780
What i want to do should be very simple. Essentially, I have some dataframe, I need assign some tuple value to some column.
我想做的应该很简单。本质上,我有一些数据框,我需要为某些列分配一些元组值。
for example:
例如:
pd_tmp = pd.DataFrame(np.random.rand(3,3))
pd_tmp["new_column"] = ("a",2)
I just need a new column with tuple value, what should i do?
我只需要一个带有元组值的新列,我该怎么办?
ValueError: Length of values does not match length of index
The previous code gets the error.
前面的代码得到错误。
回答by Psidom
You can wrap the tuples in a list:
您可以将元组包装在列表中:
import pandas as pd
pd_tmp = pd.DataFrame(np.random.rand(3,3))
pd_tmp["new_column"] = [("a",2)] * len(pd_tmp)
pd_tmp
# 0 1 2 new_column
#0 0.835350 0.338516 0.914184 (a, 2)
#1 0.007327 0.418952 0.741958 (a, 2)
#2 0.758607 0.464525 0.400847 (a, 2)
回答by Shihe Zhang
The doc of series
.
的文档series
。
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
>>> s = pd.Series(data, index=index)
Here, data can be many different things:
- a Python dict
- an ndarray
- a scalar value (like 5)
系列是一个一维标记数组,能够保存任何数据类型(整数、字符串、浮点数、Python 对象等)。轴标签统称为索引。创建系列的基本方法是调用:
>>> s = pd.Series(data, index=index)
在这里,数据可以是许多不同的东西:
- 一个 Python 字典
- 一个数组
- 标量值(如 5)
So Series
won't take tuple type directly.
@Psidom's answer is to make the tuple as the element of a ndarray
.
所以Series
不会直接取元组类型。
@Psidom 的答案是将元组作为 a 的元素ndarray
。
If you are asking about how to set a cell of Series/Dataframethat's an asked question.
如果您询问如何设置 Series/Dataframe 的单元格,这是一个被问到的问题。
回答by piRSquared
You can use apply
with a lambda
that returns the tuple
您可以使用apply
带有lambda
返回的tuple
pd_tmp.assign(newc_olumn=pd_tmp.apply(lambda x: ('a', 2), 1))
0 1 2 newc_olumn
0 0.373564 0.806956 0.106911 (a, 2)
1 0.332508 0.711735 0.230347 (a, 2)
2 0.516232 0.343266 0.813759 (a, 2)
回答by Stefano Paoli
I was looking for something similar, but in my case I wanted the tuple to be a combination of the existing columns, not just a fixed value. I found the solution below, which I share hoping it will be useful to others, like me.
我正在寻找类似的东西,但就我而言,我希望元组是现有列的组合,而不仅仅是一个固定值。我找到了下面的解决方案,我分享它,希望它对其他人有用,比如我。
In [24]: df
Out[24]:
A B
0 1 2
1 11 22
2 111 222
3 1111 2222
In [25]: df['D'] = df[['A','B']].apply(tuple, axis=1)
In [26]: df
Out[26]:
A B D
0 1 2 (1, 2)
1 11 22 (11, 22)
2 111 222 (111, 222)
3 1111 2222 (1111, 2222)