pandas 在pandas DataFrame中减去多列并附加结果
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38419286/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Subtracting multiple columns and appending results in pandas DataFrame
提问by Omegaman
I have a table of sensor data, for which some columns are measurements and some columns are sensor bias. For example, something like this:
我有一个传感器数据表,其中一些列是测量值,一些列是传感器偏差。例如,这样的事情:
df=pd.DataFrame({'x':[1.0,2.0,3.0],'y':[4.0,5.0,6.0],
'dx':[0.25,0.25,0.25],'dy':[0.5,0.5,0.5]})
dx dy x y 0 0.25 0.5 1.0 4.0 1 0.25 0.5 2.0 5.0 2 0.25 0.5 3.0 6.0
dx dy x y 0 0.25 0.5 1.0 4.0 1 0.25 0.5 2.0 5.0 2 0.25 0.5 3.0 6.0
I can add a column to the table by subtracting the bias from the measurement like this:
我可以通过从测量中减去偏差来向表中添加一列,如下所示:
df['newX'] = df['x'] - df['dx']
dx dy x y newX 0 0.25 0.5 1.0 4.0 0.75 1 0.25 0.5 2.0 5.0 1.75 2 0.25 0.5 3.0 6.0 2.75
dx dy x y newX 0 0.25 0.5 1.0 4.0 0.75 1 0.25 0.5 2.0 5.0 1.75 2 0.25 0.5 3.0 6.0 2.75
But I'd like to do that for many columns at once. This doesn't work:
但我想一次为许多列这样做。这不起作用:
df[['newX','newY']] = df[['x','y']] - df[['dx','dy']]
for two reasons, it seems.
似乎有两个原因。
- When subtracting DataFrames the column labels are used to align the subtraction, so I wind up with a 4 column result
['x', 'y', 'dx', 'dy']
. - It seems I can insert a single column into the DataFrame using indexing, but not more than one.
- 减去 DataFrame 时,列标签用于对齐减法,因此我最终得到 4 列 result
['x', 'y', 'dx', 'dy']
。 - 似乎我可以使用索引将一列插入到 DataFrame 中,但不能超过一列。
Obviously I can iterate over the columns and do each one individually, but is there a more compact way to accomplish what I'm trying to do that is more analogous to the one column solution?
显然我可以遍历列并单独执行每个列,但是有没有更紧凑的方法来完成我想要做的更类似于单列解决方案的工作?
回答by unutbu
DataFrames generally align operations such as arithmetic on column and row indices. Since df[['x','y']]
and df[['dx','dy']]
have different column names, the dx
column is not subtracted from the x
column, and similiarly for the y
columns.
DataFrames 通常对齐操作,例如列和行索引上的算术。由于df[['x','y']]
和df[['dx','dy']]
具有不同的列名称,因此dx
不会从x
列中减去列,对于y
列也是如此。
In contrast, if you subtract a NumPy array from a DataFrame, the operation is done elementwise since the NumPy array has no Panda-style indices to align upon.
相比之下,如果从 DataFrame 中减去 NumPy 数组,则操作是按元素完成的,因为 NumPy 数组没有要对齐的 Panda 样式索引。
Hence, if you use df[['dx','dy']].values
to extract a NumPy array consisting of the values in df[['dx','dy']]
, then your assignment can be done as desired:
因此,如果您df[['dx','dy']].values
用来提取由 中的值组成的 NumPy 数组df[['dx','dy']]
,则可以根据需要完成分配:
import pandas as pd
df = pd.DataFrame({'x':[1.0,2.0,3.0],'y':[4.0,5.0,6.0],
'dx':[0.25,0.25,0.25],'dy':[0.5,0.5,0.5]})
df[['newx','newy']] = df[['x','y']] - df[['dx','dy']].values
print(df)
yields
产量
dx dy x y newx newy
0 0.25 0.5 1.0 4.0 0.75 3.5
1 0.25 0.5 2.0 5.0 1.75 4.5
2 0.25 0.5 3.0 6.0 2.75 5.5
Be ware that if you were to try assigning a NumPy array (on the right-hand side) to a DataFrame (on the left-hand side), the column names specified on the left must already exist.
请注意,如果您尝试将 NumPy 数组(在右侧)分配给 DataFrame(在左侧),则左侧指定的列名称必须已经存在。
In contrast, when assigning a DataFrame on the right-hand side to a DataFrame on the left, new columns can be used since in this casePandas zips the keys (new column names) on the left with the columns on the right and assigns values in column-order instead of by aligning columns:
相比之下,当将右侧的 DataFrame 分配给左侧的 DataFrame 时,可以使用新列,因为在这种情况下,Pandas 将左侧的键(新列名称)与右侧的列一起压缩并分配值按列顺序而不是通过对齐列:
for k1, k2 in zip(key, value.columns):
self[k1] = value[k2]
Thus, using a DataFrame on the right
因此,使用右侧的 DataFrame
df[['newx','newy']] = df[['x','y']] - df[['dx','dy']].values
works, but using a NumPy array on the right
有效,但在右侧使用 NumPy 数组
df[['newx','newy']] = df[['x','y']].values - df[['dx','dy']].values
does not.
才不是。