pandas 在pandas DataFrame中减去多列并附加结果

Question

提问by Omegaman

I have a table of sensor data, for which some columns are measurements and some columns are sensor bias. For example, something like this:

我有一个传感器数据表，其中一些列是测量值，一些列是传感器偏差。例如，这样的事情：

df=pd.DataFrame({'x':[1.0,2.0,3.0],'y':[4.0,5.0,6.0],
                 'dx':[0.25,0.25,0.25],'dy':[0.5,0.5,0.5]})

    dx   dy    x    y
0  0.25  0.5  1.0  4.0
1  0.25  0.5  2.0  5.0
2  0.25  0.5  3.0  6.0

    dx   dy    x    y
0  0.25  0.5  1.0  4.0
1  0.25  0.5  2.0  5.0
2  0.25  0.5  3.0  6.0

I can add a column to the table by subtracting the bias from the measurement like this:

我可以通过从测量中减去偏差来向表中添加一列，如下所示：

df['newX'] = df['x'] - df['dx']

    dx   dy    x    y  newX
0  0.25  0.5  1.0  4.0  0.75
1  0.25  0.5  2.0  5.0  1.75
2  0.25  0.5  3.0  6.0  2.75

    dx   dy    x    y  newX
0  0.25  0.5  1.0  4.0  0.75
1  0.25  0.5  2.0  5.0  1.75
2  0.25  0.5  3.0  6.0  2.75

But I'd like to do that for many columns at once. This doesn't work:

但我想一次为许多列这样做。这不起作用：

df[['newX','newY']] = df[['x','y']] - df[['dx','dy']]

for two reasons, it seems.

似乎有两个原因。

When subtracting DataFrames the column labels are used to align the subtraction, so I wind up with a 4 column result ['x', 'y', 'dx', 'dy'].
It seems I can insert a single column into the DataFrame using indexing, but not more than one.

减去 DataFrame 时，列标签用于对齐减法，因此我最终得到 4 列 result ['x', 'y', 'dx', 'dy']。
似乎我可以使用索引将一列插入到 DataFrame 中，但不能超过一列。

Obviously I can iterate over the columns and do each one individually, but is there a more compact way to accomplish what I'm trying to do that is more analogous to the one column solution?

显然我可以遍历列并单独执行每个列，但是有没有更紧凑的方法来完成我想要做的更类似于单列解决方案的工作？

Answer 1

回答by unutbu

DataFrames generally align operations such as arithmetic on column and row indices. Since df[['x','y']]and df[['dx','dy']]have different column names, the dxcolumn is not subtracted from the xcolumn, and similiarly for the ycolumns.

DataFrames 通常对齐操作，例如列和行索引上的算术。由于df[['x','y']]和df[['dx','dy']]具有不同的列名称，因此dx不会从x列中减去列，对于y列也是如此。

In contrast, if you subtract a NumPy array from a DataFrame, the operation is done elementwise since the NumPy array has no Panda-style indices to align upon.

相比之下，如果从 DataFrame 中减去 NumPy 数组，则操作是按元素完成的，因为 NumPy 数组没有要对齐的 Panda 样式索引。

Hence, if you use df[['dx','dy']].valuesto extract a NumPy array consisting of the values in df[['dx','dy']], then your assignment can be done as desired:

因此，如果您df[['dx','dy']].values用来提取由中的值组成的 NumPy 数组df[['dx','dy']]，则可以根据需要完成分配：

import pandas as pd

df = pd.DataFrame({'x':[1.0,2.0,3.0],'y':[4.0,5.0,6.0],
                 'dx':[0.25,0.25,0.25],'dy':[0.5,0.5,0.5]})
df[['newx','newy']] = df[['x','y']] - df[['dx','dy']].values
print(df)

yields

产量

     dx   dy    x    y  newx  newy
0  0.25  0.5  1.0  4.0  0.75   3.5
1  0.25  0.5  2.0  5.0  1.75   4.5
2  0.25  0.5  3.0  6.0  2.75   5.5

Be ware that if you were to try assigning a NumPy array (on the right-hand side) to a DataFrame (on the left-hand side), the column names specified on the left must already exist.

请注意，如果您尝试将 NumPy 数组（在右侧）分配给 DataFrame（在左侧），则左侧指定的列名称必须已经存在。

In contrast, when assigning a DataFrame on the right-hand side to a DataFrame on the left, new columns can be used since in this casePandas zips the keys (new column names) on the left with the columns on the right and assigns values in column-order instead of by aligning columns:

相比之下，当将右侧的 DataFrame 分配给左侧的 DataFrame 时，可以使用新列，因为在这种情况下，Pandas 将左侧的键（新列名称）与右侧的列一起压缩并分配值按列顺序而不是通过对齐列：

            for k1, k2 in zip(key, value.columns):
                self[k1] = value[k2]

Thus, using a DataFrame on the right

因此，使用右侧的 DataFrame

df[['newx','newy']] = df[['x','y']] - df[['dx','dy']].values

works, but using a NumPy array on the right

有效，但在右侧使用 NumPy 数组

df[['newx','newy']] = df[['x','y']].values - df[['dx','dy']].values

does not.

才不是。

pandas 在pandas DataFrame中减去多列并附加结果

提问by Omegaman

回答by unutbu

相关推荐

最近更新

标签

pandas 在pandas DataFrame中减去多列并附加结果

提问by Omegaman

回答by unutbu

相关推荐

pandas 两个数据点之间的线性插值

pandas ValueError：项目错误长度 907 而不是 2000

pandas 无法从熊猫数据框中删除一列

Pandas 中的 Excel VLOOKUP 等效项

相关推荐

最近更新

标签