pandas 如何计算pandas中n列而不是行的差异

Question

提问by John Smizz

I am playing around with data and need to look at differences across columns (as well as rows) in a fairly large dataframe. The easiest way for rows is clearly the diff() method, but I cannot find the equivalent for columns?

我正在处理数据，需要在相当大的数据框中查看列（以及行）之间的差异。行的最简单方法显然是 diff() 方法，但我找不到列的等效方法？

My current solution to obtain a dataframe with the columns differenced for via

我当前的解决方案是获取一个数据框，其中的列差异为 via

df.transpose().diff().transpose()

Is there a more efficient alternative? Or is this such odd usage of pandas that this was just never requested/ considered useful? :)

有没有更有效的替代方案？或者这是Pandas的这种奇怪用法，以至于从未被要求/认为有用？:)

Thanks,

谢谢，

Answer 1

回答by unutbu

Pandas DataFrames are excellent for manipulating table-like data whose columns have different dtypes.

Pandas DataFrames 非常适合处理列具有不同 dtype 的类似表的数据。

If subtracting across columns and rows both make sense, then it means all the values are the same kindof quantity. That mightbe an indication that you should be using a NumPy array instead of a Pandas DataFrame.

如果跨列和跨行减去都有意义，那么这意味着所有值都是同一种数量。这可能表明您应该使用 NumPy 数组而不是 Pandas DataFrame。

In any case, you can use arr = df.valuesto extract a NumPy array of the underlying data from the DataFrame. If all the columns share the same dtype, then the NumPy array will have the same dtype. (When the columns have different dtypes, df.valueshas objectdtype).

在任何情况下，您都可以使用arr = df.values从 DataFrame 中提取底层数据的 NumPy 数组。如果所有列共享相同的 dtype，则 NumPy 数组将具有相同的 dtype。（当列具有不同的 dtypes 时，df.values具有dtype object）。

Then you can compute the differences along rows or columns using np.diff(arr, axis=...):

然后，您可以使用以下方法计算沿行或列的差异np.diff(arr, axis=...)：

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(12).reshape(3,4), columns=list('ABCD'))
#    A  B   C   D
# 0  0  1   2   3
# 1  4  5   6   7
# 2  8  9  10  11

np.diff(df.values, axis=0)    # difference of the rows
# array([[4, 4, 4, 4],
#        [4, 4, 4, 4]])

np.diff(df.values, axis=1)    # difference of the columns
# array([[1, 1, 1],
#        [1, 1, 1],
#        [1, 1, 1]])

Answer 2

回答by Alexander

Just difference the columns, e.g.

只是区分列，例如

df['new_col'] = df['a'] - df['b']

For multiple columns, I believe unutbu's answer is the best (although it returns a np.ndarray object instead of a dataframe, it is still faster even after then converting it to a dataframe).

对于多列，我相信 unutbu 的答案是最好的（虽然它返回一个 np.ndarray 对象而不是数据帧，但即使在将其转换为数据帧之后它仍然更快）。

# Create a large dataframe.
df = pd.DataFrame(np.random.randn(1e6, 100))

%%timeit
np.diff(df.values, axis=1)

1 loops, best of 3: 450 ms per loop

%%timeit
df - df.shift(axis=1)

1 loops, best of 3: 727 ms per loop


%%timeit
df.T.diff().T

1 loops, best of 3: 1.52 s per loop

Answer 3

回答by Adrian Martin

Use the axisparameter in diff:

在中使用axis参数diff：

df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=list('ABCD'))
#    A  B   C   D
# 0  0  1   2   3
# 1  4  5   6   7
# 2  8  9  10  11

df.diff(axis=1)            # subtracting column wise
#    A    B   C   D
# 0  NaN  1   1   1
# 1  NaN  1   1   1
# 2  NaN  1   1   1

df.diff()                  # subtracting row wise
#    A    B     C     D
# 0  NaN  NaN   NaN   NaN
# 1  4    4     4     4
# 2  4    4     4     4

pandas 如何计算pandas中n列而不是行的差异

提问by John Smizz

回答by unutbu

回答by Alexander

回答by Adrian Martin

相关推荐

最近更新

标签

pandas 如何计算pandas中n列而不是行的差异

提问by John Smizz

回答by unutbu

回答by Alexander

回答by Adrian Martin

相关推荐

pandas 如何摆脱熊猫中的多维索引

pandas 如何根据列值对熊猫数据框进行切片？

使用条件语句替换 Pandas DataFrame 中的条目

将 Pandas DataFrame 写入换行符分隔的 JSON

相关推荐

最近更新

标签