pandas 比较两列数据框中的值

Question

提问by user3282777

I have the following two columns in pandas data frame

我在Pandas数据框中有以下两列

     256   Z
0     2    2
1     2    3
2     4    4
3     4    9

There are around 1594 rows. '256' and 'Z' are column headers whereas 0,1,2,3,4 are row numbers (1st column above). I want to print row numbers where value in Column '256' is not equal to values in column 'Z'. Thus output in the above case will be 1, 3. How can this comparison be made in pandas? I will be very grateful for help. Thanks.

大约有 1594 行。'256' 和 'Z' 是列标题，而 0,1,2,3,4 是行号（上面的第一列）。我想打印行号，其中“256”列中的值不等于“Z”列中的值。因此，上述情况下的输出将是 1、3。如何在 Pandas 中进行这种比较？我将非常感谢您的帮助。谢谢。

Answer 1

回答by cel

Create the data frame:

创建数据框：

import pandas as pd
df = pd.DataFrame({"256":[2,2,4,4], "Z": [2,3,4,9]})

ouput:

输出：

After subsetting your data frame, use the index to get the id of rows in the subset:

对数据框进行子集化后，使用索引获取子集中行的 id：

row_ids = df[df["256"] != df.Z].index

gives

给

Int64Index([1, 3], dtype='int64')

Answer 2

回答by aus_lacy

Another way could be to use the .locmethod of pandas.DataFramewhich returns the indexed location of the rows that qualify the boolean indexing:

另一种方法是使用返回符合布尔索引的行的索引位置的.loc方法pandas.DataFrame：

df.loc[(df['256'] != df['Z'])].index

with an output of:

输出为：

Int64Index([1, 3], dtype='int64')

This happens to be the quickest of the listed implementations as can be seen in ipython notebook:

这恰好是列出的实现中最快的，如下所示ipython notebook：

import pandas as pd
import numpy as np

df = pd.DataFrame({"256":np.random.randint(0,10,1594), "Z": np.random.randint(0,10,1594)})

%timeit df.loc[(df['256'] != df['Z'])].index
%timeit row_ids = df[df["256"] != df.Z].index
%timeit rows = list(df[df['256'] != df.Z].index)
%timeit df[df['256'] != df['Z']].index

with an output of:

输出为：

1000 loops, best of 3: 352 μs per loop
1000 loops, best of 3: 358 μs per loop
1000 loops, best of 3: 611 μs per loop
1000 loops, best of 3: 355 μs per loop

However, when it comes down to 5-10 microseconds it doesn't make a significant difference, but if in the future you have a very large data set timing and efficiency may become a much more important issue. For your relatively small data set of 1594 rows I would go with the solution that looks the most elegant and promotes the most readability.

然而，当它下降到 5-10 微秒时，它不会产生显着差异，但如果将来你有一个非常大的数据集，时间和效率可能会成为一个更重要的问题。对于 1594 行的相对较小的数据集，我会选择看起来最优雅并提高可读性的解决方案。

Answer 3

回答by rchang

You can try this:

你可以试试这个：

# Assuming your DataFrame is named "frame"
rows = list(frame[frame['256'] != frame.Z].index)

rowswill now be a list containing the row numbers for which those two column values are not equal. So with your data:

rows现在将是一个包含这两个列值不相等的行号的列表。所以用你的数据：

>>> frame
   256  Z
0    2  2
1    2  3
2    4  4
3    4  9

[4 rows x 2 columns]
>>> rows = list(frame[frame['256'] != frame.Z].index)
>>> print(rows)
[1, 3]

Answer 4

回答by Primer

Assuming dfis your dataframe, this should do it:

假设df是您的数据框，则应该这样做：

df[df['256'] != df['Z']].index

yielding:

产生：

Int64Index([1, 3], dtype='int64')

pandas 比较两列数据框中的值

提问by user3282777

回答by cel

回答by aus_lacy

回答by rchang

回答by Primer

相关推荐

最近更新

标签

pandas 比较两列数据框中的值

提问by user3282777

回答by cel

回答by aus_lacy

回答by rchang

回答by Primer

相关推荐

pandas 在带有 lambda 函数的数据框中使用 if 语句

将 Pandas DataFrame 列附加到 CSV

如何使用条件替换 Pandas 数据框中所有列中的所有值

Pandas DataFrame 中索引和列的级别（深度）数

相关推荐

最近更新

标签