获取 Dataframe Pandas 中最大值的列和行索引

Question

提问by christfan868

I'd like to know if there's a way to find the location (column and row index) of the highest value in a dataframe. So if for example my dataframe looks like this:

我想知道是否有办法找到数据框中最高值的位置（列和行索引）。因此，例如，如果我的数据框如下所示：

   A         B         C         D         E
0  100       9         1         12        6
1  80        10        67        15        91
2  20        67        1         56        23
3  12        51        5         10        58
4  73        28        72        25        1

How do I get a result that looks like this: [0, 'A']using Pandas?

如何获得如下所示的结果：[0, 'A']使用 Pandas？

Answer 1

回答by Mike Müller

Use `np.argmax`

用 `np.argmax`

NumPy's argmaxcan be helpful:

NumPyargmax可能会有所帮助：

>>> df.stack().index[np.argmax(df.values)]
(0, 'A')

In steps

在步骤

df.valuesis a two-dimensional NumPy array:

df.values是一个二维 NumPy 数组：

>>> df.values
array([[100,   9,   1,  12,   6],
       [ 80,  10,  67,  15,  91],
       [ 20,  67,   1,  56,  23],
       [ 12,  51,   5,  10,  58],
       [ 73,  28,  72,  25,   1]])

argmaxgives you the index for the maximum value for the "flattened" array:

argmax为您提供“扁平化”数组最大值的索引：

>>> np.argmax(df.values)
0

Now, you can use this index to find the row-column location on the stacked dataframe:

现在，您可以使用此索引来查找堆叠数据框上的行列位置：

>>> df.stack().index[0]
(0, 'A')

Fast Alternative

快速替代

If you need it fast, do as few steps as possible. Working only on the NumPy array to find the indices np.argmaxseems best:

如果您需要快速，请执行尽可能少的步骤。仅在 NumPy 数组上工作以查找索引np.argmax似乎是最好的：

v = df.values
i, j = [x[0] for x in np.unravel_index([np.argmax(v)], v.shape)]
[df.index[i], df.columns[j]]

Result:

结果：

[0, 'A']

Timings

时间安排

Timing works best for lareg data frames:

时序最适合 lareg 数据帧：

df = pd.DataFrame(data=np.arange(int(1e6)).reshape(-1,5), columns=list('ABCDE'))

Sorted slowest to fastest:

从最慢到最快排序：

Mask:

面具：

%timeit df.mask(~(df==df.max().max())).stack().index.tolist()
33.4 ms ± 982 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Stack-idmax

堆栈-idmax

%timeit list(df.stack().idxmax())
17.1 ms ± 139 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Stack-argmax

堆栈参数最大值

%timeit df.stack().index[np.argmax(df.values)]
14.8 ms ± 392 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Where

在哪里

%%timeit
i,j = np.where(df.values == df.values.max())
list((df.index[i].values.tolist()[0],df.columns[j].values.tolist()[0]))

4.45 ms ± 84.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Argmax-unravel_index

%%timeit

v = df.values
i, j = [x[0] for x in np.unravel_index([np.argmax(v)], v.shape)]
[df.index[i], df.columns[j]]

499 μs ± 12 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Compare

相比

d = {'name': ['Mask', 'Stack-idmax', 'Stack-argmax', 'Where', 'Argmax-unravel_index'],
     'time': [33.4, 17.1, 14.8, 4.45, 499],
     'unit': ['ms', 'ms', 'ms', 'ms', 'μs']}


timings = pd.DataFrame(d)
timings['seconds'] = timings.time * timings.unit.map({'ms': 1e-3, 'μs': 1e-6})
timings['factor slower'] = timings.seconds / timings.seconds.min()
timings.sort_values('factor slower')

Output:

输出：

                   name    time unit   seconds  factor slower
4  Argmax-unravel_index  499.00   μs  0.000499       1.000000
3                 Where    4.45   ms  0.004450       8.917836
2          Stack-argmax   14.80   ms  0.014800      29.659319
1           Stack-idmax   17.10   ms  0.017100      34.268537
0                  Mask   33.40   ms  0.033400      66.933868

So the "Argmax-unravel_index" version seems to be one to nearly two orders of magnitude faster for large data frames, i.e. where often speeds matters most.

因此，对于大型数据帧，“Argmax-unravel_index”版本似乎快了一到近两个数量级，即通常速度最重要的地方。

Answer 2

回答by jezrael

Use stackfor Serieswith MultiIndexand idxmaxfor index of max value:

使用stack了Series与MultiIndex和idxmax为最大值的指标：

print (df.stack().idxmax())
(0, 'A')

print (list(df.stack().idxmax()))
[0, 'A']

Detail:

细节：

print (df.stack())
0  A    100
   B      9
   C      1
   D     12
   E      6
1  A     80
   B     10
   C     67
   D     15
   E     91
2  A     20
   B     67
   C      1
   D     56
   E     23
3  A     12
   B     51
   C      5
   D     10
   E     58
4  A     73
   B     28
   C     72
   D     25
   E      1
dtype: int64

Answer 3

回答by YOBEN_S

mask+ max

df.mask(~(df==df.max().max())).stack().index.tolist()
Out[17]: [(0, 'A')]

Answer 4

回答by Scott Boston

In my opinion for larger datasets, stack() becomes inefficient, let's use np.whereto return index positions:

在我看来，对于较大的数据集，stack() 变得效率低下，让我们使用np.where返回索引位置：

i,j = np.where(df.values == df.values.max())
list((df.index[i].values.tolist()[0],df.columns[j].values.tolist()[0]))

Output:

输出：

[0, 'A']

Timings for larger datafames:

更大数据名的时间：

df = pd.DataFrame(data=np.arange(10000).reshape(-1,5), columns=list('ABCDE'))

np.where method

np.where 方法

> %%timeit i,j = np.where(df.values == df.values.max())
> list((df.index[i].values.tolist()[0],df.columns[j].values.tolist()[0]))

1000 loops, best of 3: 364 μs per loop

1000 个循环，最好的 3 个：每个循环 364 μs

Other stack methods

其他堆栈方法

> %timeit df.mask(~(df==df.max().max())).stack().index.tolist()

100 loops, best of 3: 7.68 ms per loop

100 个循环，最好的 3 个：每个循环 7.68 毫秒

> %timeit df.stack().index[np.argmax(df.values)`]

10 loops, best of 3: 50.5 ms per loop

10 个循环，最好的 3 个：每个循环 50.5 毫秒

> %timeit list(df.stack().idxmax())

1000 loops, best of 3: 1.58 ms per loop

1000 个循环，最好的 3 个：每个循环 1.58 毫秒

Even larger dataframe:

更大的数据框：

df = pd.DataFrame(data=np.arange(100000).reshape(-1,5), columns=list('ABCDE'))

Respectively:

分别：

1000 loops, best of 3: 1.62 ms per loop
10 loops, best of 3: 18.2 ms per loop
100 loops, best of 3: 5.69 ms per loop
100 loops, best of 3: 6.64 ms per loop

Answer 5

回答by Alex Deineha

print('Max value:', df.stack().max())
print('Parameters :', df.stack().idxmax())

This is the best way imho.

这是最好的方式恕我直言。

Answer 6

回答by rassar

This should work:

这应该有效：

def max_df(df):
    m = None
    p = None
    for idx, item in enumerate(df.idxmax()):
        c = df.columns[item]
        val = df[c][idx]
        if m is None or val > m:
            m = val
            p = idx, c
    return p

This uses the idxmaxfunction, then compares all of the values returned by it.

这使用idxmax函数，然后比较它返回的所有值。

Example usage:

用法示例：

>>> df

     A  B
0  100  9
1   90  8
>>> max_df(df)

(0, 'A')

Here's a one-liner (for fun):

这是一个单行（为了好玩）：

def max_df2(df):
    return max((df[df.columns[item]][idx], idx, df.columns[item]) for idx, item in enumerate(df.idxmax()))[1:]

获取 Dataframe Pandas 中最大值的列和行索引

提问by christfan868

回答by Mike Müller

Use `np.argmax`

用 `np.argmax`

In steps

在步骤

Fast Alternative

快速替代

Timings

时间安排

Mask:

面具：

Stack-idmax

堆栈-idmax

Stack-argmax

堆栈参数最大值

Where

在哪里

Argmax-unravel_index

Argmax-unravel_index

Compare

相比

回答by jezrael

回答by YOBEN_S

回答by Scott Boston

Timings for larger datafames:

更大数据名的时间：

np.where method

np.where 方法

Other stack methods

其他堆栈方法

回答by Alex Deineha

回答by rassar

相关推荐

最近更新

标签

获取 Dataframe Pandas 中最大值的列和行索引

提问by christfan868

回答by Mike Müller

Use np.argmax

用 np.argmax

In steps

在步骤

Fast Alternative

快速替代

Timings

时间安排

Mask:

面具：

Stack-idmax

堆栈-idmax

Stack-argmax

堆栈参数最大值

Where

在哪里

Argmax-unravel_index

Argmax-unravel_index

Compare

相比

回答by jezrael

回答by YOBEN_S

回答by Scott Boston

Timings for larger datafames:

更大数据名的时间：

np.where method

np.where 方法

Other stack methods

其他堆栈方法

回答by Alex Deineha

回答by rassar

相关推荐

pandas 将一个 DataFrame 分组到一个新的 DataFrame 中，并以 arange 作为索引

pandas 熊猫 value_counts() 不是降序排列

Pandas sort_values 不能正确排序数字

pandas Seaborn 热图：将颜色条移动到图的顶部

相关推荐

最近更新

标签

Use `np.argmax`

用 `np.argmax`