Python Pandas，获取数据帧列中单个值的计数

Question

提问by Randhawa

Using pandas, I would like to get count of a specific value in a column.I know using df.somecolumn.ravel() will give me all the unique values and their count.But how to get count of some specific value.

使用熊猫，我想获得列中特定值的计数。我知道使用 df.somecolumn.ravel() 会给我所有唯一值及其计数。但是如何获得某些特定值的计数。

Desired :

期望：

  To get count of 1.

  In[6]:df.somecalulation(1)
  Out[6]: 5

  To get count of 2.

  In[6]:df.somecalulation(2)
  Out[6]: 3

Answer 1

回答by jezrael

You can try value_counts:

你可以试试value_counts：

df = df['col'].value_counts().reset_index()
df.columns = ['col', 'count']
print df
   col  count
0    1      5
1    2      3

EDIT:

编辑：

print (df['col'] == 1).sum()
5

Or:

或者：

def somecalulation(x):
    return (df['col'] == x).sum()

print somecalulation(1)
5
print somecalulation(2)
3

Or:

或者：

ser = df['col'].value_counts()

def somecalulation(s, x):
    return s[x]

print somecalulation(ser, 1)
5
print somecalulation(ser, 2)
3

EDIT2:

编辑2：

If you need something really fast, use numpy.in1d:

如果您需要非常快速的东西，请使用numpy.in1d：

import pandas as pd
import numpy as np

a = pd.Series([1, 1, 1, 1, 2, 2])

#for testing len(a) = 6000
a = pd.concat([a]*1000).reset_index(drop=True)

print np.in1d(a,1).sum()
4000
print (a == 1).sum()
4000
print np.sum(a==1)
4000

Timings:

时间：

len(a)=6:

len(a)=6：

In [131]: %timeit np.in1d(a,1).sum()
The slowest run took 9.17 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 29.9 μs per loop

In [132]: %timeit np.sum(a == 1)
10000 loops, best of 3: 196 μs per loop

In [133]: %timeit (a == 1).sum()
1000 loops, best of 3: 180 μs per loop

len(a)=6000:

len(a)=6000：

In [135]: %timeit np.in1d(a,1).sum()
The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 48.5 μs per loop

In [136]: %timeit np.sum(a == 1)
The slowest run took 5.23 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 273 μs per loop

In [137]: %timeit (a == 1).sum()
1000 loops, best of 3: 271 μs per loop

Answer 2

回答by Ami Tavory

If you take the value_countsreturn, you can query it for multiple values:

如果value_counts取回，则可以查询多个值：

import pandas as pd

a = pd.Series([1, 1, 1, 1, 2, 2])
counts = a.value_counts()
>>> counts[1], counts[2]
(4, 2)

However, to count only a single item, it would be faster to use

但是，要仅计算单个项目，使用会更快

import numpy as np
np.sum(a == 1)

Answer 3

回答by Kalpana

Get the total count:

获取总数：

column = df['specific_column']

column.count()

Get the specific value total count:

获取具体值总计数：

column.loc[specific_column > 0].count()

do not need to add comas ('') to indicate specific_column.

不需要加逗号（''）来表示specific_column。

Python Pandas，获取数据帧列中单个值的计数

提问by Randhawa

回答by jezrael

回答by Ami Tavory

回答by Kalpana

相关推荐

最近更新

标签

Python Pandas，获取数据帧列中单个值的计数

提问by Randhawa

回答by jezrael

回答by Ami Tavory

回答by Kalpana

相关推荐

Python Numpy：零均值数据和标准化

Python 从 pyspark 中的数据帧构建 StructType

返回字典中的第一个键 - Python 3

Python Selenium 超时异常捕获

相关推荐

最近更新

标签