Python Pandas 计算特定值的出现次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35277075/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:13:29  来源:igfitidea点击:

Python Pandas Counting the Occurrences of a Specific value

pythonpandas

提问by JJSmith

I am trying to find the number of times a certain value appears in one column.

我试图找出某个值出现在一列中的次数。

I have made the dataframe with data = pd.DataFrame.from_csv('data/DataSet2.csv')

我已经制作了数据框 data = pd.DataFrame.from_csv('data/DataSet2.csv')

and now I want to find the number of times something appears in a column. How is this done?

现在我想查找某列中出现的次数。这是怎么做的?

I thought it was the below, where I am looking in the education column and counting the number of time ?occurs.

我以为是下面,我在教育专栏中查看并计算?发生的次数。

The code below shows that I am trying to find the number of times 9thappears and the error is what I am getting when I run the code

下面的代码显示我正在尝试查找9th出现的次数,错误是我运行代码时得到的

Code

代码

missing2 = df.education.value_counts()['9th']
print(missing2)

Error

错误

KeyError: '9th'

采纳答案by jezrael

You can create subsetof data with your condition and then use shapeor len:

您可以subset根据您的条件创建数据,然后使用shapelen

print df
  col1 education
0    a       9th
1    b       9th
2    c       8th

print df.education == '9th'
0     True
1     True
2    False
Name: education, dtype: bool

print df[df.education == '9th']
  col1 education
0    a       9th
1    b       9th

print df[df.education == '9th'].shape[0]
2
print len(df[df['education'] == '9th'])
2

Performance is interesting, the fastest solution is compare numpy array and sum:

性能很有趣,最快的解决方案是比较 numpy 数组和sum

graph

图形

Code:

代码

import perfplot, string
np.random.seed(123)


def shape(df):
    return df[df.education == 'a'].shape[0]

def len_df(df):
    return len(df[df['education'] == 'a'])

def query_count(df):
    return df.query('education == "a"').education.count()

def sum_mask(df):
    return (df.education == 'a').sum()

def sum_mask_numpy(df):
    return (df.education.values == 'a').sum()

def make_df(n):
    L = list(string.ascii_letters)
    df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])
    return df

perfplot.show(
    setup=make_df,
    kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],
    n_range=[2**k for k in range(2, 25)],
    logx=True,
    logy=True,
    equality_check=False, 
    xlabel='len(df)')

回答by Zero

Couple of ways using countor sum

使用count或的几种方式sum

In [338]: df
Out[338]:
  col1 education
0    a       9th
1    b       9th
2    c       8th

In [335]: df.loc[df.education == '9th', 'education'].count()
Out[335]: 2

In [336]: (df.education == '9th').sum()
Out[336]: 2

In [337]: df.query('education == "9th"').education.count()
Out[337]: 2

回答by Emmanuel Steiner

Try this:

尝试这个:

(df[education]=='9th').sum()

回答by Minh Vu

An elegant way to count the occurrence of '?'or any symbol in any column, is to use built-in function isinof a dataframe object.

计算'?'任何列中任何符号的出现或任何符号的优雅方法是使用isin数据帧对象的内置函数。

Suppose that we have loaded the 'Automobile' datasetinto dfobject. We do not know which columns contain missing value ('?'symbol), so let do:

假设我们已经将“汽车”数据集加载到df对象中。我们不知道哪些列包含缺失值('?'符号),所以让我们这样做:

df.isin(['?']).sum(axis=0)

DataFrame.isin(values)official document says:

DataFrame.isin(values)官方文件说:

it returns boolean DataFrame showing whether each element in the DataFrame is contained in values

它返回布尔数据帧,显示数据帧中的每个元素是否包含在值中

Note that isinaccepts an iterableas input, thus we need to pass a list containing the target symbol to this function. df.isin(['?'])will return a boolean dataframe as follows.

请注意,isin接受一个可迭代对象作为输入,因此我们需要将包含目标符号的列表传递给该函数。df.isin(['?'])将返回一个布尔数据帧,如下所示。

    symboling   normalized-losses   make    fuel-type   aspiration-ratio ...
0   False       True                False   False       False
1   False       True                False   False       False
2   False       True                False   False       False
3   False       False               False   False       False
4   False       False               False   False       False
5   False       True                False   False       False
...

To count the number of occurrence of the target symbol in each column, let's take sumover all the rows of the above dataframe by indicating axis=0. The final (truncated) result shows what we expect:

为了计算每列中目标符号的出现次数,让我们sum通过指示来接管上述数据帧的所有行axis=0。最终(截断的)结果显示了我们的期望:

symboling             0
normalized-losses    41
...
bore                  4
stroke                4
compression-ratio     0
horsepower            2
peak-rpm              2
city-mpg              0
highway-mpg           0
price                 4

回答by keramat

easy but not efficient:

简单但效率不高:

list(df.education).count('9th')