Python Pandas 计算特定值的出现次数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35277075/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas Counting the Occurrences of a Specific value
提问by JJSmith
I am trying to find the number of times a certain value appears in one column.
我试图找出某个值出现在一列中的次数。
I have made the dataframe with data = pd.DataFrame.from_csv('data/DataSet2.csv')
我已经制作了数据框 data = pd.DataFrame.from_csv('data/DataSet2.csv')
and now I want to find the number of times something appears in a column. How is this done?
现在我想查找某列中出现的次数。这是怎么做的?
I thought it was the below, where I am looking in the education column and counting the number of time ?
occurs.
我以为是下面,我在教育专栏中查看并计算?
发生的次数。
The code below shows that I am trying to find the number of times 9th
appears and the error is what I am getting when I run the code
下面的代码显示我正在尝试查找9th
出现的次数,错误是我运行代码时得到的
Code
代码
missing2 = df.education.value_counts()['9th']
print(missing2)
Error
错误
KeyError: '9th'
采纳答案by jezrael
You can create subset
of data with your condition and then use shape
or len
:
您可以subset
根据您的条件创建数据,然后使用shape
或len
:
print df
col1 education
0 a 9th
1 b 9th
2 c 8th
print df.education == '9th'
0 True
1 True
2 False
Name: education, dtype: bool
print df[df.education == '9th']
col1 education
0 a 9th
1 b 9th
print df[df.education == '9th'].shape[0]
2
print len(df[df['education'] == '9th'])
2
Performance is interesting, the fastest solution is compare numpy array and sum
:
性能很有趣,最快的解决方案是比较 numpy 数组和sum
:
Code:
代码:
import perfplot, string
np.random.seed(123)
def shape(df):
return df[df.education == 'a'].shape[0]
def len_df(df):
return len(df[df['education'] == 'a'])
def query_count(df):
return df.query('education == "a"').education.count()
def sum_mask(df):
return (df.education == 'a').sum()
def sum_mask_numpy(df):
return (df.education.values == 'a').sum()
def make_df(n):
L = list(string.ascii_letters)
df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])
return df
perfplot.show(
setup=make_df,
kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],
n_range=[2**k for k in range(2, 25)],
logx=True,
logy=True,
equality_check=False,
xlabel='len(df)')
回答by Zero
Couple of ways using count
or sum
使用count
或的几种方式sum
In [338]: df
Out[338]:
col1 education
0 a 9th
1 b 9th
2 c 8th
In [335]: df.loc[df.education == '9th', 'education'].count()
Out[335]: 2
In [336]: (df.education == '9th').sum()
Out[336]: 2
In [337]: df.query('education == "9th"').education.count()
Out[337]: 2
回答by Emmanuel Steiner
Try this:
尝试这个:
(df[education]=='9th').sum()
回答by Minh Vu
An elegant way to count the occurrence of '?'
or any symbol in any column, is to use built-in function isin
of a dataframe object.
计算'?'
任何列中任何符号的出现或任何符号的优雅方法是使用isin
数据帧对象的内置函数。
Suppose that we have loaded the 'Automobile' datasetinto df
object.
We do not know which columns contain missing value ('?'
symbol), so let do:
假设我们已经将“汽车”数据集加载到df
对象中。我们不知道哪些列包含缺失值('?'
符号),所以让我们这样做:
df.isin(['?']).sum(axis=0)
DataFrame.isin(values)
official document says:
DataFrame.isin(values)
官方文件说:
it returns boolean DataFrame showing whether each element in the DataFrame is contained in values
它返回布尔数据帧,显示数据帧中的每个元素是否包含在值中
Note that isin
accepts an iterableas input, thus we need to pass a list containing the target symbol to this function. df.isin(['?'])
will return a boolean dataframe as follows.
请注意,isin
接受一个可迭代对象作为输入,因此我们需要将包含目标符号的列表传递给该函数。df.isin(['?'])
将返回一个布尔数据帧,如下所示。
symboling normalized-losses make fuel-type aspiration-ratio ...
0 False True False False False
1 False True False False False
2 False True False False False
3 False False False False False
4 False False False False False
5 False True False False False
...
To count the number of occurrence of the target symbol in each column, let's take sum
over all the rows of the above dataframe by indicating axis=0
.
The final (truncated) result shows what we expect:
为了计算每列中目标符号的出现次数,让我们sum
通过指示来接管上述数据帧的所有行axis=0
。最终(截断的)结果显示了我们的期望:
symboling 0
normalized-losses 41
...
bore 4
stroke 4
compression-ratio 0
horsepower 2
peak-rpm 2
city-mpg 0
highway-mpg 0
price 4
回答by keramat
easy but not efficient:
简单但效率不高:
list(df.education).count('9th')