pandas Python:在计数条件下删除行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49735683/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: Removing Rows on Count condition
提问by Devin Lee
I have a problem filtering a pandas
dataframe.
我在过滤pandas
数据框时遇到问题。
city
NYC
NYC
NYC
NYC
SYD
SYD
SEL
SEL
...
df.city.value_counts()
I would like to remove rows of cities that has less than 4 count frequency, which would be SYD and SEL for instance.
我想删除计数频率小于 4 的城市行,例如 SYD 和 SEL。
What would be the way to do so without manually dropping them city by city?
如果不逐个城市手动删除它们,有什么方法可以做到这一点?
回答by YOBEN_S
Here you go with filter
给你带过滤器
df.groupby('city').filter(lambda x : len(x)>3)
Out[1743]:
city
0 NYC
1 NYC
2 NYC
3 NYC
Solution two transform
解决方案二 transform
sub_df = df[df.groupby('city').city.transform('count')>3].copy()
# add copy for future warning when you need to modify the sub df
回答by jpp
This is one way using pd.Series.value_counts
.
这是使用pd.Series.value_counts
.
counts = df['city'].value_counts()
res = df[~df['city'].isin(counts[counts < 5].index)]
回答by Aaron N. Brock
I think you're looking for value_counts()
我想你正在寻找 value_counts()
# Import the great and powerful pandas
import pandas as pd
# Create some example data
df = pd.DataFrame({
'city': ['NYC', 'NYC', 'SYD', 'NYC', 'SEL', 'NYC', 'NYC']
})
# Get the count of each value
value_counts = df['city'].value_counts()
# Select the values where the count is less than 3 (or 5 if you like)
to_remove = value_counts[value_counts <= 3].index
# Keep rows where the city column is not in to_remove
df = df[~df.city.isin(to_remove)]
回答by Sruthi V
Another solution :
另一个解决方案:
threshold=3
df['Count'] = df.groupby('City')['City'].transform(pd.Series.value_counts)
df=df[df['Count']>=threshold]
df.drop(['Count'], axis = 1, inplace = True)
print(df)
City
0 NYC
1 NYC
2 NYC
3 NYC