pandas Python：在计数条件下删除行

Question

提问by Devin Lee

I have a problem filtering a pandasdataframe.

我在过滤pandas数据框时遇到问题。

city 
NYC 
NYC 
NYC 
NYC 
SYD 
SYD 
SEL 
SEL
...

df.city.value_counts()

I would like to remove rows of cities that has less than 4 count frequency, which would be SYD and SEL for instance.

我想删除计数频率小于 4 的城市行，例如 SYD 和 SEL。

What would be the way to do so without manually dropping them city by city?

如果不逐个城市手动删除它们，有什么方法可以做到这一点？

Answer 1

回答by YOBEN_S

Here you go with filter

给你带过滤器

df.groupby('city').filter(lambda x : len(x)>3)
Out[1743]: 
  city
0  NYC
1  NYC
2  NYC
3  NYC

Solution two transform

解决方案二 transform

sub_df = df[df.groupby('city').city.transform('count')>3].copy() 
# add copy for future warning when you need to modify the sub df

Answer 2

回答by jpp

This is one way using pd.Series.value_counts.

这是使用pd.Series.value_counts.

counts = df['city'].value_counts()

res = df[~df['city'].isin(counts[counts < 5].index)]

Answer 3

回答by Aaron N. Brock

I think you're looking for value_counts()

我想你正在寻找 value_counts()

# Import the great and powerful pandas
import pandas as pd

# Create some example data
df = pd.DataFrame({
    'city': ['NYC', 'NYC', 'SYD', 'NYC', 'SEL', 'NYC', 'NYC']
})

# Get the count of each value
value_counts = df['city'].value_counts()

# Select the values where the count is less than 3 (or 5 if you like)
to_remove = value_counts[value_counts <= 3].index

# Keep rows where the city column is not in to_remove
df = df[~df.city.isin(to_remove)]

Answer 4

回答by Sruthi V

Another solution :

另一个解决方案：

threshold=3
df['Count'] = df.groupby('City')['City'].transform(pd.Series.value_counts)
df=df[df['Count']>=threshold]
df.drop(['Count'], axis = 1, inplace = True)
print(df)

  City
0  NYC
1  NYC
2  NYC
3  NYC

pandas Python：在计数条件下删除行

提问by Devin Lee

回答by YOBEN_S

回答by jpp

回答by Aaron N. Brock

回答by Sruthi V

相关推荐

最近更新

标签

pandas Python：在计数条件下删除行

提问by Devin Lee

回答by YOBEN_S

回答by jpp

回答by Aaron N. Brock

回答by Sruthi V

相关推荐

pandas 如何将pandas数据帧转换为具有rdd属性的pyspark数据帧？

pandas 您如何解决“找不到隐藏的导入！” pyinstaller 中 scipy 的警告？

pandas 熊猫：返回多列的平均值

pandas 'DataFrame' 对象没有属性 'to_frame'

相关推荐

最近更新

标签