Python 如何根据值计数过滤 Pandas DataFrame？

Question

提问by uchuujin

I'm working in Python with a pandas DataFrame of video games, each with a genre. I'm trying to remove any video game with a genre that appears less than some number of times in the DataFrame, but I have no clue how to go about this. I did find a StackOverflow questionthat seems to be related, but I can't decipher the solution at all (possibly because I've never heard of R and my memory of functional programming is rusty at best).

我正在 Python 中使用 Pandas DataFrame 的视频游戏，每个游戏都有一个流派。我正在尝试删除任何类型在 DataFrame 中出现次数少于一定次数的视频游戏，但我不知道如何解决这个问题。我确实找到了一个似乎相关的 StackOverflow 问题，但我根本无法破译解决方案（可能是因为我从未听说过 R 并且我对函数式编程的记忆充其量是生疏的）。

Help?

帮助？

Answer 1

采纳答案by Andy Hayden

Use groupby filter:

使用groupby 过滤器：

In [11]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  1  4
2  5  6

In [13]: df.groupby("A").filter(lambda x: len(x) > 1)
Out[13]:
   A  B
0  1  2
1  1  4

I recommend reading the split-combine-section of the docs.

我建议阅读文档的split-combine-section。

Answer 2

回答by jezrael

Solutions with better performance should be GroupBy.transformwith sizefor count per groups to Series with same size like original df, so possible filter by boolean indexing:

性能更好的解决方案应该是GroupBy.transform与size每团体计数系列与像原来一样大小df的，所以可能的过滤器boolean indexing：

df1 = df[df.groupby("A")['A'].transform('size') > 1]

Or use Series.mapwith Series.value_counts:

或者使用Series.map具有Series.value_counts：

df1 = df[df['A'].map(df['A'].value_counts()) > 1]

Python 如何根据值计数过滤 Pandas DataFrame？

提问by uchuujin

采纳答案by Andy Hayden

回答by jezrael

相关推荐

最近更新

标签

Python 如何根据值计数过滤 Pandas DataFrame？

提问by uchuujin

采纳答案by Andy Hayden

回答by jezrael

相关推荐

Python 如何在pyqt中创建模态窗口？

Python 如何测试具有特定名称的 Enum 成员是否存在？

对python os.path.abspath的误解

Python 检查 DataFrame 中的哪些列是分类的

相关推荐

最近更新

标签