熊猫系列（pandas.Series.query（））是否有查询方法或类似方法？

Question

提问by dmeu

The pandas.DataFrame.query()method is of great usage for (pre/post)-filtering data when loading or plotting. It comes particularly handy for method chaining.

该pandas.DataFrame.query()方法对于加载或绘图时的（前/后）过滤数据非常有用。它对于方法链特别方便。

I find myself often wanting to apply the same logic to a pandas.Series, e.g. after having done a method such as df.value_countswhich returns a pandas.Series.

我发现自己经常想对 a 应用相同的逻辑pandas.Series，例如在完成诸如df.value_countswhich 返回 a 之类的方法之后pandas.Series。

Example

例子

Lets assume there is a huge table with the columns Player, Game, Pointsand I want to plot a histogram of the players with more than 14 times 3 points. I first have to sum the points of each player (groupby -> agg) which will return a Series of ~1000 players and their overall points. Applying the .querylogic it would look something like this:

让我们假设有一个带有列的巨大表格，Player, Game, Points我想绘制超过 14 倍 3 分的玩家的直方图。我首先必须对每个玩家 ( groupby -> agg)的积分求和，这将返回一系列 ~1000 名玩家及其总分。应用.query逻辑它看起来像这样：

df = pd.DataFrame({
    'Points': [random.choice([1,3]) for x in range(100)], 
    'Player': [random.choice(["A","B","C"]) for x in range(100)]})

(df
     .query("Points == 3")
     .Player.values_count()
     .query("> 14")
     .hist())

The only solutions I find force me to do an unnecessary assignment and break the method chaining:

我找到的唯一解决方案迫使我做一个不必要的分配并打破方法链：

(points_series = df
     .query("Points == 3")
     .groupby("Player").size()
points_series[points_series > 100].hist()

Method chaining as well as the query method help to keep the code legible meanwhile the subsetting-filtering can get messy quite quickly.

方法链和查询方法有助于保持代码清晰，同时子集过滤会很快变得混乱。

# just to make my point :)
series_bestplayers_under_100[series_prefiltered_under_100 > 0].shape

Please help me out of my dilemma! Thanks

请帮助我摆脱困境！谢谢

Answer 1

采纳答案by jezrael

IIUC you can add query("Points > 100"):

IIUC 你可以添加query("Points > 100")：

df = pd.DataFrame({'Points':[50,20,38,90,0, np.Inf],
                   'Player':['a','a','a','s','s','s']})

print (df)
  Player     Points
0      a  50.000000
1      a  20.000000
2      a  38.000000
3      s  90.000000
4      s   0.000000
5      s        inf

points_series = df.query("Points < inf").groupby("Player").agg({"Points": "sum"})['Points']
print (points_series)     
a = points_series[points_series > 100]
print (a)     
Player
a    108.0
Name: Points, dtype: float64


points_series = df.query("Points < inf")
                  .groupby("Player")
                  .agg({"Points": "sum"})
                  .query("Points > 100")

print (points_series)     
        Points
Player        
a        108.0

Another solution is Selection By Callable:

另一种解决方案是Selection By Callable：

points_series = df.query("Points < inf")
                  .groupby("Player")
                  .agg({"Points": "sum"})['Points']
                  .loc[lambda x: x > 100]

print (points_series)     
Player
a    108.0
Name: Points, dtype: float64

Edited answer by edited question:

编辑问题的编辑答案：

np.random.seed(1234)
df = pd.DataFrame({
    'Points': [np.random.choice([1,3]) for x in range(100)], 
    'Player': [np.random.choice(["A","B","C"]) for x in range(100)]})

print (df.query("Points == 3").Player.value_counts().loc[lambda x: x > 15])
C    19
B    16
Name: Player, dtype: int64

print (df.query("Points == 3").groupby("Player").size().loc[lambda x: x > 15])
Player
B    16
C    19
dtype: int64

Answer 2

回答by Martin

Why not convert from Series to DataFrame, do the querying, and then convert back.

为什么不从 Series 转换为 DataFrame，进行查询，然后再转换回来。

df["Points"] = df["Points"].to_frame().query('Points > 100')["Points"]

Here, .to_frame()converts to DataFrame, while the trailing ["Points"]converts to Series.

在这里，.to_frame()转换为 DataFrame，而尾随["Points"]转换为 Series。

The method .query()can then be used consistently whether or not the Pandas object has 1 or more columns.

.query()无论 Pandas 对象是否有 1 列或更多列，该方法都可以一致地使用。

Answer 3

回答by Ilya Prokin

Instead of query you can use pipe:

您可以使用而不是查询pipe：

s.pipe(lambda x: x[x>0]).pipe(lambda x: x[x<10])

熊猫系列（pandas.Series.query（））是否有查询方法或类似方法？

提问by dmeu

Example

例子

采纳答案by jezrael

回答by Martin

回答by Ilya Prokin

相关推荐

最近更新

标签

熊猫系列（pandas.Series.query（））是否有查询方法或类似方法？

提问by dmeu

Example

例子

采纳答案by jezrael

回答by Martin

回答by Ilya Prokin

相关推荐

pandas 将系列设置为索引

pandas 即使在熊猫中使用 .loc 后，也会收到 SettingWithCopyWarning 警告

pandas 熊猫系列/数据框的对数

pandas read_sql 异常缓慢

相关推荐

最近更新

标签