Pandas 中的区间数据类型 - 查找中点、左侧、中心等

Question

提问by penguin

In pandas 20.1, with the interval type, is it possible to find the midpoint, left or center values in a series.

在 pandas 20.1 中，使用区间类型，是否可以找到一系列中的中点、左值或中心值。

For example:

例如：

Create an interval datatype column, and perform some aggregation calculations over these intervals:
```
df_Stats = df.groupby(['month',pd.cut(df['Distances'], np.arange(0, 135,1))]).agg(aggregations)
```

创建一个区间数据类型列，并对这些区间执行一些聚合计算：

df_Stats = df.groupby(['month',pd.cut(df['Distances'], np.arange(0, 135,1))]).agg(aggregations)

This returns df_Stats with an interval column datatype : df['Distances']

这将返回具有间隔列数据类型的 df_Stats ： df['Distances']

Now I want to associate the left end of the interval to the result of these aggregations using a series function:
```
df['LeftEnd'] = df['Distances'].left
```

现在我想使用系列函数将区间的左端与这些聚合的结果相关联：
```
df['LeftEnd'] = df['Distances'].left
```

However, I can run this element wise:

但是，我可以明智地运行这个元素：

    df.loc[0]['LeftEnd'] = df.loc[0]['Distances'].left

This works. Thoughts?

这有效。想法？

Answer 1

采纳答案by Jeff

So pd.cut()actually creates a CategoricalIndex, with an IntervalIndexas the categories.

所以pd.cut()实际上创建了一个CategoricalIndex,IntervalIndex作为类别。

In [13]: df = pd.DataFrame({'month': [1, 1, 2, 2], 'distances': range(4), 'value': range(4)})

In [14]: df
Out[14]: 
   distances  month  value
0          0      1      0
1          1      1      1
2          2      2      2
3          3      2      3

In [15]: result = df.groupby(['month', pd.cut(df.distances, 2)]).value.mean()

In [16]: result
Out[16]: 
month  distances    
1      (-0.003, 1.5]    0.5
2      (1.5, 3.0]       2.5
Name: value, dtype: float64

You can simply coerce them to an IntervalIndex(this also works if they are a column), then access.

您可以简单地将它们强制转换为IntervalIndex（如果它们是列，这也适用），然后访问。

In [17]: pd.IntervalIndex(result.index.get_level_values('distances')).left
Out[17]: Float64Index([-0.003, 1.5], dtype='float64')

In [18]: pd.IntervalIndex(result.index.get_level_values('distances')).right
Out[18]: Float64Index([1.5, 3.0], dtype='float64')

In [19]: pd.IntervalIndex(result.index.get_level_values('distances')).mid
Out[19]: Float64Index([0.7485, 2.25], dtype='float64')

Answer 2

回答by Mahesh Babu J

Say 'cut' is the column nameafter performing pd.cut.

说'cut' 是执行 pd.cut 后的列名。

instead of ->

而不是 ->

 df['LeftEnd'] = df['Distances'].left

perform one of the following -->

执行以下操作之一 -->

 df['LeftEnd'] = df['cut'].apply(lambda x: x.left)

 df['LeftEnd'] = df['cut'].apply(lambda x: x.left).astype(str)

Pandas 中的区间数据类型 - 查找中点、左侧、中心等

提问by penguin

采纳答案by Jeff

回答by Mahesh Babu J

相关推荐

最近更新

标签

Pandas 中的区间数据类型 - 查找中点、左侧、中心等

提问by penguin

采纳答案by Jeff

回答by Mahesh Babu J

相关推荐

Python、Pandas 和卡方独立性检验

pandas 不推荐使用 imp 模块以支持 importlib

pandas 检查熊猫 df.iterrows() 中的最后一行是否

pandas 类型错误：不支持的操作数类型 -：python 3.x Anaconda 中的“str”和“str”

相关推荐

最近更新

标签