pandas DataFrame:如何使用自定义方式剪切数据框?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40318380/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas DataFrame: How to cut a dataframe using custom ways?
提问by Peng He
I want to cut a DataFrame to several dataframes using my own rules.
我想使用我自己的规则将一个 DataFrame 剪切为多个数据帧。
>>> data = pd.DataFrame({'distance':[1,2,3,4,5,6,7,8,9,10],'values':np.arange(0,1,0.1)})
>>> data
distance values
0 1 0.0
1 2 0.1
2 3 0.2
3 4 0.3
4 5 0.4
5 6 0.5
6 7 0.6
7 8 0.7
8 9 0.8
9 10 0.9
I'll cut data
according to values of distance
column. For example, there's some bins [1,3),[3,8),[8,10),[10,10+)
, if data's column distance
in same bin,I separate them into same group and compute column values
average value or sum value.That is
我会data
根据distance
列的值进行切割。例如,有一些 bins [1,3),[3,8),[8,10),[10,10+)
,如果数据的列distance
在同一个 bin 中,我将它们分成同一组并计算列values
平均值或总和值。那就是
>>> data1 = data[lambda df:(df.distance >= 1) & (df.distance < 3)]
>>> data1
distance values
0 1 0.0
1 2 0.1
>>> np.mean(data1['values'])
0.05
How can I cut origin DataFrame into several groups(and then save them,process them...) efficiently?
如何有效地将原始 DataFrame 分成几组(然后保存它们,处理它们......)?
回答by Bob Baxley
Pandas cutcommand is useful for this:
Pandas cut命令对此很有用:
data['categories']=pd.cut(data['distance'],[-np.inf,1,3,8,10,np.inf],right=False)
data.groupby('categories').mean()
Output:
输出:
distance values
categories
[-inf, 1) NaN NaN
[1, 3) 1.5 0.05
[3, 8) 5.0 0.40
[8, 10) 8.5 0.75
[10, inf) 10.0 0.90