Python 基于熊猫范围的 bin 值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31736671/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:29:26  来源:igfitidea点击:

Bin values based on ranges with pandas

pythoncsvnumpypandas

提问by pam

I have multiple CSV files with values like this in a folder:

我在一个文件夹中有多个具有以下值的 CSV 文件:

The GroupID.csv is the filename. There are multiple files like this, but the value ranges are defined in the same XML file. I'm trying to group them How can I do that?

GroupID.csv 是文件名。有多个这样的文件,但值范围是在同一个 XML 文件中定义的。我正在尝试将它们分组,我该怎么做?

UPDATE1: Based on BobHaffner's comments, I've done this

UPDATE1:根据 BobHaffner 的评论,我已经做到了

import pandas as pd 
import glob path =r'path/to/files' 
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=None)
    df['file'] = os.path.basename('path/to/files/'+file_)
    list_.append(df)
frame = pd.concat(list_)
print frame

to get something like this:

得到这样的东西:

I need to group the values based on the bins from the XML file. I'd truly appreciate any help.

我需要根据 XML 文件中的 bin 对值进行分组。我真的很感激任何帮助。

采纳答案by firelynx

In order to bucket your series, you should use the pd.cut()function, like this:

为了你的桶系列,你应该使用pd.cut()功能,就像这样:

df['bin'] = pd.cut(df['1'], [0, 50, 100,200])

         0    1        file         bin
0  person1   24     age.csv     (0, 50]
1  person2   17     age.csv     (0, 50]
2  person3   98     age.csv   (50, 100]
3  person4    6     age.csv     (0, 50]
4  person2  166  Height.csv  (100, 200]
5  person3  125  Height.csv  (100, 200]
6  person5  172  Height.csv  (100, 200]

If you want to name the bins yourself, you can use the labels=argument, like this:

如果您想自己命名垃圾箱,可以使用labels=参数,如下所示:

df['bin'] = pd.cut(df['1'], [0, 50, 100,200], labels=['0-50', '50-100', '100-200'])

         0    1        file      bin
0  person1   24     age.csv     0-50
1  person2   17     age.csv     0-50
2  person3   98     age.csv   50-100
3  person4    6     age.csv     0-50
4  person2  166  Height.csv  100-200
5  person3  125  Height.csv  100-200
6  person5  172  Height.csv  100-200