Python 基于熊猫范围的 bin 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31736671/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bin values based on ranges with pandas
提问by pam
I have multiple CSV files with values like this in a folder:
我在一个文件夹中有多个具有以下值的 CSV 文件:
The GroupID.csv is the filename. There are multiple files like this, but the value ranges are defined in the same XML file. I'm trying to group them How can I do that?
GroupID.csv 是文件名。有多个这样的文件,但值范围是在同一个 XML 文件中定义的。我正在尝试将它们分组,我该怎么做?
UPDATE1: Based on BobHaffner's comments, I've done this
UPDATE1:根据 BobHaffner 的评论,我已经做到了
import pandas as pd
import glob path =r'path/to/files'
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
df = pd.read_csv(file_,index_col=None, header=None)
df['file'] = os.path.basename('path/to/files/'+file_)
list_.append(df)
frame = pd.concat(list_)
print frame
to get something like this:
得到这样的东西:
I need to group the values based on the bins from the XML file. I'd truly appreciate any help.
我需要根据 XML 文件中的 bin 对值进行分组。我真的很感激任何帮助。
采纳答案by firelynx
In order to bucket your series, you should use the pd.cut()
function, like this:
为了你的桶系列,你应该使用的pd.cut()
功能,就像这样:
df['bin'] = pd.cut(df['1'], [0, 50, 100,200])
0 1 file bin
0 person1 24 age.csv (0, 50]
1 person2 17 age.csv (0, 50]
2 person3 98 age.csv (50, 100]
3 person4 6 age.csv (0, 50]
4 person2 166 Height.csv (100, 200]
5 person3 125 Height.csv (100, 200]
6 person5 172 Height.csv (100, 200]
If you want to name the bins yourself, you can use the labels=
argument, like this:
如果您想自己命名垃圾箱,可以使用labels=
参数,如下所示:
df['bin'] = pd.cut(df['1'], [0, 50, 100,200], labels=['0-50', '50-100', '100-200'])
0 1 file bin
0 person1 24 age.csv 0-50
1 person2 17 age.csv 0-50
2 person3 98 age.csv 50-100
3 person4 6 age.csv 0-50
4 person2 166 Height.csv 100-200
5 person3 125 Height.csv 100-200
6 person5 172 Height.csv 100-200