Python 基于熊猫范围的 bin 值

Question

提问by pam

I have multiple CSV files with values like this in a folder:

我在一个文件夹中有多个具有以下值的 CSV 文件：

The GroupID.csv is the filename. There are multiple files like this, but the value ranges are defined in the same XML file. I'm trying to group them How can I do that?

GroupID.csv 是文件名。有多个这样的文件，但值范围是在同一个 XML 文件中定义的。我正在尝试将它们分组，我该怎么做？

UPDATE1: Based on BobHaffner's comments, I've done this

UPDATE1：根据 BobHaffner 的评论，我已经做到了

import pandas as pd 
import glob path =r'path/to/files' 
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=None)
    df['file'] = os.path.basename('path/to/files/'+file_)
    list_.append(df)
frame = pd.concat(list_)
print frame

to get something like this:

得到这样的东西：

I need to group the values based on the bins from the XML file. I'd truly appreciate any help.

我需要根据 XML 文件中的 bin 对值进行分组。我真的很感激任何帮助。

Answer 1

采纳答案by firelynx

In order to bucket your series, you should use the pd.cut()function, like this:

为了你的桶系列，你应该使用的pd.cut()功能，就像这样：

df['bin'] = pd.cut(df['1'], [0, 50, 100,200])

         0    1        file         bin
0  person1   24     age.csv     (0, 50]
1  person2   17     age.csv     (0, 50]
2  person3   98     age.csv   (50, 100]
3  person4    6     age.csv     (0, 50]
4  person2  166  Height.csv  (100, 200]
5  person3  125  Height.csv  (100, 200]
6  person5  172  Height.csv  (100, 200]

If you want to name the bins yourself, you can use the labels=argument, like this:

如果您想自己命名垃圾箱，可以使用labels=参数，如下所示：

df['bin'] = pd.cut(df['1'], [0, 50, 100,200], labels=['0-50', '50-100', '100-200'])

         0    1        file      bin
0  person1   24     age.csv     0-50
1  person2   17     age.csv     0-50
2  person3   98     age.csv   50-100
3  person4    6     age.csv     0-50
4  person2  166  Height.csv  100-200
5  person3  125  Height.csv  100-200
6  person5  172  Height.csv  100-200

Python 基于熊猫范围的 bin 值

提问by pam

采纳答案by firelynx

相关推荐

最近更新

标签

Python 基于熊猫范围的 bin 值

提问by pam

采纳答案by firelynx

相关推荐

Python 创建大型 Pandas DataFrames：预分配 vs 追加 vs 连接

Python 如何从 PyInstaller PYZ 文件反编译文件

MySQLdb - 检查行是否存在 Python

Python 使用 BeautifulSoup 根据属性提取图像 src

相关推荐

最近更新

标签