Python 在 Pandas DataFrame 中使用 set_index
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18071222/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Working with set_index in Pandas DataFrame
提问by TravisVOX
Using an imported CSV file, I indexed the DataFrame like this...
使用导入的 CSV 文件,我像这样对 DataFrame 进行了索引...
rdata.set_index(['race_date', 'track_code', 'race_number', 'horse_name'])
This is what a section of the DataFrame looks like...
这是 DataFrame 的一部分的样子......
race_date track_code race_number horse_name work_date work_track
2007-08-24 BM 8 Count Me Twice 2007-05-31 PLN
Count Me Twice 2007-06-09 PLN
Count Me Twice 2007-06-16 PLN
Count Me Twice 2007-06-23 PLN
Count Me Twice 2007-08-05 PLN
Judge's Choice 2007-06-07 BM
Judge's Choice 2007-06-14 BM
Judge's Choice 2007-07-08 BM
Judge's Choice 2007-08-18 BM
Why isn't the 'horse_name' column being grouped like the date, track and race? Perhaps it's by design, thus how can I slice this larger DataFrame by race to have a new DataFrame with 'horse_name' as its index?
为什么“horse_name”列不像日期、赛道和比赛那样分组?也许是设计使然,因此我如何通过竞争来分割这个更大的 DataFrame 以获得一个以“horse_name”作为索引的新 DataFrame?
采纳答案by Viktor Kerkez
It's not a bug. This is exactly how it's intended to work.
这不是一个错误。这正是它的工作方式。
DataFrame has to show show every single item in it's data. So if the index has one level, that level will be fully expanded. If it has two levels, first level will be grouped and the second will be fully expanded, if it has tree levels, first two will be grouped and the third will be expanded, and so on.
DataFrame 必须显示其数据中的每个项目。因此,如果指数有一个级别,则该级别将完全扩展。如果有两层,第一层将被分组,第二层将完全展开,如果它有树层,则前两层将被分组,第三层将展开,以此类推。
So this is why the horse name is not grouped. How would you be able to see all the items in the DataFrame if you group also by the horse name :)
所以这就是为什么马名没有分组的原因。如果您也按马名分组,您将如何看到 DataFrame 中的所有项目:)
Try doing:
尝试做:
rdata.set_index(['race_date', 'track_code', 'race_number'])
or:
或者:
rdata.set_index(['race_date', 'track_code'])
You'll see that the last level of the index is always fully expanded, to enable you to see all the items in the DataFrame.
您将看到索引的最后一级始终完全展开,以便您可以查看 DataFrame 中的所有项目。