Python Pandas 按二级索引(或任何其他级别)对多索引进行切片
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33194016/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas slice multiindex by second level index (or any other level)
提问by raummensch
There are many postings on slicing the level[0] of a multiindex by a range of level1. However, I cannot find a solution for my problem; that is, I need a range of the level1index for level[0] index values
有很多关于将多索引的 level[0] 按级别1的范围切片的帖子。但是,我找不到解决问题的方法;也就是说,我需要level[0] 索引值的 level 1索引范围
dataframe: First is A to Z, Rank is 1 to 400; I need the first 2 and last 2 for each level[0] (First), but not in the same step.
数据框:第一个是A到Z,Rank是1到400;我需要每个级别[0](第一)的前2个和最后2个,但不在同一步骤中。
Title Score
First Rank
A 1 foo 100
2 bar 90
3 lime 80
4 lame 70
B 1 foo 400
2 lime 300
3 lame 200
4 dime 100
I am trying to get the last 2 rows for each level1index with the below code, but it slices properly only for the first level[0] value.
我正在尝试使用以下代码获取每个1级索引的最后 2 行,但它仅针对第一个级别 [0] 值正确切片。
[IN] df.ix[x.index.levels[1][-2]:]
[OUT]
Title Score
First Rank
A 3 lime 80
4 lame 70
B 1 foo 400
2 lime 300
3 lame 200
4 dime 100
The first 2 rows I get by swapping the indices, but I cannot make it work for the last 2 rows.
我通过交换索引得到的前 2 行,但我不能使它适用于最后 2 行。
df.index = df.index.swaplevel("Rank", "First")
df= df.sortlevel() #to sort by Rank
df.ix[1:2] #Produces the first 2 ranks with 2 level[1] (First) each.
Title Score
Rank First
1 A foo 100
B foo 400
2 A bar 90
B lime 300
Of course I can swap this back to get this:
当然我可以把它换回来得到这个:
df2 = df.ix[1:2]
df2.index = ttt.index.swaplevel("First","rank") #change the order of the indices back.
df2.sortlevel()
Title Score
First Rank
A 1 foo 100
2 bar 90
B 1 foo 400
2 lime 300
Any help is appreciated to get with the same procedure:
感谢您使用相同的程序获得任何帮助:
- Last 2 rows for index1(Rank)
- And a better way to get the first 2 rows
- 索引1 的最后 2 行(排名)
- 以及获得前 2 行的更好方法
Edit following feedback by @ako:
编辑@ako 的以下反馈:
Using pd.IndexSlice
truly makes it easy to slice any level index. Here a more generic solution and below my step-wise approach to get the first and last two rows. More information here: http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers
使用pd.IndexSlice
真正可以轻松地对任何级别的索引进行切片。这是一个更通用的解决方案,下面是我获取第一行和最后两行的逐步方法。更多信息:http: //pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers
"""
Slicing a dataframe at the level[2] index of the
major axis (row) for specific and at the level[1] index for columns.
"""
df.loc[idx[:,:,['some label','another label']],idx[:,'yet another label']]
"""
Thanks to @ako below is my solution, including how I
get the top and last 2 rows.
"""
idx = pd.IndexSlice
# Top 2
df.loc[idx[:,[1,2],:] #[1,2] is NOT a row index, it is the rank label.
# Last 2
max = len(df.index.levels[df.index.names.index("rank")]) # unique rank labels
last2=[x for x in range(max-2,max)]
df.loc[idx[:,last2],:] #for last 2 - assuming all level[0] have the same lengths.
回答by ako
Use an indexer to slice arbitrary values in arbitrary dimensions--just pass a list with whatever the desired levels / values are for that dimension.
使用索引器在任意维度中对任意值进行切片——只需传递一个列表,其中包含该维度所需的级别/值。
idx = pd.IndexSlice
df.loc[idx[:,[3,4]],:]
Title Score
First Rank
A 3 lime 80
4 lame 70
B 3 lame 200
4 dime 100
For reproducing the data:
为了再现数据:
from StringIO import StringIO
s="""
First Rank Title Score
A 1 foo 100
A 2 bar 90
A 3 lime 80
A 4 lame 70
B 1 foo 400
B 2 lime 300
B 3 lame 200
B 4 dime 100
"""
df = pd.read_csv(StringIO(s),
sep='\s+',
index_col=["First", "Rank"])
回答by Ash
Another way to slice by 2nd (sub) level in a multi level index is to Use slice(None)
with .loc[]
. Using slice(None)
for a level indicates that particular index is not being sliced, then pass a single item or list for the index that is being sliced. Hope it helps future readers
在多级索引中按第二(子)级切片的另一种方法是使用slice(None)
with .loc[]
。使用slice(None)
for a level 表示未对特定索引进行切片,然后为正在切片的索引传递单个项目或列表。希望对未来的读者有所帮助
df.loc[ ( slice(None), [3, 4] ), : ]