pandas 在熊猫中,如何使用列索引而不是引用列名来设置索引?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38064971/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:28:25  来源:igfitidea点击:

In pandas, how to set_index with using column index instead of referring to column names?

pandas

提问by T.Yun

For example:

例如:

We have a Pandas dataFrame foo with 2 columns ['A', 'B'].

我们有一个带有 2 列 ['A', 'B'] 的 Pandas 数据框 foo。

I want to do function like foo.set_index([0,1])instead of foo.set_index(['A', 'B'])

我想做类似的功能 foo.set_index([0,1])而不是 foo.set_index(['A', 'B'])

Have tried foo.set_index([[0,.1]])as well but came with this error:

也尝试过foo.set_index([[0,.1]]),但出现此错误:

Length mismatch: Expected axis has 9 elements, new values have 2 elements

长度不匹配:预期轴有 9 个元素,新值有 2 个元素

回答by unutbu

If the column index is uniqueyou could use:

如果列索引是唯一的,您可以使用:

df.set_index(list(df.columns[cols]))

where colsis a list of ordinal indices.

其中cols是有序索引列表。



For example,

例如,

In [77]: np.random.seed(2016)

In [79]: df = pd.DataFrame(np.random.randint(10, size=(5,4)), columns=list('ABCD'))

In [80]: df
Out[80]: 
   A  B  C  D
0  3  7  2  3
1  8  4  8  7
2  9  2  6  3
3  4  1  9  1
4  2  2  8  9

In [81]: df.set_index(list(df.columns[[0,2]]))
Out[81]: 
     B  D
A C      
3 2  7  3
8 8  4  7
9 6  2  3
4 9  1  1
2 8  2  9


If the DataFrame's column index is not unique, then setting the index by label is impossible and by ordinals more complicated:

如果 DataFrame 的列索引不是 unique,那么按标签设置索引是不可能的,按序数设置更复杂:

import numpy as np
import pandas as pd
np.random.seed(2016)

def set_ordinal_index(df, cols):
    columns, df.columns = df.columns, np.arange(len(df.columns))
    mask = df.columns.isin(cols)
    df = df.set_index(cols)
    df.columns = columns[~mask]
    df.index.names = columns[mask]
    return df

df = pd.DataFrame(np.random.randint(10, size=(5,4)), columns=list('AAAA'))
print(set_ordinal_index(df, [0,2]))

yields

产量

     A  A
A A      
3 2  7  3
8 8  4  7
9 6  2  3
4 9  1  1
2 8  2  9