pandas 在熊猫中,如何使用列索引而不是引用列名来设置索引?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38064971/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In pandas, how to set_index with using column index instead of referring to column names?
提问by T.Yun
For example:
例如:
We have a Pandas dataFrame foo with 2 columns ['A', 'B'].
我们有一个带有 2 列 ['A', 'B'] 的 Pandas 数据框 foo。
I want to do function like
foo.set_index([0,1])
instead of
foo.set_index(['A', 'B'])
我想做类似的功能
foo.set_index([0,1])
而不是
foo.set_index(['A', 'B'])
Have tried foo.set_index([[0,.1]])
as well but came with this error:
也尝试过foo.set_index([[0,.1]])
,但出现此错误:
Length mismatch: Expected axis has 9 elements, new values have 2 elements
长度不匹配:预期轴有 9 个元素,新值有 2 个元素
回答by unutbu
If the column index is uniqueyou could use:
如果列索引是唯一的,您可以使用:
df.set_index(list(df.columns[cols]))
where cols
is a list of ordinal indices.
其中cols
是有序索引列表。
For example,
例如,
In [77]: np.random.seed(2016)
In [79]: df = pd.DataFrame(np.random.randint(10, size=(5,4)), columns=list('ABCD'))
In [80]: df
Out[80]:
A B C D
0 3 7 2 3
1 8 4 8 7
2 9 2 6 3
3 4 1 9 1
4 2 2 8 9
In [81]: df.set_index(list(df.columns[[0,2]]))
Out[81]:
B D
A C
3 2 7 3
8 8 4 7
9 6 2 3
4 9 1 1
2 8 2 9
If the DataFrame's column index is not unique, then setting the index by label is impossible and by ordinals more complicated:
如果 DataFrame 的列索引不是 unique,那么按标签设置索引是不可能的,按序数设置更复杂:
import numpy as np
import pandas as pd
np.random.seed(2016)
def set_ordinal_index(df, cols):
columns, df.columns = df.columns, np.arange(len(df.columns))
mask = df.columns.isin(cols)
df = df.set_index(cols)
df.columns = columns[~mask]
df.index.names = columns[mask]
return df
df = pd.DataFrame(np.random.randint(10, size=(5,4)), columns=list('AAAA'))
print(set_ordinal_index(df, [0,2]))
yields
产量
A A
A A
3 2 7 3
8 8 4 7
9 6 2 3
4 9 1 1
2 8 2 9