pandas 键错误和 MultiIndex 词法排序深度
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24922867/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
key error and MultiIndex lexsort depth
提问by Rad
I have a set of tab delimited files that I have to go through read them, use them as pandas dataframe, do a whole bunch of operations on them and then merge them back to one excel file, the code is too long so I am going to go through the problematic part of it
我有一组制表符分隔的文件,我必须阅读它们,将它们用作 Pandas 数据框,对它们进行大量操作,然后将它们合并回一个 excel 文件,代码太长,所以我要走了去解决它的问题部分
The tab files that I am parsing contains all the same number of rows 2177
我正在解析的选项卡文件包含所有相同的行数 2177
When I read these files I am indexing by the first 2 columns of type (string, int)
当我阅读这些文件时,我按类型 (string, int) 的前 2 列进行索引
df = df.set_index(['id', 'coord'])
data = OrderedDict()
#data will contain all the information I am writing to excel
data[filename_id] = df
one of the procedures I am doing needs access to each row of data[sample_id] which contains dataframe of mixed types indexed with the columns 'id' and 'coord', like this
我正在执行的程序之一需要访问每一行数据 [sample_id],其中包含以“id”和“coord”列索引的混合类型数据框,如下所示
sample_row = data[sample].ix[index]
my index being ('id','coord')
我的索引是 ('id','coord')
If I am treating a subset of the file everything works great, but If I read the entire files with 2177 lines I end up having this error message
如果我正在处理文件的一个子集,一切都很好,但是如果我用 2177 行读取整个文件,我最终会收到此错误消息
KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)'
I searched over SO and everywhere and it seems that this is an issue of sorting the index, but I dont understand why using an unsorted subset do not cause the problem
我搜索了 SO 和所有地方,似乎这是对索引进行排序的问题,但我不明白为什么使用未排序的子集不会导致问题
Any idea on how I can get this sorted out ?
关于如何解决这个问题的任何想法?
Thanks
谢谢
采纳答案by Jeff
Docs are quite good. If you work with multi-indexes it pays to read them thru (several times!), see here
文档非常好。如果您使用多索引,则需要通读(多次!),请参阅此处
In [9]: df = DataFrame(np.arange(9).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[1,2,3],['a','b','c']],names=['one','two']))
In [10]: df
Out[10]:
value
one two
1 a 0
b 1
c 2
2 a 3
b 4
c 5
3 a 6
b 7
c 8
In [11]: df.index.lexsort_depth
Out[11]: 2
In [12]: df.sortlevel(level=1)
Out[12]:
value
one two
1 a 0
2 a 3
3 a 6
1 b 1
2 b 4
3 b 7
1 c 2
2 c 5
3 c 8
In [13]: df.sortlevel(level=1).index.lexsort_depth
Out[13]: 0
In [9]: df = DataFrame(np.arange(9).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[1,2,3],['a','b','c']],names=['one','two']))
In [10]: df
Out[10]:
value
one two
1 a 0
b 1
c 2
2 a 3
b 4
c 5
3 a 6
b 7
c 8
In [11]: df.index.lexsort_depth
Out[11]: 2
In [12]: df.sortlevel(level=1)
Out[12]:
value
one two
1 a 0
2 a 3
3 a 6
1 b 1
2 b 4
3 b 7
1 c 2
2 c 5
3 c 8
In [13]: df.sortlevel(level=1).index.lexsort_depth
Out[13]: 0
Update:
更新:
sortlevelwill be deprecated so use sort_indexi.e
sortlevel将被弃用,所以使用sort_indexie
df.sort_index(level=1)

