pandas 键错误和 MultiIndex 词法排序深度

Question

提问by Rad

I have a set of tab delimited files that I have to go through read them, use them as pandas dataframe, do a whole bunch of operations on them and then merge them back to one excel file, the code is too long so I am going to go through the problematic part of it

我有一组制表符分隔的文件，我必须阅读它们，将它们用作 Pandas 数据框，对它们进行大量操作，然后将它们合并回一个 excel 文件，代码太长，所以我要走了去解决它的问题部分

The tab files that I am parsing contains all the same number of rows 2177

我正在解析的选项卡文件包含所有相同的行数 2177

When I read these files I am indexing by the first 2 columns of type (string, int)

当我阅读这些文件时，我按类型 (string, int) 的前 2 列进行索引

df = df.set_index(['id', 'coord'])
data = OrderedDict()
#data will contain all the information I am writing to excel
data[filename_id] = df

one of the procedures I am doing needs access to each row of data[sample_id] which contains dataframe of mixed types indexed with the columns 'id' and 'coord', like this

我正在执行的程序之一需要访问每一行数据 [sample_id]，其中包含以“id”和“coord”列索引的混合类型数据框，如下所示

sample_row = data[sample].ix[index]

my index being ('id','coord')

我的索引是 ('id','coord')

If I am treating a subset of the file everything works great, but If I read the entire files with 2177 lines I end up having this error message

如果我正在处理文件的一个子集，一切都很好，但是如果我用 2177 行读取整个文件，我最终会收到此错误消息

KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)'

I searched over SO and everywhere and it seems that this is an issue of sorting the index, but I dont understand why using an unsorted subset do not cause the problem

我搜索了 SO 和所有地方，似乎这是对索引进行排序的问题，但我不明白为什么使用未排序的子集不会导致问题

Any idea on how I can get this sorted out ?

关于如何解决这个问题的任何想法？

Thanks

谢谢

Answer 1

采纳答案by Jeff

Docs are quite good. If you work with multi-indexes it pays to read them thru (several times!), see here

文档非常好。如果您使用多索引，则需要通读（多次！），请参阅此处

In [9]: df = DataFrame(np.arange(9).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[1,2,3],['a','b','c']],names=['one','two']))

In [10]: df
Out[10]: 
         value
one two       
1   a        0
    b        1
    c        2
2   a        3
    b        4
    c        5
3   a        6
    b        7
    c        8

In [11]: df.index.lexsort_depth
Out[11]: 2

In [12]: df.sortlevel(level=1)
Out[12]: 
         value
one two       
1   a        0
2   a        3
3   a        6
1   b        1
2   b        4
3   b        7
1   c        2
2   c        5
3   c        8

In [13]: df.sortlevel(level=1).index.lexsort_depth
Out[13]: 0

In [9]: df = DataFrame(np.arange(9).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[1,2,3],['a','b','c']],names=['one','two']))

In [10]: df
Out[10]: 
         value
one two       
1   a        0
    b        1
    c        2
2   a        3
    b        4
    c        5
3   a        6
    b        7
    c        8

In [11]: df.index.lexsort_depth
Out[11]: 2

In [12]: df.sortlevel(level=1)
Out[12]: 
         value
one two       
1   a        0
2   a        3
3   a        6
1   b        1
2   b        4
3   b        7
1   c        2
2   c        5
3   c        8

In [13]: df.sortlevel(level=1).index.lexsort_depth
Out[13]: 0

Update:

更新：

sortlevelwill be deprecated so use sort_indexi.e

sortlevel将被弃用，所以使用sort_indexie

df.sort_index(level=1)

pandas 键错误和 MultiIndex 词法排序深度

提问by Rad

采纳答案by Jeff

相关推荐

最近更新

标签

pandas 键错误和 MultiIndex 词法排序深度

提问by Rad

采纳答案by Jeff

相关推荐

Pandas 在名称和最近日期合并

在 Pandas 中重命名“无”值

pandas 如何使用月/年分辨率（用几行代码）绘制熊猫时间序列？

pandas 熊猫 - 非常非常慢

相关推荐

最近更新

标签