Pandas:迭代已经排序的列的唯一值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20664980/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: iterate over unique values of a column that is already in sorted order
提问by Setjmp
I have constructed a pandas data frame in sorted order and would like to iterate over groups having identical values of a particular column. It seems to me that the groupby functionality is useful for this, but as far as I can tell performing groupby does not give any guarantee about the order of the key. How can I extract the unqiue column values in sorted order.
我已经按排序顺序构建了一个 Pandas 数据框,并希望迭代具有特定列的相同值的组。在我看来 groupby 功能对此很有用,但据我所知,执行 groupby 并不能保证密钥的顺序。如何按排序顺序提取 unqiue 列值。
Here is an example data frame:
这是一个示例数据框:
Foo,1
Foo,2
Bar,2
Bar,1
I would like a list ["Foo","Bar"] where the order is guaranteed by the order of the original data frame. I can then use this list to extract appropriate rows. The sort is actually defined in my case by columns that are also given in the data frame (not included in the example above) and so a solution that re-sorts will be acceptable if the information can not be pulled out directly.
我想要一个列表 ["Foo","Bar"] ,其中的顺序由原始数据框的顺序保证。然后我可以使用这个列表来提取适当的行。在我的情况下,排序实际上是由数据框中给出的列(不包括在上面的示例中)定义的,因此如果无法直接提取信息,重新排序的解决方案将是可以接受的。
回答by Andy Hayden
As mentioned in the comments, you can use unique on the column which will preserve the order (unlike numpy's unique, it doesn't sort):
正如评论中提到的,您可以在列上使用 unique 来保留顺序(与 numpy 的 unique 不同,它不排序):
In [11]: df
Out[11]:
0 1
0 Foo 1
1 Foo 2
2 Bar 2
3 Bar 1
In [12]: df[0].unique()
Out[12]: array(['Foo', 'Bar'], dtype=object)
Then you can access the relevant rows using groupby's get_group:
然后您可以使用 groupby's 访问相关行get_group:
In [13]: g = df.groupby([0])
In [14]: g.get_group('Foo')
Out[14]:
0 1
0 Foo 1
1 Foo 2

