Python 如何迭代从 groupby().size() 生成的 Pandas 系列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38387529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:42:49  来源:igfitidea点击:

How to iterate over Pandas Series generated from groupby().size()

pythonpandas

提问by Reily Bourne

How do you iterate over a Pandas Series generated from a .groupby('...').size()command and get both the group name and count.

您如何迭代从.groupby('...').size()命令生成的 Pandas 系列并获取组名和计数。

As an example if I have:

例如,如果我有:

foo
-1     7
 0    85
 1    14
 2     5

how can I loop over them so the that each iteration I would have -1 & 7, 0 & 85, 1 & 14 and 2 & 5 in variables?

我怎样才能循环它们,以便每次迭代我都会有 -1 & 7、0 & 85、1 & 14 和 2 & 5 变量?

I tried the enumerate option but it doesn't quite work. Example:

我尝试了 enumerate 选项,但它不太好用。例子:

for i, row in enumerate(df.groupby(['foo']).size()):
    print(i, row)

it doesn't return -1, 0, 1, and 2 for ibut rather 0, 1, 2, 3.

它不返回 -1、0、1 和 2,i而是返回 0、1、2、3。

回答by Psidom

Update:

更新

Given a pandas Series:

给定一个熊猫系列:

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

s
#a    1
#b    2
#c    3
#d    4
#dtype: int64

You can directly loop through it, which yield one value from the series in each iteration:

您可以直接遍历它,在每次迭代中从系列中产生一个值:

for i in s:
    print(i)
1
2
3
4

If you want to access the index at the same time, you can use either itemsor iteritemsmethod, which produces a generator that contains both the index and value:

如果你想同时访问索引,你可以使用itemsoriteritems方法,它会生成一个包含索引和值的生成器:

for i, v in s.items():
    print('index: ', i, 'value: ', v)
#index:  a value:  1
#index:  b value:  2
#index:  c value:  3
#index:  d value:  4

for i, v in s.iteritems():
    print('index: ', i, 'value: ', v)
#index:  a value:  1
#index:  b value:  2
#index:  c value:  3
#index:  d value:  4


Old Answer:

旧答案

You can call iteritems()method on the Series:

您可以iteritems()在系列上调用方法:

for i, row in df.groupby('a').size().iteritems():
    print(i, row)

# 12 4
# 14 2

According to doc:

根据文档:

Series.iteritems()

Lazily iterate over (index, value) tuples

系列.iteitems()

懒惰地迭代(索引,值)元组

Note: This is not the same data as in the question, just a demo.

注意:这与问题中的数据不同,只是一个演示。

回答by dbouz

To expand upon the answer of Psidom, there are three useful ways to unpack data from pd.Series. Having the same Series as Psidom:

为了扩展 Psidom 的答案,有三种有用的方法可以从 pd.Series 解包数据。与 Psidom 具有相同的系列:

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

  • A direct loop over syields the valueof each row.
  • A loop over s.iteritems()or s.items()yields a tuple with the (index,value)pairs of each row.
  • Using enumerate()on s.iteritems()yields a nested tuple in the form of: (rownum,(index,value)).
  • 直接循环s产生value每一行的 。
  • 循环 s.iteritems()s.items()产生一个包含(index,value)每行对的元组。
  • 使用enumerate()on 会s.iteritems()产生以下形式的嵌套元组:(rownum,(index,value))

The last way is useful in case your index contains other information than the row number itself (e.g. in a case of a timeseries where the index is time).

如果您的索引包含除行号本身之外的其他信息(例如,在索引为时间的时间序列的情况下),则最后一种方法很有用。

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

for rownum,(indx,val) in enumerate(s.iteritems()):
    print('row number: ', rownum, 'index: ', indx, 'value: ', val)

will output:

将输出:

row number:  0 index:  a value:  1
row number:  1 index:  b value:  2
row number:  2 index:  c value:  3
row number:  3 index:  d value:  4

You can read more on unpacking nested tuples here.

您可以在此处阅读有关解包嵌套元组的更多信息