Python iterrows熊猫获取下一行值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23151246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:20:47  来源:igfitidea点击:

iterrows pandas get next rows value

pythonpandasnext

提问by Ayrat

I have a df in pandas

我在熊猫中有一个 df

import pandas as pd
df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])

I want to iterate over rows in df. For each row i want rows value and next rows value Something like(it does not work):

我想遍历 df 中的行。对于每一行,我想要 row s value and next rows value 类似的东西(它不起作用):

for i, row in df.iterrows():
     print row['value']
     i1, row1 = next(df.iterrows())
     print row1['value']

As a result I want

结果我想要

'AA'
'BB'
'BB'
'CC'
'CC'
*Wrong index error here  

At this point i have mess way to solve this

在这一点上,我有解决这个问题的混乱方法

for i in range(0, df.shape[0])
   print df.irow(i)['value']
   print df.irow(i+1)['value']

Is there more efficient way to solve this issue?

有没有更有效的方法来解决这个问题?

采纳答案by alisdt

Firstly, your "messy way" is ok, there's nothing wrong with using indices into the dataframe, and this will not be too slow. iterrows() itself isn't terribly fast.

首先,你的“乱七八糟的方式”是可以的,在数据​​帧中使用索引没有任何问题,而且这不会太慢。iterrows() 本身并不是非常快。

A version of your first idea that would work would be:

您的第一个想法可行的版本是:

row_iterator = df.iterrows()
_, last = row_iterator.next()  # take first item from row_iterator
for i, row in row_iterator:
    print(row['value'])
    print(last['value'])
    last = row

The second method could do something similar, to save one index into the dataframe:

第二种方法可以做类似的事情,将一个索引保存到数据帧中:

last = df.irow(0)
for i in range(1, df.shape[0]):
    print(last)
    print(df.irow(i))
    last = df.irow(i)

When speed is critical you can always try both and time the code.

当速度至关重要时,您可以随时尝试两者并为代码计时。

回答by Acorbe

This can be solved also by izipping the dataframe (iterator) with an offset version of itself.

这也可以通过izip使用自身的偏移版本 ping 数据帧(迭代器)来解决。

Of course the indexing error cannot be reproduced this way.

当然,索引错误不能以这种方式重现。

Check this out

看一下这个

import pandas as pd
from itertools import izip

df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])   

for id1, id2 in izip(df.iterrows(),df.ix[1:].iterrows()):
    print id1[1]['value']
    print id2[1]['value']

which gives

这使

AA
BB
BB
CC

回答by HYRY

There is a pairwise()function example in the itertoolsdocument:

文档中有一个pairwise()函数示例itertools

from itertools import tee, izip
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

import pandas as pd
df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])

for (i1, row1), (i2, row2) in pairwise(df.iterrows()):
    print i1, i2, row1["value"], row2["value"]

Here is the output:

这是输出:

0 1 AA BB
1 2 BB CC

But, I think iter rows in a DataFrameis slow, if you can explain what's the problem you want to solve, maybe I can suggest some better method.

但是,我认为 a 中的 iter 行DataFrame很慢,如果您能解释一下要解决的问题,也许我可以建议一些更好的方法。

回答by Anna K.

I would use shift() function as follows:

我会使用 shift() 函数如下:

df['value_1'] = df.value.shift(-1)
[print(x) for x in df.T.unstack().dropna(how = 'any').values];

which produces

产生

AA
BB
BB
CC
CC

This is how the code above works:

这是上面代码的工作原理:

Step 1) Use shift function

步骤 1) 使用移位功能

df['value_1'] = df.value.shift(-1)
print(df)

produces

产生

value value_1
0    AA      BB
1    BB      CC
2    CC     NaN

step 2) Transpose:

步骤2)转置:

df = df.T
print(df)

produces:

产生:

          0   1    2
value    AA  BB   CC
value_1  BB  CC  NaN

Step 3) Unstack:

步骤 3) 拆垛:

df = df.unstack()
print(df)

produces:

产生:

0  value       AA
   value_1     BB
1  value       BB
   value_1     CC
2  value       CC
   value_1    NaN
dtype: object

Step 4) Drop NaN values

步骤 4)删除 NaN 值

df = df.dropna(how = 'any')
print(df)

produces:

产生:

0  value      AA
   value_1    BB
1  value      BB
   value_1    CC
2  value      CC
dtype: object

Step 5) Return a Numpy representation of the DataFrame, and print value by value:

步骤 5) 返回 DataFrame 的 Numpy 表示,并按值打印值:

df = df.values
[print(x) for x in df];

produces:

产生:

AA
BB
BB
CC
CC

回答by R.V

a combination of answers gave me a very fast running time. using the shiftmethod to create new column of next row values, then using the row_iteratorfunction as @alisdt did, but here i changed it from iterrowsto itertupleswhich is 100 times faster.

答案的组合给了我一个非常快的运行时间。使用shift方法创建下一行值的新列,然后 像@alisdt 一样使用row_iterator函数,但在这里我将其从iterrows更改为itertuples,速度提高了 100 倍。

my script is for iterating dataframe of duplications in different length and add one second for each duplication so they all be unique.

我的脚本用于迭代不同长度的重复数据帧,并为每个重复增加一秒,使它们都是唯一的。

# create new column with shifted values from the departure time column
df['next_column_value'] = df['column_value'].shift(1)
# create row iterator that can 'save' the next row without running for loop
row_iterator = df.itertuples()
# jump to the next row using the row iterator
last = next(row_iterator)
# because pandas does not support items alteration i need to save it as an object
t = last[your_column_num]
# run and update the time duplications with one more second each
for row in row_iterator:
    if row.column_value == row.next_column_value:
         t = t + add_sec
         df_result.at[row.Index, 'column_name'] = t
    else:
         # here i resetting the 'last' and 't' values
         last = row
         t = last[your_column_num]

Hope it will help.

希望它会有所帮助。