Python iterrows熊猫获取下一行值

Question

提问by Ayrat

I have a df in pandas

我在熊猫中有一个 df

import pandas as pd
df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])

I want to iterate over rows in df. For each row i want rows value and next rows value Something like(it does not work):

我想遍历 df 中的行。对于每一行，我想要 row s value and next rows value 类似的东西（它不起作用）：

for i, row in df.iterrows():
     print row['value']
     i1, row1 = next(df.iterrows())
     print row1['value']

As a result I want

结果我想要

'AA'
'BB'
'BB'
'CC'
'CC'
*Wrong index error here

At this point i have mess way to solve this

在这一点上，我有解决这个问题的混乱方法

for i in range(0, df.shape[0])
   print df.irow(i)['value']
   print df.irow(i+1)['value']

Is there more efficient way to solve this issue?

有没有更有效的方法来解决这个问题？

Answer 1

采纳答案by alisdt

Firstly, your "messy way" is ok, there's nothing wrong with using indices into the dataframe, and this will not be too slow. iterrows() itself isn't terribly fast.

首先，你的“乱七八糟的方式”是可以的，在数据帧中使用索引没有任何问题，而且这不会太慢。iterrows() 本身并不是非常快。

A version of your first idea that would work would be:

您的第一个想法可行的版本是：

row_iterator = df.iterrows()
_, last = row_iterator.next()  # take first item from row_iterator
for i, row in row_iterator:
    print(row['value'])
    print(last['value'])
    last = row

The second method could do something similar, to save one index into the dataframe:

第二种方法可以做类似的事情，将一个索引保存到数据帧中：

last = df.irow(0)
for i in range(1, df.shape[0]):
    print(last)
    print(df.irow(i))
    last = df.irow(i)

When speed is critical you can always try both and time the code.

当速度至关重要时，您可以随时尝试两者并为代码计时。

Answer 2

回答by Acorbe

This can be solved also by izipping the dataframe (iterator) with an offset version of itself.

这也可以通过izip使用自身的偏移版本 ping 数据帧（迭代器）来解决。

Of course the indexing error cannot be reproduced this way.

当然，索引错误不能以这种方式重现。

Check this out

看一下这个

import pandas as pd
from itertools import izip

df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])   

for id1, id2 in izip(df.iterrows(),df.ix[1:].iterrows()):
    print id1[1]['value']
    print id2[1]['value']

which gives

这使

AA
BB
BB
CC

Answer 3

回答by HYRY

There is a pairwise()function example in the itertoolsdocument:

文档中有一个pairwise()函数示例itertools：

from itertools import tee, izip
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

import pandas as pd
df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])

for (i1, row1), (i2, row2) in pairwise(df.iterrows()):
    print i1, i2, row1["value"], row2["value"]

Here is the output:

这是输出：

0 1 AA BB
1 2 BB CC

But, I think iter rows in a DataFrameis slow, if you can explain what's the problem you want to solve, maybe I can suggest some better method.

但是，我认为 a 中的 iter 行DataFrame很慢，如果您能解释一下要解决的问题，也许我可以建议一些更好的方法。

Answer 4

回答by Anna K.

I would use shift() function as follows:

我会使用 shift() 函数如下：

df['value_1'] = df.value.shift(-1)
[print(x) for x in df.T.unstack().dropna(how = 'any').values];

which produces

产生

AA
BB
BB
CC
CC

This is how the code above works:

这是上面代码的工作原理：

Step 1) Use shift function

步骤 1) 使用移位功能

df['value_1'] = df.value.shift(-1)
print(df)

produces

产生

value value_1
0    AA      BB
1    BB      CC
2    CC     NaN

step 2) Transpose:

步骤2）转置：

df = df.T
print(df)

produces:

产生：

          0   1    2
value    AA  BB   CC
value_1  BB  CC  NaN

Step 3) Unstack:

步骤 3) 拆垛：

df = df.unstack()
print(df)

produces:

产生：

0  value       AA
   value_1     BB
1  value       BB
   value_1     CC
2  value       CC
   value_1    NaN
dtype: object

Step 4) Drop NaN values

步骤 4）删除 NaN 值

df = df.dropna(how = 'any')
print(df)

produces:

产生：

0  value      AA
   value_1    BB
1  value      BB
   value_1    CC
2  value      CC
dtype: object

Step 5) Return a Numpy representation of the DataFrame, and print value by value:

步骤 5) 返回 DataFrame 的 Numpy 表示，并按值打印值：

df = df.values
[print(x) for x in df];

produces:

产生：

AA
BB
BB
CC
CC

Answer 5

回答by R.V

a combination of answers gave me a very fast running time. using the shiftmethod to create new column of next row values, then using the row_iteratorfunction as @alisdt did, but here i changed it from iterrowsto itertupleswhich is 100 times faster.

答案的组合给了我一个非常快的运行时间。使用shift方法创建下一行值的新列，然后像@alisdt 一样使用row_iterator函数，但在这里我将其从iterrows更改为itertuples，速度提高了 100 倍。

my script is for iterating dataframe of duplications in different length and add one second for each duplication so they all be unique.

我的脚本用于迭代不同长度的重复数据帧，并为每个重复增加一秒，使它们都是唯一的。

# create new column with shifted values from the departure time column
df['next_column_value'] = df['column_value'].shift(1)
# create row iterator that can 'save' the next row without running for loop
row_iterator = df.itertuples()
# jump to the next row using the row iterator
last = next(row_iterator)
# because pandas does not support items alteration i need to save it as an object
t = last[your_column_num]
# run and update the time duplications with one more second each
for row in row_iterator:
    if row.column_value == row.next_column_value:
         t = t + add_sec
         df_result.at[row.Index, 'column_name'] = t
    else:
         # here i resetting the 'last' and 't' values
         last = row
         t = last[your_column_num]

Hope it will help.

希望它会有所帮助。

Python iterrows熊猫获取下一行值

提问by Ayrat

采纳答案by alisdt

回答by Acorbe

回答by HYRY

回答by Anna K.

回答by R.V

相关推荐

最近更新

标签

Python iterrows熊猫获取下一行值

提问by Ayrat

采纳答案by alisdt

回答by Acorbe

回答by HYRY

回答by Anna K.

回答by R.V

相关推荐

Python 使用 asyncio 时，如何让所有正在运行的任务在关闭事件循环之前完成

Python 导入错误：没有名为 PySide 的模块

Python PyQt：如何连接 QComboBox 以使用参数运行

Python dateutil.parser.parse 首先解析月，而不是日

相关推荐

最近更新

标签