Python iterrows熊猫获取下一行值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23151246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
iterrows pandas get next rows value
提问by Ayrat
I have a df in pandas
我在熊猫中有一个 df
import pandas as pd
df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])
I want to iterate over rows in df. For each row i want rows value and next row
s value
Something like(it does not work):
我想遍历 df 中的行。对于每一行,我想要 row s value and next row
s value 类似的东西(它不起作用):
for i, row in df.iterrows():
print row['value']
i1, row1 = next(df.iterrows())
print row1['value']
As a result I want
结果我想要
'AA'
'BB'
'BB'
'CC'
'CC'
*Wrong index error here
At this point i have mess way to solve this
在这一点上,我有解决这个问题的混乱方法
for i in range(0, df.shape[0])
print df.irow(i)['value']
print df.irow(i+1)['value']
Is there more efficient way to solve this issue?
有没有更有效的方法来解决这个问题?
采纳答案by alisdt
Firstly, your "messy way" is ok, there's nothing wrong with using indices into the dataframe, and this will not be too slow. iterrows() itself isn't terribly fast.
首先,你的“乱七八糟的方式”是可以的,在数据帧中使用索引没有任何问题,而且这不会太慢。iterrows() 本身并不是非常快。
A version of your first idea that would work would be:
您的第一个想法可行的版本是:
row_iterator = df.iterrows()
_, last = row_iterator.next() # take first item from row_iterator
for i, row in row_iterator:
print(row['value'])
print(last['value'])
last = row
The second method could do something similar, to save one index into the dataframe:
第二种方法可以做类似的事情,将一个索引保存到数据帧中:
last = df.irow(0)
for i in range(1, df.shape[0]):
print(last)
print(df.irow(i))
last = df.irow(i)
When speed is critical you can always try both and time the code.
当速度至关重要时,您可以随时尝试两者并为代码计时。
回答by Acorbe
This can be solved also by izip
ping the dataframe (iterator) with an offset version of itself.
这也可以通过izip
使用自身的偏移版本 ping 数据帧(迭代器)来解决。
Of course the indexing error cannot be reproduced this way.
当然,索引错误不能以这种方式重现。
Check this out
看一下这个
import pandas as pd
from itertools import izip
df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])
for id1, id2 in izip(df.iterrows(),df.ix[1:].iterrows()):
print id1[1]['value']
print id2[1]['value']
which gives
这使
AA
BB
BB
CC
回答by HYRY
There is a pairwise()
function example in the itertools
document:
文档中有一个pairwise()
函数示例itertools
:
from itertools import tee, izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
import pandas as pd
df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])
for (i1, row1), (i2, row2) in pairwise(df.iterrows()):
print i1, i2, row1["value"], row2["value"]
Here is the output:
这是输出:
0 1 AA BB
1 2 BB CC
But, I think iter rows in a DataFrame
is slow, if you can explain what's the problem you want to solve, maybe I can suggest some better method.
但是,我认为 a 中的 iter 行DataFrame
很慢,如果您能解释一下要解决的问题,也许我可以建议一些更好的方法。
回答by Anna K.
I would use shift() function as follows:
我会使用 shift() 函数如下:
df['value_1'] = df.value.shift(-1)
[print(x) for x in df.T.unstack().dropna(how = 'any').values];
which produces
产生
AA
BB
BB
CC
CC
This is how the code above works:
这是上面代码的工作原理:
Step 1) Use shift function
步骤 1) 使用移位功能
df['value_1'] = df.value.shift(-1)
print(df)
produces
产生
value value_1
0 AA BB
1 BB CC
2 CC NaN
step 2) Transpose:
步骤2)转置:
df = df.T
print(df)
produces:
产生:
0 1 2
value AA BB CC
value_1 BB CC NaN
Step 3) Unstack:
步骤 3) 拆垛:
df = df.unstack()
print(df)
produces:
产生:
0 value AA
value_1 BB
1 value BB
value_1 CC
2 value CC
value_1 NaN
dtype: object
Step 4) Drop NaN values
步骤 4)删除 NaN 值
df = df.dropna(how = 'any')
print(df)
produces:
产生:
0 value AA
value_1 BB
1 value BB
value_1 CC
2 value CC
dtype: object
Step 5) Return a Numpy representation of the DataFrame, and print value by value:
步骤 5) 返回 DataFrame 的 Numpy 表示,并按值打印值:
df = df.values
[print(x) for x in df];
produces:
产生:
AA
BB
BB
CC
CC
回答by R.V
a combination of answers gave me a very fast running time. using the shiftmethod to create new column of next row values, then using the row_iteratorfunction as @alisdt did, but here i changed it from iterrowsto itertupleswhich is 100 times faster.
答案的组合给了我一个非常快的运行时间。使用shift方法创建下一行值的新列,然后 像@alisdt 一样使用row_iterator函数,但在这里我将其从iterrows更改为itertuples,速度提高了 100 倍。
my script is for iterating dataframe of duplications in different length and add one second for each duplication so they all be unique.
我的脚本用于迭代不同长度的重复数据帧,并为每个重复增加一秒,使它们都是唯一的。
# create new column with shifted values from the departure time column
df['next_column_value'] = df['column_value'].shift(1)
# create row iterator that can 'save' the next row without running for loop
row_iterator = df.itertuples()
# jump to the next row using the row iterator
last = next(row_iterator)
# because pandas does not support items alteration i need to save it as an object
t = last[your_column_num]
# run and update the time duplications with one more second each
for row in row_iterator:
if row.column_value == row.next_column_value:
t = t + add_sec
df_result.at[row.Index, 'column_name'] = t
else:
# here i resetting the 'last' and 't' values
last = row
t = last[your_column_num]
Hope it will help.
希望它会有所帮助。