pandas 在 Python 中循环遍历数据帧的更优雅方式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/54697342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
More elegant way to loop through a dataframe in Python
提问by jingz
For one iterable, we can loop through using
对于一个可迭代对象,我们可以循环使用
for item in items:
But what if I have two iterables side by side, think about a pandas dataframe with 2 columns for example. I can use the above approach to loop through one column, but is there a more elegant way to loop through both columns at the same time?
但是,如果我并排有两个可迭代对象,例如,考虑一个带有 2 列的 Pandas 数据框。我可以使用上述方法循环遍历一列,但是有没有更优雅的方法同时循环遍历两列?
import pandas as pd
df = pd.DataFrame({'col 1': [1,2,3,4,5], 'col 2': [6,7,8,9,10]})
i = 0
for j in df['col 1']:
print(j)
print(df['col 2'][i])
i += 1
Thanks!
谢谢!
回答by Konstantinos Charalampous
You can iterate through entire rows which is more elegant:
您可以遍历更优雅的整行:
for index, row in df.iterrows():
print(row['col 1'], row['col 2'])
回答by LetEpsilonBeLessThanZero
You've already gotten some great answers to your question. However, I would also like to provide you with a different approach altogether which could be even more elegant (depending on what your end goal is).
你的问题已经得到了一些很好的答案。但是,我还想为您提供一种完全不同的方法,它可能会更加优雅(取决于您的最终目标是什么)。
As a general rule of thumb, you want to avoid looping through the rows of a dataframe. That tends to be slow and there's usually a better way. Try to shift your thinking into applying a function to entire "vector" (fancy word for dataframe column).
作为一般经验法则,您希望避免遍历数据帧的行。这往往很慢,通常有更好的方法。尝试将您的想法转变为将函数应用于整个“向量”(数据框列的花哨词)。
Check this out:
看一下这个:
import pandas as pd
import numpy as np
df = pd.DataFrame({'col 1': [1,2,3,4,5], 'col 2': [6,7,8,9,10]})
def sum_2_cols(col1,col2):
return col1 + col2
df['new_col'] = np.vectorize(sum_2_cols)(df['col 1'], df['col 2'])
The np.vectorize
method is very powerful, flexible, and fast. It allows you to apply your own functions to a dataframe and it tends to perform very well. Try it out, you might get inspired to go about solving your problem in a different way.
该np.vectorize
方法非常强大、灵活且快速。它允许您将自己的函数应用于数据帧,并且它的性能往往非常好。尝试一下,您可能会受到启发,以不同的方式解决您的问题。
回答by Daniel Labbe
Use the DataFrame.itertuples()method to loop through both columns at the same time:
使用DataFrame.itertuples()方法同时循环遍历两列:
for i, j in df[['col 1', 'col 2']].itertuples(index=False):
print(i)
print(j)
回答by Rodolfo Don? Hosp
the zip
built-in function creates iterables that aggregates whatever you pass as parameters, so this should be an alternative:
在zip
内置函数创建iterables,无论你作为参数传递聚集,所以这应该是一种替代方案:
import pandas as pd
df = pd.DataFrame({'col 1': [1,2,3,4,5], 'col 2': [6,7,8,9,10]})
for i,j in zip(df['col 1'], df['col 2']):
print(i)
print(j)
Output:
输出:
1
6
2
7
3
8
4
9
5
10