pandas 在 Python 中循环遍历数据帧的更优雅方式

Question

提问by jingz

For one iterable, we can loop through using

对于一个可迭代对象，我们可以循环使用

for item in items:

But what if I have two iterables side by side, think about a pandas dataframe with 2 columns for example. I can use the above approach to loop through one column, but is there a more elegant way to loop through both columns at the same time?

但是，如果我并排有两个可迭代对象，例如，考虑一个带有 2 列的 Pandas 数据框。我可以使用上述方法循环遍历一列，但是有没有更优雅的方法同时循环遍历两列？

import pandas as pd
df = pd.DataFrame({'col 1': [1,2,3,4,5], 'col 2': [6,7,8,9,10]})
i = 0
for j in df['col 1']:
    print(j)
    print(df['col 2'][i])
    i += 1

Thanks!

谢谢！

Answer 1

回答by Konstantinos Charalampous

You can iterate through entire rows which is more elegant:

您可以遍历更优雅的整行：

for index, row in df.iterrows():
    print(row['col 1'], row['col 2'])

Answer 2

回答by LetEpsilonBeLessThanZero

You've already gotten some great answers to your question. However, I would also like to provide you with a different approach altogether which could be even more elegant (depending on what your end goal is).

你的问题已经得到了一些很好的答案。但是，我还想为您提供一种完全不同的方法，它可能会更加优雅（取决于您的最终目标是什么）。

As a general rule of thumb, you want to avoid looping through the rows of a dataframe. That tends to be slow and there's usually a better way. Try to shift your thinking into applying a function to entire "vector" (fancy word for dataframe column).

作为一般经验法则，您希望避免遍历数据帧的行。这往往很慢，通常有更好的方法。尝试将您的想法转变为将函数应用于整个“向量”（数据框列的花哨词）。

Check this out:

看一下这个：

import pandas as pd
import numpy as np

df = pd.DataFrame({'col 1': [1,2,3,4,5], 'col 2': [6,7,8,9,10]})

def sum_2_cols(col1,col2):
    return col1 + col2

df['new_col'] = np.vectorize(sum_2_cols)(df['col 1'], df['col 2'])

The np.vectorizemethod is very powerful, flexible, and fast. It allows you to apply your own functions to a dataframe and it tends to perform very well. Try it out, you might get inspired to go about solving your problem in a different way.

该np.vectorize方法非常强大、灵活且快速。它允许您将自己的函数应用于数据帧，并且它的性能往往非常好。尝试一下，您可能会受到启发，以不同的方式解决您的问题。

Answer 3

回答by Daniel Labbe

Use the DataFrame.itertuples()method to loop through both columns at the same time:

使用DataFrame.itertuples()方法同时循环遍历两列：

for i, j in df[['col 1', 'col 2']].itertuples(index=False):
    print(i)
    print(j)

Answer 4

回答by Rodolfo Don? Hosp

the zipbuilt-in function creates iterables that aggregates whatever you pass as parameters, so this should be an alternative:

在zip内置函数创建iterables，无论你作为参数传递聚集，所以这应该是一种替代方案：

import pandas as pd
df = pd.DataFrame({'col 1': [1,2,3,4,5], 'col 2': [6,7,8,9,10]})
for i,j in zip(df['col 1'], df['col 2']):
    print(i)
    print(j)

Output:

输出：

pandas 在 Python 中循环遍历数据帧的更优雅方式

提问by jingz

回答by Konstantinos Charalampous

回答by LetEpsilonBeLessThanZero

回答by Daniel Labbe

回答by Rodolfo Don? Hosp

相关推荐

最近更新

标签

pandas 在 Python 中循环遍历数据帧的更优雅方式

提问by jingz

回答by Konstantinos Charalampous

回答by LetEpsilonBeLessThanZero

回答by Daniel Labbe

回答by Rodolfo Don? Hosp

相关推荐

Pandas 无法读取在 PySpark 中创建的镶木地板文件

pandas pd.read_hdf 抛出“无法将此数组的 WRITABLE 标志设置为 True”

pandas to_csv 不输出文件

Pandas - 检查是否有任何列是日期时间并将其更改为日期格式字符串 (yyyy-mm-dd)

相关推荐

最近更新

标签