Python 如何遍历 Pandas 中 DataFrame 中的行？

Question

提问by Roman

I have a DataFramefrom pandas:

我有一个DataFrame来自熊猫的：

import pandas as pd
inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
df = pd.DataFrame(inp)
print df

Output:

输出：

Now I want to iterate over the rows of this frame. For every row I want to be able to access its elements (values in cells) by the name of the columns. For example:

现在我想遍历这个框架的行。对于每一行，我希望能够通过列名访问其元素（单元格中的值）。例如：

for row in df.rows:
   print row['c1'], row['c2']

Is it possible to do that in pandas?

有可能在熊猫中做到这一点吗？

I found this similar question. But it does not give me the answer I need. For example, it is suggested there to use:

我发现了这个类似的问题。但它没有给我我需要的答案。例如，建议在那里使用：

for date, row in df.T.iteritems():

or

或者

for row in df.iterrows():

But I do not understand what the rowobject is and how I can work with it.

但我不明白这个row对象是什么以及我如何使用它。

Answer 1

采纳答案by waitingkuo

DataFrame.iterrowsis a generator which yield both index and row

DataFrame.iterrows是一个生成索引和行的生成器

import pandas as pd
import numpy as np

df = pd.DataFrame([{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}])

for index, row in df.iterrows():
    print(row['c1'], row['c2'])

Output: 
   10 100
   11 110
   12 120

Answer 2

回答by Wes McKinney

You should use df.iterrows(). Though iterating row-by-row is not especially efficient since Seriesobjects have to be created.

你应该使用df.iterrows(). 尽管逐行迭代并不是特别有效，因为Series必须创建对象。

Answer 3

回答by cheekybastard

You can also use df.apply()to iterate over rows and access multiple columns for a function.

您还可以df.apply()用于遍历行并访问函数的多列。

docs: DataFrame.apply()

文档：DataFrame.apply()

def valuation_formula(x, y):
    return x * y * 0.5

df['price'] = df.apply(lambda row: valuation_formula(row['x'], row['y']), axis=1)

Answer 4

回答by e9t

While iterrows()is a good option, sometimes itertuples()can be much faster:

虽然iterrows()是一个不错的选择，但有时itertuples()可以更快：

df = pd.DataFrame({'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000)), 'x': 'x'})

%timeit [row.a * 2 for idx, row in df.iterrows()]
# => 10 loops, best of 3: 50.3 ms per loop

%timeit [row[1] * 2 for row in df.itertuples()]
# => 1000 loops, best of 3: 541 μs per loop

Answer 5

回答by PJay

You can use the df.iloc function as follows:

您可以按如下方式使用 df.iloc 函数：

for i in range(0, len(df)):
    print df.iloc[i]['c1'], df.iloc[i]['c2']

Answer 6

回答by viddik13

First consider if you really need to iterateover rows in a DataFrame. See this answerfor alternatives.

首先考虑是否真的需要遍历DataFrame 中的行。有关替代方案，请参阅此答案。

If you still need to iterate over rows, you can use methods below. Note some important caveatswhich are not mentioned in any of the other answers.

如果您仍然需要遍历行，您可以使用下面的方法。请注意其他任何答案中未提及的一些 重要警告。

DataFrame.iterrows()

for index, row in df.iterrows():
    print(row["c1"], row["c2"])

DataFrame.itertuples()

for row in df.itertuples(index=True, name='Pandas'):
    print(row.c1, row.c2)

DataFrame.iterrows()

for index, row in df.iterrows():
    print(row["c1"], row["c2"])

DataFrame.itertuples()

for row in df.itertuples(index=True, name='Pandas'):
    print(row.c1, row.c2)

itertuples()is supposed to be faster than iterrows()

itertuples()应该比 iterrows()

But be aware, according to the docs (pandas 0.24.2 at the moment):

但请注意，根据文档（目前为熊猫 0.24.2）：

iterrows: dtypemight not match from row to row
Because iterrows returns a Series for each row, it does not preservedtypes across the rows (dtypes are preserved across columns for DataFrames). To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally much faster than iterrows()
iterrows: Do not modify rows
You should never modifysomething you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
Use DataFrame.apply()instead:
```
new_df = df.apply(lambda x: x * 2)
```
itertuples:
The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. With a large number of columns (>255), regular tuples are returned.

iterrows：dtype行与行之间可能不匹配
因为 iterrows 为每一行返回一个 Series，所以它不会跨行保留dtypes（跨列保留 DataFrames 的 dtypes）。为了在迭代行时保留 dtypes，最好使用 itertuples()，它返回值的命名元组，并且通常比 iterrows() 快得多
iterrows：不修改行
你永远不应该修改你正在迭代的东西。这不能保证在所有情况下都有效。根据数据类型，迭代器返回一个副本而不是一个视图，写入它没有任何效果。
使用DataFrame.apply()代替：
```
new_df = df.apply(lambda x: x * 2)
```
迭代：
如果列名是无效的 Python 标识符、重复或以下划线开头，则它们将重命名为位置名称。对于大量列 (>255)，将返回常规元组。

See pandas docs on iterationfor more details.

有关更多详细信息，请参阅有关迭代的 pandas 文档。

Answer 7

回答by CONvid19

To loop all rows in a dataframeyou can use:

要循环 a 中的所有行，dataframe您可以使用：

for x in range(len(date_example.index)):
    print date_example['Date'].iloc[x]

Answer 8

回答by Grag2015

 for ind in df.index:
     print df['c1'][ind], df['c2'][ind]

Answer 9

回答by piRSquared

You can write your own iterator that implements namedtuple

您可以编写自己的迭代器来实现 namedtuple

from collections import namedtuple

def myiter(d, cols=None):
    if cols is None:
        v = d.values.tolist()
        cols = d.columns.values.tolist()
    else:
        j = [d.columns.get_loc(c) for c in cols]
        v = d.values[:, j].tolist()

    n = namedtuple('MyTuple', cols)

    for line in iter(v):
        yield n(*line)

This is directly comparable to pd.DataFrame.itertuples. I'm aiming at performing the same task with more efficiency.

这与pd.DataFrame.itertuples. 我的目标是以更高的效率执行相同的任务。

For the given dataframe with my function:

对于具有我的函数的给定数据框：

list(myiter(df))

[MyTuple(c1=10, c2=100), MyTuple(c1=11, c2=110), MyTuple(c1=12, c2=120)]

Or with pd.DataFrame.itertuples:

或与pd.DataFrame.itertuples：

list(df.itertuples(index=False))

[Pandas(c1=10, c2=100), Pandas(c1=11, c2=110), Pandas(c1=12, c2=120)]

A comprehensive test
We test making all columns available and subsetting the columns.

综合测试
我们测试使所有列可用并设置列子集。

def iterfullA(d):
    return list(myiter(d))

def iterfullB(d):
    return list(d.itertuples(index=False))

def itersubA(d):
    return list(myiter(d, ['col3', 'col4', 'col5', 'col6', 'col7']))

def itersubB(d):
    return list(d[['col3', 'col4', 'col5', 'col6', 'col7']].itertuples(index=False))

res = pd.DataFrame(
    index=[10, 30, 100, 300, 1000, 3000, 10000, 30000],
    columns='iterfullA iterfullB itersubA itersubB'.split(),
    dtype=float
)

for i in res.index:
    d = pd.DataFrame(np.random.randint(10, size=(i, 10))).add_prefix('col')
    for j in res.columns:
        stmt = '{}(d)'.format(j)
        setp = 'from __main__ import d, {}'.format(j)
        res.at[i, j] = timeit(stmt, setp, number=100)

res.groupby(res.columns.str[4:-1], axis=1).plot(loglog=True);

Answer 10

回答by James L.

You can also do numpyindexing for even greater speed ups. It's not really iterating but works much better than iteration for certain applications.

您还可以进行numpy索引以提高速度。对于某些应用程序，它并不是真正的迭代，但比迭代要好得多。

subset = row['c1'][0:5]
all = row['c1'][:]

You may also want to cast it to an array. These indexes/selections are supposed to act like Numpy arrays already but I ran into issues and needed to cast

您可能还想将其转换为数组。这些索引/选择应该已经像 Numpy 数组一样，但我遇到了问题，需要进行转换

np.asarray(all)
imgs[:] = cv2.resize(imgs[:], (224,224) ) #resize every image in an hdf5 file

Python 如何遍历 Pandas 中 DataFrame 中的行？

提问by Roman

采纳答案by waitingkuo

回答by Wes McKinney

回答by cheekybastard

回答by e9t

回答by PJay

回答by viddik13

回答by CONvid19

回答by Grag2015

回答by piRSquared

回答by James L.

相关推荐

最近更新

标签

Python 如何遍历 Pandas 中 DataFrame 中的行？

提问by Roman

采纳答案by waitingkuo

回答by Wes McKinney

回答by cheekybastard

回答by e9t

回答by PJay

回答by viddik13

回答by CONvid19

回答by Grag2015

回答by piRSquared

回答by James L.

相关推荐

Python yield 如何捕获 StopIteration 异常？

Python 需要使用 os.walk() 的特定文件的路径

如何通过 Python 访问 Hive？

Python - 打印制表符分隔的两字集

相关推荐

最近更新

标签