pandas 循环遍历熊猫数据框列表

Question

提问by Iwan Thomas

Two quick pandas questions for you.

两个简单的Pandas问题给你。

I have a list of dataframes I would like to apply a filter to.
```
countries = [us, uk, france]
for df in countries:
    df = df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')] 
```
When I run this, the df's don't change afterwards. Why is that? If I loop through the dataframes to create a new column, as below, this works fine, and changes each df in the list.
```
 for df in countries:
      df["Continent"] = "Europe"
```
As a follow up question, I noticed something strange when I created a list of dataframes for different countries. I defined the list then applied transformations to each df in the list. After I transformed these different dfs, I called the list again. I was surprised to see that the list still pointed to the unchanged dataframes, and I had to redefine the list to update the results. Could anybody shed any light on why that is?

我有一个数据框列表，我想对其应用过滤器。
```
countries = [us, uk, france]
for df in countries:
    df = df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')] 
```
当我运行它时，df 之后不会改变。这是为什么？如果我循环遍历数据框以创建一个新列，如下所示，这可以正常工作，并更改列表中的每个 df。
```
 for df in countries:
      df["Continent"] = "Europe"
```
作为后续问题，当我为不同国家/地区创建数据框列表时，我注意到了一些奇怪的事情。我定义了列表，然后对列表中的每个 df 应用了转换。在我转换了这些不同的 dfs 之后，我再次调用了列表。我很惊讶地看到列表仍然指向未更改的数据框，我不得不重新定义列表以更新结果。任何人都可以解释为什么会这样吗？

Answer 1

回答by miradulo

Taking a look at this answer, you can see that for df in countries:is equivalent to something like

看看这个答案，你可以看到它for df in countries:相当于

for idx in range(len(countries)):
    df = countries[idx]
    # do something with df

which obviously won't actually modify anything in your list. It is generally bad practice to modify a list while iterating over it in a loop like this.

这显然不会实际修改您列表中的任何内容。在像这样的循环中迭代列表时修改列表通常是不好的做法。

A better approach would be a list comprehension, you can try something like

更好的方法是列表理解，您可以尝试类似的方法

 countries = [us, uk, france]
 countries = [df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')]
              for df in countries]

Notice that with a list comprehension like this, we aren't actually modifying the original list - instead we are creating a new list, and assigning it to the variable which held our original list.

请注意，使用这样的列表推导式，我们实际上并没有修改原始列表——而是创建一个新列表，并将其分配给保存原始列表的变量。

Also, you might consider placing all of your data in a single DataFrame with an additional country column or something along those lines - Python-level loops are generally slower and a list of DataFrames is often much less convenient to work with than a single DataFrame, which can fully leverage the vectorized pandas methods.

此外，您可能会考虑将所有数据放在一个单独的 DataFrame 中，并带有一个额外的国家/地区列或类似的内容 - Python 级别的循环通常较慢，并且 DataFrame 列表通常比单个 DataFrame 更不方便，它可以充分利用矢量化的Pandas方法。

Answer 2

回答by Janet Lu

For why

为什么

for df in countries:
    df["Continent"] = "Europe"

modifies countries, while

修改国家，而

for df in countries:
    df = df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')]

does not, see why should I make a copy of a data frame in pandas. df is a reference to the actual DataFrame in countries, and not the actual DataFrame itself, but modifications to a reference affect the original DataFrame as well. Declaring a new column is a modification. However, taking a subset is not a modification. It is just changing what the reference is referring to in the original DataFrame.

没有，看看我为什么要在 pandas 中制作数据框的副本。df 是对国家/地区实际 DataFrame 的引用，而不是实际 DataFrame 本身，但对引用的修改也会影响原始 DataFrame。声明一个新列是一种修改。然而，取一个子集并不是一种修改。它只是改变了原始 DataFrame 中引用所指的内容。

pandas 循环遍历熊猫数据框列表

提问by Iwan Thomas

回答by miradulo

回答by Janet Lu

相关推荐

最近更新

标签

pandas 循环遍历熊猫数据框列表

提问by Iwan Thomas

回答by miradulo

回答by Janet Lu

相关推荐

pandas 中的 read_table 和 read_csv 有区别吗？

pandas isnull sum 与列标题

pandas 数据框中的熊猫聚合计数

pandas 根据列索引重命名 Dataframe 列

相关推荐

最近更新

标签