在 for 循环中附加 Pandas 数据帧会导致 ValueError

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38040825/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:27:56  来源:igfitidea点击:

Appending Pandas dataframes in for loop results in ValueError

pythonpython-3.xpandasdataframe

提问by user1718097

I want to generate a dataframe that is made up of separate dataframes generated in a for loop. Each individual dataframe consists of a name column, a range of integers and a column identify a category to which the integer belongs (e.g. quintile 1 to 5). If I generate each dataframe individually and then append one to the other to create a 'master' dataframe then there are no problems. However, when I use a loop to create each individual dataframe (as I will need to do in my real life situation) then trying to append a dataframe to the master dataframe results in:

我想生成一个由 for 循环中生成的单独数据帧组成的数据帧。每个单独的数据框由名称列、整数范围和标识整数所属类别的列组成(例如五分位数 1 到 5)。如果我单独生成每个数据帧,然后将一个附加到另一个以创建“主”数据帧,那么就没有问题。但是,当我使用循环创建每个单独的数据帧时(正如我在现实生活中需要做的那样),然后尝试将数据帧附加到主数据帧会导致:

ValueError: incompatible categories in categorical concat

I've written a simplified loop to illustrate:

我写了一个简化的循环来说明:

import numpy as np
import pandas as pd

# Define column names
colNames = ('a','b','c')

# Define a dataframe with the required column names
masterDF = pd.DataFrame(columns = colNames)

# A list of the group names
names = ['Group1','Group2','Group3']

# Create a dataframe for each group
for i in names:
    tempDF = pd.DataFrame(columns = colNames)
    tempDF['a'] = np.arange(1,11,1)
    tempDF['b'] = i
    tempDF['c'] = pd.cut(np.arange(1,11,1),
                        bins = np.linspace(0,10,6),
                        labels = [1,2,3,4,5])
    print(tempDF)
    print('\n')

    # Try to append temporary DF to master DF
    masterDF = masterDF.append(tempDF,ignore_index=True)

print(masterDF)

I would expect a dataframe that looked like:

我希望数据框看起来像:

     a       b  c
 0   1  Group1  1
 1   2  Group1  1
 2   3  Group1  2
 3   4  Group1  2
 4   5  Group1  3
 5   6  Group1  3
 6   7  Group1  4
 7   8  Group1  4
 8   9  Group1  5
 9  10  Group1  5
10  11  Group2  1
11  12  Group2  1
12  13  Group2  2
13  14  Group2  2
...
28  29  Group3  5
29  30  Group3  5

It seems that a partial solution can be obtained by typecasting the categories as they are added to the tempDF as follows:

似乎可以通过对添加到 tempDF 的类别进行类型转换来获得部分解决方案,如下所示:

tempDF['c'] = pd.cut(np.arange(1,11,1),
                     bins = np.linspace(0,10,6),
                     labels = [1,2,3,4,5]).astype('int')

However, in this case, the categories (column 'c') are now displayed as 1.0, 2.0, etc. rather than 1, 2, etc. so is not ideal.

但是,在这种情况下,类别(“c”列)现在显示为 1.0、2.0 等而不是 1、2 等,因此并不理想。

Can anyone please explain why this happens and suggest a more satisfactory solution.

任何人都可以解释为什么会发生这种情况并提出更令人满意的解决方案。

采纳答案by jezrael

You can first append all DataFramesto list dfsand then concat:

您可以先将所有内容附加DataFrames到列表中dfs,然后concat

dfs = []
# Create a dataframe for each group
for i in names:
    tempDF = pd.DataFrame(columns = colNames)
    tempDF['a'] = np.arange(1,11,1)
    tempDF['b'] = i
    tempDF['c'] = pd.cut(np.arange(1,11,1),
                        bins = np.linspace(0,10,6),
                        labels = [1,2,3,4,5])
    print(tempDF)
    print('\n')

    # Try to append temporary DF to master DF
    dfs.append(tempDF)

masterDF = pd.concat(dfs, ignore_index=True)
print(masterDF)
     a       b  c
0    1  Group1  1
1    2  Group1  1
2    3  Group1  2
3    4  Group1  2
4    5  Group1  3
5    6  Group1  3
6    7  Group1  4
7    8  Group1  4
8    9  Group1  5
9   10  Group1  5
10   1  Group2  1
11   2  Group2  1
12   3  Group2  2
13   4  Group2  2
14   5  Group2  3
15   6  Group2  3
16   7  Group2  4
17   8  Group2  4
18   9  Group2  5
19  10  Group2  5
20   1  Group3  1
21   2  Group3  1
22   3  Group3  2
23   4  Group3  2
24   5  Group3  3
25   6  Group3  3
26   7  Group3  4
27   8  Group3  4
28   9  Group3  5
29  10  Group3  5