Python 试图合并 2 个数据帧但得到 ValueError

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50649853/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:34:24  来源:igfitidea点击:

Trying to merge 2 dataframes but get ValueError

pythonpandasdataframe

提问by PEREZje

These are my two dataframes saved in two variables:

这是我保存在两个变量中的两个数据帧:

> print(df.head())
>
          club_name  tr_jan  tr_dec  year
    0  ADO Den Haag    1368    1422  2010
    1  ADO Den Haag    1455    1477  2011
    2  ADO Den Haag    1461    1443  2012
    3  ADO Den Haag    1437    1383  2013
    4  ADO Den Haag    1386    1422  2014
> print(rankingdf.head())
>
           club_name  ranking  year
    0    ADO Den Haag    12    2010
    1    ADO Den Haag    13    2011
    2    ADO Den Haag    11    2012
    3    ADO Den Haag    14    2013
    4    ADO Den Haag    17    2014

I'm trying to merge these two using this code:

我正在尝试使用以下代码合并这两个:

new_df = df.merge(ranking_df, on=['club_name', 'year'], how='left')

The how='left' is added because I have less datapoints in my ranking_df than in my standard df.

添加 how='left' 是因为我的 rating_df 中的数据点少于标准 df 中的数据点。

The expected behaviour is as such:

预期的行为是这样的:

> print(new_df.head()) 
> 

      club_name  tr_jan  tr_dec  year    ranking
0  ADO Den Haag    1368    1422  2010    12
1  ADO Den Haag    1455    1477  2011    13
2  ADO Den Haag    1461    1443  2012    11
3  ADO Den Haag    1437    1383  2013    14
4  ADO Den Haag    1386    1422  2014    17

But I get this error:

但我收到此错误:

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

ValueError: 您正在尝试合并 object 和 int64 列。如果你想继续,你应该使用 pd.concat

But I do not wish to use concat since I want to merge the trees not just add them on.

但我不想使用 concat,因为我想合并树而不仅仅是添加它们。

Another behaviour that's weird in my mind is that my code works if I save the first df to .csv and then load that .csv into a dataframe.

在我看来,另一个奇怪的行为是,如果我将第一个 df 保存到 .csv,然后将该 .csv 加载到数据帧中,我的代码就可以工作。

The code for that:

代码:

df = pd.DataFrame(data_points, columns=['club_name', 'tr_jan', 'tr_dec', 'year'])
df.to_csv('preliminary.csv')

df = pd.read_csv('preliminary.csv', index_col=0)

ranking_df = pd.DataFrame(rankings, columns=['club_name', 'ranking', 'year'])

new_df = df.merge(ranking_df, on=['club_name', 'year'], how='left')

I think that it has to do with the index_col=0 parameter. But I have no idea to fix it without having to save it, it doesn't matter much but is kind of an annoyance that I have to do that.

我认为这与 index_col=0 参数有关。但是我不知道在不必保存它的情况下修复它,这无关紧要,但我必须这样做有点烦恼。

回答by Arnon Rotem-Gal-Oz

In one of your dataframes the year is a string and the other it is an int64 you can convert it first and then join (e.g. df['year']=df['year'].astype(int)or as RafaelC suggested df.year.astype(int))

在您的一个数据框中,年份是一个字符串,另一个是一个 int64,您可以先将其转换然后加入(例如df['year']=df['year'].astype(int)或如 RafaelC 建议的那样df.year.astype(int)

回答by Alex Moore-Niemi

I found that my dfs both had the same type column (str) but switching from jointo mergesolved the issue.

我发现我的 dfs 都有相同类型的列 ( str) 但从 切换joinmerge解决了这个问题。

回答by Ashish Anand

It happens when common column in both table are of different data type.

当两个表中的公共列具有不同的数据类型时,就会发生这种情况。

Example: In table1, you have dateas string whereas in table2 you have dateas datetime. so before merging,we need to change dateto common data type.

示例:在 table1 中,您将日期作为字符串,而在 table2 中,您将日期作为日期时间。所以在合并之前,我们需要将日期更改为通用数据类型。

回答by CathyQian

Additional: when you save df to .csv format, the datetime (year in this specific case) is saved as object, so you need to convert it into integer (year in this specific case) when you do the merge. That is why when you upload both df from csv files, you can do the merge easily, while above error will show up if one df is uploaded from csv files and the other is from an existing df. This is somewhat annoying, but have an easy solution if kept in mind.

附加:当您将 df 保存为 .csv 格式时,日期时间(在此特定情况下为年份)将另存为对象,因此您需要在进行合并时将其转换为整数(在此特定情况下为年份)。这就是为什么当您从 csv 文件上传两个 df 时,您可以轻松进行合并,而如果一个 df 从 csv 文件上传而另一个来自现有 df,则会显示上述错误。这有点烦人,但如果记住,有一个简单的解决方案。

回答by escha

@Arnon Rotem-Gal-Oz answer is right for the most part. But I would like to point out the difference between df['year']=df['year'].astype(int)and df.year.astype(int). df.year.astype(int)returns a view of the dataframe and doesn't not explicitly change the type, atleast in pandas 0.24.2. df['year']=df['year'].astype(int)explicitly change the type because it's an assignment. I would argue that this is the safest way to permanently change the dtype of a column.

@Arnon Rotem-Gal-Oz 的回答大部分是正确的。但我想指出df['year']=df['year'].astype(int)和之间的区别df.year.astype(int)df.year.astype(int)返回数据框的视图并且没有显式更改类型,至少在 Pandas 0.24.2 中是这样。df['year']=df['year'].astype(int)显式更改类型,因为它是一个赋值。我认为这是永久更改列的 dtype 的最安全方法。

Example:

例子:

df = pd.DataFrame({'Weed': ['green crack', 'northern lights', 'girl scout cookies'], 'Qty':[10,15,3]}) df.dtypes

df = pd.DataFrame({'Weed': ['green crack', 'northern lights', 'girl scout cookies'], 'Qty':[10,15,3]}) df.dtypes

Weed object, Qty int64

杂草对象,数量 int64

df['Qty'].astype(str) df.dtypes

df['Qty'].astype(str) df.dtypes

Weed object, Qty int64

杂草对象,数量 int64

Even setting the inplace arg to True doesn't help at times. I don't know why this happens though. In most cases inplace=True equals an explicit assignment.

即使将 inplace arg 设置为 True 有时也无济于事。我不知道为什么会发生这种情况。在大多数情况下, inplace=True 等于显式赋值。

df['Qty'].astype(str, inplace = True) df.dtypes

df['Qty'].astype(str, inplace = True) df.dtypes

Weed object, Qty int64

杂草对象,数量 int64

Now the assignment,

现在的任务,

df['Qty'] = df['Qty'].astype(str) df.dtypes

df['Qty'] = df['Qty'].astype(str) df.dtypes

Weed object, Qty object

杂草对象,数量对象

回答by Ega Dharmawan

this simple solution works for me

这个简单的解决方案对我有用

    final = pd.concat([df, rankingdf], axis=1, sort=False)

but you may need to drop some duplicate column first.

但您可能需要先删除一些重复的列。

回答by Erol Erdogan

At first check the type of columns which you want to merge. You will see one of them is string where other one is int. Then convert it to int as following code:

首先检查要合并的列的类型。您会看到其中一个是 string 而另一个是int。然后将其转换为 int 如下代码:

df.["something"] = df["something"].astype(int)

merged = df.merge[df1, on="something"]