python pandas column dtype=object 导致合并失败:DtypeWarning: Columns have mixed types

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44639772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:50:05  来源:igfitidea点击:

python pandas column dtype=object causing merge to fail with: DtypeWarning: Columns have mixed types

pythonpandasmergetype-conversion

提问by jeangelj

I am trying to merge two dataframes df1, df2on Customer_IDcolumn. It seems that Customer_IDhas the same data type (object) in both.

我正在尝试合并列df1, df2上的两个数据框Customer_ID。似乎两者Customer_ID具有相同的数据类型 ( object)。

df1:

df1:

Customer_ID |  Flag
12345           A

df2:

df2:

Customer_ID | Transaction_Value
12345           258478

When I merge the two tables:

当我合并两个表时:

new_df = df2.merge(df1, on='Customer_ID', how='left')

For some Customer_IDs it worked and for others it didn't. FOr this example, I would get this result:

对于某些 Customer_ID,它有效,而对于其他 Customer_ID 则无效。对于这个例子,我会得到这个结果:

Customer_ID | Transaction_Value | Flag
    12345           258478         NaN

I checked the data types and they are the same:

我检查了数据类型,它们是相同的:

df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 873353 entries, 0 to 873352
Data columns (total 2 columns):
Customer_ID    873353 non-null object
Flag      873353 non-null object
dtypes: object(2)
memory usage: 20.0+ MB

df2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 873353 entries, 0 to 873352
Data columns (total 2 columns):
Customer_ID    873353 non-null object
Transaction_Value      873353 int64
dtypes: object(2)
memory usage: 20.0+ MB

When I uploaded df1, I did get this message:

当我上传 df1 时,我确实收到了这条消息:

C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py:2717: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

When I wanted to check, if a customer ID exists, I realized that I have to specify it differently in the two dataframes.

当我想检查客户 ID 是否存在时,我意识到我必须在两个数据框中以不同的方式指定它。

df1.loc[df1['Customer_ID'] == 12345]

df2.loc[df2['Customer_ID'] == '12345']

回答by piRSquared

Customer_IDis of dtype==objectin both cases... But that doesn't mean that the individual elements are the same type. You need to make both stror int

Customer_IDdtype==object在这两种情况下......但是,这并不意味着单个元素都是同一类型。你需要使两者strint



Using int

使用 int

dtype = dict(Customer_ID=int)

df1.astype(dtype).merge(df2.astype(dtype), 'left')

   Customer_ID Flag  Transaction_Value
0        12345    A             258478


Using str

使用 str

dtype = dict(Customer_ID=str)

df1.astype(dtype).merge(df2.astype(dtype), 'left')

   Customer_ID Flag  Transaction_Value
0        12345    A             258478

回答by Ega Dharmawan

I think i have found the easiest way to merge between two data frame without changing the dtypes.

我想我已经找到了在不更改 dtypes 的情况下合并两个数据框的最简单方法。

    final = pd.concat([df1, df2], axis=1, sort=False)

Hope it helps :)

希望能帮助到你 :)