python pandas column dtype=object 导致合并失败:DtypeWarning: Columns have mixed types
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44639772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas column dtype=object causing merge to fail with: DtypeWarning: Columns have mixed types
提问by jeangelj
I am trying to merge two dataframes df1, df2
on Customer_ID
column. It seems that Customer_ID
has the same data type (object
) in both.
我正在尝试合并列df1, df2
上的两个数据框Customer_ID
。似乎两者Customer_ID
具有相同的数据类型 ( object
)。
df1:
df1:
Customer_ID | Flag
12345 A
df2:
df2:
Customer_ID | Transaction_Value
12345 258478
When I merge the two tables:
当我合并两个表时:
new_df = df2.merge(df1, on='Customer_ID', how='left')
For some Customer_IDs it worked and for others it didn't. FOr this example, I would get this result:
对于某些 Customer_ID,它有效,而对于其他 Customer_ID 则无效。对于这个例子,我会得到这个结果:
Customer_ID | Transaction_Value | Flag
12345 258478 NaN
I checked the data types and they are the same:
我检查了数据类型,它们是相同的:
df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 873353 entries, 0 to 873352
Data columns (total 2 columns):
Customer_ID 873353 non-null object
Flag 873353 non-null object
dtypes: object(2)
memory usage: 20.0+ MB
df2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 873353 entries, 0 to 873352
Data columns (total 2 columns):
Customer_ID 873353 non-null object
Transaction_Value 873353 int64
dtypes: object(2)
memory usage: 20.0+ MB
When I uploaded df1, I did get this message:
当我上传 df1 时,我确实收到了这条消息:
C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py:2717: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
When I wanted to check, if a customer ID exists, I realized that I have to specify it differently in the two dataframes.
当我想检查客户 ID 是否存在时,我意识到我必须在两个数据框中以不同的方式指定它。
df1.loc[df1['Customer_ID'] == 12345]
df2.loc[df2['Customer_ID'] == '12345']
回答by piRSquared
Customer_ID
is of dtype==object
in both cases... But that doesn't mean that the individual elements are the same type. You need to make both str
or int
Customer_ID
是dtype==object
在这两种情况下......但是,这并不意味着单个元素都是同一类型。你需要使两者str
或int
Using int
使用 int
dtype = dict(Customer_ID=int)
df1.astype(dtype).merge(df2.astype(dtype), 'left')
Customer_ID Flag Transaction_Value
0 12345 A 258478
Using str
使用 str
dtype = dict(Customer_ID=str)
df1.astype(dtype).merge(df2.astype(dtype), 'left')
Customer_ID Flag Transaction_Value
0 12345 A 258478
回答by Ega Dharmawan
I think i have found the easiest way to merge between two data frame without changing the dtypes.
我想我已经找到了在不更改 dtypes 的情况下合并两个数据框的最简单方法。
final = pd.concat([df1, df2], axis=1, sort=False)
Hope it helps :)
希望能帮助到你 :)