Pandas read_sql 数据类型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46594631/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:35:39  来源:igfitidea点击:

Pandas read_sql DataTypes

pythonpandasdataframetype-conversion

提问by MattR

I have to compare two data sources to see if the same record is the same across all rows. One data source comes from an Excel File, where another comes from a SQL Table. I tried using DataFrame.equals()Like i have in the past.

我必须比较两个数据源,以查看所有行中的相同记录是否相同。一个数据源来自 Excel 文件,另一个来自 SQL 表。我试过DataFrame.equals()像我过去一样使用。

However, the issue is due to pesky datatype issues. Even though the data looksthe same, the datatypes are making excel_df.loc[excel_df['ID'] = 1].equals(sql_df.loc[sql_df['ID'] = 1])return False. Here is an example of the datatype from pd.read_excel():

但是,该问题是由于讨厌的数据类型问题造成的。即使数据看起来相同,数据类型也在excel_df.loc[excel_df['ID'] = 1].equals(sql_df.loc[sql_df['ID'] = 1])返回False。以下是来自 的数据类型示例pd.read_excel()

COLUMN ID                         int64
ANOTHER Id                      float64
SOME Date                datetime64[ns]
Another Date             datetime64[ns] 

The same columns from pd.read_sql:

来自 的相同列pd.read_sql

COLUMN ID                        float64
ANOTHER Id                       float64
SOME Date                         object
Another Date                      object

I could try using the convertersargument from pd.read_excel()to match SQL. Or also doing df['Column_Name] = df['Column_Name].astype(dtype_here)But I am dealing with a lot of columns. Is there an easier way to check for values across all columns?

我可以尝试使用converters参数 frompd.read_excel()来匹配 SQL。或者也在做df['Column_Name] = df['Column_Name].astype(dtype_here)但是我正在处理很多列。有没有更简单的方法来检查所有列的值?

checking pd.read_sql()there is no thing like convertersbut I'm looking for something like:

检查pd.read_sql()没有类似的东西,converters但我正在寻找类似的东西:

df = pd.read_sql("Select * From Foo", con, dtypes = ({Column_name: str,
                                                      Column_name2:int}))

回答by Igor Raush

How about

怎么样

excel_df = pd.read_excel(...)
sql_df = pd.read_sql(...)

# attempt to cast all columns of excel_df to the types of sql_df
excel_df.astype(sql_df.dtypes.to_dict()).equals(sql_df)

回答by Mikhail Venkov

If you are seeing "Object" dtype that means that pandas can't interpret some of the rows as dates so instead in casts the whole column as Object (which is basically string)

如果您看到“Object”dtype,这意味着Pandas无法将某些行解释为日期,因此将整列转换为 Object(基本上是字符串)

Looking at documentation for dtypes, converters and parse_dates arguments: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

查看 dtypes、转换器和 parse_dates 参数的文档:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

You can also check dayfirst argument to parse the dates correctly.

您还可以检查 dayfirst 参数以正确解析日期。