pandas 熊猫读取sql整数变成浮点数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37796916/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:22:59  来源:igfitidea点击:

Pandas read sql integer became float

pythonmysqlpandasint

提问by Z Xie

I met a problem that when I use pandas to read Mysql table, some columns (see 'to_nlc') used to be integer became a float number (automatically add .0 after that). Can anyone figure it out? Or some guessings? Thanks very much!

我遇到了一个问题,当我使用 Pandas 读取 Mysql 表时,一些列(参见“to_nlc”)曾经是整数,变成了浮点数(之后自动添加 .0)。任何人都可以弄清楚吗?还是一些猜测?非常感谢!

enter image description hereenter image description hereenter image description here

在此处输入图片说明在此处输入图片说明在此处输入图片说明

采纳答案by jezrael

Problem is your data contains NaNvalues, so intis automatically cast to float.

问题是您的数据包含NaN值,因此int会自动转换为float.

I think you can check NA type promotions:

我认为您可以查看NA 类型的促销活动

When introducing NAs into an existing Series or DataFrame via reindex or some other means, boolean and integer types will be promoted to a different dtype in order to store the NAs. These are summarized by this table:

当通过重新索引或其他方式将 NA 引入现有的系列或数据帧时,布尔和整数类型将被提升为不同的 dtype 以存储 NA。下表总结了这些:

Typeclass   Promotion dtype for storing NAs
floating    no change
object      no change
integer     cast to float64
boolean     cast to object

While this may seem like a heavy trade-off, in practice I have found very few cases where this is an issue in practice. Some explanation for the motivation here in the next section.

虽然这似乎是一个沉重的权衡,但在实践中,我发现这在实践中是一个问题的情况很少。对下一节中的动机的一些解释。

回答by Nikolay

As already said the problem is that pandas' integer can not handle NULL/NA value.

如前所述,问题在于Pandas的整数无法处理 NULL/NA 值。

You can replace read_sql_table with read_sql and convert NULL to some integer value (for example 0 or -1, something which has NULL sense in your setting):

您可以用 read_sql 替换 read_sql_table 并将 NULL 转换为某个整数值(例如 0 或 -1,在您的设置中具有 NULL 意义的值):

df = pandas.read_sql("SELECT col1, col2, IFNULL(col3, 0) FROM table", engine)

Here col3 can be NULL in mysql, ifnull will return 0 if it is NULL or col3 value otherwise.

这里 col3 在 mysql 中可以为 NULL,ifnull 如果为 NULL 则返回 0,否则 col3 值。

Or same thing with little function helper:

或者用小函数助手做同样的事情:

def read_sql_table_with_nullcast(table_name, engine, null_cast={}):
    """
    table_name - table name
    engine - sql engine
    null_cast - dictionary of columns to replace NULL:
           column name as key value to replace with as value.
           for example {'col3':0} will set all NULL in col3 to 0
    """
    import pandas
    cols = pandas.read_sql("SHOW COLUMNS FROM " + table_name, engine)
    cols_call = [c if c not in null_cast else "ifnull(%s,%d) as %s"%(c,null_cast[c],c) for c in cols['Field']]
    sel = ",".join(cols_call)
    return pandas.read_sql("SELECT " + sel + " FROM " + table_name, engine)

read_sql_table_with_nullcast("table", engine, {'col3':0})