pandas Python df.to_excel() 将数字存储为 excel 中的文本。如何存储为值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41080999/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:35:52  来源:igfitidea点击:

Python df.to_excel() storing numbers as text in excel. How to store as Value?

pythonexcelpandas

提问by gluc7

I am scraping table data from google finance through pd.read_html and then saving that data to excel through df.to_excel()as seen below:

我正在通过 pd.read_html 从谷歌金融中抓取表格数据,然后将该数据保存到 excel 中df.to_excel(),如下所示:

    dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
    xlWriter = pd.ExcelWriter(output.xlsx, engine='xlsxwriter')

    for i, df in enumerate(dfs):
        df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
    xlWriter.save()

However, the numbers that are saved to excel are stored as text with the little green triangle in the corner of the cell. When moving over this data to excel, how do I store them as actual values and not text?

但是,保存到 excel 的数字以文本形式存储,单元格角落带有绿色小三角形。将这些数据移到 excel 时,如何将它们存储为实际值而不是文本?

采纳答案by Parfait

Consider converting numeric columns to floats since the pd.read_htmlreads web data as string types (i.e., objects). But before converting to floats, you need to replace hyphens to NaNs:

考虑将数字列转换为浮点数,因为pd.read_html将 Web 数据读取为字符串类型(即对象)。但是在转换为浮点数之前,您需要将连字符替换为 NaN:

import pandas as pd
import numpy as np

dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL' +
                   '&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
workbook = xlWriter.book

for i, df in enumerate(dfs):
    for col in df.columns[1:]:                  # UPDATE ONLY NUMERIC COLS 
        df.loc[df[col] == '-', col] = np.nan    # REPLACE HYPHEN WITH NaNs
        df[col] = df[col].astype(float)         # CONVERT TO FLOAT   

    df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))

xlWriter.save()

回答by jmcnamara

In addition to the other solutions where the string data is converted to numbers when creating or using the dataframe it is also possible to do it using options to the xlsxwriterengine:

除了在创建或使用数据帧时将字符串数据转换为数字的其他解决方案之外,还可以使用xlsxwriter引擎的选项来实现:

writer = pd.ExcelWriter('output.xlsx',
                        engine='xlsxwriter',
                        options={'strings_to_numbers': True})

From the docs:

文档

strings_to_numbers: Enable the worksheet.write()method to convert strings to numbers, where possible, using float()in order to avoid an Excel warning about "Numbers Stored as Text".

strings_to_numbersworksheet.write()在可能的情况下启用将字符串转换为数字的方法,使用float()以避免有关“数字存储为文本”的 Excel 警告。

回答by Bluu

Since pandas 0.19, you can supply the argument na_values to pd.read_html which will allow pandas to correctly automatically infer the float type to your price columns...

从 pandas 0.19 开始,您可以将参数 na_values 提供给 pd.read_html,这将允许 pandas 正确自动推断您的价格列的浮点类型...

Here's how that would look like:

下面是它的样子:

dfs = pd.read_html(
    'https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM',
    flavor='html5lib',
    index_col='\nIn Millions of USD (except for per share items)\n',
    na_values='-'
)

xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
for i, df in enumerate(dfs):
    df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
xlWriter.save()

Alternatively (if you don't have pandas 0.19 yet), I'd use a simpler version of @Parfait's solution:

或者(如果您还没有 Pandas 0.19),我会使用更简单的 @Parfait 解决方案版本:

dfs = pd.read_html(
    'https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM',
    flavor='html5lib',
    index_col='\nIn Millions of USD (except for per share items)\n'
)

xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
for i, df in enumerate(dfs):
    df.mask(df == '-').astype(float).to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
xlWriter.save()

This second solution only works if you correctly define your index column (in the .read_html), it will fail miserably with a ValueError if one of the (data) columns contains anything that is notconvertible to a float...

第二种解决方案仅在您正确定义索引列(在 .read_html 中)时才有效,如果(数据)列之一包含任何不可转换为浮点数的内容,它会因 ValueError 悲惨地失败...

回答by Felix

Did you verify that the columns that you're exporting are actually numbers in python (int or float)?

您是否验证过要导出的列实际上是 Python 中的数字(int 或 float)?

Alternatively, you can convert the text fields into numbers in excel using the =VALUE() function.

或者,您可以使用 =VALUE() 函数将文本字段转换为 Excel 中的数字。