C# 在 .Net 中从 Excel 导入时的科学记数法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/429853/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 03:03:01  来源:igfitidea点击:

Scientific notation when importing from Excel in .Net

c#.netexceloledb

提问by ChrisDiRulli

I have a C#/.Net job that imports data from Excel and then processes it. Our client drops off the files and we process them. I don't have any control over the original file.

我有一个从 Excel 导入数据然后处理它的 C#/.Net 作业。我们的客户放下文件,我们处理它们。我对原始文件没有任何控制权。

I use the OleDb library to fill up a dataset. The file contains some numbers like 30829300, 30071500, etc... The data type for those columns is "Text".

我使用 OleDb 库来填充数据集。该文件包含一些数字,如 30829300、30071500 等...这些列的数据类型是“文本”。

Those numbers are converted to scientific notation when I import the data. Is there anyway to prevent this from happening?

当我导入数据时,这些数字会转换为科学记数法。有没有办法防止这种情况发生?

采纳答案by P Daddy

The OleDb library will, more often than not, mess up your data in an Excel spreadsheet. This is largely because it forces everything into a fixed-type column layout, guessingat the type of each column from the values in the first 8 cells in each column. If it guesses wrong, you end up with digit strings converted to scientific-notation. Blech!

使用OLEDB库的意愿,更多的,往往不是搞乱在Excel电子表格数据。这主要是因为它强制所有内容进入固定类型的列布局,从每列的前 8 个单元格中的值猜测每列的类型。如果猜错了,您最终会将数字字符串转换为科学记数法。布莱克!

To avoid this you're better off skipping the OleDb and reading the sheet directly yourself. You can do this using the COM interface of Excel (also blech!), or a third-party .NET Excel-compatible reader. SpreadsheetGearis one such library that works reasonably well, and has an interface that's very similar to Excel's COM interface.

为了避免这种情况,您最好跳过 OleDb 并直接自己阅读表格。您可以使用 Excel 的 COM 接口(也是 blech!)或第三方 .NET Excel 兼容阅读器来执行此操作。 SpreadsheetGear就是这样一个运行良好的库,它的界面与 Excel 的 COM 界面非常相似。

回答by Fionnuala

I have found that the easiest way is to choose Zip format, rather than text format for columns with large 'numbers'.

我发现最简单的方法是为具有大“数字”的列选择 Zip 格式,而不是文本格式。

回答by palehorse

Have you tried casting the value of the field to (int) or perhaps (Int64) as you are reading it?

您是否在阅读时尝试将字段的值转换为 (int) 或 (Int64) ?

回答by Andrew Rollings

Look up the IMEX=1 connection string option and TypeGuessRows registry setting on google. In truth, there is no easy way round this because the reader infers column data types by looking at the first few rows (8 by default). If the rows contain all numbers then you're out of luck.

在谷歌上查找 IMEX=1 连接字符串选项和 TypeGuessRows 注册表设置。事实上,没有简单的方法来解决这个问题,因为阅读器通过查看前几行(默认为 8 行)来推断列数据类型。如果行包含所有数字,那么你就不走运了。

An unfortunate workaround which I've used in the past is to use the HDR=NO connection string option and set the TypeGuessRows registry setting value to 1, which forces it to read the first row as valid data to make its datatype determination, rather than a header. It's a hack, but it works. The code reads the first row (containing the header) as text, and then sets the datatype accordingly.

我过去使用过的一个不幸的解决方法是使用 HDR=NO 连接字符串选项并将 TypeGuessRows 注册表设置值设置为 1,这会强制它读取第一行作为有效数据来确定其数据类型,而不是一个标题。这是一个黑客,但它的工作原理。该代码将第一行(包含标题)作为文本读取,然后相应地设置数据类型。

Changing the registry is a pain (and not always possible) but I'd recommend restoring the original value afterwards.

更改注册表是一件痛苦的事情(并非总是可行),但我建议之后恢复原始值。

If your import data doesn't have a header row, then an alternative option is to pre-process the file and insert a ' character before each of the numbers in the offending column. This causes the column data to be treated as text.

如果您的导入数据没有标题行,则另一种选择是预处理文件并在有问题的列中的每个数字之前插入一个 ' 字符。这会导致列数据被视为文本。

So all in all, there are a bunch of hacks to work around this, but nothing really foolproof.

总而言之,有很多技巧可以解决这个问题,但没有什么是万无一失的。

回答by Andrew Garrison

I had this same problem, but was able to work around it without resorting to the Excel COM interface or 3rd party software. It involves a little processing overhead, but appears to be working for me.

我遇到了同样的问题,但无需求助于 Excel COM 接口或 3rd 方软件就可以解决它。它涉及一些处理开销,但似乎对我有用。

  1. First read in the data to get the column names
  2. Then create a new DataSet with each of these columns, setting each of their DataTypes to string.
  3. Read the data in again into this new dataset. Voila - the scientific notation is now gone and everything is read in as a string.
  1. 首先读入数据以获取列名
  2. 然后用这些列中的每一个创建一个新的数据集,将它们的每个数据类型设置为字符串。
  3. 再次将数据读入这个新数据集。瞧 - 科学记数法现在消失了,所有内容都作为字符串读入。

Here's some code that illustrates this, and as an added bonus, it's even StyleCopped!

这是一些说明这一点的代码,作为额外的奖励,它甚至是 StyleCopped!

public void ImportSpreadsheet(string path)
{
    string extendedProperties = "Excel 12.0;HDR=YES;IMEX=1";
    string connectionString = string.Format(
        CultureInfo.CurrentCulture,
        "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"{1}\"",
        path,
        extendedProperties);

    using (OleDbConnection connection = new OleDbConnection(connectionString))
    {
        using (OleDbCommand command = connection.CreateCommand())
        {
            command.CommandText = "SELECT * FROM [Worksheet1$]";
            connection.Open();

            using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
            using (DataSet columnDataSet = new DataSet())
            using (DataSet dataSet = new DataSet())
            {
                columnDataSet.Locale = CultureInfo.CurrentCulture;
                adapter.Fill(columnDataSet);

                if (columnDataSet.Tables.Count == 1)
                {
                    var worksheet = columnDataSet.Tables[0];

                    // Now that we have a valid worksheet read in, with column names, we can create a
                    // new DataSet with a table that has preset columns that are all of type string.
                    // This fixes a problem where the OLEDB provider is trying to guess the data types
                    // of the cells and strange data appears, such as scientific notation on some cells.
                    dataSet.Tables.Add("WorksheetData");
                    DataTable tempTable = dataSet.Tables[0];

                    foreach (DataColumn column in worksheet.Columns)
                    {
                        tempTable.Columns.Add(column.ColumnName, typeof(string));
                    }

                    adapter.Fill(dataSet, "WorksheetData");

                    if (dataSet.Tables.Count == 1)
                    {
                        worksheet = dataSet.Tables[0];

                        foreach (var row in worksheet.Rows)
                        {
                            // TODO: Consume some data.
                        }
                    }
                }
            }
        }
    }
}

回答by dankyy1

I googled around this state.. Here are my solulition steps

我用谷歌搜索了这个州..这是我的解决步骤

  • For template excel file
  • 对于模板excel文件

1-format Excel coloumn as Text 2- write macro to disable error warnings for Number -> text convertion

1 格式 Excel 列作为文本 2- 编写宏以禁用数字 -> 文本转换的错误警告

  Private Sub Workbook_BeforeClose(Cancel As Boolean)
Application.ErrorCheckingOptions.BackgroundChecking = Ture
End Sub
Private Sub Workbook_Open()
Application.ErrorCheckingOptions.BackgroundChecking = False
End Sub
  • On codebehind
  • 在代码隐藏上

3- while reading data to import try to parse incoming data to Int64 or Int32....

3- 在读取要导入的数据时尝试将传入数据解析为 Int64 或 Int32 ....

回答by BA TabNabber

One workaround to this issue is to change your select statement, instead of SELECT * do this:

此问题的一种解决方法是更改​​您的 select 语句,而不是 SELECT * 这样做:

"SELECT Format([F1], 'General Number')  From [Sheet1$]"
 -or-
"SELECT Format([F1], \"#####\")  From [Sheet1$]"

However, doing so will blow up if your cells contain more than 255 characters with the following error: "Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done."

但是,如果您的单元格包含超过 255 个字符并出现以下错误,则这样做会爆炸:“多步 OLE DB 操作生成错误。检查每个 OLE DB 状态值(如果可用)。未完成任何工作。”

Fortunately my customer didn't care about erroring out in this scenario.

幸运的是,我的客户并不关心在这种情况下出错。

This page has a bunch of good things to try as well: http://www.dicks-blog.com/archives/2004/06/03/external-data-mixed-data-types/

这个页面也有很多好东西可以尝试:http: //www.dicks-blog.com/archives/2004/06/03/external-data-mixed-data-types/

回答by Sameer Alibhai

If you look at the actual .XSLX file using Open XML SDK 2.0 Productivity Tool (or simply unzip the file and view the XML in notepad) you will see that Excel 2007 actually stores the raw data in scientific format.

如果您使用 Open XML SDK 2.0 Productivity Tool 查看实际的 .XSLX 文件(或简单地解压缩文件并在记事本中查看 XML),您将看到 Excel 2007 实际上以科学格式存储原始数据。

For example 0.00001 is stored as 1.0000000000000001E-5

例如 0.00001 存储为 1.0000000000000001E-5

<x:c r="C18" s="11" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:v>1.0000000000000001E-5</x:v>
</x:c>

Looking at the cell in Excel its displayed as 0.00001 in both the cell and the formula bar. So it not always true that OleDB is causing the issue.

查看 Excel 中的单元格,它在单元格和公式栏中都显示为 0.00001。因此,OleDB 导致问题并不总是正确的。

回答by johndsamuels

Using this connection string:

使用此连接字符串:

Provider=Microsoft.ACE.OLEDB.12.0; data source={0}; Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1\"

with Excel 2010 I have noticed the following. If the Excel file is open when you run the OLEDB SELECT then you get the current version of the cells, not the saved file values. Furthermore the string values returned for a long number, decimal value and date look like this:

使用 Excel 2010 我注意到以下几点。如果在运行 OLEDB SELECT 时 Excel 文件处于打开状态,那么您将获得单元格的当前版本,而不是保存的文件值。此外,为长数字、十进制值和日期返回的字符串值如下所示:

5.0130370071e+012
4.08
36808

If the file is not open then the returned values are:

如果文件未打开,则返回值是:

5013037007084
£4.08
Monday, October 09, 2000

回答by Tipur Madan

I got one solution from somewhere else but it worked perfectly for me. No need to make any code change, just format excel columns cells to 'General" instead of any other formatting like "number" or "text", then even Select * from [$Sheet1] or Select Column_name from [$Sheet1] will read it perfectly even with large numeric values more than 9 digits

我从其他地方得到了一个解决方案,但它对我来说非常有效。无需进行任何代码更改,只需将 excel 列单元格的格式设置为“常规”而不是“数字”或“文本”等任何其他格式,然后即使 Select * from [$Sheet1] 或 Select Column_name from [$Sheet1] 也会读取即使有超过 9 位的大数值,它也很完美