C# 使用 DataReader 和 OLEDB Jet 数据提供程序读取 CSV 文件时,如何控制列数据类型?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/115658/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 14:26:24  来源:igfitidea点击:

When reading a CSV file using a DataReader and the OLEDB Jet data provider, how can I control column data types?

提问by Rory MacLeod

In my C# application I am using the Microsoft Jet OLEDB data provider to read a CSV file. The connection string looks like this:

在我的 C# 应用程序中,我使用 Microsoft Jet OLEDB 数据提供程序来读取 CSV 文件。连接字符串如下所示:

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Data;Extended Properties="text;HDR=Yes;FMT=Delimited

I open an ADO.NET OleDbConnection using that connection string and select all the rows from the CSV file with the command:

我使用该连接字符串打开一个 ADO.NET OleDbConnection 并使用以下命令从 CSV 文件中选择所有行:

select * from Data.csv

When I open an OleDbDataReader and examine the data types of the columns it returns, I find that something in the stack has tried to guess at the data types based on the first row of data in the file. For example, suppose the CSV file contains:

当我打开一个 OleDbDataReader 并检查它返回的列的数据类型时,我发现堆栈中的某些东西试图根据文件中的第一行数据猜测数据类型。例如,假设 CSV 文件包含:

House,Street,Town
123,Fake Street,Springfield
12a,Evergreen Terrace,Springfield

Calling the OleDbDataReader.GetDataTypeName method for the House column will reveal that the column has been given the data type "DBTYPE_I4", so all values read from it are interpreted as integers. My problem is that House should be a string - when I try to read the House value from the second row, the OleDbDataReader returns null.

为 House 列调用 OleDbDataReader.GetDataTypeName 方法将显示该列的数据类型为“DBTYPE_I4”,因此从它读取的所有值都被解释为整数。我的问题是 House 应该是一个字符串 - 当我尝试从第二行读取 House 值时,OleDbDataReader 返回 null。

How can I tell either the Jet database provider or the OleDbDataReader to interpret a column as strings instead of numbers?

如何告诉 Jet 数据库提供程序或 OleDbDataReader 将列解释为字符串而不是数字?

采纳答案by MarcE

There's a schema file you can create that would tell ADO.NET how to interpret the CSV - in effect giving it a structure.

您可以创建一个模式文件,它会告诉 ADO.NET 如何解释 CSV - 实际上给它一个结构。

Try this: http://www.aspdotnetcodes.com/Importing_CSV_Database_Schema.ini.aspx

试试这个:http: //www.aspdotnetcodes.com/Importing_CSV_Database_Schema.ini.aspx

Or the most recent MS Documentation

或最新的MS 文档

回答by Rory MacLeod

To expand on Marc's answer, I need to create a text file called Schema.ini and put it in the same directory as the CSV file. As well as column types, this file can specify the file format, date time format, regional settings, and the column names if they're not included in the file.

为了扩展 Marc 的答案,我需要创建一个名为 Schema.ini 的文本文件,并将其放在与 CSV 文件相同的目录中。除了列类型,此文件还可以指定文件格式、日期时间格式、区域设置和列名称(如果它们未包含在文件中)。

To make the example I gave in the question work, the Schema file should look like this:

为了使我在问题中给出的示例有效,架构文件应如下所示:

[Data.csv]
ColNameHeader=True
Col1=House Text
Col2=Street Text
Col3=Town Text

I could also try this to make the data provider examine all the rows in the file before it tries to guess the data types:

我也可以尝试这样做,让数据提供者在尝试猜测数据类型之前检查文件中的所有行:

[Data.csv]
ColNameHeader=true
MaxScanRows=0

In real life, my application imports data from files with dynamic names, so I have to create a Schema.ini file on the fly and write it to the same directory as the CSV file before I open my connection.

在现实生活中,我的应用程序从具有动态名称的文件中导入数据,因此我必须即时创建一个 Schema.ini 文件并将其写入与 CSV 文件相同的目录,然后再打开我的连接。

Further details can be found here - http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx- or by searching the MSDN Library for "Schema.ini file".

可以在此处找到更多详细信息 - http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx- 或通过在 MSDN 库中搜索“Schema.ini 文件”。

回答by Akhil

Please check

请检查

http://kbcsv.codeplex.com/

http://kbcsv.codeplex.com/

using (var reader = new CsvReader("data.csv"))
{
    reader.ReadHeaderRecord();
    foreach (var record in reader.DataRecords)
    {
        var name = record["Name"];
        var age = record["Age"];
    }
}

回答by Jon

You need to tell the driver to scan all rows to determine the schema. Otherwise if the first few rows are numeric and the rest are alphanumeric, the alphanumeric cells will be blank.

您需要告诉驱动程序扫描所有行以确定架构。否则,如果前几行是数字而其余行是字母数字,则字母数字单元格将为空白。

Like Rory, I found that I needed to create a schema.ini file dynamically because there is no way to programatically tell the driver to scan all rows. (this is not the case for excel files)

Rory一样,我发现我需要动态创建一个 schema.ini 文件,因为无法以编程方式告诉驱动程序扫描所有行。(excel文件不是这种情况)

You must have MaxScanRows=0in your schema.ini

你必须MaxScanRows=0在你的 schema.ini

Here's a code example:

这是一个代码示例:

    public static DataTable GetDataFromCsvFile(string filePath, bool isFirstRowHeader = true)
    {
        if (!File.Exists(filePath))
        {
            throw new FileNotFoundException("The path: " + filePath + " doesn't exist!");
        }

        if (!(Path.GetExtension(filePath) ?? string.Empty).ToUpper().Equals(".CSV"))
        {
            throw new ArgumentException("Only CSV files are supported");
        }
        var pathOnly = Path.GetDirectoryName(filePath);
        var filename = Path.GetFileName(filePath);
        var schemaIni =
            $"[{filename}]{Environment.NewLine}" +
            $"Format=CSVDelimited{Environment.NewLine}" +
            $"ColNameHeader={(isFirstRowHeader ? "True" : "False")}{Environment.NewLine}" +
            $"MaxScanRows=0{Environment.NewLine}" +
            $" ; scan all rows for data type{Environment.NewLine}" +
            $" ; This file was automatically generated";
        var schemaFile = pathOnly != null ? Path.Combine(pathOnly, "schema.ini") : "schema.ini";
        File.WriteAllText(schemaFile, schemaIni);

        try
        {
            var sqlCommand = $@"SELECT * FROM [{filename}]";

            var oleDbConnString =
                $"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={pathOnly};Extended Properties=\"Text;HDR={(isFirstRowHeader ? "Yes" : "No")}\"";

            using (var oleDbConnection = new OleDbConnection(oleDbConnString))
            using (var adapter = new OleDbDataAdapter(sqlCommand, oleDbConnection))
            using (var dataTable = new DataTable())
            {
                adapter.FillSchema(dataTable, SchemaType.Source);
                adapter.Fill(dataTable);
                return dataTable;
            }
        }
        finally
        {
            if (File.Exists(schemaFile))
            {
                File.Delete(schemaFile);
            }
        }
    }

You'll need to do some modification if you are running this on the same directory in multiple threads at the same time.

如果您同时在多个线程中的同一目录上运行它,则需要进行一些修改。