将逗号分隔的文本文件读取到 C# DataTable,列被截断为 255 个字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1051271/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 07:03:03  来源:igfitidea点击:

Reading Comma Delimited Text File to C# DataTable, columns get truncated to 255 characters

c#.netcsvoledbjet

提问by Greg Bailey

We are importing from CSV to SQL. To do so, we are reading the CSV file and writing to a temporary .txt file using a schema.ini. (I'm not sure yet exactly why are are writing to this temporary file, but that's how the code currently works). From there, we are loading a DataTable via OleDB using the following connection string (for ASCII files).

我们正在从 CSV 导入到 SQL。为此,我们正在读取 CSV 文件并使用 schema.ini 写入临时 .txt 文件。(我还不确定为什么要写入这个临时文件,但这就是代码当前的工作方式)。从那里,我们使用以下连接字符串(对于 ASCII 文件)通过 OleDB 加载数据表。

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + sPath + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited\"";

The problem we are having is that fields with more than 255 characters get truncated. I've read online about this problem and it seems that by default, text fields get truncated thusly.

我们遇到的问题是超过 255 个字符的字段会被截断。我已经在网上阅读了有关此问题的信息,似乎默认情况下,文本字段会因此被截断。

I set my registry settings ImportMixedTypes=Majority Typeand TypeGuessRows=0in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel, hoping that mycolumnswill no longer be interpreted as text. After doing that, the temporary txt file is being written correctly from the CSV file, but when I call dataAdapter.Fill, the resulting DataTable still has a truncated value.

我把我的注册表设置 ImportMixedTypes=Majority Type,并TypeGuessRows=0HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel,希望mycolumns不再被解释为文本。这样做之后,临时 txt 文件正在从 CSV 文件中正确写入,但是当我调用 时dataAdapter.Fill,生成的 DataTable 仍然有一个截断的值。

Here is the column definition in question. CommaDelimited#txt Notes 2 false 234 true 130 0 0

这是有问题的列定义。CommaDelimited#txt 注释 2 false 234 true 130 0 0

Any help would be appreciated. At this time, I'm not interested in using any 3d party code to solve this problem, there must be a way using built in tools.

任何帮助,将不胜感激。目前,我对使用任何 3d 方代码来解决此问题不感兴趣,必须有使用内置工具的方法。

Here is the table definition:

这是表定义:

<Columns> 
    <TABLE_NAME>CommaDelimited#txt</TABLE_NAME> 
    <COLUMN_NAME>Notes</COLUMN_NAME> 
    <ORDINAL_POSITION>2</ORDINAL_POSITION> 
    <COLUMN_HASDEFAULT>false</COLUMN_HASDEFAULT> 
    <COLUMN_FLAGS>234</COLUMN_FLAGS> 
    <IS_NULLABLE>true</IS_NULLABLE> 
    <DATA_TYPE>130</DATA_TYPE> 
    <CHARACTER_MAXIMUM_LENGTH>0</CHARACTER_MAXIMUM_LENGTH> 
    <CHARACTER_OCTET_LENGTH>0</CHARACTER_OCTET_LENGTH> 
</Columns>

Thanks,

谢谢,

Greg

格雷格



I tried editing the schema.ini specifying text with a width, and that did not help (it was set to memo before)

我尝试编辑 schema.ini 指定宽度的文本,但这没有帮助(之前设置为备忘录)

[CommaDelimited.txt] Format=CSVDelimited DecimalSymbol=. Col1=Notes Text Width 5000

[CommaDelimited.txt] 格式=CSVDelimited DecimalSymbol=。Col1=注释文本宽度 5000

采纳答案by Robert Harvey

The Jet database engine truncates memo fields if you ask it to process the data based on the memo: aggregating, de-duplicating, formatting, and so on.

如果您要求 Jet 数据库引擎根据备忘录处理数据,Jet 数据库引擎会截断备忘录字段:聚合、重复数据删除、格式化等。

http://allenbrowne.com/ser-63.html

http://allenbrowne.com/ser-63.html

回答by Reed Copsey

You can correct this by correctly specifying your schema.inifile. I believe the two options are to either set the column to a Memo type, or to set the Width > 255.

您可以通过正确指定您的schema.ini文件来更正此问题。我相信这两个选项是将列设置为备忘录类型,或者将宽度设置为 > 255。

回答by Joel Mueller

My inclination would be to create the DataTable directly when reading the CSV file, rather than going through the extra step of writing the data out to a different text file, only to read it back into memory a second time.

我的倾向是在读取 CSV 文件时直接创建 DataTable,而不是通过额外的步骤将数据写入不同的文本文件,然后再次将其读回内存。

For that matter, how are you ultimately getting the data from the DataTable into the SQL database? If you're just looping through the DataTable and doing a bunch of INSERT statements, why not skip twomiddlemen and call the same INSERT statements while you're initially reading the CSV file?

就此而言,您最终如何将数据从 DataTable 获取到 SQL 数据库中?如果您只是遍历 DataTable 并执行一堆 INSERT 语句,为什么不在最初读取 CSV 文件时跳过两个中间人并调用相同的 INSERT 语句?

回答by Robert Harvey

Here's a simple class for reading a delimited file and returning a DataTable (all strings) that doesn't truncate strings. It has an overloaded method to specify column names if they're not in the file. Maybe you can use it?

这是一个简单的类,用于读取分隔文件并返回不截断字符串的 DataTable(所有字符串)。如果列名不在文件中,它有一个重载方法来指定列名。也许你可以使用它?

Imported Namespaces

导入的命名空间

using System;
using System.Text;
using System.Data;
using System.IO;

Code

代码

/// <summary>
/// Simple class for reading delimited text files
/// </summary>
public class DelimitedTextReader
{
    /// <summary>
    /// Read the file and return a DataTable
    /// </summary>
    /// <param name="filename">File to read</param>
    /// <param name="delimiter">Delimiting string</param>
    /// <returns>Populated DataTable</returns>
    public static DataTable ReadFile(string filename, string delimiter)
    {
        return ReadFile(filename, delimiter, null);
    }
    /// <summary>
    /// Read the file and return a DataTable
    /// </summary>
    /// <param name="filename">File to read</param>
    /// <param name="delimiter">Delimiting string</param>
    /// <param name="columnNames">Array of column names</param>
    /// <returns>Populated DataTable</returns>
    public static DataTable ReadFile(string filename, string delimiter, string[] columnNames)
    {
        //  Create the new table
        DataTable data = new DataTable();
        data.Locale = System.Globalization.CultureInfo.CurrentCulture;

        //  Check file
        if (!File.Exists(filename))
            throw new FileNotFoundException("File not found", filename);

        //  Process the file line by line
        string line;
        using (TextReader tr = new StreamReader(filename, Encoding.Default))
        {
            //  If column names were not passed, we'll read them from the file
            if (columnNames == null)
            {
                //  Get the first line
                line = tr.ReadLine();
                if (string.IsNullOrEmpty(line))
                    throw new IOException("Could not read column names from file.");
                columnNames = line.Split(new string[] { delimiter }, StringSplitOptions.RemoveEmptyEntries);
            }

            //  Add the columns to the data table
            foreach (string colName in columnNames)
                data.Columns.Add(colName);

            //  Read the file
            string[] columns;
            while ((line = tr.ReadLine()) != null)
            {
                columns = line.Split(new string[] { delimiter }, StringSplitOptions.None);
                //  Ensure we have the same number of columns
                if (columns.Length != columnNames.Length)
                {
                    string message = "Data row has {0} columns and {1} are defined by column names.";
                    throw new DataException(string.Format(message, columns.Length, columnNames.Length));
                }
                data.Rows.Add(columns);
            }
        }
        return data;

    }
}

Required Namespaces

必需的命名空间

using System;
using System.Data;
using System.Windows.Forms;
using System.Data.SqlClient;
using System.Diagnostics;

Here's an example of calling it and uploading to a SQL Database:

下面是调用它并上传到 SQL 数据库的示例:

        Stopwatch sw = new Stopwatch();
        TimeSpan tsRead;
        TimeSpan tsTrunc;
        TimeSpan tsBcp;
        int rows;
        sw.Start();
        using (DataTable dt = DelimitedTextReader.ReadFile(textBox1.Text, "\t"))
        {
            tsRead = sw.Elapsed;
            sw.Reset();
            rows = dt.Rows.Count;
            string connect = @"Data Source=.;Initial Catalog=MyDB;Integrated Security=SSPI";
            using (SqlConnection cn = new SqlConnection(connect))
            using (SqlCommand cmd = new SqlCommand("TRUNCATE TABLE dbo.UploadTable", cn))
            using (SqlBulkCopy bcp = new SqlBulkCopy(cn))
            {
                cn.Open();
                sw.Start();
                cmd.ExecuteNonQuery();
                tsTrunc = sw.Elapsed;
                sw.Reset();

                sw.Start();
                bcp.DestinationTableName = "dbo.UploadTable";
                bcp.ColumnMappings.Add("Column A", "ColumnA");
                bcp.ColumnMappings.Add("Column D", "ColumnD");
                bcp.WriteToServer(dt);
                tsBcp = sw.Elapsed;
                sw.Reset();
            }
        }

        string message = "File read:\t{0}\r\nTruncate:\t{1}\r\nBcp:\t{2}\r\n\r\nTotal time:\t{3}\r\nTotal rows:\t{4}";
        MessageBox.Show(string.Format(message, tsRead, tsTrunc, tsBcp, tsRead + tsTrunc + tsBcp, rows));

回答by Ayman

I think the best way to do it is by using CSVReader in the following blog: http://ronaldlemmen.blogspot.com/2008/03/stopping-and-continuing-save-event.html

我认为最好的方法是在以下博客中使用 CSVReader:http://ronaldlemmen.blogspot.com/2008/03/stopping-and-continuing-save-event.html

回答by Sunil R

I was facing a similar issue with my .Net code, and i was able to make it work just by changing the Registry settings to Majortity Type as mentioned in the original post. What i did different in my case is: Since we are trying to import from a CSV and not an Excel. So i had to change the settings for Text in the Registry (Not Excel).

我的 .Net 代码遇到了类似的问题,我只需将注册表设置更改为原始帖子中提到的多数类型即可使其工作。在我的情况下,我所做的不同是:因为我们试图从 CSV 而不是 Excel 导入。所以我不得不更改注册表中的文本设置(不是 Excel)。

Try doing it and i think it should work

尝试这样做,我认为它应该有效