C# 将 CSV 文件导入到 .Net 中的强类型数据结构

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1898/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-07-31 13:00:17  来源:igfitidea点击:

Import CSV file to strongly typed data structure in .Net

提问by MattH

What's the best way to import a CSV file into a strongly-typed data structure?

将 CSV 文件导入强类型数据结构的最佳方法是什么?

回答by helloandre

A good simple way to do it is to open the file, and read each line into an array, linked list, data-structure-of-your-choice. Be careful about handling the first line though.

一个很好的简单方法是打开文件,并将每一行读入一个数组、链表、您选择的数据结构。不过要小心处理第一行。

This may be over your head, but there seems to be a direct way to access them as well using a connection string.

这可能超出您的想象,但似乎也有一种直接的方法可以使用连接字符串来访问它们。

Why not try using Python instead of C# or VB? It has a nice CSV module to import that does all the heavy lifting for you.

为什么不尝试使用 Python 而不是 C# 或 VB?它有一个很好的 CSV 模块可以导入,可以为您完成所有繁重的工作。

回答by Mike Stone

If you can guarantee that there are no commas in the data, then the simplest way would probably be to use String.split.

如果您可以保证数据中没有逗号,那么最简单的方法可能是使用String.split

For example:

例如:

String[] values = myString.Split(',');
myObject.StringField = values[0];
myObject.IntField = Int32.Parse(values[1]);

There may be libraries you could use to help, but that's probably as simple as you can get. Just make sure you can't have commas in the data, otherwise you will need to parse it better.

可能有一些库可以提供帮助,但这可能很简单。只要确保数据中不能有逗号,否则您将需要更好地解析它。

回答by Yaakov Ellis

There are two articles on CodeProject that provide code for a solution, one that uses StreamReaderand one that imports CSV datausing the Microsoft Text Driver.

CodeProject 上有两篇文章提供了解决方案的代码,一篇使用StreamReader,另一篇使用Microsoft Text Driver导入 CSV 数据

回答by Brian Leahy

I was bored so i modified some stuff i wrote. It try's to encapsulate the parsing in an OO manner whle cutting down on the amount of iterations through the file, it only iterates once at the top foreach.

我很无聊所以我修改了一些我写的东西。它尝试以面向对象的方式封装解析,同时减少通过文件的迭代量,它只在顶部 foreach 迭代一次。

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.IO;

namespace ConsoleApplication1
{
    class Program
    {

        static void Main(string[] args)
        {

            // usage:

            // note this wont run as getting streams is not Implemented

            // but will get you started

            CSVFileParser fileParser = new CSVFileParser();

            // TO Do:  configure fileparser

            PersonParser personParser = new PersonParser(fileParser);

            List<Person> persons = new List<Person>();
            // if the file is large and there is a good way to limit
            // without having to reparse the whole file you can use a 
            // linq query if you desire
            foreach (Person person in personParser.GetPersons())
            {
                persons.Add(person);
            }

            // now we have a list of Person objects
        }
    }

    public abstract  class CSVParser 
    {

        protected String[] deliniators = { "," };

        protected internal IEnumerable<String[]> GetRecords()
        {

            Stream stream = GetStream();
            StreamReader reader = new StreamReader(stream);

            String[] aRecord;
            while (!reader.EndOfStream)
            {
                  aRecord = reader.ReadLine().Split(deliniators,
                   StringSplitOptions.None);

                yield return aRecord;
            }

        }

        protected abstract Stream GetStream(); 

    }

    public class CSVFileParser : CSVParser
    {
        // to do: add logic to get a stream from a file

        protected override Stream GetStream()
        {
            throw new NotImplementedException();
        } 
    }

    public class CSVWebParser : CSVParser
    {
        // to do: add logic to get a stream from a web request

        protected override Stream GetStream()
        {
            throw new NotImplementedException();
        }
    }

    public class Person
    {
        public String Name { get; set; }
        public String Address { get; set; }
        public DateTime DOB { get; set; }
    }

    public class PersonParser 
    {

        public PersonParser(CSVParser parser)
        {
            this.Parser = parser;
        }

        public CSVParser Parser { get; set; }

        public  IEnumerable<Person> GetPersons()
        {
            foreach (String[] record in this.Parser.GetRecords())
            {
                yield return new Person()
                {
                    Name = record[0],
                    Address = record[1],
                    DOB = DateTime.Parse(record[2]),
                };
            }
        }
    }
}

回答by ICR

Brian gives a nice solution for converting it to a strongly typed collection.

Brian 提供了一个很好的解决方案,可以将其转换为强类型集合。

Most of the CSV parsing methods given don't take into account escaping fields or some of the other subtleties of CSV files (like trimming fields). Here is the code I personally use. It's a bit rough around the edges and has pretty much no error reporting.

给出的大多数 CSV 解析方法都没有考虑转义字段或 CSV 文件的其他一些细微之处(如修剪字段)。这是我个人使用的代码。它的边缘有点粗糙,几乎没有错误报告。

public static IList<IList<string>> Parse(string content)
{
    IList<IList<string>> records = new List<IList<string>>();

    StringReader stringReader = new StringReader(content);

    bool inQoutedString = false;
    IList<string> record = new List<string>();
    StringBuilder fieldBuilder = new StringBuilder();
    while (stringReader.Peek() != -1)
    {
        char readChar = (char)stringReader.Read();

        if (readChar == '\n' || (readChar == '\r' && stringReader.Peek() == '\n'))
        {
            // If it's a \r\n combo consume the \n part and throw it away.
            if (readChar == '\r')
            {
                stringReader.Read();
            }

            if (inQoutedString)
            {
                if (readChar == '\r')
                {
                    fieldBuilder.Append('\r');
                }
                fieldBuilder.Append('\n');
            }
            else
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();

                records.Add(record);
                record = new List<string>();

                inQoutedString = false;
            }
        }
        else if (fieldBuilder.Length == 0 && !inQoutedString)
        {
            if (char.IsWhiteSpace(readChar))
            {
                // Ignore leading whitespace
            }
            else if (readChar == '"')
            {
                inQoutedString = true;
            }
            else if (readChar == ',')
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();
            }
            else
            {
                fieldBuilder.Append(readChar);
            }
        }
        else if (readChar == ',')
        {
            if (inQoutedString)
            {
                fieldBuilder.Append(',');
            }
            else
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();
            }
        }
        else if (readChar == '"')
        {
            if (inQoutedString)
            {
                if (stringReader.Peek() == '"')
                {
                    stringReader.Read();
                    fieldBuilder.Append('"');
                }
                else
                {
                    inQoutedString = false;
                }
            }
            else
            {
                fieldBuilder.Append(readChar);
            }
        }
        else
        {
            fieldBuilder.Append(readChar);
        }
    }
    record.Add(fieldBuilder.ToString().TrimEnd());
    records.Add(record);

    return records;
}

Note that this doesn't handle the edge case of fields not being deliminated by double quotes, but meerley having a quoted string inside of it. See this postfor a bit of a better expanation as well as some links to some proper libraries.

请注意,这不会处理未被双引号分隔的字段的边缘情况,但 meerley 内部有一个带引号的字符串。请参阅这篇文章以获得更好的解释以及一些适当库的链接。

回答by pbh101

I had to use a CSV parser in .NET for a project this summer and settled on the Microsoft Jet Text Driver. You specify a folder using a connection string, then query a file using a SQL Select statement. You can specify strong types using a schema.ini file. I didn't do this at first, but then I was getting bad results where the type of the data wasn't immediately apparent, such as IP numbers or an entry like "XYQ 3.9 SP1".

今年夏天,我不得不在 .NET 中为一个项目使用 CSV 解析器,并选择了 Microsoft Jet Text Driver。您使用连接字符串指定文件夹,然后使用 SQL Select 语句查询文件。您可以使用 schema.ini 文件指定强类型。一开始我没有这样做,但后来我得到了糟糕的结果,其中数据的类型不是很明显,例如 IP 号码或像“XYQ 3.9 SP1”这样的条目。

One limitation I ran into is that it cannot handle column names above 64 characters; it truncates. This shouldn't be a problem, except I was dealing with very poorly designed input data. It returns an ADO.NET DataSet.

我遇到的一个限制是它不能处理超过 64 个字符的列名;它截断。这应该不是问题,除非我正在处理设计非常糟糕的输入数据。它返回一个 ADO.NET 数据集。

This was the best solution I found. I would be wary of rolling my own CSV parser, since I would probably miss some of the end cases, and I didn't find any other free CSV parsing packages for .NET out there.

这是我找到的最好的解决方案。我会谨慎使用自己的 CSV 解析器,因为我可能会错过一些最终情况,而且我没有找到任何其他免费的 .NET CSV 解析包。

EDIT: Also, there can only be one schema.ini file per directory, so I dynamically appended to it to strongly type the needed columns. It will only strongly-type the columns specified, and infer for any unspecified field. I really appreciated this, as I was dealing with importing a fluid 70+ column CSV and didn't want to specify each column, only the misbehaving ones.

编辑:此外,每个目录只能有一个 schema.ini 文件,因此我动态附加到它以强类型所需的列。它只会强类型指定的列,并推断任何未指定的字段。我真的很感激这一点,因为我正在处理导入一个 70+ 列的 CSV 并且不想指定每一列,只指定行为不端的列。

回答by Jon Limjap

If you're expecting fairly complex scenarios for CSV parsing, don't even think up of rolling our own parser. There are a lot of excellent tools out there, like FileHelpers, or even ones from CodeProject.

如果您期待 CSV 解析相当复杂的场景,甚至不要考虑滚动我们自己的解析器。有很多优秀的工具,比如FileHelpers,甚至来自CodeProject 的工具

The point is this is a fairly common problem and you could bet that a lotof software developers have already thought about and solved this problem.

关键是这是一个相当普遍的问题,您可以打赌,很多软件开发人员已经考虑并解决了这个问题。

回答by Jon Galloway

I agree with @NotMyself. FileHelpersis well tested and handles all kinds of edge cases that you'll eventually have to deal with if you do it yourself. Take a look at what FileHelpers does and only write your own if you're absolutely sure that either (1) you will never need to handle the edge cases FileHelpers does, or (2) you love writing this kind of stuff and are going to be overjoyed when you have to parse stuff like this:

我同意@NotMyselfFileHelpers已经过充分测试,可以处理如果您自己动手,最终将不得不处理的各种边缘情况。看看 FileHelpers 做了什么,并且只有在您绝对确定(1)您永远不需要处理 FileHelpers 所做的边缘情况,或者(2)您喜欢编写此类东西并且将要编写自己的文件时才编写自己的当你必须解析这样的东西时会欣喜若狂:

1,"Bill","Smith","Supervisor", "No Comment"

1、“比尔”、“史密斯”、“主管”、“无可奉告”

2 , 'Drake,' , 'O'Malley',"Janitor,

2、'德雷克'、'奥马利'、“看门人”

Oops, I'm not quoted and I'm on a new line!

哎呀,我没有被引用,我在一个新的行!

回答by Kevin

Use an OleDB connection.

使用 OleDB 连接。

String sConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\InputDirectory\;Extended Properties='text;HDR=Yes;FMT=Delimited'";
OleDbConnection objConn = new OleDbConnection(sConnectionString);
objConn.Open();
DataTable dt = new DataTable();
OleDbCommand objCmdSelect = new OleDbCommand("SELECT * FROM file.csv", objConn);
OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
objAdapter1.SelectCommand = objCmdSelect;
objAdapter1.Fill(dt);
objConn.Close();

回答by MarkJ

Microsoft's TextFieldParseris stable and follows RFC 4180for CSV files. Don't be put off by the Microsoft.VisualBasicnamespace; it's a standard component in the .NET Framework, just add a reference to the global Microsoft.VisualBasicassembly.

Microsoft 的TextFieldParser是稳定的并且遵循RFC 4180的 CSV 文件。不要被Microsoft.VisualBasic命名空间推迟;它是 .NET Framework 中的标准组件,只需添加对全局Microsoft.VisualBasic程序集的引用即可。

If you're compiling for Windows (as opposed to Mono) and don't anticipate having to parse "broken" (non-RFC-compliant) CSV files, then this would be the obvious choice, as it's free, unrestricted, stable, and actively supported, most of which cannot be said for FileHelpers.

如果您正在为 Windows 编译(而不是 Mono)并且不期望必须解析“损坏的”(不符合 RFC 标准的)CSV 文件,那么这将是显而易见的选择,因为它是免费的、不受限制的、稳定的,并积极支持,其中大部分不能说 FileHelpers。

See also: How to: Read From Comma-Delimited Text Files in Visual Basicfor a VB code example.

另请参阅:如何:在 Visual Basic 中从逗号分隔的文本文件中读取VB 代码示例。