wpf 读取 CSV 文件并提取特定数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30669967/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading a CSV file and extracting specific data
提问by sparta93
So I have a .CSV file which has possibly several millions, maybe even billions lines of data. The data is in the format below:
所以我有一个 .CSV 文件,它可能包含数百万甚至数十亿行数据。数据格式如下:
1,,5,6,7,82,4,6
1,4,4,5,6,33,4,
2,6,3,,6,32,6,7
,,,2,5,45,,6
,4,5,6,,33,5,6
What I am trying to achieve is this: Lets assume each line of data is an "event". Lets call it that. Now lets say an user says, show me all events where the 6th value is 33. You can see above that the 6th data element is a 2 digit number and the user can say show me all events where the 6th data element is 33 and the output would be:
我想要实现的是:让我们假设每一行数据都是一个“事件”。让我们这样称呼它。现在假设用户说,向我显示第 6 个值为 33 的所有事件。您可以在上面看到第 6 个数据元素是一个 2 位数,用户可以说向我显示第 6 个数据元素为 33 的所有事件,并且输出将是:
1,4,4,5,6,33,4,
,4,5,6,,33,5,6
Also, as you can see. The data can have blanks or holes where data is missing. I don't need help reading a .CSV file or anything. I just cant wrap my mind around how I would access the 6th data element. Also, I would prefer if this output is represented in a collection of some sort maybe. I'm new to C# so I don't have much knowledge about the inbuilt classes. Any help will be appreciated!
此外,如您所见。数据可能有空白或缺失数据的空洞。我不需要帮助阅读 .CSV 文件或任何东西。我只是想不通我将如何访问第 6 个数据元素。另外,我更喜欢这个输出是否以某种集合表示。我是 C# 的新手,所以我对内置类的了解不多。任何帮助将不胜感激!
采纳答案by Alexander Bell
I suggest instead of using term "event" to call this data structure more customarily as "rows and columns" and use C# Split()function to create 2d-array (string[,]or int[,]), where each element is conveniently accessible by its row/column index, and to apply whatever business logic to those elements.
我建议不要使用术语“事件”来更习惯地将此数据结构称为“行和列”,并使用 C#Split()函数创建二维数组(string[,]或int[,]),其中每个元素都可以通过其行/列索引方便地访问,并且将任何业务逻辑应用于这些元素。
Possible implementation of the CSV file reader (by line, with each line stored in the List<string> listRows) is shown below (re: Reading CSV file and storing values into an array)
CSV 文件读取器的可能实现(按行,每行存储在 中List<string> listRows)如下所示(重新:读取 CSV 文件并将值存储到数组中)
using System.IO;
static void Main(string[] args)
{
var reader = new StreamReader(File.OpenRead(@"C:\YouFile.csv"));
List<string> listRows= new List<string>();
while (!reader.EndOfStream)
{
listRows.Add(reader.ReadLine());
}
}
Then apply Split(',')function to each row (stored in listRows) to compose a 2d-array string[,]and use int.TryParse()method to convert it to type int(optional, upon necessity).
然后将Split(',')函数应用于每一行(存储在 中listRows)以组成一个二维数组string[,]并使用int.TryParse()方法将其转换为类型int(可选,必要时)。
Alternatively, this could be implemented by using LINQ Library, which is not recommended because of unnecessary extension of the technology surface area, plus possible performance degradation (LINQ solution expected to be slower than suggested direct processing).
或者,这可以通过使用 LINQ 库来实现,但不建议这样做,因为技术表面积的不必要扩展,以及可能的性能下降(LINQ 解决方案预计比建议的直接处理慢)。
Hope this may help.
希望这可能会有所帮助。
回答by Yuri
Using Linq it is pretty easy to achieve. I'm posting as sample from LinqPad and providing output. All you need to do is to replace 33 with a parameter:
使用 Linq 很容易实现。我从 LinqPad 作为示例发布并提供输出。您需要做的就是将 33 替换为一个参数:
void Main()
{
string csvFile = @"C:\Temp\TestData.csv";
string[] lines = File.ReadAllLines(csvFile);
var values = lines.Select(s => new { myRow = s.Split(',')});
//and here is your collection representing results
List<string[]> results = new List<string[]>();
foreach (var value in values)
{
if(value.Values.Contains("33")){
results.Add(value.myRow);
}
}
results.Dump();
}
Output:

输出:

or if you want you can have it all in one shot by doing this
或者,如果您愿意,您可以通过执行此操作一次性完成所有操作
string csvFile = @"C:\Temp\TestData.csv";
string[] lines = File.ReadAllLines(csvFile);
var values = lines.Select(s =>
new {Position =Array.FindIndex(s.Split(','),a=>a.Contains("33"))+1
,myRow = s.Split(',')
});
so the final product will have both - the position of your search (33) and the complete string[] of items.
所以最终产品将同时具有 - 您搜索的位置 (33) 和项目的完整 string[]。
回答by Andrew Grinder
Create a class EventEntity. In this class create a List<int>with a constructor that initializes the list. Here is a class example:
创建一个类EventEntity。在这个类中创建一个List<int>带有初始化列表的构造函数。这是一个类示例:
public class EventEntity
{
public EventEntity()
{
EventList = new List<int>();
}
public List<int> EventList { get; set; }
}
From there loop through each row of data. Example:
从那里循环遍历每一行数据。例子:
public class EventEntityRepo
{
public EventEntity GetEventEntityByCsvDataRow(String[] csvRow)
{
EventEntity events = new EventEntity();
foreach (String csvCell in csvRow)
{
int eventId = -1;
if(csvCell != null && csvCell != String.Empty)
{
try
{
eventId = Convert.ToInt32(csvCell.Trim());
}
catch (Exception ex)
{
//failed to parse int
}
}
events.EventList.Add(eventId); //if an empty item, insert -1
}
return events;
}
}
Then you can reference the items whenever you want.
然后,您可以随时引用这些项目。
eventEntityList = GetEventEntityByCsvDataRow(csvDataRow);
eventEntitySixthElement = eventEntityList[5];
回答by Alex Sikilinda
So your questions is how to access the 6th data element. It's not too hard if you have right data structure representing your csv.
所以您的问题是如何访问第 6 个数据元素。如果您拥有代表 csv 的正确数据结构,这并不太难。
Basically this csv document in abstract term can be described as IEnumerable<IEnumerable<String>>, or, maybe, IEnumerable<IEnumerable<int?>>. Having implemented csv parsing logic, you will access the 6th elements by executin:
基本上这个抽象术语的 csv 文档可以描述为IEnumerable<IEnumerable<String>>,或者,也许,IEnumerable<IEnumerable<int?>>。实现 csv 解析逻辑后,您将通过 executin 访问第 6 个元素:
var csvRepresenation = ParseCsv(@"D:/file.csv");
var element = csvRepresentation.ElementAt(6);
if (element == "6")
{
// do smth
}
With this aproach you will also be able to execute Linq statements on it.
Now the question is how you will implement the ParseCsv():
通过这种方法,您还可以在其上执行 Linq 语句。现在的问题是您将如何实现ParseCsv():
public IEnumerable<IEnumerable<String>> ParseCsv(string path)
{
return File.ReadAllLines(path).Select(row => row.Split(','));
}

