Java 中的 CSV 解析 - 工作示例..?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/843997/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
CSV parsing in Java - working example..?
提问by Andy Schmidt
I want to write a program for a school java project to parse some CSV I do not know. I do know the datatype of each column - although I do not know the delimiter.
我想为学校 java 项目编写一个程序来解析一些我不知道的 CSV。我确实知道每列的数据类型 - 尽管我不知道分隔符。
The problem I do not even marginally know how to fix is to parse Date or even DateTime Columns. They can be in one of many formats.
我什至不知道如何解决的问题是解析日期甚至日期时间列。它们可以是多种格式中的一种。
I found many libraries but have no clue which is the best for my needs: http://opencsv.sourceforge.net/http://www.csvreader.com/java_csv.phphttp://supercsv.sourceforge.net/http://flatpack.sourceforge.net/
我找到了很多图书馆,但不知道哪个最适合我的需求:http: //opencsv.sourceforge.net/ http://www.csvreader.com/java_csv.php http://supercsv.sourceforge.net/ http ://flatpack.sourceforge.net/
The problem is I am a total java beginner. I am afraid non of those libraries can do what I need or I can't convince them to do it.
问题是我是一个完全的java初学者。我担心这些图书馆都不能做我需要的,或者我不能说服他们去做。
I bet there are a lot of people here who have code sample that could get me started in no time for what I need:
我敢打赌,这里有很多人拥有代码示例,可以让我立即开始满足我的需要:
- automatically split in Columns (delimiter unknown, Columntypes are known)
- cast to Columntype (should cope with $, %, etc.)
- convert dates to Java Date or Calendar Objects
- 自动拆分列(分隔符未知,列类型已知)
- 强制转换为 Columntype(应处理 $、% 等)
- 将日期转换为 Java 日期或日历对象
It would be nice to get as many code samples as possible by email.
通过电子邮件获得尽可能多的代码示例会很好。
Thanks a lot! AS
非常感谢!作为
回答by Richard West
At a minimum you are going to need to know the column delimiter.
您至少需要知道列分隔符。
回答by Leonard Ehrenfried
Basically you will need to read the file line by line.
基本上,您需要逐行读取文件。
Then you will need to split each line by the delimiter, say a comma (CSV stands for comma-separated values), with
然后你需要用分隔符分割每一行,比如逗号(CSV 代表逗号分隔值),用
String[] strArr=line.split(",");
This will turn it into an array of strings which you can then manipulate, for example with
这会将它变成一个字符串数组,然后您可以对其进行操作,例如
String name=strArr[0];
int yearOfBirth = Integer.valueOf(strArr[1]);
int monthOfBirth = Integer.valueOf(strArr[2]);
int dayOfBirth = Integer.valueOf(strArr[3]);
GregorianCalendar dob=new GregorianCalendar(yearOfBirth, monthOfBirth, dayOfBirth);
Student student=new Student(name, dob); //lets pretend you are creating instances of Student
You will need to do this for every line so wrap this code into a while loop. (If you don't know the delimiter just open the file in a text editor.)
您需要对每一行都执行此操作,因此将此代码包装到一个 while 循环中。(如果您不知道分隔符,只需在文本编辑器中打开文件。)
回答by willcodejavaforfood
You might want to have a look at this specificationfor CSV. Bear in mind that there is no official recognized specification.
If you do not now the delimiter it will not be possible to do this so you have to find out somehow. If you can do a manual inspection of the file you should quickly be able to see what it is and hard code it in your program. If the delimiter can vary your only hope is to be able to deduce if from the formatting of the known data. When Excel imports CSV files it lets the user choose the delimiter and this is a solution you could use as well.
如果您现在没有分隔符,则无法执行此操作,因此您必须以某种方式找出。如果您可以对文件进行手动检查,您应该很快就能看到它是什么并将其硬编码到您的程序中。如果分隔符可以变化,您唯一的希望是能够从已知数据的格式中推断出是否。当 Excel 导入 CSV 文件时,它允许用户选择分隔符,这也是您可以使用的解决方案。
回答by Ray Tayek
i had to use a csv parser about 5 years ago. seems there are at least two csv standards: http://en.wikipedia.org/wiki/Comma-separated_valuesand what microsoft does in excel.
大约 5 年前,我不得不使用 csv 解析器。似乎至少有两个 csv 标准:http: //en.wikipedia.org/wiki/Comma-separated_values以及微软在 excel 中的作用。
i found this libaray which eats both: http://ostermiller.org/utils/CSV.html, but afaik, it has no way of inferring what data type the columns were.
我发现这个 libaray 可以同时吃:http: //ostermiller.org/utils/CSV.html,但是afaik,它无法推断列是什么数据类型。
回答by Valentin Rocher
You also have the Apache Commons CSVlibrary, maybe it does what you need. See the guide. Updated to Release 1.1 in 2014-11.
您还有Apache Commons CSV库,也许它可以满足您的需求。请参阅指南。2014 年 11 月更新至 1.1 版。
Also, for the foolproof edition, I think you'll need to code it yourself...through SimpleDateFormat
you can choose your formats, and specify various types, if the Date
isn't like any of your pre-thought types, it isn't a Date.
另外,对于万无一失的版本,我认为您需要自己编写代码......通过SimpleDateFormat
您可以选择您的格式,并指定各种类型,如果这Date
与您预先考虑的任何类型不同,那就不是一个约会。
回答by Kevin Day
I would recommend that you start by pulling your task apart into it's component parts.
我建议您首先将您的任务分解为它的组成部分。
- Read string data from a CSV
- Convert string data to appropriate format
- 从 CSV 读取字符串数据
- 将字符串数据转换为适当的格式
Once you do that, it should be fairly trivial to use one of the libraries you link to (which most certainly will handle task #1). Then iterate through the returned values, and cast/convert each String value to the value you want.
一旦你这样做了,使用你链接到的库之一应该是相当简单的(它肯定会处理任务#1)。然后遍历返回的值,并将每个 String 值转换/转换为您想要的值。
If the question is how to convert strings to different objects, it's going to depend on what format you are starting with, and what format you want to wind up with.
如果问题是如何将字符串转换为不同的对象,这将取决于您开始使用的格式以及您想要结束的格式。
DateFormat.parse(), for example, will parse dates from strings. See SimpleDateFormat for quickly constructing a DateFormat for a certain string representation. Integer.parseInt() will prase integers from strings.
例如,DateFormat.parse() 将解析字符串中的日期。请参阅 SimpleDateFormat 以快速构造特定字符串表示的 DateFormat。Integer.parseInt() 将从字符串中传递整数。
Currency, you'll have to decide how you want to capture it. If you want to just capture as a float, then Float.parseFloat() will do the trick (just use String.replace() to remove all $ and commas before you parse it). Or you can parse into a BigDecimal (so you don't have rounding problems). There may be a better class for currency handling (I don't do much of that, so am not familiar with that area of the JDK).
货币,您必须决定如何获取它。如果您只想捕获为浮点数,则 Float.parseFloat() 可以解决问题(只需在解析之前使用 String.replace() 删除所有 $ 和逗号)。或者您可以解析为 BigDecimal (因此您没有舍入问题)。可能有更好的货币处理类(我没有做太多,所以不熟悉 JDK 的那个领域)。
回答by Brian Clapper
My approach would notbe to start by writing your own API. Life's too short, and there are more pressing problems to solve. In this situation, I typically:
我的方法不是从编写自己的 API 开始。人生苦短,还有更紧迫的问题要解决。在这种情况下,我通常:
- Find a library that appears to do what I want. If one doesn't exist, thenimplement it.
- If a library does exist, but I'm not sure it'll be suitable for my needs, write a thin adapter API around it, so I can control how it's called. The adapter API expresses the API Ineed, and it maps those calls to the underlying API.
- If the library doesn't turn out to be suitable, I can swap another one in underneath the adapter API (whether it's another open source one or something I write myself) with a minimum of effort, without affecting the callers.
- 找到一个看起来像我想要的那样的图书馆。如果不存在,则实施它。
- 如果一个库确实存在,但我不确定它是否适合我的需要,请围绕它编写一个瘦适配器 API,这样我就可以控制它的调用方式。适配器 API 表达了我需要的 API ,并将这些调用映射到底层 API。
- 如果该库不适合,我可以在不影响调用者的情况下,以最少的努力在适配器 API 下交换另一个库(无论是另一个开源库还是我自己编写的库)。
Start with something someone has already written. Odds are, it'll do what you want. You can always write your own later, if necessary. OpenCSV is as good a starting point as any.
从某人已经写过的东西开始。很有可能,它会做你想做的。如有必要,您可以随时自行编写。OpenCSV 是一个很好的起点。
回答by Ichthyo
Writing your own parser is fun, but likely you should have a look at Open CSV. It provides numerous ways of accessing the CSV and also allows to generate CSV. And it doeshandle escapes properly. As mentioned in another post, there is also a CSV-parsing lib in the Apache Commons, but that one isn't released yet.
编写自己的解析器很有趣,但您可能应该看看 Open CSV。它提供了多种访问 CSV 的方法,还允许生成 CSV。它确实可以正确处理转义。正如在另一篇文章中提到的,Apache Commons 中还有一个 CSV 解析库,但尚未发布。
回答by Davidson
I agree with @Brian Clapper. I have used SuperCSV as a parser though I've had mixed results. I enjoy the versatility of it, but there are some situations within my own csv files for which I have not been able to reconcile "yet". I have faith in this product and would recommend it overall--I'm just missing something simple, no doubt, that I'm doing in my own implementation.
我同意@Brian Clapper。我已经使用 SuperCSV 作为解析器,尽管我的结果好坏参半。我喜欢它的多功能性,但在我自己的 csv 文件中有一些我“还没有”能够协调的情况。我对这个产品有信心,并会整体推荐它——毫无疑问,我只是缺少一些简单的东西,我在自己的实现中正在做。
SuperCSV can parse the columns into various formats, do edits on the columns, etc. It's worth taking a look-see. It has examples as well, and easy to follow.
SuperCSV 可以将列解析为各种格式,对列进行编辑等。值得一看。它也有示例,并且易于遵循。
The one/only limitation I'm having is catching an 'empty' column and parsing it into an Integer or maybe a blank, etc. I'm getting null-pointer errors, but javadocs suggest each cellProcessor checks for nulls first. So, I'm blaming myself first, for now. :-)
我遇到的一个/唯一的限制是捕获一个“空”列并将其解析为一个整数或一个空白等。我收到空指针错误,但 javadocs 建议每个 cellProcessor 首先检查空值。所以,我现在先责怪自己。:-)
Anyway, take a look at SuperCSV. http://supercsv.sourceforge.net/
无论如何,看看SuperCSV。 http://supercsv.sourceforge.net/
回答by AgilePro
There is a serious problem with using
使用存在严重问题
String[] strArr=line.split(",");
in order to parse CSV files, and that is because there can be commas within the data values, and in that case you must quote them, and ignore commas between quotes.
为了解析 CSV 文件,这是因为数据值中可以有逗号,在这种情况下,您必须引用它们,并忽略引号之间的逗号。
There is a very very simple way to parse this:
有一个非常非常简单的方法来解析这个:
/**
* returns a row of values as a list
* returns null if you are past the end of the input stream
*/
public static List<String> parseLine(Reader r) throws Exception {
int ch = r.read();
while (ch == '\r') {
//ignore linefeed chars wherever, particularly just before end of file
ch = r.read();
}
if (ch<0) {
return null;
}
Vector<String> store = new Vector<String>();
StringBuffer curVal = new StringBuffer();
boolean inquotes = false;
boolean started = false;
while (ch>=0) {
if (inquotes) {
started=true;
if (ch == '\"') {
inquotes = false;
}
else {
curVal.append((char)ch);
}
}
else {
if (ch == '\"') {
inquotes = true;
if (started) {
// if this is the second quote in a value, add a quote
// this is for the double quote in the middle of a value
curVal.append('\"');
}
}
else if (ch == ',') {
store.add(curVal.toString());
curVal = new StringBuffer();
started = false;
}
else if (ch == '\r') {
//ignore LF characters
}
else if (ch == '\n') {
//end of a line, break out
break;
}
else {
curVal.append((char)ch);
}
}
ch = r.read();
}
store.add(curVal.toString());
return store;
}
There are many advantages to this approach. Note that each character is touched EXACTLY once. There is no reading ahead, pushing back in the buffer, etc. No searching ahead to the end of the line, and then copying the line before parsing. This parser works purely from the stream, and creates each string value once. It works on header lines, and data lines, you just deal with the returned list appropriate to that. You give it a reader, so the underlying stream has been converted to characters using any encoding you choose. The stream can come from any source: a file, a HTTP post, an HTTP get, and you parse the stream directly. This is a static method, so there is no object to create and configure, and when this returns, there is no memory being held.
这种方法有很多优点。请注意,每个字符只被触摸一次。没有提前读取,推回缓冲区等。没有搜索到行尾,然后在解析之前复制该行。这个解析器纯粹从流中工作,并为每个字符串值创建一次。它适用于标题行和数据行,您只需处理相应的返回列表。您给它一个阅读器,因此底层流已使用您选择的任何编码转换为字符。流可以来自任何来源:文件、HTTP 帖子、HTTP 获取,您可以直接解析流。这是一个静态方法,因此没有要创建和配置的对象,并且当 this 返回时,没有占用内存。
You can find a full discussion of this code, and why this approach is preferred in my blog post on the subject: The Only Class You Need for CSV Files.
您可以在我关于该主题的博客文章中找到有关此代码的完整讨论,以及为什么首选此方法: 您需要的 CSV 文件的唯一类。