java 使用 CsvBeanReader 读取列数可变的 CSV 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11678238/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 06:01:13  来源:igfitidea点击:

Using CsvBeanReader to read a CSV file with a variable number of columns

javacsvsupercsv

提问by Bryce Sandlund

So I'm working on parsing a .csv file. I took the advice of another thread somewhere on StackOverflow and downloaded SuperCSV. I finally got pretty much everything working, but now I've run into a bug that seems difficult to fix.

所以我正在解析一个 .csv 文件。我在 StackOverflow 某处听取了另一个线程的建议并下载了 SuperCSV。我终于让几乎所有东西都能正常工作,但现在我遇到了一个似乎难以修复的错误。

The problem occurs because the last two columns of data may or may not be populated. Here is an example of a .csv file with the first row missing the last column, and the second row entirely complete:

出现此问题是因为最后两列数据可能会或可能不会被填充。这是一个 .csv 文件的示例,其中第一行缺少最后一列,第二行完全完整:

2012:07:25,11:48:20,922,"uLog.exe","",Key pressed,1246,341,-1.00,-1.00,1.00,Shift 2012:07:25,11:48:21,094,"uLog.exe","",Key pressed,1246,341,-1.00,-1.00,1.00,b,Shift

2012:07:25,11:48:20,922,"uLog.exe","",Keypressed,1246,341,-1.00,-1.00,1.00,Shift 2012:07:25,11:48:21,094," uLog.exe","",按键按下,1246,341,-1.00,-1.00,1.00,b,Shift

From my understanding of the Super CSV Javadoc, there is no way to populate a Java Bean with the CsvBeanReaderif there are a variable number of columns. This seems really dumb because I feel like these missing columns should be allowed to be null or some other default value when the Bean is initialized.

根据我对Super CSV Javadoc 的理解,如果列数可变,则无法使用CsvBeanReader填充 Java Bean 。这看起来真的很愚蠢,因为我觉得在初始化 Bean 时应该允许这些缺失的列为 null 或其他一些默认值。

For reference, here is my complete code for the parser:

作为参考,这是我的解析器完整代码:

public class ULogParser {

String uLogFileLocation;
String screenRecorderFileLocation;

private static final CellProcessor[] cellProcessor = new CellProcessor[] {
    new ParseDate("yyyy:MM:dd"),
    new ParseDate("HH:mm:ss"),
    new ParseDate("SSS"),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
    new ParseInt(),
    new ParseInt(),
    new ParseDouble(),
    new ParseDouble(),
    new ParseDouble(),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
};

public String[] header = {"Date", "Time", "Msec", "Application", "Window", "Message", "X", "Y", "RelDist", "TotalDist", "Rate", "Extra1", "Extra2"}; 

public ULogParser(String uLogFileLocation, String screenRecorderFileLocation)
{
    this.uLogFileLocation = uLogFileLocation;
    this.screenRecorderFileLocation = screenRecorderFileLocation;
}

public void parse()
{
    try {
        ICsvBeanReader reader = new CsvBeanReader(new BufferedReader(new FileReader(uLogFileLocation)), CsvPreference.STANDARD_PREFERENCE);
        reader.getCSVHeader(false); //parse past the header
        Entry entry;
        entry = reader.read(Entry.class, header, cellProcessor);
        System.out.println(entry.Application);
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

public void sendToDB()
{
    Query query = new Query();
}
}

And the code for the Entry class:

以及 Entry 类的代码:

public class Entry
{
private Date Date;
private Date Time;
private Date Msec;
private String Application;
private String Window;
private String Message;
private int X;
private int Y;
private double RelDist;
private double TotalDist;
private double Rate;
private String Extra1;
private String Extra2;

public Date getDate() { return Date; }
public Date getTime() { return Time; }
public Date getMsec() { return Msec; }
public String getApplication() { return Application; }
public String getWindow() { return Window; }
public String getMessage() { return Message; }
public int getX() { return X; }
public int getY() { return Y; }
public double getRelDist() { return RelDist; }
public double getTotalDist() { return TotalDist; }
public double getRate() { return Rate; }
public String getExtra1() { return Extra1; }
public String getExtra2() { return Extra2; }

public void setDate(Date Date) { this.Date = Date; }
public void setTime(Date Time) { this.Time = Time; }
public void setMsec(Date Msec) { this.Msec = Msec; }
public void setApplication(String Application) { this.Application = Application; }
public void setWindow(String Window) { this.Window = Window; }
public void setMessage(String Message) { this.Message = Message; }
public void setX(int X) { this.X = X; }
public void setY(int Y) { this.Y = Y; }
public void setRelDist(double RelDist) { this.RelDist = RelDist; }
public void setTotalDist(double TotalDist) { this.TotalDist = TotalDist; }
public void setRate(double Rate) { this.Rate = Rate; }
public void setExtra1(String Extra1) { this.Extra1 = Extra1; }
public void setExtra2(String Extra2) { this.Extra2 = Extra2; }

public Entry(){}
}

And the exception I'm receiving (note this is a different line than my above example, missing both of the last two columns):

我收到的异常(请注意,这与我上面的示例不同,缺少最后两列):

Exception in thread "main" The value array (size 12)  must match the processors array (size 13): You are probably reading a CSV line with a different number of columns than the number of cellprocessors specified context: Line: 2 Column: 0 Raw line:
[2012:07:25, 11:48:05, 740, uLog.exe,  , Logging started, -1, -1, -1.00, -1.00, -1.00, ]
 offending processor: null
    at org.supercsv.util.Util.processStringList(Unknown Source)
    at org.supercsv.io.CsvBeanReader.read(Unknown Source)
    at processing.ULogParser.parse(ULogParser.java:59)
    at ui.ParseImplicitData.main(ParseImplicitData.java:15)

Yes, writing all those getters and setters was a pain in the ass. Also, I apologize, I probably don't have perfect convention in my use of SuperCSV (like what CellProcessor to use if you just want the unmodified String), but you get the idea. Also, this code is obviously not complete. For now, I'm just trying to successfully retrieve a line of data.

是的,编写所有这些 getter 和 setter 是一件很痛苦的事情。另外,我很抱歉,我在使用 SuperCSV 时可能没有完美的约定(例如,如果您只想要未修改的字符串,则使用什么 CellProcessor),但是您明白了。另外,这段代码显然不完整。现在,我只是想成功检索一行数据。

At this point, I'm wondering if using the CsvBeanReader is possible for my purposes. If not, I'm a little disappointed, since the CsvListReader (I would post hyperlink, but StackOverflow isn't allowing me too, also dumb) is just about as easy as not using the API at all, and just using Scanner.next().

在这一点上,我想知道是否可以使用 CsvBeanReader 来满足我的目的。如果没有,我有点失望,因为 CsvListReader(我会发布超链接,但 StackOverflow 也不允许我,也很笨)就像根本不使用 API 一样简单,只使用 Scanner.next ().

Any help would be appreciated. Thanks in advance!

任何帮助,将不胜感激。提前致谢!

采纳答案by James Bassett

Edit:Update for Super CSV 2.0.0-beta-1

编辑:更新Super CSV 2.0.0-beta-1

Please note the API has changed in Super CSV 2.0.0-beta-1 (the code example is based on 1.52). The getCSVHeader()method on all readers is now getHeader()(to be in line with writeHeaderon the writers).

请注意,Super CSV 2.0.0-beta-1 中的 API 已更改(代码示例基于 1.52)。getCSVHeader()所有读者的方法是现在getHeader()(与writeHeader作者一致)。

Also, SuperCSVExceptionhas been renamed to SuperCsvException.

此外,SuperCSVException已重命名为SuperCsvException.



Edit:Update for Super CSV 2.1.0

编辑:更新超级 CSV 2.1.0

Since version 2.1.0 it's possible to execute the cell processors afterreading a line of CSV by using the new executeProcessors()method. For more information see this exampleon the project website. Please note this is only relevant for CsvListReader, as it's the only reader that allows for variable column length.

从 2.1.0 版本开始,可以在使用新方法读取一行 CSV执行单元处理器executeProcessors()。有关更多信息,请参阅项目网站上的此示例。请注意,这仅与 相关CsvListReader,因为它是唯一允许可变列长度的读取器。



You're correct - CsvBeanReaderdoesn't support CSV files with a variable number of columns. According to most CSV specifications (including RFC 4180), the number of columns must be the same on every row.

您是对的 -CsvBeanReader不支持具有可变列数的 CSV 文件。根据大多数 CSV 规范(包括RFC 4180),每一行的列数必须相同。

For this reason (as a Super CSV developer) I'm reluctant to add this functionality to Super CSV. If you can think of an elegant way to add it then feel free to make suggestions on the project's SourceForge site. It would probably mean a new reader that extends upon CsvBeanReader: it would have to split the reading and mapping/processing into two separate methods (you can't do any processing or mapping to fields of the bean unless you know how many columns there are).

出于这个原因(作为超级 CSV 开发人员),我不愿意将此功能添加到超级 CSV。如果您能想到一种优雅的方式来添加它,请随时在项目的 SourceForge 站点上提出建议。这可能意味着一个新的读取器扩展到CsvBeanReader:它必须将读取和映射/处理拆分为两个单独的方法(除非您知道有多少列,否则您无法对 bean 的字段进行任何处理或映射) .

Simple solution

简单的解决方案

The simple solution to this (if you have control of the CSV file you're working with) is to simply add a blank column when writing your CSV file (the first line in your example would have a comma at the end - to indicate the last column is empty). That way, your CSV file will be valid (it will have the same number of columns on every row) and you can use CsvBeanReaderas you're already doing.

对此的简单解决方案(如果您可以控制正在使用的 CSV 文件)是在编写 CSV 文件时简单地添加一个空白列(示例中的第一行末尾有一个逗号 - 以指示最后一列是空的)。这样,您的 CSV 文件将是有效的(每行的列数相同)并且您可以CsvBeanReader像之前那样使用。

If that's not possible, then all is not lost!

如果那不可能,那么一切都没有丢失

Fancy solution

花哨的解决方案

As you probably realize, CsvBeanReaderuses the name mapping to associate each column in the CSV file with a field in your bean, and the CellProcessor array to process each column. In other words, you have to know how many columns there are (and what they represent) if you want to use it.

您可能已经意识到,CsvBeanReader使用名称映射将 CSV 文件中的每一列与 bean 中的一个字段相关联,并使用 CellProcessor 数组来处理每一列。换句话说,如果您想使用它,您必须知道有多少列(以及它们代表什么)。

CsvListReader, on the other hand, is very primitive and can read rows of varying length (because it doesn't need to process or map them).

CsvListReader,另一方面,是非常原始的,可以读取不同长度的行(因为它不需要处理或映射它们)。

So you can combine all the features of CsvBeanReaderwith CsvListReader(as done in the following example) by reading the file with both readers in parallel: using CsvListReaderto figure out how many columns there are, and CsvBeanReaderto do the processing/mapping.

因此,您可以通过使用两个读取器并行读取文件来组合CsvBeanReaderwith 的所有功能CsvListReader(如下例所示):CsvListReader用于计算有多少列,并CsvBeanReader进行处理/映射。

Note that this makes the assumption that it's only ever the birthDate column that may not be present (i.e. it wouldn't work if you can't tell which column is missing).

请注意,这假设它只是可能不存在的birthDate 列(即,如果您无法确定缺少哪一列,它将无法工作)。

package example;

import java.io.StringReader;
import java.util.Date;

import org.supercsv.cellprocessor.ParseDate;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCSVException;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.CsvListReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.io.ICsvListReader;
import org.supercsv.prefs.CsvPreference;

public class VariableColumns {

    private static final String INPUT = "name,birthDate,city\n"
        + "John,New York\n" 
        + "Sally,22/03/1974,London\n" 
        + "Jim,Sydney";

    // cell processors
    private static final CellProcessor[] NORMAL_PROCESSORS = 
    new CellProcessor[] {null, new ParseDate("dd/MM/yyyy"), null };
    private static final CellProcessor[] NO_BIRTHDATE_PROCESSORS = 
    new CellProcessor[] {null, null };

    // name mappings
    private static final String[] NORMAL_HEADER = 
    new String[] { "name", "birthDate", "city" };
    private static final String[] NO_BIRTHDATE_HEADER = 
    new String[] { "name", "city" };

    public static void main(String[] args) {

        // using bean reader and list reader together (to read the same file)
        final ICsvBeanReader beanReader = new CsvBeanReader(new StringReader(
                INPUT), CsvPreference.STANDARD_PREFERENCE);
        final ICsvListReader listReader = new CsvListReader(new StringReader(
                INPUT), CsvPreference.STANDARD_PREFERENCE);

        try {
            // skip over header
            beanReader.getCSVHeader(true);
            listReader.getCSVHeader(true);

            while (listReader.read() != null) {

                final String[] nameMapping;
                final CellProcessor[] processors;

                if (listReader.length() == NORMAL_HEADER.length) {
                    // all columns present - use normal header/processors
                    nameMapping = NORMAL_HEADER;
                    processors = NORMAL_PROCESSORS;

                } else if (listReader.length() == NO_BIRTHDATE_HEADER.length) {
                    // one less column - birth date must be missing
                    nameMapping = NO_BIRTHDATE_HEADER;
                    processors = NO_BIRTHDATE_PROCESSORS;

                } else {
                    throw new SuperCSVException(
                            "unexpected number of columns: "
                                    + listReader.length());
                }

                // can now use CsvBeanReader safely 
                // (we know how many columns there are)
                Person person = beanReader.read(Person.class, nameMapping,
                        processors);

                System.out.println(String.format(
                        "Person: name=%s, birthDate=%s, city=%s",
                        person.getName(), person.getBirthDate(),
                        person.getCity()));

            }
        } catch (Exception e) {
            // handle exceptions here
            e.printStackTrace();
        } finally {
            // close readers here
        }
    }

    public static class Person {

        private String name;
        private Date birthDate;
        private String city;

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public Date getBirthDate() {
            return birthDate;
        }

        public void setBirthDate(Date birthDate) {
            this.birthDate = birthDate;
        }

        public String getCity() {
            return city;
        }

        public void setCity(String city) {
            this.city = city;
        }
    }

}

I hope this helps.

我希望这有帮助。

Oh, and is there any reason why the fields in your Entryclass don't follow normal naming conventions (camelCase)? If you update your headerarray to use camelcase, then your fields can be camelcase as well.

哦,您Entry班级中的字段不遵循正常命名约定(驼峰式命名法)有什么原因吗?如果您更新header数组以使用驼峰命名,那么您的字段也可以是驼峰命名。

回答by Jeronimo Backes

Using uniVocity-parsersyou can map CSV files with a varying number of columns to java beans. Using annotations:

使用uniVocity 解析器,您可以将具有不同列数的 CSV 文件映射到 java bean。使用注解:

class TestBean {

// if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
@NullString(nulls = { "?", "-" })
// if a value resolves to null, it will be converted to the String "0".
@Parsed(defaultNullRead = "0")
private Integer quantity;   // The attribute type defines which conversion will be executed when processing the value.
// In this case, IntegerConversion will be used.
// The attribute name will be matched against the column header in the file automatically.

@Trim
@LowerCase
// the value for the comments attribute is in the column at index 4 (0 is the first column, so this means fifth column in the file)
@Parsed(index = 4)
private String comments;

// you can also explicitly give the name of a column in the file.
@Parsed(field = "amount")
private BigDecimal amount;

@Trim
@LowerCase
// values "no", "n" and "null" will be converted to false; values "yes" and "y" will be converted to true
@BooleanString(falseStrings = { "no", "n", "null" }, trueStrings = { "yes", "y" })
@Parsed
private Boolean pending;
...
}

To parse your CSV into a list of TestBeaninstances:

要将您的 CSV 解析为TestBean实例列表:

// BeanListProcessor converts each parsed row to an instance of a given class, then stores each instance into a list.
BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
//Uses the first valid row of the CSV to assign names to each column
parserSettings.setHeaderExtractionEnabled(true);

CsvParser parser = new CsvParser(parserSettings);
parser.parse(new FileReader(yourFile));

// The BeanListProcessor provides a list of objects extracted from the input.
List<TestBean> beans = rowProcessor.getBeans();

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

披露:我是这个图书馆的作者。它是开源且免费的(Apache V2.0 许可)。

回答by Jim Garrison

Well, SuperCSV is Open Source. If you want to add functionality, such as handling input with a variable number of trailing fields, you have basically two options:

好吧,SuperCSV 是开源的。如果您想添加功能,例如处理带有可变数量尾随字段的输入,您基本上有两种选择:

  1. Post a support request on the SourceForge site and hope the author agrees and has time to do it
  2. Download the source, change it to your liking, and contribute the changes to the project.
  1. 在 SourceForge 站点上发布支持请求,希望作者同意并有时间去做
  2. 下载源代码,根据自己的喜好进行更改,然后将更改贡献给项目。

This is how Open Source works.

这就是开源的工作原理。