使用 Java 8 Stream 解析 .csv 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49660669/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 03:05:22  来源:igfitidea点击:

Parsing .csv file using Java 8 Stream

javacsvjava-8java-stream

提问by Michael Heneghan

I have a .csv file full of data on over 500 companies. Each row in the file refers to a particular companies dataset. I need to parse this file and extrapolate data from each to call 4 different web services.

我有一个 .csv 文件,里面有 500 多家公司的数据。文件中的每一行都指向一个特定的公司数据集。我需要解析这个文件并从每个文件中推断数据以调用 4 个不同的 Web 服务。

The first line of the .csv file contains the column names. I am trying to write a method that takes a string param and this relates to the column title found in the .csv file.

.csv 文件的第一行包含列名称。我正在尝试编写一个接受字符串参数的方法,这与 .csv 文件中的列标题有关。

Based on this param, I want the method to parse the file using Java 8's Stream functionality and return a list of the data taken from the column title for each row/company.

基于此参数,我希望该方法使用 Java 8 的 Stream 功能解析文件,并返回从每一行/公司的列标题中获取的数据列表。

I feel like I am making it more complicated than it needs to be but cannot think of a more efficient way to achieve my goal.

我觉得我让它变得比需要的更复杂,但想不出更有效的方法来实现我的目标。

Any thoughts or ideas would be greatly appreciated.

任何想法或想法将不胜感激。

Searching through stackoverflow I found the following post which is similar but not quite the same. Parsing a CSV file for a unique row using the new Java 8 Streams API

通过 stackoverflow 搜索,我发现了以下类似但不完全相同的帖子。 使用新的 Java 8 Streams API 为唯一行解析 CSV 文件

    public static List<String> getData(String titleToSearchFor) throws IOException{
    Path path = Paths.get("arbitoryPath");
    int titleIndex;
    String retrievedData = null;
    List<String> listOfData = null;

    if(Files.exists(path)){ 
        try(Stream<String> lines = Files.lines(path)){
            List<String> columns = lines
                    .findFirst()
                    .map((line) -> Arrays.asList(line.split(",")))
                    .get();

            titleIndex = columns.indexOf(titleToSearchFor);

            List<List<String>> values = lines
                    .skip(1)
                    .map(line -> Arrays.asList(line.split(",")))
                    .filter(list -> list.get(titleIndex) != null)
                    .collect(Collectors.toList());

            String[] line = (String[]) values.stream().flatMap(l -> l.stream()).collect(Collectors.collectingAndThen(
                    Collectors.toList(), 
                    list -> list.toArray()));
            String value = line[titleIndex];
            if(value != null && value.trim().length() > 0){
                retrievedData = value;
            }
            listOfData.add(retrievedData);
        }
    }
    return listOfTitles;
}

Thanks

谢谢

回答by ixeption

You should not reinvent the wheel and use a common csv parser library. For example you can just use Apache Commons CSV.

您不应该重新发明轮子并使用通用的 csv 解析器库。例如,您可以只使用Apache Commons CSV

It will handle a lot of things for you and is much more readable. There is also OpenCSV, which is even more powerful and comes with annotations based mappings to data classes.

它将为您处理很多事情,并且更具可读性。还有OpenCSV,它更强大,并带有基于注释的数据类映射。

 try (Reader reader = Files.newBufferedReader(Paths.get("file.csv"));
            CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT
                    .withFirstRecordAsHeader()        
        ) {
            for (CSVRecord csvRecord : csvParser) {
                // Access
                String name = csvRecord.get("MyColumn");
                // (..)
          }

Edit: Anyway, if you really want to do it on your own, take a look at thisexample.

编辑:无论如何,如果你真的想自己做,看看这个例子。

回答by davidxxx

1) You cannot invoke multiple terminal operations on a Stream.
But you invoke two of them : findFirst()to retrieve the column names and then collect()to collect the line values. The second terminal operation invoked on the Stream will throw an exception.

1) 不能在一个 Stream 上调用多个终端操作。
但是您调用其中两个:findFirst()检索列名,然后collect()收集行值。在 Stream 上调用的第二个终端操作将引发异常。

2) Instead of Stream<String> lines = Files.lines(path))that reads all lines in a Stream, you should make things in two times by using Files.readAllLines()that return a List of String.
Use the first element to retrieve the column name and use the whole list to retrieve the value of each line matching to the criteria.

2)而不是Stream<String> lines = Files.lines(path))读取流中的所有行,您应该使用Files.readAllLines()返回字符串列表的方式进行两次操作。
使用第一个元素检索列名,并使用整个列表检索与条件匹配的每一行的值。

3) You split the retrieval in multiple little steps that you can shorter in a single stream processing that will iterate all lines, keep only which of them where the criteria matches and collect them.

3)您将检索分成多个小步骤,您可以在单个流处理中缩短这些步骤,该处理将迭代所有行,仅保留其中符合条件的行并收集它们。

It would give something like :

它会给出类似的东西:

public static List<String> getData(String titleToSearchFor) throws IOException {
    Path path = Paths.get("arbitoryPath");

    if (Files.exists(path)) {
        List<String> lines = Files.readAllLines(path);

        List<String> columns = Arrays.asList(lines.get(0)
                                                  .split(","));

        int titleIndex = columns.indexOf(titleToSearchFor);

        List<String> values = lines.stream()
                                   .skip(1)
                                   .map(line -> Arrays.asList(line.split(",")))
                                   .map(list -> list.get(titleIndex))
                                   .filter(Objects::nonNull)
                                   .filter(s -> s.trim()
                                                 .length() > 0)
                                   .collect(Collectors.toList());

        return values;
    }

    return new ArrayList<>();

}

回答by Andrew Tobilko

I managed to shorten your snippet a bit.

我设法缩短了您的代码段。

If I get you correctly, you need all values of a particular column. The name of that column is given.

如果我理解正确,您需要特定列的所有值。给出了该列的名称。

The idea is the same, but I improved reading from the file (it reads once); removed code duplication (like line.split(",")), unnecessary wraps in List(Collectors.toList()).

想法是一样的,但我改进了从文件中读取(读取一次);删除了代码重复(如line.split(",")),不必要的换行ListCollectors.toList())。

// read lines once
List<String[]> lines = lines(path).map(l -> l.split(","))
                                  .collect(toList());

// find the title index
int titleIndex = lines.stream()
                      .findFirst()
                      .map(header -> asList(header).indexOf(titleToSearchFor))
                      .orElse(-1);

// collect needed values
return lines.stream()
            .skip(1)
            .map(row -> row[titleIndex])
            .collect(toList());


I've got 2 tips not related to the issue:

我有 2 个与该问题无关的提示:

1. You have hardcoded a URI, it's better to move the value to a constant or add a method param.
2. You could move the main part out of the ifclause if you checked the opposite condition !Files.exists(path)and threw an exception.

1. 你已经硬编码了一个 URI,最好将值移动到一个常量或添加一个方法参数。
2.if如果检查相反的条件!Files.exists(path)并抛出异常,则可以将主要部分移出子句。

回答by Andbdrew

As usual, you should use Hymanson! Check out the docs

像往常一样,你应该使用Hyman逊!查看文档

If you want Hymanson to use the first line as header info:

如果您希望 Hymanson 使用第一行作为标题信息:

public class CsvExample {
    public static void main(String[] args) throws IOException {
        String csv = "name,age\nIBM,140\nBurger King,76";
        CsvSchema bootstrapSchema = CsvSchema.emptySchema().withHeader();
        ObjectMapper mapper = new CsvMapper();
        MappingIterator<Map<String, String>> it = mapper.readerFor(Map.class).with(bootstrapSchema).readValues(csv);
        List<Map<String, String>> maps = it.readAll();
    }
}

or you can define your schema as a java object:

或者您可以将架构定义为 java 对象:

public class CsvExample {
    private static class Pojo {
        private final String name;
        private final int age;

        @JsonCreator
        public Pojo(@JsonProperty("name") String name, @JsonProperty("age") int age) {
            this.name = name;
            this.age = age;
        }

        @JsonProperty("name")
        public String getName() {
            return name;
        }

        @JsonProperty("age")
        public int getAge() {
            return age;
        }
    }

    public static void main(String[] args) throws IOException {
        String csv = "name,age\nIBM,140\nBurger King,76";
        CsvSchema bootstrapSchema = CsvSchema.emptySchema().withHeader();
        ObjectMapper mapper = new CsvMapper();
        MappingIterator<Pojo> it = mapper.readerFor(Pojo.class).with(bootstrapSchema).readValues(csv);
        List<Pojo> pojos = it.readAll();
    }
}