用 Java 解析固定宽度的格式化文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1609807/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 16:42:23  来源:igfitidea点击:

Parsing a fixed-width formatted file in Java

javaparsingfixed-width

提问by MattGrommes

I've got a file from a vendor that has 115 fixed-width fields per line. How can I parse that file into the 115 fields so I can use them in my code?

我有一个来自供应商的文件,每行有 115 个固定宽度的字段。如何将该文件解析为 115 个字段,以便在我的代码中使用它们?

My first thought is just to make constants for each field like NAME_START_POSITIONand NAME_LENGTHand using substring. That just seems ugly, so I'm curious about better ways of doing this. None of the couple of libraries a Google search turned up seemed any better, either.

我首先想到的是只是为了让常数为每场像NAME_START_POSITIONNAME_LENGTH使用substring。这看起来很丑陋,所以我很好奇这样做的更好方法。谷歌搜索出现的几个图书馆也没有一个更好。

采纳答案by Pascal Thivent

I would use a flat file parser like flatworminstead of reinventing the wheel: it has a clean API, is simple to use, has decent error handling and a simple file format descriptor. Another option is jFFPbut I prefer the first one.

我会使用像flatworm这样的平面文件解析器,而不是重新发明轮子:它有一个干净的 API,使用简单,有不错的错误处理和简单的文件格式描述符。另一种选择是jFFP,但我更喜欢第一个。

回答by p3t0r

I've played arround with fixedformat4jand it is quite nice. Easy to configure converters and the like.

我玩过fixedformat4j,它非常好。易于配置转换器等。

回答by Jherico

The Apache Commons CSVproject can handle fixed with files.

Apache的百科全书CSV项目可处理固定文件。

Looks like the fixed width functionality didn't survive promotion from the sandbox.

看起来固定宽度的功能在沙箱的推广中无法生存。

回答by Sriramkishore Naraharisetti

Here is the plain java code to read fixedwidth file:

这是读取固定宽度文件的纯Java代码:

import java.io.File;
import java.io.FileNotFoundException;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;

public class FixedWidth {

    public static void main(String[] args) throws FileNotFoundException, IOException {
        // String S1="NHJAMES TURNER M123-45-67890004224345";
        String FixedLengths = "2,15,15,1,11,10";

        List<String> items = Arrays.asList(FixedLengths.split("\s*,\s*"));
        File file = new File("src/sample.txt");

        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            String line1;
            while ((line1 = br.readLine()) != null) {
                // process the line.

                int n = 0;
                String line = "";
                for (String i : items) {
                    // System.out.println("Before"+n);
                    if (i == items.get(items.size() - 1)) {
                        line = line + line1.substring(n, n + Integer.parseInt(i)).trim();
                    } else {
                        line = line + line1.substring(n, n + Integer.parseInt(i)).trim() + ",";
                    }
                    // System.out.println(
                    // S1.substring(n,n+Integer.parseInt(i)));
                    n = n + Integer.parseInt(i);
                    // System.out.println("After"+n);
                }
                System.out.println(line);
            }
        }

    }

}

回答by Jeronimo Backes

uniVocity-parserscomes with a FixedWidthParserand FixedWidthWriterthe can support tricky fixed-width formats, including lines with different fields, paddings, etc.

uniVocity-parsers带有 aFixedWidthParser并且FixedWidthWriter可以支持棘手的固定宽度格式,包括具有不同字段、填充等的行。

// creates the sequence of field lengths in the file to be parsed
FixedWidthFields fields = new FixedWidthFields(4, 5, 40, 40, 8);

// creates the default settings for a fixed width parser
FixedWidthParserSettings settings = new FixedWidthParserSettings(fields); // many settings here, check the tutorial.

//sets the character used for padding unwritten spaces in the file
settings.getFormat().setPadding('_');

// creates a fixed-width parser with the given settings
FixedWidthParser parser = new FixedWidthParser(settings);

// parses all rows in one go.
List<String[]> allRows = parser.parseAll(new File("path/to/fixed.txt")));

Here are a few examples for parsingall sorts of fixed-width inputs.

以下是解析各种固定宽度输入的几个示例

And here are some other examples for writing in generaland other fixed-width examplesspecific to the fixed-width format.

这里有一些其他的例子,用于一般的写作和其他特定于固定宽度格式的固定宽度的例子

Disclosure: I'm the author of this library, it's open-source and free (Apache 2.0 License)

披露:我是这个库的作者,它是开源和免费的(Apache 2.0 许可证)

回答by Constantin

Here is a basic implementation I use:

这是我使用的基本实现:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.io.Writer;

public class FlatFileParser {

  public static void main(String[] args) {
    File inputFile = new File("data.in");
    File outputFile = new File("data.out");
    int columnLengths[] = {7, 4, 10, 1};
    String charset = "ISO-8859-1";
    String delimiter = "~";

    System.out.println(
        convertFixedWidthFile(inputFile, outputFile, columnLengths, delimiter, charset)
        + " lines written to " + outputFile.getAbsolutePath());
  }

  /**
   * Converts a fixed width file to a delimited file.
   * <p>
   * This method ignores (consumes) newline and carriage return
   * characters. Lines returned is based strictly on the aggregated
   * lengths of the columns.
   *
   * A RuntimeException is thrown if run-off characters are detected
   * at eof.
   *
   * @param inputFile the fixed width file
   * @param outputFile the generated delimited file
   * @param columnLengths the array of column lengths
   * @param delimiter the delimiter used to split the columns
   * @param charsetName the charset name of the supplied files
   * @return the number of completed lines
   */
  public static final long convertFixedWidthFile(
      File inputFile,
      File outputFile,
      int columnLengths[],
      String delimiter,
      String charsetName) {

    InputStream inputStream = null;
    Reader inputStreamReader = null;
    OutputStream outputStream = null;
    Writer outputStreamWriter = null;
    String newline = System.getProperty("line.separator");
    String separator;
    int data;
    int currentIndex = 0;
    int currentLength = columnLengths[currentIndex];
    int currentPosition = 0;
    long lines = 0;

    try {
      inputStream = new FileInputStream(inputFile);
      inputStreamReader = new InputStreamReader(inputStream, charsetName);
      outputStream = new FileOutputStream(outputFile);
      outputStreamWriter = new OutputStreamWriter(outputStream, charsetName);

      while((data = inputStreamReader.read()) != -1) {
        if(data != 13 && data != 10) {
          outputStreamWriter.write(data);
          if(++currentPosition > (currentLength - 1)) {
            currentIndex++;
            separator = delimiter;
            if(currentIndex > columnLengths.length - 1) {
              currentIndex = 0;
              separator = newline;
              lines++;
            }
            outputStreamWriter.write(separator);
            currentLength = columnLengths[currentIndex];
            currentPosition = 0;
          }
        }
      }
      if(currentIndex > 0 || currentPosition > 0) {
        String line = "Line " + ((int)lines + 1);
        String column = ", Column " + ((int)currentIndex + 1);
        String position = ", Position " + ((int)currentPosition);
        throw new RuntimeException("Incomplete record detected. " + line + column + position);
      }
      return lines;
    }
    catch (Throwable e) {
      throw new RuntimeException(e);
    }
    finally {
      try {
        inputStreamReader.close();
        outputStreamWriter.close();
      }
      catch (Throwable e) {
        throw new RuntimeException(e);
      }
    }
  }
}

回答by Amit Prasad

/*The method takes three parameters, fixed length record , length of record which will come from schema , say 10 columns and third parameter is delimiter*/
public class Testing {

    public static void main(String as[]) throws InterruptedException {

        fixedLengthRecordProcessor("1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10", 10, ",");

    }

    public static void fixedLengthRecordProcessor(String input, int reclength, String dilimiter) {
        String[] values = input.split(dilimiter);
        String record = "";
        int recCounter = 0;
        for (Object O : values) {

            if (recCounter == reclength) {
                System.out.println(record.substring(0, record.length() - 1));// process
                                                                                // your
                                                                                // record
                record = "";
                record = record + O.toString() + ",";
                recCounter = 1;
            } else {

                record = record + O.toString() + ",";

                recCounter++;

            }

        }
        System.out.println(record.substring(0, record.length() - 1)); // process
                                                                        // your
                                                                        // record
    }

}

回答by Atais

Most suitable for Scala, but probably you could use it in Java

最适合 Scala,但可能你可以在 Java 中使用它

I was so fed up with the fact that there is no proper library for fixed length format that I have created my own. You can check it out here: https://github.com/atais/Fixed-Length

我对这样一个事实感到厌烦,因为我自己创建的固定长度格式没有合适的库。你可以在这里查看:https: //github.com/atais/Fixed-Length

A basic usage is that you create a case class and it's described as an HList(Shapeless):

一个基本用法是您创建一个案例类,它被描述为一个HList(Shapeless):

case class Employee(name: String, number: Option[Int], manager: Boolean)

object Employee {

    import com.github.atais.util.Read._
    import cats.implicits._
    import com.github.atais.util.Write._
    import Codec._

    implicit val employeeCodec: Codec[Employee] = {
      fixed[String](0, 10) <<:
        fixed[Option[Int]](10, 13, Alignment.Right) <<:
        fixed[Boolean](13, 18)
    }.as[Employee]
}

And you can easily decode your lines now or encode your object:

您现在可以轻松解码您的线条或编码您的对象:

import Employee._
Parser.decode[Employee](exampleString)
Parser.encode(exampleObject)

回答by user300778

If your string is called inStr, convert it to a char array and use the String(char[], start, length)constructor

如果您的字符串被调用inStr,请将其转换为字符数组并使用 String(char[], start, length)构造函数

char[] intStrChar = inStr.toCharArray();
String charfirst10 = new String(intStrChar,0,9);
String char10to20 = new String(intStrChar,10,19);

回答by stenix

Another library that can be used to parse a fixed width text source: https://github.com/org-tigris-jsapar/jsapar

另一个可用于解析固定宽度文本源的库:https: //github.com/org-tigris-jsapar/jsapar

Allows you to define a schema in xml or in code and parse fixed width text into java beans or fetch values from an internal format.

允许您在 xml 或代码中定义模式,并将固定宽度的文本解析为 java bean 或从内部格式中获取值。

Disclosure: I am the author of the jsapar library. If it does not fulfill your needs, on this pageyou can find a comprehensive list of other parsing libraries. Most of them are only for delimited files but some can parse fixed width as well.

披露:我是 jsapar 库的作者。如果它不能满足您的需求,您可以在此页面上找到其他解析库的完整列表。它们中的大多数仅用于分隔文件,但有些也可以解析固定宽度。