Java - 读取文件并拆分为多个文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19177994/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 14:47:26  来源:igfitidea点击:

Java - Read file and split into multiple files

java

提问by Ankit Rustagi

I have a file which I would like to read in Java and split this file into n(user input) output files. Here is how I read the file:

我有一个文件,我想用 Java 读取它并将该文件拆分为n(用户输入)输出文件。这是我读取文件的方式:

int n = 4;
BufferedReader br = new BufferedReader(new FileReader("file.csv"));
try {
    String line = br.readLine();

    while (line != null) {
        line = br.readLine();
    }
} finally {
    br.close();
}

How do I split the file - file.csvinto nfiles?

如何将文件拆分file.csv为多个n文件?

Note - Since the number of entries in the file are of the order of 100k, I can't store the file content into an array and then split it and save into multiple files.

注意 - 由于文件中的条目数约为 100k,我无法将文件内容存储到数组中,然后将其拆分并保存到多个文件中。

采纳答案by harsh

Since one file can be very large, each split file could be large as well.

由于一个文件可能很大,因此每个拆分文件也可能很大。

Example:

例子:

Source File Size: 5GB

Num Splits: 5: Destination

File Size: 1GB each (5 files)

源文件大小:5GB

Num Splits: 5: 目的地

文件大小:每个 1GB(5 个文件)

There is no way to read this large split chunk in one go, even if we have such a memory. Basically for each split we can read a fix size byte-arraywhich we know should be feasible in terms of performance as well memory.

即使我们有这样的内存,也没有办法一口气读取这个大的拆分块。基本上对于每个拆分,我们可以读取固定大小byte-array,我们知道在性能和内存方面应该是可行的。

NumSplits: 10 MaxReadBytes: 8KB

NumSplits:10 MaxReadBytes:8KB

public static void main(String[] args) throws Exception
    {
        RandomAccessFile raf = new RandomAccessFile("test.csv", "r");
        long numSplits = 10; //from user input, extract it from args
        long sourceSize = raf.length();
        long bytesPerSplit = sourceSize/numSplits ;
        long remainingBytes = sourceSize % numSplits;

        int maxReadBufferSize = 8 * 1024; //8KB
        for(int destIx=1; destIx <= numSplits; destIx++) {
            BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+destIx));
            if(bytesPerSplit > maxReadBufferSize) {
                long numReads = bytesPerSplit/maxReadBufferSize;
                long numRemainingRead = bytesPerSplit % maxReadBufferSize;
                for(int i=0; i<numReads; i++) {
                    readWrite(raf, bw, maxReadBufferSize);
                }
                if(numRemainingRead > 0) {
                    readWrite(raf, bw, numRemainingRead);
                }
            }else {
                readWrite(raf, bw, bytesPerSplit);
            }
            bw.close();
        }
        if(remainingBytes > 0) {
            BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+(numSplits+1)));
            readWrite(raf, bw, remainingBytes);
            bw.close();
        }
            raf.close();
    }

    static void readWrite(RandomAccessFile raf, BufferedOutputStream bw, long numBytes) throws IOException {
        byte[] buf = new byte[(int) numBytes];
        int val = raf.read(buf);
        if(val != -1) {
            bw.write(buf);
        }
    }

回答by Pranalee

Have a counter to count no of entries. Let's say one entry per line.

有一个计数器来计算条目的数量。假设每行一个条目。

step1: Initially create new subfile, set counter=0;

step1:初始创建新的子文件,设置counter=0;

step2: increment counter as you read each entry from source file to buffer

步骤 2:当您将每个条目从源文件读取到缓冲区时递增计数器

step3: when counter reaches limit to number of entries that you want to write in each sub file, flush contents of buffer to subfile. close the subfile

步骤3:当计数器达到您要在每个子文件中写入的条目数的限制时,将缓冲区的内容刷新到子文件。关闭子文件

step4 : jump to step1 till you have data in source file to read from

step4:跳转到step1,直到源文件中有数据可供读取

回答by Leff

There's no need to loop twice through the file. You could estimate the size of each chunk as the source file size divided by number of chunks needed. Then you just stop filling each cunk with data as it's size exceeds estimated.

无需在文件中循环两次。您可以将每个块的大小估计为源文件大小除以所需的块数。然后你就停止用数据填充每个块,因为它的大小超过了估计。

回答by user3556411

import java.io.*;  
import java.util.Scanner;  
public class split {  
public static void main(String args[])  
{  
 try{  
  // Reading file and getting no. of files to be generated  
  String inputfile = "C:/test.txt"; //  Source File Name.  
  double nol = 2000.0; //  No. of lines to be split and saved in each output file.  
  File file = new File(inputfile);  
  Scanner scanner = new Scanner(file);  
  int count = 0;  
  while (scanner.hasNextLine())   
  {  
   scanner.nextLine();  
   count++;  
  }  
  System.out.println("Lines in the file: " + count);     // Displays no. of lines in the input file.  

  double temp = (count/nol);  
  int temp1=(int)temp;  
  int nof=0;  
  if(temp1==temp)  
  {  
   nof=temp1;  
  }  
  else  
  {  
   nof=temp1+1;  
  }  
  System.out.println("No. of files to be generated :"+nof); // Displays no. of files to be generated.  

  //---------------------------------------------------------------------------------------------------------  

  // Actual splitting of file into smaller files  

  FileInputStream fstream = new FileInputStream(inputfile); DataInputStream in = new DataInputStream(fstream);  

  BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine;  

  for (int j=1;j<=nof;j++)  
  {  
   FileWriter fstream1 = new FileWriter("C:/New Folder/File"+j+".txt");     // Destination File Location  
   BufferedWriter out = new BufferedWriter(fstream1);   
   for (int i=1;i<=nol;i++)  
   {  
    strLine = br.readLine();   
    if (strLine!= null)  
    {  
     out.write(strLine);   
     if(i!=nol)  
     {  
      out.newLine();  
     }  
    }  
   }  
   out.close();  
  }  

  in.close();  
 }catch (Exception e)  
 {  
  System.err.println("Error: " + e.getMessage());  
 }  

}  

}   

回答by user1472187

Though its a old question but for reference I am listing out the code which I used to split large files to any sizes and it works with any Java versions above 1.4 .

虽然它是一个老问题,但作为参考,我列出了用于将大文件拆分为任何大小的代码,它适用于 1.4 以上的任何 Java 版本。

Sample Split and Join blocks were like below:

示例拆分和连接块如下所示:

public void join(String FilePath) {
    long leninfile = 0, leng = 0;
    int count = 1, data = 0;
    try {
        File filename = new File(FilePath);
        //RandomAccessFile outfile = new RandomAccessFile(filename,"rw");

        OutputStream outfile = new BufferedOutputStream(new FileOutputStream(filename));
        while (true) {
            filename = new File(FilePath + count + ".sp");
            if (filename.exists()) {
                //RandomAccessFile infile = new RandomAccessFile(filename,"r");
                InputStream infile = new BufferedInputStream(new FileInputStream(filename));
                data = infile.read();
                while (data != -1) {
                    outfile.write(data);
                    data = infile.read();
                }
                leng++;
                infile.close();
                count++;
            } else {
                break;
            }
        }
        outfile.close();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

public void split(String FilePath, long splitlen) {
    long leninfile = 0, leng = 0;
    int count = 1, data;
    try {
        File filename = new File(FilePath);
        //RandomAccessFile infile = new RandomAccessFile(filename, "r");
        InputStream infile = new BufferedInputStream(new FileInputStream(filename));
        data = infile.read();
        while (data != -1) {
            filename = new File(FilePath + count + ".sp");
            //RandomAccessFile outfile = new RandomAccessFile(filename, "rw");
            OutputStream outfile = new BufferedOutputStream(new FileOutputStream(filename));
            while (data != -1 && leng < splitlen) {
                outfile.write(data);
                leng++;
                data = infile.read();
            }
            leninfile += leng;
            leng = 0;
            outfile.close();
            count++;
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

Complete java code available here in File Split in Java Programlink.

完整的 Java 代码可在 Java 程序中的文件拆分链接中找到。

回答by amralieg

Here is one that worked for me and I used it to split 10GB file. it also enables you to add a header and a footer. very useful when splitting document based format such as XML and JSON because you need to add document wrapper in the new split files.

这是一个对我有用的,我用它来分割 10GB 文件。它还使您能够添加页眉和页脚。在拆分基于文档的格式(例如 XML 和 JSON)时非常有用,因为您需要在新的拆分文件中添加文档包装器。

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class FileSpliter
{
    public static void main(String[] args) throws IOException
    {
        splitTextFiles("D:\xref.csx", 750000, "", "", null);
    }

    public static void splitTextFiles(String fileName, int maxRows, String header, String footer, String targetDir) throws IOException
    {
        File bigFile = new File(fileName);
        int i = 1;
        String ext = fileName.substring(fileName.lastIndexOf("."));

        String fileNoExt = bigFile.getName().replace(ext, "");
        File newDir = null;
        if(targetDir != null)
        {
            newDir = new File(targetDir);           
        }
        else
        {
            newDir = new File(bigFile.getParent() + "\" + fileNoExt + "_split");
        }
        newDir.mkdirs();
        try (BufferedReader reader = Files.newBufferedReader(Paths.get(fileName)))
        {
            String line = null;
            int lineNum = 1;
            Path splitFile = Paths.get(newDir.getPath() + "\" +  fileNoExt + "_" + String.format("%02d", i) + ext);
            BufferedWriter writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE);
            while ((line = reader.readLine()) != null)
            {
                if(lineNum == 1)
                {
                    System.out.print("new file created '" + splitFile.toString());
                    if(header != null && header.length() > 0)
                    {
                        writer.append(header);
                        writer.newLine();
                    }
                }
                writer.append(line);

                if (lineNum >= maxRows)
                {
                    if(footer != null && footer.length() > 0)
                    {
                        writer.newLine();
                        writer.append(footer);
                    }
                    writer.close();
                    System.out.println(", " + lineNum + " lines written to file");
                    lineNum = 1;
                    i++;
                    splitFile = Paths.get(newDir.getPath() + "\" + fileNoExt + "_" + String.format("%02d", i) + ext);
                    writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE);
                }
                else
                {
                    writer.newLine();
                    lineNum++;
                }
            }
            if(lineNum <= maxRows) // early exit
            {
                if(footer != null && footer.length() > 0)
                {
                    writer.newLine();
                    lineNum++;
                    writer.append(footer);
                }
            }
            writer.close();
            System.out.println(", " + lineNum + " lines written to file");
        }

        System.out.println("file '" + bigFile.getName() + "' split into " + i + " files");
    }
}

回答by Narendra Kumar Samal

Below code used to split a big file into small files with lesser lines.

下面的代码用于将大文件拆分为具有较少行的小文件。

    long linesWritten = 0;
    int count = 1;

    try {
        File inputFile = new File(inputFilePath);
        InputStream inputFileStream = new BufferedInputStream(new FileInputStream(inputFile));
        BufferedReader reader = new BufferedReader(new InputStreamReader(inputFileStream));

        String line = reader.readLine();

        String fileName = inputFile.getName();
        String outfileName = outputFolderPath + "\" + fileName;

        while (line != null) {
            File outFile = new File(outfileName + "_" + count + ".split");
            Writer writer = new OutputStreamWriter(new FileOutputStream(outFile));

            while (line != null && linesWritten < linesPerSplit) {
                writer.write(line);
                line = reader.readLine();
                linesWritten++;
            }

            writer.close();
            linesWritten = 0;//next file
            count++;//nect file count
        }

        reader.close();

    } catch (Exception e) {
        e.printStackTrace();
    }

回答by Aymen

a clean solution to edit.

一个干净的编辑解决方案。

this solution involves loading the entire file into memory.

此解决方案涉及将整个文件加载到内存中。

set all line of a file in List<String> rowsOfFile;

设置文件的所有行 List<String> rowsOfFile;

edit maxSizeFileto choice max size of a single file splitted

编辑maxSizeFile以选择拆分的单个文件的最大大小

public void splitFile(File fileToSplit) throws IOException {
  long maxSizeFile = 10000000 // 10mb
  StringBuilder buffer = new StringBuilder((int) maxSizeFile);
  int sizeOfRows = 0;
  int recurrence = 0;
  String fileName;
  List<String> rowsOfFile;

  rowsOfFile = Files.readAllLines(fileToSplit.toPath(), Charset.defaultCharset());

  for (String row : rowsOfFile) {
      buffer.append(row);
      numOfRow++;
      sizeOfRows += row.getBytes(StandardCharsets.UTF_8).length;
      if (sizeOfRows >= maxSizeFile) {
          fileName = generateFileName(recurrence);
          File newFile = new File(fileName);

          try (PrintWriter writer = new PrintWriter(newFile)) {
              writer.println(buffer.toString());
          }

          recurrence++;
          sizeOfRows = 0;
          buffer = new StringBuilder();
      }
  }
  // last rows
  if (sizeOfRows > 0) {
      fileName = generateFileName(recurrence);
      File newFile = createFile(fileName);

      try (PrintWriter writer = new PrintWriter(newFile)) {
          writer.println(buffer.toString());
      }
  }
  Files.delete(fileToSplit.toPath());
}

method to generate Name of file:

生成文件名的方法:

    public String generateFileName(int numFile) {
      String extension = ".txt";
      return "myFile" + numFile + extension;
    }