在文本文件Java中写入大量数据的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1062113/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fastest way to write huge data in text file Java
提问by Rakesh Juyal
I have to write huge data in text[csv] file. I used BufferedWriter to write the data and it took around 40 secs to write 174 mb of data. Is this the fastest speed java can offer?
我必须在 text[csv] 文件中写入大量数据。我使用 BufferedWriter 写入数据,写入 174 mb 数据大约需要 40 秒。这是java可以提供的最快速度吗?
bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) );
Note:These 40 secs include the time of iterating and fetching the records from resultset as well. :) . 174 mb is for 400000 rows in resultset.
注意:这 40 秒还包括迭代和从结果集中获取记录的时间。:) 。174 mb 用于结果集中的 400000 行。
采纳答案by David Moles
You might try removing the BufferedWriter and just using the FileWriter directly. On a modern system there's a good chance you're just writing to the drive's cache memory anyway.
您可以尝试删除 BufferedWriter 并直接使用 FileWriter。在现代系统上,您很有可能只是在写入驱动器的缓存。
It takes me in the range of 4-5 seconds to write 175MB (4 million strings) -- this is on a dual-core 2.4GHz Dell running Windows XP with an 80GB, 7200-RPM Hitachi disk.
写入 175MB(400 万个字符串)需要 4-5 秒的时间——这是在运行 Windows XP 的双核 2.4GHz 戴尔上,使用 80GB、7200-RPM 日立磁盘。
Can you isolate how much of the time is record retrieval and how much is file writing?
你能区分出记录检索和文件写入的时间有多少吗?
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.util.ArrayList;
import java.util.List;
public class FileWritingPerfTest {
private static final int ITERATIONS = 5;
private static final double MEG = (Math.pow(1024, 2));
private static final int RECORD_COUNT = 4000000;
private static final String RECORD = "Help I am trapped in a fortune cookie factory\n";
private static final int RECSIZE = RECORD.getBytes().length;
public static void main(String[] args) throws Exception {
List<String> records = new ArrayList<String>(RECORD_COUNT);
int size = 0;
for (int i = 0; i < RECORD_COUNT; i++) {
records.add(RECORD);
size += RECSIZE;
}
System.out.println(records.size() + " 'records'");
System.out.println(size / MEG + " MB");
for (int i = 0; i < ITERATIONS; i++) {
System.out.println("\nIteration " + i);
writeRaw(records);
writeBuffered(records, 8192);
writeBuffered(records, (int) MEG);
writeBuffered(records, 4 * (int) MEG);
}
}
private static void writeRaw(List<String> records) throws IOException {
File file = File.createTempFile("foo", ".txt");
try {
FileWriter writer = new FileWriter(file);
System.out.print("Writing raw... ");
write(records, writer);
} finally {
// comment this out if you want to inspect the files afterward
file.delete();
}
}
private static void writeBuffered(List<String> records, int bufSize) throws IOException {
File file = File.createTempFile("foo", ".txt");
try {
FileWriter writer = new FileWriter(file);
BufferedWriter bufferedWriter = new BufferedWriter(writer, bufSize);
System.out.print("Writing buffered (buffer size: " + bufSize + ")... ");
write(records, bufferedWriter);
} finally {
// comment this out if you want to inspect the files afterward
file.delete();
}
}
private static void write(List<String> records, Writer writer) throws IOException {
long start = System.currentTimeMillis();
for (String record: records) {
writer.write(record);
}
writer.flush();
writer.close();
long end = System.currentTimeMillis();
System.out.println((end - start) / 1000f + " seconds");
}
}
回答by Brian Agnew
Your transfer speed is likely not to be limited by Java. Instead I would suspect (in no particular order)
您的传输速度可能不受 Java 限制。相反,我会怀疑(没有特别的顺序)
- the speed of transfer from the database
- the speed of transfer to the disk
- 从数据库传输的速度
- 传输到磁盘的速度
If you read the complete dataset and then write it out to disk, then that will take longer, since the JVM will have to allocate memory, and the db rea/disk write will happen sequentially. Instead I would write out to the buffered writer for every read that you make from the db, and so the operation will be closer to a concurrent one (I don't know if you're doing that or not)
如果您读取完整的数据集然后将其写入磁盘,那么这将花费更长的时间,因为 JVM 将不得不分配内存,并且 db rea/disk 写入将按顺序发生。相反,我会为您从数据库进行的每次读取写入缓冲写入器,因此该操作将更接近并发操作(我不知道您是否正在这样做)
回答by gpeche
For these bulky reads from DB you may want to tune your Statement's fetch size. It might save a lot of roundtrips to DB.
对于这些来自 DB 的大量读取,您可能需要调整 Statement 的fetch size。它可能会节省很多到 DB 的往返次数。
http://download.oracle.com/javase/1.5.0/docs/api/java/sql/Statement.html#setFetchSize%28int%29
http://download.oracle.com/javase/1.5.0/docs/api/java/sql/Statement.html#setFetchSize%28int%29
回答by Deepak Agarwal
try memory mapped files (takes 300 m/s to write 174MB in my m/c, core 2 duo, 2.5GB RAM) :
尝试内存映射文件(在我的 m/c、core 2 duo、2.5GB RAM 中写入 174MB 需要 300 m/s):
byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes();
int number_of_lines = 400000;
FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines);
for (int i = 0; i < number_of_lines; i++)
{
wrBuf.put(buffer);
}
rwChannel.close();
回答by Damian Leszczyński - Vash
Only for the sake of statistics:
仅供统计:
The machine is old Dell with new SSD
机器是老戴尔配新SSD
CPU: Intel Pentium D 2,8 Ghz
CPU:英特尔奔腾 D 2,8 Ghz
SSD: Patriot Inferno 120GB SSD
SSD:爱国者炼狱 120GB SSD
4000000 'records'
175.47607421875 MB
Iteration 0
Writing raw... 3.547 seconds
Writing buffered (buffer size: 8192)... 2.625 seconds
Writing buffered (buffer size: 1048576)... 2.203 seconds
Writing buffered (buffer size: 4194304)... 2.312 seconds
Iteration 1
Writing raw... 2.922 seconds
Writing buffered (buffer size: 8192)... 2.406 seconds
Writing buffered (buffer size: 1048576)... 2.015 seconds
Writing buffered (buffer size: 4194304)... 2.282 seconds
Iteration 2
Writing raw... 2.828 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.078 seconds
Writing buffered (buffer size: 4194304)... 2.015 seconds
Iteration 3
Writing raw... 3.187 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.094 seconds
Writing buffered (buffer size: 4194304)... 2.031 seconds
Iteration 4
Writing raw... 3.093 seconds
Writing buffered (buffer size: 8192)... 2.141 seconds
Writing buffered (buffer size: 1048576)... 2.063 seconds
Writing buffered (buffer size: 4194304)... 2.016 seconds
As we can see the raw method is slower the buffered.
正如我们所看到的,原始方法缓冲的速度较慢。
回答by RAM
package all.is.well;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import junit.framework.TestCase;
/**
* @author Naresh Bhabat
*
Following implementation helps to deal with extra large files in java.
This program is tested for dealing with 2GB input file.
There are some points where extra logic can be added in future.
Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object.
It uses random access file,which is almost like streaming API.
* ****************************************
Notes regarding executor framework and its readings.
Please note :ExecutorService executor = Executors.newFixedThreadPool(10);
* for 10 threads:Total time required for reading and writing the text in
* :seconds 349.317
*
* For 100:Total time required for reading the text and writing : seconds 464.042
*
* For 1000 : Total time required for reading and writing text :466.538
* For 10000 Total time required for reading and writing in seconds 479.701
*
*
*/
public class DealWithHugeRecordsinFile extends TestCase {
static final String FILEPATH = "C:\springbatch\bigfile1.txt.txt";
static final String FILEPATH_WRITE = "C:\springbatch\writinghere.txt";
static volatile RandomAccessFile fileToWrite;
static volatile RandomAccessFile file;
static volatile String fileContentsIter;
static volatile int position = 0;
public static void main(String[] args) throws IOException, InterruptedException {
long currentTimeMillis = System.currentTimeMillis();
try {
fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles
file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles
seriouslyReadProcessAndWriteAsynch();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Thread currentThread = Thread.currentThread();
System.out.println(currentThread.getName());
long currentTimeMillis2 = System.currentTimeMillis();
double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0;
System.out.println("Total time required for reading the text in seconds " + time_seconds);
}
/**
* @throws IOException
* Something asynchronously serious
*/
public static void seriouslyReadProcessAndWriteAsynch() throws IOException {
ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class
while (true) {
String readLine = file.readLine();
if (readLine == null) {
break;
}
Runnable genuineWorker = new Runnable() {
@Override
public void run() {
// do hard processing here in this thread,i have consumed
// some time and eat some exception in write method.
writeToFile(FILEPATH_WRITE, readLine);
// System.out.println(" :" +
// Thread.currentThread().getName());
}
};
executor.execute(genuineWorker);
}
executor.shutdown();
while (!executor.isTerminated()) {
}
System.out.println("Finished all threads");
file.close();
fileToWrite.close();
}
/**
* @param filePath
* @param data
* @param position
*/
private static void writeToFile(String filePath, String data) {
try {
// fileToWrite.seek(position);
data = "\n" + data;
if (!data.contains("Randomization")) {
return;
}
System.out.println("Let us do something time consuming to make this thread busy"+(position++) + " :" + data);
System.out.println("Lets consume through this loop");
int i=1000;
while(i>0){
i--;
}
fileToWrite.write(data.getBytes());
throw new Exception();
} catch (Exception exception) {
System.out.println("exception was thrown but still we are able to proceeed further"
+ " \n This can be used for marking failure of the records");
//exception.printStackTrace();
}
}
}
回答by Rajendra Dangwal
For those who want to improve the time for retrieval of records and dump into the file (i.e no processing on records), instead of putting them into an ArrayList, append those records into a StringBuffer. Apply toSring() function to get a single String and write it into the file at once.
对于那些想要缩短检索记录和转储到文件中的时间(即不对记录进行处理)的人,不要将它们放入 ArrayList,而是将这些记录附加到 StringBuffer 中。应用 toSring() 函数获取单个 String 并立即将其写入文件。
For me, the retrieval time reduced from 22 seconds to 17 seconds.
对我来说,检索时间从 22 秒减少到 17 秒。