在 Windows 中对 500k 行文件执行 dos2unix 的最佳方法是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/313178/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's the best way of doing dos2unix on a 500k line file, in Windows?
提问by ninesided
Question says it all, I've got a 500,000 line file that gets generated as part of an automated build process on a Windows box and it's riddled with ^M's. When it goes out the door it needs to *nixfriendly, what's the best approach here, is there a handy snippet of code that could do this for me? Or do I need to write a little C# or Java app?
问题说明了一切,我有一个 500,000 行的文件,它是在 Windows 机器上作为自动构建过程的一部分生成的,并且充满了^M。当它出门时需要*nix友好,这里最好的方法是什么,是否有方便的代码片段可以为我做到这一点?或者我需要编写一些 C# 或 Java 应用程序吗?
回答by Federico A. Ramponi
Here is a Perl one-liner, taken from http://www.technocage.com/~caskey/dos2unix/
这是 Perl one-liner,取自http://www.technocage.com/~caskey/dos2unix/
#!/usr/bin/perl -pi
s/\r\n/\n/;
You can run it as follows:
您可以按如下方式运行它:
perl dos2unix.pl < file.dos > file.unix
Or, you can run it also in this way (the conversion is done in-place):
或者,您也可以以这种方式运行它(转换就地完成):
perl -pi dos2unix.pl file.dos
And here is my (naive) C version:
这是我的(天真的)C 版本:
#include <stdio.h>
int main(void)
{
int c;
while( (c = fgetc(stdin)) != EOF )
if(c != '\r')
fputc(c, stdout);
return 0;
}
You should run it with input and output redirection:
您应该使用输入和输出重定向来运行它:
dos2unix.exe < file.dos > file.unix
回答by Ken Gentle
If installing a base cygwinis too heavy, there are a number of standalone dos2unix
and unix2dos
Windows standalone console-based programs on the net, many with C/C++ source available. If I'm understanding the requirement correctly, either of these solutions would fit nicely into an automated build script.
如果安装一个基本的cygwin过于繁重,网上有许多独立的dos2unix
和unix2dos
基于 Windows 独立控制台的程序,其中许多都有 C/C++ 源代码可用。如果我正确理解了需求,那么这些解决方案中的任何一个都可以很好地适应自动构建脚本。
回答by strager
If you're on Windows and need something run in a batch script, you can compile a simple C program to do the trick.
如果您使用的是 Windows 并且需要在批处理脚本中运行某些东西,您可以编译一个简单的 C 程序来实现这一点。
#include <stdio.h>
int main() {
while(1) {
int c = fgetc(stdin);
if(c == EOF)
break;
if(c == '\r')
continue;
fputc(c, stdout);
}
return 0;
}
Usage:
用法:
myprogram.exe < input > output
Editing in-place would be a bit more difficult. Besides, you may want to keep backups of the originals for some reason (in case you accidentally strip a binary file, for example).
就地编辑会有点困难。此外,出于某种原因,您可能希望保留原始文件的备份(例如,以防万一您不小心剥离了二进制文件)。
That version removes allCR characters; if you only want to remove the ones that are in a CR-LF pair, you can use (this is the classic one-character-back method :-):
该版本删除了所有CR 字符;如果您只想删除 CR-LF 对中的那些,您可以使用(这是经典的单字符返回方法:-):
/* XXX Contains a bug -- see comments XXX */
#include <stdio.h>
int main() {
int lastc = EOF;
int c;
while ((c = fgetc(stdin)) != EOF) {
if ((lastc != '\r') || (c != '\n')) {
fputc (lastc, stdout);
}
lastc = c;
}
fputc (lastc, stdout);
return 0;
}
You can edit the file in-place using mode "r+". Below is a general myd2u program, which accepts file names as arguments. NOTE: This program uses ftruncate to chop off extra characters at the end. If there's any better (standard) way to do this, please edit or comment. Thanks!
您可以使用模式“r+”就地编辑文件。下面是一个通用的 myd2u 程序,它接受文件名作为参数。注意:该程序使用 ftruncate 在末尾截去多余的字符。如果有更好的(标准)方法可以做到这一点,请编辑或评论。谢谢!
#include <stdio.h>
int main(int argc, char **argv) {
FILE *file;
if(argc < 2) {
fprintf(stderr, "Usage: myd2u <files>\n");
return 1;
}
file = fopen(argv[1], "rb+");
if(!file) {
perror("");
return 2;
}
long readPos = 0, writePos = 0;
int lastC = EOF;
while(1) {
fseek(file, readPos, SEEK_SET);
int c = fgetc(file);
readPos = ftell(file); /* For good measure. */
if(c == EOF)
break;
if(c == '\n' && lastC == '\r') {
/* Move back so we override the \r with the \n. */
--writePos;
}
fseek(file, writePos, SEEK_SET);
fputc(c, file);
writePos = ftell(file);
lastC = c;
}
ftruncate(fileno(file), writePos); /* Not in C89/C99/ANSI! */
fclose(file);
/* 'cus I'm too lazy to make a loop. */
if(argc > 2)
main(argc - 1, argv - 1);
return 0;
}
回答by hayalci
tr -d '^M' < infile > outfile
You will type ^M as : ctrl+V , Enter
您将键入 ^M 为: ctrl+V ,回车
Edit: You can use '\r' instead of manually entering a carriage return, [thanks to @strager]
编辑:您可以使用 '\r' 而不是手动输入回车,[感谢@strager]
tr -d '\r' < infile > outfile
Edit 2: 'tr' is a unix utility, you can download a native windows version from http://unxutils.sourceforge.net[thanks to @Rob Kennedy] or use cygwin's unix emulation.
编辑 2:'tr' 是一个 unix 实用程序,您可以从http://unxutils.sourceforge.net[感谢@Rob Kennedy]下载本机 Windows 版本或使用cygwin的 unix 仿真。
回答by nickf
Some text editors, such as UltraEdit/UEStudiohave this functionality built-in.
某些文本编辑器,例如UltraEdit/UEStudio,内置了此功能。
File > Conversions > DOS to UNIX
File > Conversions > DOS to UNIX
回答by EvilTeach
Ftp it from the dos box, to the unix box, as an ascii file, instead of a binary file. Ftp will strip the crlf, and insert a lf. Transfer it back to the dos box as a binary file, and the lfwill be retained.
将它从dos 框ftp 到unix 框,作为ascii 文件,而不是二进制文件。Ftp 将剥离crlf,并插入lf。把它作为二进制文件传回dos盒子,lf会被保留下来。
回答by Paul
If it is just one file I use notepad++. Nice because it is free. I have cygwin installed and use a one liner script I wrote for multiple files. If your interest in the script leave a comment. (I don't have it available to me a this moment.)
如果它只是一个文件,我使用记事本++。很好,因为它是免费的。我已经安装了 cygwin 并使用了我为多个文件编写的单行脚本。如果您对脚本感兴趣,请发表评论。(我现在没有它。)