bash 如何检测文件中的 DOS 换行符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2798627/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I detect DOS line breaks in a file?
提问by chiggsy
I have a bunch of files. Some are Unix line endings, many are DOS. I'd like to test each file to see if if is dos formatted, before I switch the line endings.
我有一堆文件。有些是 Unix 行尾,很多是 DOS。在切换行尾之前,我想测试每个文件以查看是否为 dos 格式。
How would I do this? Is there a flag I can test for? Something similar?
我该怎么做?有我可以测试的标志吗?相似的东西?
采纳答案by nc3b
回答by Eric O Lebigot
Python can automatically detect what newline convention is used in a file, thanks to the "universal newline mode" (U), and you can access Python's guess through the newlinesattribute of file objects:
Python 可以自动检测文件中使用了什么换行约定,这要归功于“通用换行模式”(U),并且可以通过newlines文件对象的属性访问 Python 的猜测:
f = open('myfile.txt', 'U')
f.readline() # Reads a line
# The following now contains the newline ending of the first line:
# It can be "\r\n" (Windows), "\n" (Unix), "\r" (Mac OS pre-OS X).
# If no newline is found, it contains None.
print repr(f.newlines)
This gives the newline ending of the first line (Unix, DOS, etc.), if any.
这给出了第一行(Unix、DOS 等)的换行符结尾,如果有的话。
As John M. pointed out, if by any chance you have a pathological file that uses more than one newline coding, f.newlinesis a tuple with all the newline codings found so far, after reading many lines.
正如约翰 M. 指出的那样,如果您有一个使用多个换行符编码的病理文件,f.newlines那么在阅读了许多行之后,它是一个包含迄今为止发现的所有换行符编码的元组。
Reference: http://docs.python.org/2/library/functions.html#open
参考:http: //docs.python.org/2/library/functions.html#open
If you just want to convert a file, you can simply do:
如果您只想转换文件,只需执行以下操作:
with open('myfile.txt', 'U') as infile:
text = infile.read() # Automatic ("Universal read") conversion of newlines to "\n"
with open('myfile.txt', 'w') as outfile:
outfile.write(text) # Writes newlines for the platform running the program
回答by johntellsall
(Python 2 only:) If you just want to read text files, either DOS or Unix-formatted, this works:
(仅限Python 2 :) 如果您只想读取文本文件,无论是 DOS 还是 Unix 格式,这都有效:
print open('myfile.txt', 'U').read()
That is, Python's "universal" file reader will automatically use all the different end of line markers, translating them to "\n".
也就是说,Python 的“通用”文件阅读器将自动使用所有不同的行尾标记,将它们转换为“\n”。
http://docs.python.org/library/functions.html#open
http://docs.python.org/library/functions.html#open
(Thanks handle!)
(谢谢把手!)
回答by Jonik
As a complete Python newbie & just for fun, I tried to find some minimalistic way of checking this for one file. This seems to work:
作为一个完整的 Python 新手并且只是为了好玩,我试图找到一些简单的方法来检查一个文件。这似乎有效:
if "\r\n" in open("/path/file.txt","rb").read():
print "DOS line endings found"
Edit: simplified as per John Machin's comment (no need to use regular expressions).
编辑:根据 John Machin 的评论进行简化(无需使用正则表达式)。
回答by Femaref
dos linebreaks are \r\n, unix only \n. So just search for \r\n.
dos 换行符是\r\n, 仅 Unix \n。所以只需搜索\r\n.
回答by shallo
Using grep & bash:
使用 grep 和 bash:
grep -c -m 1 $'\r$' file
echo $'\r\n\r\n' | grep -c $'\r$' # test
echo $'\r\n\r\n' | grep -c -m 1 $'\r$'
回答by Cito
You can use the following function (which should work in Python 2 and Python 3) to get the newline representation used in an existing text file. All three possible kinds are recognized. The function reads the file only up to the first newline to decide. This is faster and less memory consuming when you have larger text files, but it does not detect mixed newline endings.
您可以使用以下函数(应该在 Python 2 和 Python 3 中工作)来获取现有文本文件中使用的换行符。所有三种可能的类型都被识别。该函数只读取文件直到第一个换行符来决定。当您有较大的文本文件时,这会更快且内存消耗更少,但它不会检测混合换行符结尾。
In Python 3, you can then pass the output of this function to the newlineparameter of the openfunction when writing the file. This way you can alter the context of a text file without changing its newline representation.
在 Python 3 中,您可以在写入文件时将此函数的输出传递给函数的newline参数open。通过这种方式,您可以更改文本文件的上下文,而无需更改其换行表示。
def get_newline(filename):
with open(filename, "rb") as f:
while True:
c = f.read(1)
if not c or c == b'\n':
break
if c == b'\r':
if f.read(1) == b'\n':
return '\r\n'
return '\r'
return '\n'

