如何使用 Python 从文本文件中删除回车符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17658055/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I remove carriage return from a text file with Python?
提问by mrcoulson
The things I've googled haven't worked, so I'm turning to experts!
我在谷歌上搜索的东西没有用,所以我求助于专家!
I have some text in a tab-delimited text file that has some sort of carriage return in it (when I open it in Notepad++ and use "show all characters", I see [CR][LF] at the end of the line). I need to remove this carriage return (or whatever it is), but I can't seem to figure it out. Here's a snippet of the text file showing a line with the carriage return:
我在制表符分隔的文本文件中有一些文本,其中包含某种回车符(当我在 Notepad++ 中打开它并使用“显示所有字符”时,我在行尾看到 [CR][LF]) . 我需要删除这个回车(或其他任何东西),但我似乎无法弄清楚。这是文本文件的一个片段,显示一行带有回车符:
firstcolumn secondcolumn third fourth fifth sixth seventh
moreoftheseventh 8th 9th 10th 11th 12th 13th
Here's the code I'm trying to use to replace it, but it's not finding the return:
这是我试图用来替换它的代码,但没有找到返回值:
with open(infile, "r") as f:
for line in f:
if "\n" in line:
line = line.replace("\n", " ")
My script just doesn't find the carriage return. Am I doing something wrong or making an incorrect assumption about this carriage return? I could just remove it manually in a text editor, but there are about 5000 records in the text file that may also contain this issue.
我的脚本只是没有找到回车符。我是否做错了什么或对这个回车做出了错误的假设?我可以在文本编辑器中手动删除它,但是文本文件中大约有 5000 条记录也可能包含此问题。
Further information: The goal here is select two columns from the text file, so I split on \t characters and refer to the values as parts of an array. It works on any line without the returns, but fails on the lines with the returns because, for example, there is no element 9 in those lines.
更多信息:这里的目标是从文本文件中选择两列,所以我在 \t 字符上拆分并将值作为数组的一部分引用。它适用于没有返回的任何行,但在有返回的行上失败,因为例如,这些行中没有元素 9。
vals = line.split("\t")
print(vals[0] + " " + vals[9])
So, for the line of text above, this code fails because there is no index 9 in that particular array. For lines of text that don't have the [CR][LF], it works as expected.
因此,对于上面的文本行,此代码失败,因为该特定数组中没有索引 9。对于没有 [CR][LF] 的文本行,它按预期工作。
采纳答案by mrcoulson
Technically, there is an answer!
从技术上讲,有一个答案!
with open(filetoread, "rb") as inf:
with open(filetowrite, "w") as fixed:
for line in inf:
fixed.write(line)
The b in open(filetoread, "rb")
apparently opens the file in such a way that I can access those line breaks and remove them. This answer actually came from Stack Overflow user Kenneth Reitz off the site.
bopen(filetoread, "rb")
显然以这样一种方式打开文件,我可以访问这些换行符并删除它们。这个答案实际上来自站点外的 Stack Overflow 用户 Kenneth Reitz。
Thanks everyone!
谢谢大家!
回答by inspectorG4dget
Depending on the type of file (and the OS it comes from, etc), your carriage return might be '\r'
, '\n'
, or '\r'\n'
. The best way to get rid of them regardless of which one they are is to use line.rstrip()
.
根据文件类型(以及它来自等操作系统),你回车可能是'\r'
,'\n'
或'\r'\n'
。摆脱它们的最好方法是使用它们,而不管它们是哪一种line.rstrip()
。
with open(infile, "r") as f:
for line in f:
line = line.rstrip() # strip out all tailing whitespace
If you want to get rid of ONLY the carriage returns and not any extra whitespaces that might be at the end, you can supply the optional argument to rstrip
:
如果您只想删除回车符而不是末尾可能出现的任何额外空格,您可以将可选参数提供给rstrip
:
with open(infile, "r") as f:
for line in f:
line = line.rstrip('\r\n') # strip out all tailing whitespace
Hope this helps
希望这可以帮助
回答by ovgolovin
Python opens files in so-called universal newline mode
, so newlines are always \n
.
Python 以所谓的 . 打开文件universal newline mode
,因此换行符始终为\n
.
Python is usually built with universal newlines support; supplying 'U' opens the file as a text file, but lines may be terminated by any of the following: the Unix end-of-line convention '\n', the Macintosh convention '\r', or the Windows convention '\r\n'. All of these external representations are seen as '\n' by the Python program.
Python 通常内置通用换行符支持;提供 'U' 将文件作为文本文件打开,但行可以由以下任何一种终止:Unix 行尾约定 '\n'、Macintosh 约定 '\r' 或 Windows 约定 '\ r\n'。所有这些外部表示都被 Python 程序视为“\n”。
You iterate through file line-by-line. And you are replacing \n
in the lines. But in fact there are no \n
because lines are already separated by \n
by iterator and each line contains no \n
.
您逐行遍历文件。你正在替换\n
行中。但实际上没有,\n
因为行已经\n
被迭代器分隔,每行都包含 no \n
。
You can just read from file f.read()
. And then replace \n
in it.
您可以从 file 读取f.read()
。然后替换\n
进去。
with open(infile, "r") as f:
content = f.read()
content = content.replace('\n', ' ')
#do something with content
回答by Raphael
I've created a code to do it and it works:
我已经创建了一个代码来做到这一点,它的工作原理:
end1='C:\...\file1.txt'
end2='C:\...\file2.txt'
with open(end1, "rb") as inf:
with open(end2, "w") as fixed:
for line in inf:
line = line.replace("\n", "")
line = line.replace("\r", "")
fixed.write(line)
回答by Michael Hays
Here's how to remove carriage returns without using a temporary file:
以下是不使用临时文件删除回车的方法:
with open(file_name, 'r') as file:
content = file.read()
with open(file_name, 'w', newline='\n') as file:
file.write(content)