如何在 Windows 机器上用 Python 将 CRLF 转换为 LF

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36422107/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:51:36  来源:igfitidea点击:

How to convert CRLF to LF on a Windows machine in Python

pythonpython-2.7

提问by Eildosa

So I got those template, they are all ending in LF and I can fill some terms inside with format and still get LF files by opening with "wb"

所以我得到了那些模板,它们都以 LF 结尾,我可以用格式填充一些术语,并且仍然通过用“wb”打开来获得 LF 文件

Those templates are used in a deployment script on a windows machine to deploy on a unix server.

这些模板在 Windows 机器上的部署脚本中用于在 unix 服务器上进行部署。

Problem is, a lot of people are going to mess with those template, and I'm 100% sure that some of them will put some CRLF inside.

问题是,很多人会弄乱这些模板,我 100% 肯定他们中的一些人会在里面放一些 CRLF。

How could I, using python convert all the crlf to lf?

我怎么能使用python将所有的crlf转换为lf?

Thanks.

谢谢。

EDIT

编辑

Well, my bad, I had a bug in my code, opening in "wb" always put lf at the end of the lines even if the file was using crlf before.

好吧,我的错,我的代码中有一个错误,在“wb”中打开总是将 lf 放在行尾,即使该文件之前使用了 crlf。

Here is the code I'm using if you are wondering :

如果您想知道,这是我正在使用的代码:

#!/usr/bin/env python
# --*-- encoding: iso-8859-1 --*--

import string

def formatFile(templatePath, filledFilePath, params, target):
    openingMode = 'w'
    if target == 'linux':
        openingMode += 'b'

    with open(templatePath, 'r') as infile, open(filledFilePath, openingMode) as outfile:
        for line in infile:
            template = string.Template(line.decode('UTF-8'))
            outfile.write(template.substitute(**params).encode('UTF-8'))

So no problem, everything is working fine :x

所以没问题,一切正常:x

回答by winklerrr

Convert Line Endings in-place (with Python 3)

就地转换行尾(使用 Python 3)

Windows to Linux/Unix

Windows 到 Linux/Unix

Here is a short script for directly converting Windows line endings(\r\nalso called CRLF) to Linux/Unix line endings(\nalso called LF) in-place (without creating an extra output file):

这是一个简短的脚本,用于直接将Windows 行尾\r\n也称为CRLF)就地转换为Linux/Unix 行尾\n也称为LF)(无需创建额外的输出文件):

# replacement strings
WINDOWS_LINE_ENDING = b'\r\n'
UNIX_LINE_ENDING = b'\n'

# relative or absolute file path, e.g.:
file_path = r"c:\Users\Username\Desktop\file.txt"

with open(file_path, 'rb') as open_file:
    content = open_file.read()

content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)

with open(file_path, 'wb') as open_file:
    open_file.write(content)

Linux/Unix to Windows

Linux/Unix 到 Windows

Just swap the constants for the line endings in the str.replace()call like so: content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING).

只是交换常数在该行结束str.replace()呼叫,像这样:content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING)



Code Explanation

代码说明

  • Important: Binary ModeWe need to make sure that we open the file both times in binary mode (mode='rb'and mode='wb') for the conversion to work.

    When opening files in text mode (mode='r'or mode='w'without b), the platform's native line endings (\r\non Windows and \ron old Mac OS versions) are automatically converted to Python's Unix-style line endings: \n. So the call to content.replace()couldn't find any \r\nline endings to replace.

    In binary mode, no such conversion is done. Therefore the call to str.replace()can do its work.

  • Binary StringsIn Python 3, if not declared otherwise, strings are stored as Unicode (UTF-8). But we open our files in binary mode - therefore we need to add bin front of our replacement strings to tell Python to handle those strings as binary, too.

  • Raw StringsOn Windows the path separator is a backslash \which we would need to escape in a normal Python string with \\. By adding rin front of the string we create a so called "raw string" which doesn't need any escaping. So you can directly copy/paste the path from Windows Explorer into your script.

    (Hint:Inside Windows Explorer press CTRL+Lto automatically select the path from the address bar.)

  • AlternativeWe open the file twice to avoid the need of repositioning the file pointer. We also could have opened the file once with mode='rb+'but then we would have needed to move the pointer back to start after reading its content (open_file.seek(0)) and truncate its original content before writing the new one (open_file.truncate(0)).

    Simply opening the file again in write mode does that automatically for us.

  • 重要提示:二进制模式我们需要确保以二进制模式 (mode='rb'mode='wb')打开文件两次,以便转换工作。

    当以文本模式(mode='r'mode='w'不使用b)打开文件时,平台的本机行尾(\r\n在 Windows 和\r旧 Mac OS 版本上)会自动转换为 Python 的 Unix 样式行尾:\n. 所以调用 tocontent.replace()找不到任何\r\n要替换的行尾。

    在二进制模式下,不进行此类转换。因此调用str.replace()可以完成它的工作。

  • 二进制字符串在 Python 3 中,如果没有另外声明,字符串将存储为 Unicode ( UTF-8)。但是我们以二进制模式打开我们的文件 - 因此我们需要b在替换字符串前面添加以告诉 Python 也将这些字符串作为二进制处理。

  • 原始字符串在 Windows 上,路径分隔符是一个反斜杠\,我们需要在普通的 Python 字符串中用\\. 通过r在字符串前面添加,我们创建了一个不需要任何转义的所谓“原始字符串”。因此,您可以直接将 Windows 资源管理器中的路径复制/粘贴到您的脚本中。

    提示:在 Windows 资源管理器中按CTRL+L自动从地址栏中选择路径。)

  • 替代方案我们打开文件两次以避免需要重新定位文件指针。我们也可以打开文件一次,mode='rb+'但随后我们需要在读取其内容 ( open_file.seek(0))后将指针移回开始位置,并在写入新内容 ( ) 之前截断其原始内容open_file.truncate(0)

    只需在写入模式下再次打开文件即可自动为我们执行此操作。

Cheers and happy programming,
winklerrr

干杯和快乐的编程,
winklerrr

回答by Yann Vernier

Python's openfunction supports the 'rU'mode for universal newlines, in which case it doesn't mind which sort of newline each line has. In Python 3 you can also request a specific form of newline with the newline argument for open.

Python 的open函数支持通用换行'rU'模式,在这种情况下,它不介意每行具有哪种换行符。在 Python 3 中,您还可以使用open换行参数请求特定形式的换行符。

Translating from one form to the other is thus rather simple in Python:

因此,在 Python 中从一种形式转换为另一种形式相当简单:

with open('filename.in', 'rU') as infile,                 \
   open('filename.out', 'w', newline='\n') as outfile:
       outfile.writelines(infile.readlines())

(Due to the newline argument, the U is actually deprecated in Python 3; the equivalent form is newline=None.)

(由于换行参数,U 实际上在 Python 3 中已弃用;等效形式是newline=None。)

回答by ichigoBambi

why don't you try below:: str.replace('\r\n','\n');

你为什么不试试下面的: str.replace('\r\n','\n');

CRLF => \r\n LF => \n

CRLF => \r\n LF => \n

it's history of typewriter =)

这是打字机的历史 =)

回答by apr

It is possible to fix existing templates with messed-up ending with this code:

可以使用以下代码以混乱的结尾修复现有模板:

with open('file.tpl') as template:
   lines = [line.replace('\r\n', '\n') for line in template]
with open('file.tpl', 'w') as template:
   template.writelines(lines)