Linux 如何查找 Windows 行尾 (EOL) 字符

Question

提问by Stephen Turner

I have several hundred GB of data that I need to paste together using the unix paste utility in Cygwin, but it won't work properly if there are windows EOL characters in the files. The data may or may not have windows EOL characters, and I don't want to spend the time running dos2unix if I don't have to.

我有几百 GB 的数据需要使用 Cygwin 中的 unix paste 实用程序粘贴在一起，但如果文件中有 Windows EOL 字符，它将无法正常工作。数据可能有也可能没有windows EOL字符，如果我不需要，我不想花时间运行dos2unix。

So my question is, in Cygwin, how can I figure out whether these files have windows EOL CRLF characters?

所以我的问题是，在 Cygwin 中，如何确定这些文件是否具有 Windows EOL CRLF 字符？

I've tried creating some test data and running

我试过创建一些测试数据并运行

sed -r 's/\r\n//' testdata.txt

But that appears to match regardless of whether dos2unix has been run or not.

但是不管 dos2unix 是否已运行，这似乎都匹配。

Thanks.

谢谢。

Answer 1

采纳答案by sarnold

The file(1)utility knows the difference:

该file(1)实用程序知道区别：

$ file * | grep ASCII
2:                                       ASCII text
3:                                       ASCII English text
a:                                       ASCII C program text
blah:                                    ASCII Java program text
foo.js:                                  ASCII C++ program text
openssh_5.5p1-4ubuntu5.dsc:              ASCII text, with very long lines
windows:                                 ASCII text, with CRLF line terminators

file(1)has been optimized to try to read as little of a file as possible, so you may be lucky and drastically reduce the amount of disk IO you need to perform when finding and fixing the CRLF terminators.

file(1)已优化为尝试读取尽可能少的文件，因此您可能很幸运，并在查找和修复 CRLF 终止符时大大减少了需要执行的磁盘 IO 量。

Note that some cases of CRLF should stay in place: captures of SMTPwill use CRLF. But that's up to you. :)

请注意，某些 CRLF 情况应该保持不变：SMTP 的捕获将使用 CRLF。但这取决于你。:)

Answer 2

回答by user unknown

You can find out using file:

你可以找到使用file：

file /mnt/c/BOOT.INI 
/mnt/c/BOOT.INI: ASCII text, with CRLF line terminators

CRLF is the significant value here.

CRLF 是这里的重要值。

Answer 3

回答by Paused until further notice.

If you expect the exit code to be different from sed, it won't be. It will perform a substitution or not depending on the match. The exit code will be true unless there's an error.

如果您希望退出代码与不同sed，则不会。它将根据比赛进行替换或不替换。除非出现错误，否则退出代码将为真。

You can get a usable exit code from grep, however.

但是，您可以从获得可用的退出代码grep。

#!/bin/bash
for f in *
do
    if head -n 10 "$f" | grep -qs $'\r'
    then
        dos2unix "$f"
    fi
done

Answer 4

回答by Amaya Rodrigo

#!/bin/bash
for i in $(find . -type f); do
        if file $i | grep CRLF ; then
                echo $i
                file $i
                #dos2unix "$i"
        fi
done

Uncomment "#dos2unix "$i"" when you are ready to convert them.

当您准备好转换它们时，取消注释 "#dos2unix "$i""。

Answer 5

回答by Harm

As stated above the 'file' solution works. Maybe the following code snippet may help.

如上所述，“文件”解决方案有效。也许下面的代码片段可能会有所帮助。

#!/bin/ksh
EOL_UNKNOWN="Unknown"       # Unknown EOL
EOL_MAC="Mac"               # File EOL Classic Apple Mac  (CR)
EOL_UNIX="Unix"             # File EOL UNIX               (LF)
EOL_WINDOWS="Windows"       # File EOL Windows            (CRLF)
SVN_PROPFILE="name-of-file" # Filename to check.
...

# Finds the EOL used in the requested File
#  Name of the file (requested filename)
# $r EOL_FILE set to enumerated EOL-values.
getEolFile() {
    EOL_FILE=$EOL_UNKNOWN

    # Check for EOL-windows
    EOL_CHECK=`file  | grep "ASCII text, with CRLF line terminators"`
    if [[ -n $EOL_CHECK ]] ; then
       EOL_FILE=$EOL_WINDOWS
       return
    fi

    # Check for Classic Mac EOL
    EOL_CHECK=`file  | grep "ASCII text, with CR line terminators"`
    if [[ -n $EOL_CHECK ]] ; then
       EOL_FILE=$EOL_MAC
       return
    fi

    # Check for Classic Mac EOL
    EOL_CHECK=`file  | grep "ASCII text"`
    if [[ -n $EOL_CHECK ]] ; then
       EOL_FILE=$EOL_UNIX
       return
    fi

    return
   } # getFileEOL   
   ...

   # Using this snippet
   getEolFile $SVN_PROPFILE
   echo "Found EOL: $EOL_FILE"
   exit -1

Answer 6

回答by Alan Finlay

Thanks for the tip to use file(1) command, however it does need a bit more refinement. I had the situation where not only plain text files but also some ".sh" scripts had the wrong eol. And "file" reports them as follows regardless of eol:

感谢使用 file(1) 命令的提示，但是它确实需要更多的改进。我遇到的情况不仅是纯文本文件，还有一些“.sh”脚本的 eol 错误。并且“文件”报告它们如下而不考虑eol：

xxx/y/z.sh: application/x-shellscript

So the "file -e soft" option was needed (at least for Linux):

所以需要“file -e soft”选项（至少对于Linux）：

bash$ find xxx -exec file -e soft {} \; | grep CRLF

This finds all the files with DOS eol in directory xxx and subdirs.

这将在目录 xxx 和 subdirs 中找到所有带有 DOS eol 的文件。

Answer 7

回答by Mykhaylo Adamovych

grep recursive, with file pattern filter

grep 递归，带有文件模式过滤器

grep -Pnr --include=*file.sh '\r$' .

output file name, line number and line itself

输出文件名、行号和行本身

./test/file.sh:2:here is windows line break

Answer 8

回答by Erwin Waterlander

You can use dos2unix's -i option to get information about DOS Unix Mac line breaks (in that order), BOMs, and text/binary without converting the file.

您可以使用 dos2unix 的 -i 选项来获取有关 DOS Unix Mac 换行符（按此顺序）、BOM 和文本/二进制文件的信息，而无需转换文件。

$ dos2unix -i *.txt
    6       0       0  no_bom    text    dos.txt
    0       6       0  no_bom    text    unix.txt
    0       0       6  no_bom    text    mac.txt
    6       6       6  no_bom    text    mixed.txt
   50       0       0  UTF-16LE  text    utf16le.txt
    0      50       0  no_bom    text    utf8unix.txt
   50       0       0  UTF-8     text    utf8dos.txt

With the "c" flag dos2unix will report files that would be converted, iow files have have DOS line breaks. To report all txt files with DOS line breaks you could do this:

带有 "c" 标志的 dos2unix 将报告将被转换的文件，iow 文件有 DOS 换行符。要报告所有带有 DOS 换行符的 txt 文件，您可以这样做：

$ dos2unix -ic *.txt
dos.txt
mixed.txt
utf16le.txt
utf8dos.txt

To convert only these files you simply do:

要仅转换这些文件，您只需执行以下操作：

dos2unix -ic *.txt | xargs dos2unix

If you need to go recursive over directories you do:

如果您需要递归遍历目录，请执行以下操作：

find -name '*.txt' | xargs dos2unix -ic | xargs dos2unix

Linux 如何查找 Windows 行尾 (EOL) 字符

提问by Stephen Turner

采纳答案by sarnold

回答by user unknown

回答by Paused until further notice.

回答by Amaya Rodrigo

回答by Harm

回答by Alan Finlay

回答by Mykhaylo Adamovych

回答by Erwin Waterlander

相关推荐

最近更新

标签

Linux 如何查找 Windows 行尾 (EOL) 字符

提问by Stephen Turner

采纳答案by sarnold

回答by user unknown

回答by Paused until further notice.

回答by Amaya Rodrigo

回答by Harm

回答by Alan Finlay

回答by Mykhaylo Adamovych

回答by Erwin Waterlander

相关推荐

Linux 为什么在 POSIX 中创建消息队列时出现“无法分配内存”错误？

C# 如何将结构编组为指向结构的指针？

Linux 使用 Eclipse 创建一个 java 可执行文件

如何构建 C# 控制台应用程序以有效使用 IDisposable 数据库资源？

相关推荐

最近更新

标签