为什么我的 Bash 脚本将 <feff> 添加到文件的开头？

Question

提问by SDGuero

I've written a script that cleans up .csv files, removing some bad commas and bad quotes (bad, means they break an in house program we use to transform these files) using sed:

我编写了一个脚本来清理 .csv 文件，使用 sed 删除一些错误的逗号和错误的引号（不好，意味着它们破坏了我们用来转换这些文件的内部程序）：

# remove all commas, and re-insert the good commas using clean.sed
sed -f clean.sed  > .1st

# remove all quotes
sed 's/\"//g' .1st > .tmp

# add the good quotes around good commas
sed 's/\,/\"\,\"/g' .tmp > .tmp1

# add leading quotes
sed 's/^/\"/' .tmp1 > .tmp2

# add trailing quotes
sed 's/$/\"/' .tmp2 > .tmp3

# remove utf characters
sed 's/<feff>//' .tmp3 > .tmp4

# replace original file with new stripped version and delete .tmp files
cp -rf .tmp4 quotes_

Here is clean.sed:

这是clean.sed：

s/\",\"/XXX/g;
:a
s/,//g
ta
s/XXX/\",\"/g;

Then it removes the temp files and viola we have a new file that starts with the word "quotes" that we can use for our other processes.

然后它删除临时文件，中提琴我们有一个以“quotes”一词开头的新文件，我们可以将其用于其他进程。

My question is:
Why do I have to make a sed statement to remove the feff tag in that temp file? The original file doesn't have it, but it always appears in the replacement. At first I thought cp was causing this but if I put in the sed statement to remove before the cp, it isn't there.

我的问题是：
为什么我必须做一个 sed 语句来删除该临时文件中的 feff 标签？原始文件没有它，但它总是出现在替换中。起初我以为是 cp 导致了这个，但是如果我在 cp 之前放入要删除的 sed 语句，它就不存在了。

Maybe I'm just missing something...

也许我只是错过了一些东西......

Answer 1

采纳答案by Mark Byers

U+FEFF is the code point for a byte order mark. Your files most likely contain data saved in UTF-16 and the BOM has been corrupted by your 'cleaning process' which is most likely expecting ASCII. It's probably not a good idea to remove the BOM, but instead to fix your scripts to not corrupt it in the first place.

U+FEFF 是字节顺序标记的代码点。您的文件很可能包含以 UTF-16 格式保存的数据，并且 BOM 已被您的“清理过程”损坏，这很可能需要 ASCII。删除 BOM 可能不是一个好主意，而是首先修复您的脚本以免损坏它。

Answer 2

回答by stinkoid

To get rid of these in GNU emacs:

要在 GNU emacs 中摆脱这些：

Open Emacs
Do a find-file-literally to open the file
Edit off the leading three bytes
Save the file

打开 Emacs
执行 find-file-literally 打开文件
编辑掉前三个字节
保存文件

There is also a way to convert files with DOS line termination convention to Unix line termination convention.

还有一种方法可以将具有 DOS 行终止约定的文件转换为 Unix 行终止约定。

为什么我的 Bash 脚本将 <feff> 添加到文件的开头？

提问by SDGuero

采纳答案by Mark Byers

回答by stinkoid

相关推荐

最近更新

标签

为什么我的 Bash 脚本将 <feff> 添加到文件的开头？

提问by SDGuero

采纳答案by Mark Byers

回答by stinkoid

相关推荐

bash 如何从bash查看二进制文件？

bash ls 命令：如何获得递归完整路径列表，每个文件一行？

bash LINES 和 COLUMNS 环境变量在脚本中丢失

bash 使用 unix shell 脚本发送电子邮件

相关推荐

最近更新

标签