bash 将 shell 脚本设置为 utf8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28477852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Setting shell script to utf8
提问by Vic23
I want to write to following command line into a shell script:
我想将以下命令行写入 shell 脚本:
cat text.tsv |
grep -Pvi '.\t.\t.*\bHotels|Гостиница|Готель|Отель|Хотел|ホテル|????|????|????|??????|??|??|宾馆|旅店|旅馆|酒店|飯店\b' |
awk '{print cat text.tsv |
grep -Pvi '.\t.\t.*\bHotels|?????????|??????|?????|?????|???|????|????|????|??????|??|??|??|??|??|??|??\b' |
awk '{print iconv -c -f ASCII -t UTF-8 Test.sh > Test2.sh
,"\t","True"} > Text2.tsv
,"\t","column1"} > Text2.tsv
However when I put this into a .sh file all non ascii characters get ommited:
但是,当我将其放入 .sh 文件时,所有非 ascii 字符都会被忽略:
export LANG=C.UTF-8
How do I set my .sh file into UTF-8? I tried:
如何将我的 .sh 文件设置为 UTF-8?我试过:
awk 'tolower(##代码##) !~ /.\t.\t.*\<(Hotels|Гостиница|Готель|Отель|Хотел|ホテル|????|????|????|??????|??|??|宾馆|旅店|旅馆|酒店|飯店)\>/{
print ##代码##,"\t","column1"}' test.tsv > Text2.tsv
But this doesn't seem to work.
但这似乎不起作用。
回答by jeremf
Bash takes care of your locale settings.
Bash 负责您的语言环境设置。
Check it with locale
检查它 locale
If not in UTF-8, you do like this:
如果不是 UTF-8,你可以这样做:
##代码##回答by tripleee
The script itself should be in UTF-8. You need to make sure your locale and your Bash settings are set up properly (really old versions of Bash would need to be explicitly configured to pass through 8-bit data, etc; but this should be a matter of ancient history on any reasonably modern platform). Basically, this should Just Work.
脚本本身应该是 UTF-8。您需要确保您的语言环境和 Bash 设置正确设置(真正旧版本的 Bash 需要明确配置为通过 8 位数据等;但这应该是任何合理现代的古代历史问题平台)。基本上,这应该有效。
There are many things which could be wrong, though. Is the script file properly in UTF-8? The file Test2.sh
almost certainly isn't, and you should have received warnings from iconv
if the input in Test.sh
was correctly formatted, so we vaguely speculate that you have used some other encoding in this file, which would explain why things don't work.
但是,有很多事情可能是错误的。脚本文件在 UTF-8 中是否正确?该文件Test2.sh
几乎肯定不是,iconv
如果输入Test.sh
格式正确,您应该收到警告,因此我们模糊地推测您在此文件中使用了一些其他编码,这将解释为什么事情不起作用。
Also, your Awk script seems to be missing a closing single quote at the end.
此外,您的 awk 脚本最后似乎缺少一个结束单引号。
Finally, anything which looks like grep | awk
can usually be refactored more elegantly into just an Awk script. Get rid of the Useless cat
while you're at it.
最后,任何看起来像的东西grep | awk
通常都可以更优雅地重构为一个 awk 脚本。摆脱无用的cat
东西。
I assume your regex was missing a pair of parentheses around the hotel phrases. Awk doesn't recognize \b
but \<
/ \>
means the same thing.
我假设您的正则表达式在酒店短语周围缺少一对括号。Awk 无法识别,\b
但\<
/\>
表示相同的意思。
If the intent is to look for these phrases in the third column of a tab-separated text file, use -F '\t'
and examine $3
directly.
如果要在制表符分隔的文本文件的第三列中查找这些短语,请直接使用-F '\t'
和检查$3
。