bash LC_ALL=C 对加速 grep 的影响

Question

提问by elhoim

I just discovered that if i prefix my grep commands with a LC_ALL=C it does wonders for speeding grep up.

我刚刚发现，如果我在 grep 命令前加上 LC_ALL=C 前缀，它确实可以加快 grep 的速度。

But i am wondering about the implications.

但我想知道其中的含义。

Would a pattern using UTF-8 not match? What happens if the grepped file is using UTF-8?

使用 UTF-8 的模式会不匹配吗？如果 grepped 文件使用 UTF-8，会发生什么？

Answer 1

回答by thiton

You don't necessarily need UTF-8 to run into trouble here. The locale is responsible for setting the character classes, i.e. determining which character is a space, a letter or a digit. Consider these two examples:

您不一定需要 UTF-8 才能在这里遇到麻烦。语言环境负责设置字符类，即确定哪个字符是空格、字母或数字。考虑这两个例子：

$ echo -e '\xe4' | LC_ALL=en_US.iso88591 grep '[[:alnum:]]' || echo false
?
$ echo -e '\xe4' | LC_ALL=C grep '[[:alnum:]]' || echo false
false

When trying to match exact binary patterns against each other, the locale doesn't make a difference, however:

但是，当尝试将精确的二进制模式相互匹配时，语言环境没有任何区别：

$ echo -e '\xe4' | LC_ALL=en_US.iso88591 grep "$(echo -e '\xe4')" || echo false
?
$ echo -e '\xe4' | LC_ALL=C grep "$(echo -e '\xe4')" || echo false
?

I'm not sure about the extent of grep implementing unicode, and how well different codepoints are matched to each other, but matching any subset of ASCII and the matching of single characters without alternate binary representations should work fine regardless of locale.

我不确定 grep 实现 unicode 的范围，以及不同代码点彼此匹配的程度，但是匹配 ASCII 的任何子集和匹配没有替代二进制表示的单个字符应该可以正常工作，而不管语言环境如何。

bash LC_ALL=C 对加速 grep 的影响

提问by elhoim

回答by thiton

相关推荐

最近更新

标签

bash LC_ALL=C 对加速 grep 的影响

提问by elhoim

回答by thiton

相关推荐

bash 约定 if ；然后

带有 IF ELSE 语句的 BASH Shell 无限循环

bash 1：找不到命令

bash 为什么 Fabric 看不到我的 .bash_profile？

相关推荐

最近更新

标签