Linux 使用带有 bash 或命令行的正则表达式从文本文件中提取电子邮件地址

Question

提问by Arringar1

How can I grep out only the email address using a regex from a file with multiple lines similar to this. (a sql dump to be precise)

如何使用与此类似的多行文件中的正则表达式仅 grep 出电子邮件地址。（准确地说是 sql 转储）

Unfortunately I cannot just go back and dump the email column at this point.

不幸的是，此时我不能返回并转储电子邮件列。

Example data:

示例数据：

62372,35896,1,cgreen,Chad,Green,[email protected],123456789,0,,,,,,,,,3,Blah,,2013-05-02 17:42:31.659574,164842,,0,0

I have tried this but it did not work:

我试过这个，但没有用：

grep -o '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}' file.csv

Answer 1

采纳答案by anubhava

If you know the field position then it is much easier with awk or cut:

如果您知道字段位置，那么使用 awk 或 cut 会容易得多：

awk -F ',' '{print }' file

OR

或者

cut -d ',' -f7 file

Answer 2

回答by Flimzy

The best way to handle this is with a proper CSV parser. A simple way to accomplish that, if it's a one-time task, is to load the CSV file into your favorite spreadsheet software, then extract just the email field.

处理此问题的最佳方法是使用适当的 CSV 解析器。如果是一次性任务，实现这一目标的一种简单方法是将 CSV 文件加载到您最喜欢的电子表格软件中，然后仅提取电子邮件字段。

It is difficult to parse CSV with a regex, because of the possibility of escaped commas, quoted text, etc.

使用正则表达式解析 CSV 很困难，因为可能存在转义逗号、引用文本等。

Consider, the following are valid email addresses, according to Internet standards:

考虑一下，根据 Internet 标准，以下是有效的电子邮件地址：

foo,[email protected]
foo"[email protected]

foo,[email protected]
foo"[email protected]

If you know for a fact that you will never have this sort of data, then perhaps simple grep and awk tools will work (as in @anubhava's answer).

如果您知道您永远不会拥有此类数据的事实，那么简单的 grep 和 awk 工具可能会起作用（如@anubhava 的回答）。

Answer 3

回答by Birei

You can solve it using pythonwith the help of the built-in csvmodule and the external validatorsmodule, like this:

您可以在内置模块和外部模块的帮助下使用python解决它，如下所示：csvvalidators

import validators
import csv
import sys

with open(sys.argv[1], newline='') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        for field in row:
            if validators.email(field):
                print(field)

Run it like:

像这样运行它：

python3 script.py infile

That yields:

这产生：

[email protected]

Answer 4

回答by Digital Trauma

If you still want to go the grep -oroute, this one works for me:

如果你还想走这grep -o条路，这条路对我有用：

$ grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' file.csv
[email protected]
$

I appear to have 2 versions of grep in my path, 2.4.2 and 2.5.1. Only 2.5.1 appears to support the -o option.

我的路径中似乎有 2 个版本的 grep，2.4.2 和 2.5.1。只有 2.5.1 似乎支持 -o 选项。

Your regular expression is close, but you're missing 2 things:

你的正则表达式很接近，但你错过了两件事：

regular expressions are case sensitive. So you can either pass -ito grep or add extra a-zto your square bracket expressions
The +modifiers and {}curly braces appear to need to be escaped.

正则表达式区分大小写。因此，您可以传递-i给 grep 或a-z向方括号表达式添加额外内容
该+改性剂和{}大括号似乎需要进行转义。

Linux 使用带有 bash 或命令行的正则表达式从文本文件中提取电子邮件地址

提问by Arringar1

采纳答案by anubhava

回答by Flimzy

回答by Birei

回答by Digital Trauma

相关推荐

最近更新

标签

Linux 使用带有 bash 或命令行的正则表达式从文本文件中提取电子邮件地址

提问by Arringar1

采纳答案by anubhava

回答by Flimzy

回答by Birei

回答by Digital Trauma

相关推荐

C# 应用程序无法在另一台计算机上运行

Linux 在 Bash 中以字符串形式执行命令

C# 错误：将 nvarchar 数据类型转换为 smalldatetime 数据类型导致值超出范围

Linux 在用户选择的文件中查找单词的 Shell 脚本

相关推荐

最近更新

标签