Linux 使用带有 bash 或命令行的正则表达式从文本文件中提取电子邮件地址

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19940935/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-07 01:19:19  来源:igfitidea点击:

Extract email addresses from text file using regex with bash or command line

regexlinuxbash

提问by Arringar1

How can I grep out only the email address using a regex from a file with multiple lines similar to this. (a sql dump to be precise)

如何使用与此类似的多行文件中的正则表达式仅 grep 出电子邮件地址。(准确地说是 sql 转储)

Unfortunately I cannot just go back and dump the email column at this point.

不幸的是,此时我不能返回并转储电子邮件列。

Example data:

示例数据:

62372,35896,1,cgreen,Chad,Green,[email protected],123456789,0,,,,,,,,,3,Blah,,2013-05-02 17:42:31.659574,164842,,0,0

I have tried this but it did not work:

我试过这个,但没有用:

grep -o '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}' file.csv

采纳答案by anubhava

If you know the field position then it is much easier with awk or cut:

如果您知道字段位置,那么使用 awk 或 cut 会容易得多:

awk -F ',' '{print }' file

OR

或者

cut -d ',' -f7 file

回答by Flimzy

The best way to handle this is with a proper CSV parser. A simple way to accomplish that, if it's a one-time task, is to load the CSV file into your favorite spreadsheet software, then extract just the email field.

处理此问题的最佳方法是使用适当的 CSV 解析器。如果是一次性任务,实现这一目标的一种简单方法是将 CSV 文件加载到您最喜欢的电子表格软件中,然后仅提取电子邮件字段。

It is difficult to parse CSV with a regex, because of the possibility of escaped commas, quoted text, etc.

使用正则表达式解析 CSV 很困难,因为可能存在转义逗号、引用文本等。

Consider, the following are valid email addresses, according to Internet standards:

考虑一下,根据 Internet 标准,以下是有效的电子邮件地址:

If you know for a fact that you will never have this sort of data, then perhaps simple grep and awk tools will work (as in @anubhava's answer).

如果您知道您永远不会拥有此类数据的事实,那么简单的 grep 和 awk 工具可能会起作用(如@anubhava 的回答)。

回答by Birei

You can solve it using pythonwith the help of the built-in csvmodule and the external validatorsmodule, like this:

您可以在内置模块和外部模块的帮助下使用python解决它,如下所示:csvvalidators

import validators
import csv
import sys

with open(sys.argv[1], newline='') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        for field in row:
            if validators.email(field):
                print(field)

Run it like:

像这样运行它:

python3 script.py infile

That yields:

这产生:

[email protected]

回答by Digital Trauma

If you still want to go the grep -oroute, this one works for me:

如果你还想走这grep -o条路,这条路对我有用:

$ grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' file.csv
[email protected]
$ 

I appear to have 2 versions of grep in my path, 2.4.2 and 2.5.1. Only 2.5.1 appears to support the -o option.

我的路径中似乎有 2 个版本的 grep,2.4.2 和 2.5.1。只有 2.5.1 似乎支持 -o 选项。

Your regular expression is close, but you're missing 2 things:

你的正则表达式很接近,但你错过了两件事:

  • regular expressions are case sensitive. So you can either pass -ito grep or add extra a-zto your square bracket expressions
  • The +modifiers and {}curly braces appear to need to be escaped.
  • 正则表达式区分大小写。因此,您可以传递-i给 grep 或a-z向方括号表达式添加额外内容
  • +改性剂和{}大括号似乎需要进行转义。