删除/替换 bash 中的 html 标签

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12719128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 03:27:48  来源:igfitidea点击:

Remove/replace html tags in bash

regexbashunixsed

提问by thisiscrazy4

I have a file with lines that contain:

我有一个包含以下行的文件:

<li><b> Some Text:</b> More Text </li>

I want to remove the html tags and replace the </b>tag with a dash so it becomes like this:

我想删除 html 标签并用</b>破折号替换标签,使其变成这样:

Some Text:- More Text

I'm trying to use sed however I can't find the proper regex combination.

我正在尝试使用 sed 但是我找不到合适的正则表达式组合。

回答by newfurniturey

If you strictly want to strip all HTML tags, but at the same time onlyreplace the </b>tag with a -, you can chain two simple sedcommands with a pipe:

如果您严格想要去除所有 HTML 标签,但同时</b>用 a替换标签-,您可以sed用管道链接两个简单的命令:

cat your_file | sed 's|</b>|-|g' | sed 's|<[^>]*>||g' > stripped_file

This will pass all the file's contents to the first sedcommand that will handle replacing the </b>to a -. Then, the output of that will be piped to a sedthat will replace all HTML tags with empty strings. The final output will be saved into the new file stripped_file.

这会将文件的所有内容传递给第一个sed命令,该命令将处理替换</b>-. 然后,它的输出将通过管道传送到一个sed将用空字符串替换所有 HTML 标记的。最终输出将保存到新文件中stripped_file

Using a similar method as the other answer from @Steve, you could also use sed's -eoption to chain expressions into a single (non-piped command); by adding -i, you can also read-in and replace the contents of your original file without the need for cat, or a new file:

使用与@Steve 的其他答案类似的方法,您还可以使用sed's-e选项将表达式链接到单个(非管道命令)中;通过添加-i,您还可以读入和替换原始文件的内容,而无需cat,或新文件:

sed -i -e 's|</b>|-|g' -e 's|<[^>]*>||g' your_file

This will do the replacement just as the chained-command above, however this time it will directly replace the contents in the input file. To save to a new file instead, remove the -iand add > stripped_fileto the end (or whatever file-name you choose).

这将像上面的链式命令一样进行替换,但是这次它将直接替换输入文件中的内容。要保存到新文件,请删除-i并添加> stripped_file到末尾(或您选择的任何文件名)。

回答by Steve

One way using GNU sed:

一种使用方式GNU sed

sed -e 's/<\/b>/-/g' -e 's/<[^>]*>//g' file.txt

Example:

例子:

echo "<li><b> Some Text:</b> More Text </li>" | sed -e 's/<\/b>/-/g' -e 's/<[^>]*>//g'

Result:

结果:

 Some Text:- More Text