windows 从文本中删除特殊字符的应用程序或批处理文件脚本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5497092/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 16:31:48  来源:igfitidea点击:

an app or a batch file script to remove special characters from text

windowsbatch-file

提问by techdaemon

I love this online tool http://textmechanic.co/but it lacks another important feature which is to delete special characters such as %, %, [, ), *, ?, ', etc.. except for _, -, and . from a large quantity of text.

我喜欢这个在线工具http://textmechanic.co/,但它缺少另一个重要功能,即删除特殊字符,例如 %、%、[、)、*、?、' 等。除了 _、-、和 。从大量的文本。

I am looking for an online tool or a small windows utility or a batch script that can do this.

我正在寻找可以执行此操作的在线工具或小型 Windows 实用程序或批处理脚本。

回答by Joey

I think sedis the easiest choice here. You can download it for Windows hereFurthermore, nearly every text editor should allow that (but most won't cope with files in the multi-GiB range well).

我认为sed是这里最简单的选择。您可以在此处为 Windows 下载它此外,几乎每个文本编辑器都应该允许这样做(但大多数无法很好地处理多 GiB 范围内的文件)。

With sedyou'd probably want something like this:

随着sed你可能想是这样的:

sed "s/[^a-zA-Z0-9_.-]//g" file.txt

Likewise, if you have a semi-recent Windows (i.e. Windows 7), then PowerShell comes preinstalled with it. The following one-liner will do that for you:

同样,如果您使用的是半新的 Windows(即 Windows 7),那么 PowerShell 会预装它。以下单线将为您做到这一点:

Get-Content file.txt | foreach { $_ -replace '[^\w\d_.-]' } | Out-File -Encoding UTF8 file.new.txt

This can easily adapted to multiple files as well. It could be that you also can output into the original file again, since I think Get-Contentyields an array, not an enumerator (i.e. this pipeline cannot operate on the file as you read it). Similar problem due to that with very large files, though.

这也可以轻松适应多个文件。可能你也可以再次输出到原始文件中,因为我认为Get-Content产生一个数组,而不是一个枚举器(即这个管道不能在你读取文件时对其进行操作)。但是,由于文件非常大而导致的类似问题。

回答by kurumi

You can do regex with any tool/language that supports it. Here's a Ruby for Windowscommand

您可以使用任何支持它的工具/语言来执行正则表达式。这是一个Ruby for Windows命令

C:\work>ruby -ne 'print $_.gsub(/[%)?\[\]*]/,"")' file