string 如何使用windows命令行查找文件中字符串的出现次数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9307187/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 01:22:50  来源:igfitidea点击:

How to find the number of occurrences of a string in file using windows command line?

windowsstringfilecommand-linefind

提问by Patryk

I have a huge files with e-mail addresses and I would like to count how many of them are in this file. How can I do that using Windows' command line ?

我有一个带有电子邮件地址的大文件,我想计算这个文件中有多少个。我如何使用 Windows 的命令行来做到这一点?

I have tried this but it just prints the matching lines. (btw : all e-mails are contained in one line)

我试过这个,但它只是打印匹配的行。(顺便说一句:所有电子邮件都包含在一行中)

findstr /c:"@" mail.txt

findstr /c:"@" mail.txt

回答by Adam S

Using what you have, you could pipe the results through a find. I've seen something like this used from time to time.

使用您所拥有的,您可以通过find. 我不时看到这样的东西。

findstr /c:"@" mail.txt | find /c /v "GarbageStringDefNotInYourResults"

So you are counting the lines resulting from your findstrcommand that do not have the garbage string in it. Kind of a hack, but it could work for you. Alternatively, just use the find /con the string you do care about being there. Lastly, you mentioned one address per line, so in this case the above works, but multiple addresses per line and this breaks.

因此,您正在计算由您的findstr命令产生的没有垃圾字符串的行。有点像黑客,但它可以为你工作。或者,只需find /c在您关心的字符串上使用。最后,您提到了每行一个地址,因此在这种情况下,上述方法有效,但每行有多个地址,这就中断了。

回答by aschipfl

Why not simply using this (this determines the number of lines containing (at least) an @char.):

为什么不简单地使用它(这决定了包含(至少)一个@字符的行数。):

find /C "@" "mail.txt"

Example output:

示例输出:

---------- MAIL.TXT: 96
---------- MAIL.TXT: 96


To avoid the file name in the output, change it to this:

为避免输出中的文件名,请将其更改为:

find /C "@" < "mail.txt"

Example output:

示例输出:

96
96


To capture the resulting number and store it in a variable, use this (change %Nto %%Nin a batch file):

要捕获结果数字并将其存储在变量中,请使用以下命令(在批处理文件中更改%N%%N):

set "NUM=0"
for /F %N in ('find /C "@" ^< "mail.txt"') do set "NUM=%N"
echo %NUM%

回答by DigiBat

Very simple solution:

非常简单的解决方案:

grep -o "@" mail.txt | grep -c .

Remember a dot at end of line!

记住行尾的一个点!

Here is little bit more understandable way:

这是更容易理解的方式:

grep -o "@" mail.txt | grep -c "@"

First grep selects only "@" strings and put each on new line.

首先 grep 只选择“@”字符串并将每个字符串放在新行上。

Second grep counts lines (or lines with @).

第二个 grep 计算行(或带有 @ 的行)。

The greputility can be installed from GnuWin projector from WinGrepsites. It is very small and safe text filter. The grep is one of most usefull Unix/Linux commands and I use it in both Linux and Windows daily. The Windows findstris good, but does not have such features as grep.

grep的实用程序可以通过安装GnuWin项目WinGrep网站。它是非常小且安全的文本过滤器。grep 是最有用的 Unix/Linux 命令之一,我每天都在 Linux 和 Windows 中使用它。Windows findstr很好,但没有 grep 之类的功能。

Installation of the grepin Windows will be one of the best decision if you like CLI or batch scripts.

如果您喜欢 CLI 或批处理脚本,在 Windows 中安装grep将是最好的决定之一。

回答by paranoid

May be it's a little bit late, but the following script worked for me (the source file contained quote characters, this is why I used 'usebackq' parameter). The caret sign(^) acts as escape character in windows batch scripting language.

可能有点晚了,但以下脚本对我有用(源文件包含引号字符,这就是我使用 'usebackq' 参数的原因)。插入符号 (^) 在 Windows 批处理脚本语言中充当转义字符。

@setlocal enableextensions enabledelayedexpansion    
SET TOTAL=0
FOR /F "usebackq tokens=*" %%I IN (file.txt) do (
    SET LN=%%I
    FOR %%J IN ("!LN!") do (
        FOR /F %%K IN ('ECHO %%J ^| FIND /I /C "searchPhrase"') DO (
            @SET /A TOTAL=!TOTAL!+%%K
        )
    )
)
ECHO Number of occurences is !TOTAL!

回答by gentrobot

I found this on the net. See if it works:

我在网上找到了这个。看看它是否有效:

findstr /R /N "^.*certainString.*$" file.txt | find /c "@"

回答by TheEye

I would install the unix tools on your system (handy in any case :-), then it's really simple - look e.g. here:

我会在你的系统上安装 unix 工具(在任何情况下都很方便:-),然后它真的很简单 - 看看这里:

Count the number of occurrences of a string using sed?

使用sed计算字符串出现的次数?

(Using awk:

(使用 awk:

awk ' ~ /title/ {++c} END {print c}' FS=: myFile.txt

).

)。

You can get the Windows unix tools here:

您可以在此处获取 Windows unix 工具:

http://unxutils.sourceforge.net/

http://unxutils.sourceforge.net/

回答by Corb

OK - way late to the table, but... it seems many respondents missed the original spec that all email addresses occur on 1 line. This means unless you introduce a CRLF with each occurrence of the @ symbol, your suggestions to use variants of FINDSTR /c will not help.

好的 - 迟到了,但是......似乎许多受访者错过了所有电子邮件地址都出现在 1 行的原始规范。这意味着除非您在每次出现 @ 符号时都引入 CRLF,否则您使用 FINDSTR /c 变体的建议将无济于事。

Among the Unix tools for DOS is the very powerful SED.exe. Google it. It rocks RegEx. Here's a suggestion:

在用于 DOS 的 Unix 工具中,有一个非常强大的 SED.exe。去谷歌上查询。它震撼了正则表达式。这是一个建议:

find "@" datafile.txt | find "@" | sed "s/@/@\n/g" | find /n "@" | SED "s/\[\(.*\)\].*/Set \/a NumFound=/">CountChars.bat

Explanation: (assuming the file with the data is named "Datafile.txt") 1) The 1st FIND includes 3 lines of header info, which throws of a line-count approach, so pipe the results to a 2nd (identical) find to strip off unwanted header info.

说明:(假设包含数据的文件名为“Datafile.txt”)1)第一个 FIND 包含 3 行标题信息,这会引发行计数方法,因此将结果通过管道传输到第二个(相同)查找到去除不需要的标题信息。

2) Pipe the above results to SED, which will search for each "@" character and replace it with itself+ "\n" (which is a "new line" aka a CRLF) which gets each "@" on its own line in the output stream...

2) 将上述结果传送到 SED,SED 将搜索每个“@”字符并将其替换为它自己+“\n”(这是一个“新行”,又名 CRLF),它在自己的行中获取每个“@”输出流...

3) When you pipe the above output from SED into the FIND /n command, you'll be adding line numbers to the beginning of each line. Now, all you have to do is isolate the numeric portion of each line and preface it with "SET /a" to convert each line into a batch statement that (increasingly with each line) sets the variable equal to that line's number.

3) 当您将上述 SED 输出通过管道传送到 FIND /n 命令时,您将在每行的开头添加行号。现在,您所要做的就是隔离每一行的数字部分,并在它前面加上“SET /a”,以将每一行转换为批处理语句(随着每一行的增加)将变量设置为等于该行的编号。

4) isolate each line's numeric part and preface the isolated number per the above via:
| SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/"

4)隔离每行的数字部分,并通过以下方式按照上述方式对隔离数字进行序言:
| SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/"

In the above snippet, you're piping the previous commands's output to SED, which uses this syntax "s/WhatToLookFor/WhatToReplaceItWith/", to do these steps:

在上面的代码片段中,您将先前命令的输出通过管道传输到 SED,它使用这种语法“s/WhatToLookFor/WhatToReplaceItWith/”来执行以下步骤:

a) look for a "[" (which must be "escaped" by prefacing it with "\")

a) 寻找“[”(必须以“\”开头来“转义”)

b) begin saving (or "tokenizing") what follows, up to the closing "]"

b) 开始保存(或“标记化”)接下来的内容,直到结束“]”

    --> in other words it ignores the brackets but stores the number
    --> the ".*" that follows the bracket wildcards whatever follows the "]"

c) the stuff between the \(and the \)is "tokenized", which means it can be referred-to later, in the "WhatToReplaceItWith" section. The first stuff that's tokenized is referred to via "\1" then second as "\2", etc.

c)该之间的东西\(\)是“标记化的”,这意味着它可以在以后参照的,则“WhatToReplaceItWith”部分。标记化的第一个东西通过“\1”引用,然后第二个称为“\2”,依此类推。

So... we're ignoring the [ and the ] and we're saving the number that lies between the brackets and IGNORING all the wild-carded remainder of each line... thus we're replacing the line with the literal string: Set /a NumFound=+ the saved, or "tokenized" number, i.e. ...the first line will read: Set /a NumFound=1...& the next line reads: Set /a NumFound=2etc. etc.

所以......我们忽略了 [ 和 ] 并且我们正在保存括号之间的数字并忽略每行的所有通配符剩余部分......因此我们用文字字符串替换该行: Set /a NumFound=+ 保存的或“标记化”的数字,即...第一行将显示: Set /a NumFound=1...& 下一行显示: Set /a NumFound=2等。等等。

Thus, if you have 1,283 email addresses, your results will have 1,283 lines.

因此,如果您有 1,283 个电子邮件地址,您的结果将有 1,283 行。

The last one executed = the one that matters.

执行的最后一个 = 重要的那个。

If you use the ">" character to redirect all of the above output to a batch file, i.e.: > CountChars.bat

如果您使用“>”字符将上述所有输出重定向到批处理文件,即: > CountChars.bat

...then just call that batch file & you'll have a DOS environment variable named "NumFound" with your answer.

...然后只需调用该批处理文件,您将拥有一个名为“NumFound”的 DOS 环境变量以及您的答案。

回答by djangofan

This is how I do it, using an AND condition with FINDSTR (to count number of errors in a log file):

这就是我的做法,使用带有 FINDSTR 的 AND 条件(计算日志文件中的错误数):

SET COUNT=0
FOR /F "tokens=4*" %%a IN ('TYPE "soapui.log" ^| FINDSTR.exe /I /R^
 /C:"Assertion" ^| FINDSTR.exe /I /R /C:"has status VALID"') DO (
  :: counts number of lines containing both "Assertion" and "has status VALID"
  SET /A COUNT+=1
)
SET /A PASSNUM=%COUNT%

NOTE: This counts "number of lines containing string match" rather than "number of total occurrences in file".

注意:这计算“包含字符串匹配的行数”而不是“文件中的总出现次数”。

回答by Quinlan Vos

Use this:

用这个:

type file.txt | find /i "@" /c