Linux 通过shell脚本计算文件中的字符数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5026214/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 00:21:44  来源:igfitidea点击:

Counting number of characters in a file through shell script

linuxshell

提问by Shweta

I want to check the no of characters in a file from starting to EOF character. Can anyone tell me how to do this through shell script

我想检查文件中从开始到 EOF 字符的字符数。谁能告诉我如何通过 shell 脚本做到这一点

采纳答案by Paused until further notice.

This will do it:

这将做到:

wc -c filename

If you want only the count without the filename being repeated in the output:

如果您只想要计数而不在输出中重复文件名:

wc -c < filename

Edit:

编辑:

Use -mto count character instead of bytes (as shown in Sébastien's answer).

使用-m数字符而不是字节(如图塞巴斯蒂安的答案)。

回答by Sébastien Le Callonnec

#!/bin/sh

wc -m  | awk '{print }'

wc -mcounts the number of characters; the awkcommand prints the number of characters only, omitting the filename.

wc -m计算字符数;该awk命令仅打印字符数,省略文件名。

wc -cwould give you the number of bytes (which can be different to the number of characters, as depending on the encoding you may have a character encoded on several bytes).

wc -c会给你字节数(这可能与字符数不同,因为根据编码你可能有一个字符编码在几个字节上)。

回答by kurumi

awk only

仅 awk

awk 'BEGIN{FS=""}{for(i=1;i<=NF;i++)c++}END{print "total chars:"c}' file

shell only

仅外壳

var=$(<file)
echo ${#var}

Ruby(1.9+)

红宝石(1.9+)

ruby -0777 -ne 'print $_.size' file

回答by Vijay

awk '{t+=length(
\#!/bin/bash

echo "Enter the file name"

read file

echo "enter the word to be found"

read word

count=0

for i in \`cat $file`

do

if [ $i == $word ]

then

count=\`expr $count + 1`

fi

done

echo "The number of words are $count"
)}END{print t}' file3

回答by Paresh

The following script is tested and gives exactly the results, that are expected

以下脚本经过测试并给出了预期的结果

$printf '6chars' | wc -m
6

回答by user.py

To get exact character count of string, use printf, as opposed to echo, cat, or running wc -c directly on a file, because using echo, cat, etc will count a newline character, which will give you the amount of characters including the newline character. So a file with the text 'hello' will print 6 if you use echo etc, but if you use printf it will return the exact 5, because theres no newline element to count.

要获得字符串的确切字符数,请使用 printf,而不是直接在文件上运行 echo、cat 或 wc -c,因为使用 echo、cat 等将计算换行符,这将为您提供字符数,包括换行符。因此,如果您使用 echo 等,带有文本 'hello' 的文件将打印 6,但如果您使用 printf,它将返回精确的 5,因为没有要计算的换行元素。

How to use printf for counting characters within strings:

如何使用 printf 计算字符串中的字符数:

#!/bin/bash
characters=$(cat "")
printf "$characters" | wc -m

To turn this into a script you can run on a text file to count characters, save the following in a file called print-character-amount.sh:

要将其转换为脚本,您可以在文本文件上运行以计算字符数,请将以下内容保存在名为 print-character-amount.sh 的文件中:

print-character-amount.sh file-to-count-characters-of.txt

chmod +x on file print-character-amount.sh containing above text, place the file in your PATH (i.e. /usr/bin/ or any directory exported as PATH in your .bashrc file) then to run script on text file type:

chmod +x 在包含上述文本的文件 print-character-amount.sh 上,将该文件放置在您的 PATH(即 /usr/bin/ 或您的 .bashrc 文件中作为 PATH 导出的任何目录)中,然后在文本文件类型上运行脚本:

stat -c%s file

回答by Mark Setchell

I would have thought that it would be better to use statto find the size of a file, since the filesystem knows it already, rather than causing the whole file to have to be read with awkor wc- especially if it is a multi-GB file or one that may be non-resident in the file-system on an HSM.

我会认为最好使用stat来查找文件的大小,因为文件系统已经知道它,而不是导致必须使用awk或读取整个文件wc- 特别是如果它是一个多 GB 文件或一个可能不常驻在 HSM 上的文件系统中。

echo "??" > /tmp/your_file.txt
cat /tmp/your_file.txt | wc -m

Yes, I concede it doesn't account for multi-byte characters, but would add that the OP has never clarified whether that is/was an issue.

是的,我承认它没有考虑多字节字符,但要补充的是,OP 从未澄清过这是否是/曾经是一个问题。

回答by qr?bn?

Credits to user.py et al.

归功于 user.py 等。



##代码##

results in 3.

结果在3.

In my example the result is expected to be 2(twice the letter ?). However, echo (or vi) adds a line break \nto the end of the output (or file). So two ?and one Linux line break \nare counted. That's three together.

在我的示例中,结果预计为2(字母的两倍?)。但是,echo(或 vi)会\n在输出(或文件)的末尾添加换行符。所以两个?和一个 Linux 换行符\n被计算在内。那是三个一起。

Working with pipes |is not the shortest variant, but so I have to know less wcparameters by heart. In addition, catis bullet-proof in my experience.

使用管道|并不是最短的变体,但因此我必须记住较少的wc参数。此外,cat根据我的经验,它是防弹的。

Tested on Ubuntu 18.04.1 LTS (Bionic Beaver).

在 Ubuntu 18.04.1 LTS(仿生海狸)上测试。