Linux 命令(如 cat)读取指定数量的字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/218912/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Linux command (like cat) to read a specified quantity of characters
提问by pbreault
Is there a command like cat
in linux which can return a specified quantity of characters from a file?
是否有类似cat
linux的命令可以从文件中返回指定数量的字符?
e.g., I have a text file like:
例如,我有一个文本文件,如:
Hello world
this is the second line
this is the third line
And I want something that would return the first 5 characters, which would be "hello".
我想要一些可以返回前 5 个字符的东西,那就是“你好”。
thanks
谢谢
采纳答案by Dan
head
works too:
head
也有效:
head -c 100 file # returns the first 100 bytes in the file
..will extract the first 100 bytes and return them.
..将提取前 100 个字节并返回它们。
What's nice about using head
for this is that the syntax for tail
matches:
使用head
它的好处是tail
匹配的语法:
tail -c 100 file # returns the last 100 bytes in the file
You can combine these to get ranges of bytes. For example, to get the second100 bytes from a file, read the first 200 with head
and use tail to get the last 100:
您可以组合这些来获取字节范围。例如,要从文件中获取第二个100 字节,请head
使用with 读取前 200 个字节并使用 tail 获取最后一个 100 个字节:
head -c 200 file | tail -c 100
回答by gimel
head:
头:
Name
姓名
head - output the first part of files
head - 输出文件的第一部分
Synopsis
概要
head[OPTION]... [FILE]...
头[选项]... [文件]...
Description
描述
Print the first 10 lines of each FILE to standard output. With more than one FILE, precede each with a header giving the file name. With no FILE, or when FILE is -, read standard input.
将每个 FILE 的前 10 行打印到标准输出。对于多个 FILE,在每个 FILE 之前都有一个提供文件名的标题。没有 FILE 或 FILE 为 - 时,读取标准输入。
Mandatory arguments to long options are mandatory for short options too.
-c, --bytes=[-]Nprint the first N bytes of each file; with the leading '-', print all but the last N bytes of each file
长选项的强制性参数对于短选项也是强制性的。
-c, --bytes= [-]N打印每个文件的前 N 个字节;以“-”开头,打印每个文件的最后 N 个字节以外的所有字节
回答by Zathrus
head or tail can do it as well:
head 或 tail 也可以这样做:
head -c X
头 -c X
Prints the first X bytes (not necessarily characters if it's a UTF-16 file) of the file. tail will do the same, except for the last X bytes.
打印文件的前 X 个字节(如果是 UTF-16 文件,则不一定是字符)。除了最后 X 个字节外,tail 也会做同样的事情。
This (and cut) are portable.
这(和切割)是便携式的。
回答by nkr1pt
you could also grep the line out and then cut it like for instance:
你也可以 grep 这条线,然后像这样剪切它:
grep 'text' filename | cut -c 1-5
grep '文本' 文件名 | 切-c 1-5
回答by fcw
You can use dd to extract arbitrary chunks of bytes.
您可以使用 dd 提取任意字节块。
For example,
例如,
dd skip=1234 count=5 bs=1
would copy bytes 1235 to 1239 from its input to its output, and discard the rest.
会将字节 1235 到 1239 从其输入复制到其输出,并丢弃其余字节。
To just get the first five bytes from standard input, do:
要从标准输入中获取前五个字节,请执行以下操作:
dd count=5 bs=1
Note that, if you want to specify the input file name, dd has old-fashioned argument parsing, so you would do:
请注意,如果要指定输入文件名, dd 具有老式参数解析,因此您可以执行以下操作:
dd count=5 bs=1 if=filename
Note also that dd verbosely announces what it did, so to toss that away, do:
还要注意 dd 详细地宣布它做了什么,所以要扔掉它,请执行以下操作:
dd count=5 bs=1 2>&-
or
或者
dd count=5 bs=1 2>/dev/null
回答by fcw
head -Line_number file_name | tail -1 |cut -c Num_of_chars
this script gives the exact number of characters from the specific line and location, e.g.:
这个脚本给出了特定行和位置的确切字符数,例如:
head -5 tst.txt | tail -1 |cut -c 5-8
gives the chars in line 5 and chars 5 to 8 of line 5,
给出第 5 行的字符和第 5 行的字符 5 到 8,
Note: tail -1
is used to select the last line displayed by the head.
注意:tail -1
用于选择头部显示的最后一行。
回答by bobbyus
I know the answer is in reply to a question asked 6 years ago ...
我知道答案是对 6 年前提出的问题的答复......
But I was looking for something similar for a few hours and then found out that: cut -cdoes exactly that, with an added bonus that you could also specify an offset.
但是我找了几个小时类似的东西,然后发现: cut -c 就是这样做的,还有一个额外的好处,你还可以指定一个偏移量。
cut -c 1-5 will return Helloand cut -c 7-11 will return world. No need for any other command
cut -c 1-5将返回Hello,cut -c 7-11将返回world。不需要任何其他命令
回答by rowanthorpe
Even though this was answered/accepted years ago, the presently accepted answer is only correct for one-byte-per-character encodings like iso-8859-1, or for the single-byte subsets of variable-byte character sets (like Latin characters within UTF-8). Even using multiple-byte splices instead would still only work for fixed-multibyte encodings like UTF-16. Given that now UTF-8 is well on its way to being a universal standard, and when looking at this list of languages by number of native speakersand this list of top 30 languages by native/secondary usage, it is important to point out a simple variable-byte character-friendly (not byte-based) technique, using cut -c
and tr
/sed
with character-classes.
尽管这在几年前就已得到回答/接受,但目前接受的答案仅适用于每字符一个字节的编码,如 iso-8859-1,或可变字节字符集的单字节子集(如拉丁字符)在 UTF-8 中)。即使改用多字节拼接,仍然只适用于固定多字节编码,如 UTF-16。鉴于现在 UTF-8 正朝着成为通用标准的方向发展,当查看这个按母语使用者人数排列的语言列表和这个按母语/次要用途排列的前 30 种语言列表时,重要的是要指出一个简单的可变字节字符友好(不是基于字节的)技术,使用cut -c
和tr
/sed
与字符类。
Compare the following which doubly fails due to two common Latin-centric mistakes/presumptions regarding the bytes vs. characters issue (one is head
vs. cut
, the other is [a-z][A-Z]
vs. [:upper:][:lower:]
):
比较以下由于两个常见的以拉丁语为中心的错误/关于字节与字符问题的假设(一个是head
vs. cut
,另一个是[a-z][A-Z]
vs. [:upper:][:lower:]
)而双重失败:
$ printf 'Πο? μπορ? να μ?θω σανσκριτικ?;\n' | \
$ head -c 1 | \
$ sed -e 's/[A-Z]/[a-z]/g'
[[unreadable binary mess, or nothing if the terminal filtered it]]
to this (note: this worked fine on FreeBSD, but both cut
& tr
on GNU/Linux still mangled Greek in UTF-8 for me though):
对此(注意:这在 FreeBSD 上运行良好,但在 GNU/Linux 上cut
&tr
对我来说仍然在 UTF-8 中损坏了希腊语):
$ printf 'Πο? μπορ? να μ?θω σανσκριτικ?;\n' | \
$ cut -c 1 | \
$ tr '[:upper:]' '[:lower:]'
π
Another more recent answer had already proposed "cut", but only because of the side issue that it can be used to specify arbitrary offsets, not because of the directly relevant character vs. bytes issue.
另一个最近的答案已经提出了“cut”,但只是因为它可以用于指定任意偏移量的附带问题,而不是因为直接相关的字符与字节问题。
If your cut
doesn't handle -c
with variable-byte encodings correctly, for "the first X
characters" (replace X
with your number) you could try:
如果您cut
没有-c
正确处理可变字节编码,对于“第一个X
字符”(替换X
为您的数字),您可以尝试:
sed -E -e '1 s/^(.{X}).*$/\1/' -e q
- which is limited to the first line thoughhead -n 1 | grep -E -o '^.{X}'
- which is limited to the first line and chains two commands thoughdd
- which has already been suggested in other answers, but is really cumbersome- A complicated
sed
script with sliding window buffer to handle characters spread over multiple lines, but that is probably more cumbersome/fragile than just using something likedd
sed -E -e '1 s/^(.{X}).*$/\1/' -e q
- 虽然仅限于第一行head -n 1 | grep -E -o '^.{X}'
- 仅限于第一行并链接两个命令dd
- 这已经在其他答案中提出了,但真的很麻烦- 一个复杂的
sed
脚本,带有滑动窗口缓冲区来处理分布在多行上的字符,但这可能比仅仅使用类似的东西更麻烦/脆弱dd
If your tr
doesn't handle character-classes with variable-byte encodings correctly you could try:
如果您tr
没有正确处理具有可变字节编码的字符类,您可以尝试:
sed -E -e 's/[[:upper:]]/\L&/g
(GNU-specific)
sed -E -e 's/[[:upper:]]/\L&/g
(特定于 GNU)
回答by Brad Parks
Here's a simple script that wraps up using the dd
approach mentioned here:
这是一个简单的脚本,它使用dd
这里提到的方法结束:
extract_chars.sh
提取字符.sh
#!/usr/bin/env bash
function show_help()
{
IT="
extracts characters X to Y from stdin or FILE
usage: X Y {FILE}
e.g.
2 10 /tmp/it => extract chars 2-10 from /tmp/it
EOF
"
echo "$IT"
exit
}
if [ "" == "help" ]
then
show_help
fi
if [ -z "" ]
then
show_help
fi
FROM=
TO=
COUNT=`expr $TO - $FROM + 1`
if [ -z "" ]
then
dd skip=$FROM count=$COUNT bs=1 2>/dev/null
else
dd skip=$FROM count=$COUNT bs=1 if= 2>/dev/null
fi