Linux 命令（如 cat）读取指定数量的字符

Question

提问by pbreault

Is there a command like catin linux which can return a specified quantity of characters from a file?

是否有类似catlinux的命令可以从文件中返回指定数量的字符？

e.g., I have a text file like:

例如，我有一个文本文件，如：

Hello world
this is the second line
this is the third line

And I want something that would return the first 5 characters, which would be "hello".

我想要一些可以返回前 5 个字符的东西，那就是“你好”。

thanks

谢谢

Answer 1

采纳答案by Dan

headworks too:

head也有效：

head -c 100 file  # returns the first 100 bytes in the file

..will extract the first 100 bytes and return them.

..将提取前 100 个字节并返回它们。

What's nice about using headfor this is that the syntax for tailmatches:

使用head它的好处是tail匹配的语法：

tail -c 100 file  # returns the last 100 bytes in the file

You can combine these to get ranges of bytes. For example, to get the second100 bytes from a file, read the first 200 with headand use tail to get the last 100:

您可以组合这些来获取字节范围。例如，要从文件中获取第二个100 字节，请head使用with 读取前 200 个字节并使用 tail 获取最后一个 100 个字节：

head -c 200 file | tail -c 100

Answer 2

回答by gimel

head:

头：

Name

姓名

head - output the first part of files

head - 输出文件的第一部分

Synopsis

概要

head[OPTION]... [FILE]...

头[选项]... [文件]...

Description

描述

Print the first 10 lines of each FILE to standard output. With more than one FILE, precede each with a header giving the file name. With no FILE, or when FILE is -, read standard input.

将每个 FILE 的前 10 行打印到标准输出。对于多个 FILE，在每个 FILE 之前都有一个提供文件名的标题。没有 FILE 或 FILE 为 - 时，读取标准输入。

Mandatory arguments to long options are mandatory for short options too.
-c, --bytes=[-]Nprint the first N bytes of each file; with the leading '-', print all but the last N bytes of each file

长选项的强制性参数对于短选项也是强制性的。
-c, --bytes= [-]N打印每个文件的前 N 个字节；以“-”开头，打印每个文件的最后 N 个字节以外的所有字节

Answer 3

回答by Zathrus

head or tail can do it as well:

head 或 tail 也可以这样做：

head -c X

头 -c X

Prints the first X bytes (not necessarily characters if it's a UTF-16 file) of the file. tail will do the same, except for the last X bytes.

打印文件的前 X 个字节（如果是 UTF-16 文件，则不一定是字符）。除了最后 X 个字节外，tail 也会做同样的事情。

This (and cut) are portable.

这（和切割）是便携式的。

Answer 4

回答by nkr1pt

you could also grep the line out and then cut it like for instance:

你也可以 grep 这条线，然后像这样剪切它：

grep 'text' filename | cut -c 1-5

grep '文本' 文件名 | 切-c 1-5

Answer 5

回答by fcw

You can use dd to extract arbitrary chunks of bytes.

您可以使用 dd 提取任意字节块。

For example,

例如，

dd skip=1234 count=5 bs=1

would copy bytes 1235 to 1239 from its input to its output, and discard the rest.

会将字节 1235 到 1239 从其输入复制到其输出，并丢弃其余字节。

To just get the first five bytes from standard input, do:

要从标准输入中获取前五个字节，请执行以下操作：

dd count=5 bs=1

Note that, if you want to specify the input file name, dd has old-fashioned argument parsing, so you would do:

请注意，如果要指定输入文件名， dd 具有老式参数解析，因此您可以执行以下操作：

dd count=5 bs=1 if=filename

Note also that dd verbosely announces what it did, so to toss that away, do:

还要注意 dd 详细地宣布它做了什么，所以要扔掉它，请执行以下操作：

dd count=5 bs=1 2>&-

or

或者

dd count=5 bs=1 2>/dev/null

Answer 6

回答by fcw

head -Line_number file_name | tail -1 |cut -c Num_of_chars

this script gives the exact number of characters from the specific line and location, e.g.:

这个脚本给出了特定行和位置的确切字符数，例如：

head -5 tst.txt | tail -1 |cut -c 5-8

gives the chars in line 5 and chars 5 to 8 of line 5,

给出第 5 行的字符和第 5 行的字符 5 到 8，

Note: tail -1is used to select the last line displayed by the head.

注意：tail -1用于选择头部显示的最后一行。

Answer 7

回答by bobbyus

I know the answer is in reply to a question asked 6 years ago ...

我知道答案是对 6 年前提出的问题的答复......

But I was looking for something similar for a few hours and then found out that: cut -cdoes exactly that, with an added bonus that you could also specify an offset.

但是我找了几个小时类似的东西，然后发现： cut -c 就是这样做的，还有一个额外的好处，你还可以指定一个偏移量。

cut -c 1-5 will return Helloand cut -c 7-11 will return world. No need for any other command

cut -c 1-5将返回Hello，cut -c 7-11将返回world。不需要任何其他命令

Answer 8

回答by rowanthorpe

Even though this was answered/accepted years ago, the presently accepted answer is only correct for one-byte-per-character encodings like iso-8859-1, or for the single-byte subsets of variable-byte character sets (like Latin characters within UTF-8). Even using multiple-byte splices instead would still only work for fixed-multibyte encodings like UTF-16. Given that now UTF-8 is well on its way to being a universal standard, and when looking at this list of languages by number of native speakersand this list of top 30 languages by native/secondary usage, it is important to point out a simple variable-byte character-friendly (not byte-based) technique, using cut -cand tr/sedwith character-classes.

尽管这在几年前就已得到回答/接受，但目前接受的答案仅适用于每字符一个字节的编码，如 iso-8859-1，或可变字节字符集的单字节子集（如拉丁字符）在 UTF-8 中）。即使改用多字节拼接，仍然只适用于固定多字节编码，如 UTF-16。鉴于现在 UTF-8 正朝着成为通用标准的方向发展，当查看这个按母语使用者人数排列的语言列表和这个按母语/次要用途排列的前 30 种语言列表时，重要的是要指出一个简单的可变字节字符友好（不是基于字节的）技术，使用cut -c和tr/sed与字符类。

Compare the following which doubly fails due to two common Latin-centric mistakes/presumptions regarding the bytes vs. characters issue (one is headvs. cut, the other is [a-z][A-Z]vs. [:upper:][:lower:]):

比较以下由于两个常见的以拉丁语为中心的错误/关于字节与字符问题的假设（一个是headvs. cut，另一个是[a-z][A-Z]vs. [:upper:][:lower:]）而双重失败：

$ printf 'Πο? μπορ? να μ?θω σανσκριτικ?;\n' | \
$     head -c 1 | \
$     sed -e 's/[A-Z]/[a-z]/g'
[[unreadable binary mess, or nothing if the terminal filtered it]]

to this (note: this worked fine on FreeBSD, but both cut& tron GNU/Linux still mangled Greek in UTF-8 for me though):

对此（注意：这在 FreeBSD 上运行良好，但在 GNU/Linux 上cut&tr对我来说仍然在 UTF-8 中损坏了希腊语）：

$ printf 'Πο? μπορ? να μ?θω σανσκριτικ?;\n' | \
$     cut -c 1 | \
$     tr '[:upper:]' '[:lower:]'
π

Another more recent answer had already proposed "cut", but only because of the side issue that it can be used to specify arbitrary offsets, not because of the directly relevant character vs. bytes issue.

另一个最近的答案已经提出了“cut”，但只是因为它可以用于指定任意偏移量的附带问题，而不是因为直接相关的字符与字节问题。

If your cutdoesn't handle -cwith variable-byte encodings correctly, for "the first Xcharacters" (replace Xwith your number) you could try:

如果您cut没有-c正确处理可变字节编码，对于“第一个X字符”（替换X为您的数字），您可以尝试：

sed -E -e '1 s/^(.{X}).*$/\1/' -e q- which is limited to the first line though
head -n 1 | grep -E -o '^.{X}'- which is limited to the first line and chains two commands though
dd- which has already been suggested in other answers, but is really cumbersome
A complicated sedscript with sliding window buffer to handle characters spread over multiple lines, but that is probably more cumbersome/fragile than just using something like dd

sed -E -e '1 s/^(.{X}).*$/\1/' -e q- 虽然仅限于第一行
head -n 1 | grep -E -o '^.{X}'- 仅限于第一行并链接两个命令
dd- 这已经在其他答案中提出了，但真的很麻烦
一个复杂的sed脚本，带有滑动窗口缓冲区来处理分布在多行上的字符，但这可能比仅仅使用类似的东西更麻烦/脆弱dd

If your trdoesn't handle character-classes with variable-byte encodings correctly you could try:

如果您tr没有正确处理具有可变字节编码的字符类，您可以尝试：

sed -E -e 's/[[:upper:]]/\L&/g(GNU-specific)

sed -E -e 's/[[:upper:]]/\L&/g（特定于 GNU）

Answer 9

回答by Brad Parks

Here's a simple script that wraps up using the ddapproach mentioned here:

这是一个简单的脚本，它使用dd这里提到的方法结束：

extract_chars.sh

提取字符.sh

#!/usr/bin/env bash

function show_help()
{
  IT="
extracts characters X to Y from stdin or FILE
usage: X Y {FILE}

e.g. 

2 10 /tmp/it     => extract chars 2-10 from /tmp/it
EOF
  "
  echo "$IT"
  exit
}

if [ "" == "help" ]
then
  show_help
fi
if [ -z "" ]
then
  show_help
fi

FROM=
TO=
COUNT=`expr $TO - $FROM + 1`

if [ -z "" ]
then
  dd skip=$FROM count=$COUNT bs=1 2>/dev/null
else
  dd skip=$FROM count=$COUNT bs=1 if= 2>/dev/null 
fi

Linux 命令（如 cat）读取指定数量的字符

提问by pbreault

采纳答案by Dan

回答by gimel

Name

姓名

Synopsis

概要

Description

描述

回答by Zathrus

回答by nkr1pt

回答by fcw

回答by fcw

回答by bobbyus

回答by rowanthorpe

回答by Brad Parks

extract_chars.sh

提取字符.sh

相关推荐

最近更新

标签

Linux 命令（如 cat）读取指定数量的字符

提问by pbreault

采纳答案by Dan

回答by gimel

Name

姓名

Synopsis

概要

Description

描述

回答by Zathrus

回答by nkr1pt

回答by fcw

回答by fcw

回答by bobbyus

回答by rowanthorpe

回答by Brad Parks

extract_chars.sh

提取字符.sh

相关推荐

C# 如何将多个表读入数据集？

Linux 获取 MAC 地址

Linux 如何按名称而不是 PID 杀死进程？

C# 在事件声明中添加匿名空委托有什么缺点吗？

相关推荐

最近更新

标签