Linux 如何根据cygwin中的开始和结束行号裁剪（剪切）文本文件？

Question

提问by bits

I have few log files around 100MBs each. Personally I find it cumbersome to deal with such big files. I know that log lines that are interesting to me are only between 200 to 400 lines or so.

我的日志文件很少，每个大约 100MB。我个人觉得处理这么大的文件很麻烦。我知道我感兴趣的日志行只有 200 到 400 行左右。

What would be a good way to extract relavant log lines from these files ie I just want to pipe the range of line numbers to another file.

从这些文件中提取相关日志行的好方法是什么，即我只想将行号范围通过管道传输到另一个文件。

For example, the inputs are:

例如，输入是：

filename: MyHugeLogFile.log
Starting line number: 38438
Ending line number:   39276

Is there a command that I can run in cygwin to catout only that range in that file? I know that if I can somehow display that range in stdout then I can also pipe to an output file.

有没有我可以在 cygwin 中运行的命令，cat只能超出该文件中的那个范围？我知道，如果我能以某种方式在标准输出中显示该范围，那么我也可以通过管道传输到输出文件。

Note: Adding Linuxtag for more visibility, but I need a solution that might work in cygwin. (Usually linux commands do work in cygwin).

注意：添加Linux标签以获得更多可见性，但我需要一个可能适用于 cygwin 的解决方案。（通常 linux 命令在 cygwin 中工作）。

Answer 1

采纳答案by Johnsyweb

Sounds like a job for sed:

听起来像是一份工作sed：

sed -n '8,12p' yourfile

...will send lines 8 through 12 of yourfileto standard out.

...将第 8 行到第 12 行发送yourfile到标准输出。

If you want to prepend the line number, you may wish to use cat -nfirst:

如果要添加行号，您可能希望先使用cat -n：

cat -n yourfile | sed -n '8,12p'

Answer 2

回答by thkala

How about this:

这个怎么样：

$ seq 1 100000 | tail -n +10000 | head -n 10
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009

It uses tailto output from the 10,000th line and onwards and then headto only keep 10 lines.

它用于tail从第 10,000 行开始输出，然后head只保留 10 行。

The same (almost) result with sed:

相同（几乎）的结果sed：

$ seq 1 100000 | sed -n '10000,10010p'
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010

This one has the advantage of allowing you to input the line range directly.

这样做的好处是可以直接输入行范围。

Answer 3

回答by David

You can use wc -lto figure out the total # of lines.

您可以用来wc -l计算总行数。

You can then combine headand tailto get at the range you want. Let's assume the log is 40,000 lines, you want the last 1562 lines, then of those you want the first 838. So:

然后您可以组合head和tail以获得您想要的范围。让我们假设日志是 40,000 行，你想要最后 1562 行，然后你想要前 838 行。所以：

tail -1562 MyHugeLogFile.log | head -838 | ....

Or there's probably an easier way using sedor awk.

或者可能有更简单的方法使用sedor awk。

Answer 4

回答by Dorian

I saw this thread when I was trying to split a file in files with 100 000 lines. A better solution than sed for that is:

当我尝试将文件拆分为 100 000 行的文件时，我看到了这个线程。比 sed 更好的解决方案是：

split -l 100000 database.sql database-

It will give files like:

它将提供如下文件：

database-aaa
database-aab
database-aac
...

Answer 5

回答by Jose Antonio Escobar Garcia

If you are interested only in the last X lines, you can use the "tail" command like this.

如果你只对最后 X 行感兴趣，你可以像这样使用“tail”命令。

$ tail -n XXXXX yourlogfile.log >> mycroppedfile.txt

This will save the last XXXXX lines of your log file to a new file called "mycroppedfile.txt"

这会将日志文件的最后 XXXXX 行保存到一个名为“mycroppedfile.txt”的新文件中

Answer 6

回答by Marc Perrin-Pelletier

And if you simply want to cut part of a file - say from line 26 to 142 - and input it to a newfile : cat file-to-cut.txt | sed -n '26,142p' >> new-file.txt

如果您只是想剪切文件的一部分 - 例如从第 26 行到第 142 行 - 并将其输入到 newfile ： cat file-to-cut.txt | sed -n '26,142p' >> new-file.txt

Answer 7

回答by hbolingbroke

This is an old thread but I was surprised nobody mentioned grep. The -A option allows specifying a number of lines to print after a search match and the -B option includes lines before a match. The following command would output 10 lines before and 10 lines after occurrences of "my search string" in the file "mylogfile.log":

这是一个旧线程，但我很惊讶没有人提到 grep。-A 选项允许指定在搜索匹配之后打印的行数，-B 选项包括匹配之前的行。以下命令将输出文件“mylogfile.log”中出现“my search string”之前的 10 行和之后的 10 行：

grep -A 10 -B 10 "my search string" mylogfile.log

grep -A 10 -B 10 "我的搜索字符串" mylogfile.log

If there are multiple matches within a large file the output can rapidly get unwieldy. Two helpful options are -n which tells grep to include line numbers and --color which highlights the matched text in the output.

如果一个大文件中有多个匹配项，输出会很快变得笨拙。两个有用的选项是 -n 告诉 grep 包括行号和 --color 突出显示输出中匹配的文本。

If there is more than file to be searched grep allows multiple files to be listed separated by spaces. Wildcards can also be used. Putting it all together:

如果要搜索的文件不止一个，grep 允许列出多个文件，以空格分隔。也可以使用通配符。把它们放在一起：

grep -A 10 -B 10 -n --color "my search string" *.log someOtherFile.txt

grep -A 10 -B 10 -n --color "我的搜索字符串" *.log someOtherFile.txt

Linux 如何根据cygwin中的开始和结束行号裁剪（剪切）文本文件？

提问by bits

采纳答案by Johnsyweb

回答by thkala

回答by David

回答by Dorian

回答by Jose Antonio Escobar Garcia

回答by Marc Perrin-Pelletier

回答by hbolingbroke

相关推荐

最近更新

标签

Linux 如何根据cygwin中的开始和结束行号裁剪（剪切）文本文件？

提问by bits

采纳答案by Johnsyweb

回答by thkala

回答by David

回答by Dorian

回答by Jose Antonio Escobar Garcia

回答by Marc Perrin-Pelletier

回答by hbolingbroke

相关推荐

C# 如何在新线程中运行一些简单的代码？

在linux bash中运行php脚本（php函数）

C# List<> GroupBy 2 值

如何在具有子目录和时间的目录中递归查找并列出最新修改的文​​件？

相关推荐

最近更新

标签

如何在具有子目录和时间的目录中递归查找并列出最新修改的文件？