Linux 如何根据cygwin中的开始和结束行号裁剪(剪切)文本文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5683367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to crop(cut) text files based on starting and ending line-numbers in cygwin?
提问by bits
I have few log files around 100MBs each. Personally I find it cumbersome to deal with such big files. I know that log lines that are interesting to me are only between 200 to 400 lines or so.
我的日志文件很少,每个大约 100MB。我个人觉得处理这么大的文件很麻烦。我知道我感兴趣的日志行只有 200 到 400 行左右。
What would be a good way to extract relavant log lines from these files ie I just want to pipe the range of line numbers to another file.
从这些文件中提取相关日志行的好方法是什么,即我只想将行号范围通过管道传输到另一个文件。
For example, the inputs are:
例如,输入是:
filename: MyHugeLogFile.log
Starting line number: 38438
Ending line number: 39276
Is there a command that I can run in cygwin to cat
out only that range in that file? I know that if I can somehow display that range in stdout then I can also pipe to an output file.
有没有我可以在 cygwin 中运行的命令,cat
只能超出该文件中的那个范围?我知道,如果我能以某种方式在标准输出中显示该范围,那么我也可以通过管道传输到输出文件。
Note: Adding Linux
tag for more visibility, but I need a solution that might work in cygwin. (Usually linux commands do work in cygwin).
注意:添加Linux
标签以获得更多可见性,但我需要一个可能适用于 cygwin 的解决方案。(通常 linux 命令在 cygwin 中工作)。
采纳答案by Johnsyweb
Sounds like a job for sed
:
听起来像是一份工作sed
:
sed -n '8,12p' yourfile
...will send lines 8 through 12 of yourfile
to standard out.
...将第 8 行到第 12 行发送yourfile
到标准输出。
If you want to prepend the line number, you may wish to use cat -n
first:
如果要添加行号,您可能希望先使用cat -n
:
cat -n yourfile | sed -n '8,12p'
回答by thkala
How about this:
这个怎么样:
$ seq 1 100000 | tail -n +10000 | head -n 10
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
It uses tail
to output from the 10,000th line and onwards and then head
to only keep 10 lines.
它用于tail
从第 10,000 行开始输出,然后head
只保留 10 行。
The same (almost) result with sed
:
相同(几乎)的结果sed
:
$ seq 1 100000 | sed -n '10000,10010p'
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010
This one has the advantage of allowing you to input the line range directly.
这样做的好处是可以直接输入行范围。
回答by David
You can use wc -l
to figure out the total # of lines.
您可以用来wc -l
计算总行数。
You can then combine head
and tail
to get at the range you want. Let's assume the log is 40,000 lines, you want the last 1562 lines, then of those you want the first 838. So:
然后您可以组合head
和tail
以获得您想要的范围。让我们假设日志是 40,000 行,你想要最后 1562 行,然后你想要前 838 行。所以:
tail -1562 MyHugeLogFile.log | head -838 | ....
Or there's probably an easier way using sed
or awk
.
或者可能有更简单的方法使用sed
or awk
。
回答by Dorian
I saw this thread when I was trying to split a file in files with 100 000 lines. A better solution than sed for that is:
当我尝试将文件拆分为 100 000 行的文件时,我看到了这个线程。比 sed 更好的解决方案是:
split -l 100000 database.sql database-
It will give files like:
它将提供如下文件:
database-aaa
database-aab
database-aac
...
回答by Jose Antonio Escobar Garcia
If you are interested only in the last X lines, you can use the "tail" command like this.
如果你只对最后 X 行感兴趣,你可以像这样使用“tail”命令。
$ tail -n XXXXX yourlogfile.log >> mycroppedfile.txt
This will save the last XXXXX lines of your log file to a new file called "mycroppedfile.txt"
这会将日志文件的最后 XXXXX 行保存到一个名为“mycroppedfile.txt”的新文件中
回答by Marc Perrin-Pelletier
And if you simply want to cut part of a file - say from line 26 to 142 - and input it to a newfile :
cat file-to-cut.txt | sed -n '26,142p' >> new-file.txt
如果您只是想剪切文件的一部分 - 例如从第 26 行到第 142 行 - 并将其输入到 newfile :
cat file-to-cut.txt | sed -n '26,142p' >> new-file.txt
回答by hbolingbroke
This is an old thread but I was surprised nobody mentioned grep. The -A option allows specifying a number of lines to print after a search match and the -B option includes lines before a match. The following command would output 10 lines before and 10 lines after occurrences of "my search string" in the file "mylogfile.log":
这是一个旧线程,但我很惊讶没有人提到 grep。-A 选项允许指定在搜索匹配之后打印的行数,-B 选项包括匹配之前的行。以下命令将输出文件“mylogfile.log”中出现“my search string”之前的 10 行和之后的 10 行:
grep -A 10 -B 10 "my search string" mylogfile.log
grep -A 10 -B 10 "我的搜索字符串" mylogfile.log
If there are multiple matches within a large file the output can rapidly get unwieldy. Two helpful options are -n which tells grep to include line numbers and --color which highlights the matched text in the output.
如果一个大文件中有多个匹配项,输出会很快变得笨拙。两个有用的选项是 -n 告诉 grep 包括行号和 --color 突出显示输出中匹配的文本。
If there is more than file to be searched grep allows multiple files to be listed separated by spaces. Wildcards can also be used. Putting it all together:
如果要搜索的文件不止一个,grep 允许列出多个文件,以空格分隔。也可以使用通配符。把它们放在一起:
grep -A 10 -B 10 -n --color "my search string" *.log someOtherFile.txt
grep -A 10 -B 10 -n --color "我的搜索字符串" *.log someOtherFile.txt