在 Windows 中获取大文件的最后 n 行或字节(如 Unix 的尾部)。避免耗时的选项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36507343/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 19:33:23  来源:igfitidea点击:

Get last n lines or bytes of a huge file in Windows (like Unix's tail). Avoid time consuming options

windowspowershellbatch-filetail

提问by sancho.s ReinstateMonicaCellio

I need to retrieve the last n lines of huge files (1-4 Gb), in Windows 7. Due to corporate restrictions, I cannot run any command that is not built-in. The problem is that all solutions I found appear to read the whole file, so they are extremely slow.

我需要在 Windows 7 中检索最后 n 行大文件(1-4 Gb)。由于公司限制,我无法运行任何非内置命令。问题是我找到的所有解决方案似乎都读取了整个文件,因此它们非常慢。

Can this be accomplished, fast?

这能很快完成吗?

Notes:

笔记:

  1. I managed to get the first n lines, fast.
  2. It is ok if I get the last n bytes. (I used this https://stackoverflow.com/a/18936628/2707864for the first n bytes).
  1. 我设法快速获得前 n 行。
  2. 如果我得到最后 n 个字节就可以了。(我使用这个https://stackoverflow.com/a/18936628/2707864作为前 n 个字节)。

Solutions here Unix tail equivalent command in Windows Powershelldid not work. Using -waitdoes not make it fast. I do not have -tail(and I do not know if it will work fast).

此处的解决方案Windows Powershell 中的 Unix tail 等效命令不起作用。使用-wait并不能使它变快。我没有-tail(我不知道它是否会很快工作)。

PS: There are quite a few related questions for headand tail, but not focused on the issue of speed. Therefore, useful or accepted answers there may not be useful here. E.g.,

PS:head和相关的问题比较多tail,但没有集中在速度问题上。因此,有用的或被接受的答案在这里可能没有用。例如,

Windows equivalent of the 'tail' command

相当于“tail”命令的 Windows

CMD.EXE batch script to display last 10 lines from a txt file

CMD.EXE 批处理脚本显示 txt 文件的最后 10 行

Extract N lines from file using single windows command

使用单个 windows 命令从文件中提取 N 行

https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent

https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent

powershell to get the first x MB of a file

powershell 获取文件的第一个 x MB

https://superuser.com/questions/859870/windows-equivalent-of-the-head-c-command

https://superuser.com/questions/859870/windows-equivalent-of-the-head-c-command

采纳答案by Aziz Kabyshev

How about this (reads last 8 bytes for demo):

这个怎么样(读取演示的最后 8 个字节):

$fpath = "C:GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-8, 'End') | Out-Null
for ($i = 0; $i -lt 8; $i++)
{
    $fs.ReadByte()
}


UPDATE. To interpret bytes as string (but be sure to select correct encoding - here UTF8 is used):

更新。要将字节解释为字符串(但请确保选择正确的编码 - 此处使用 UTF8):

$N = 8
$fpath = "C:GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$N, [System.IO.SeekOrigin]::End) | Out-Null
$buffer = new-object Byte[] $N
$fs.Read($buffer, 0, $N) | Out-Null
$fs.Close()
[System.Text.Encoding]::UTF8.GetString($buffer)


UPDATE 2. To read last M lines, we'll be reading the file by portions until there are more than Mnewline char sequences in the result:

更新 2。要读取最后 M 行,我们将按部分读取文件,直到结果中有超过 M 个换行符序列:

$M = 3
$fpath = "C:GBfile.dat"

$result = ""
$seq = "`r`n"
$buffer_size = 10
$buffer = new-object Byte[] $buffer_size

$fs = [IO.File]::OpenRead($fpath)
while (([regex]::Matches($result, $seq)).Count -lt $M)
{
    $fs.Seek(-($result.Length + $buffer_size), [System.IO.SeekOrigin]::End) | Out-Null
    $fs.Read($buffer, 0, $buffer_size) | Out-Null
    $result = [System.Text.Encoding]::UTF8.GetString($buffer) + $result
}
$fs.Close()

($result -split $seq) | Select -Last $M

Try playing with bigger $buffer_size- this ideally is equal to expected average line length to make fewer disk operations. Also pay attention to $seq - this could be \r\nor just \n. This is very dirty code without any error handling and optimizations.

尝试使用更大的$buffer_size- 理想情况下,这等于预期的平均行长度,以减少磁盘操作。还要注意 $seq - 这可能是\r\n或只是\n. 这是非常脏的代码,没有任何错误处理和优化。

回答by alroc

If you have PowerShell 3 or higher, you can use the -Tailparameter for Get-Contentto get the last nlines.

如果你有 PowerShell 3 或更高版本,你可以使用-Tail参数 forGet-Content来获取最后n几行。

Get-content -tail 5 PATH_TO_FILE;

On a 34MB text file on my local SSD, this returned in 1 millisecond vs. 8.5 seconds for get-content |select -last 5

在我本地 SSD 上的 34MB 文本文件中,返回时间为 1 毫秒,而不是 8.5 秒 get-content |select -last 5

回答by sancho.s ReinstateMonicaCellio

With the awesome answer by Aziz Kabyshev, which solves the issue of speed, and with some googling, I ended up using this script

随着阿齐兹Kabyshev的真棒答案,解决了速度的问题,并与一些google搜索,我结束了使用这个脚本

$fpath = $Args[1]
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$Args[0], 'End') | Out-Null
$mystr = ''
for ($i = 0; $i -lt $Args[0]; $i++)
{
    $mystr = ($mystr) + ([char[]]($fs.ReadByte()))
}
$fs.Close()
Write-Host $mystr

which I call from a batch file containing

我从包含的批处理文件中调用

@PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& '.\myscript.ps1' %1 %2"

(thanks to How to run a PowerShell script from a batch file).

(感谢如何从批处理文件运行 PowerShell 脚本)。

回答by Aacini

This is not an answer, but a large comment as reply to sancho.s' answer.

这不是答案,而是作为对 sancho.s 答案的回复的大评论。

When you want to use small PowerShell scripts from a Batch file, I suggest you to use the method below, that is simpler and allows to keep all the code in the same Batch file:

当您想使用批处理文件中的小型 PowerShell 脚本时,我建议您使用以下方法,该方法更简单,并且允许将所有代码保存在同一个批处理文件中:

@PowerShell  ^
   $fpath = %2;  ^
   $fs = [IO.File]::OpenRead($fpath);  ^
   $fs.Seek(-%1, 'End') ^| Out-Null;  ^
   $mystr = '';  ^
   for ($i = 0; $i -lt %1; $i++)  ^
   {  ^
      $mystr = ($mystr) + ([char[]]($fs.ReadByte()));  ^
   }  ^
   Write-Host $mystr
%End PowerShell%