windows 如何在Windows中拆分大文本文件？

Question

提问by Albin

I have a log file with size of 2.5 GB. Is there any way to split this file into smaller files using windows command prompt?

我有一个大小为 2.5 GB 的日志文件。有没有办法使用 Windows 命令提示符将此文件拆分为较小的文件？

Answer 1

回答by Josh Withee

If you have installed Git for Windows, you should have Git Bash installed, since that comes with Git.

如果您已经安装了 Windows 版 Git，则应该安装 Git Bash，因为 Git 附带了它。

Use the splitcommand in Git Bash to split a file:

使用splitGit Bash 中的命令来拆分文件：

into files of size 500MB each: split myLargeFile.txt -b 500m
into files with 10000 lines each: split myLargeFile.txt -l 10000

成每个大小为 500MB 的文件： split myLargeFile.txt -b 500m
到每个 10000 行的文件中： split myLargeFile.txt -l 10000

Tips:

提示：

If you don't have Git/Git Bash, download at https://git-scm.com/download
If you lost the shortcut to Git Bash, you can run it using C:\Program Files\Git\git-bash.exe

如果您没有 Git/Git Bash，请在https://git-scm.com/download下载
如果您丢失了 Git Bash 的快捷方式，您可以使用 C:\Program Files\Git\git-bash.exe

That's it!

就是这样！

I always like examples though...

不过我总是喜欢例子......

Example:

例子：

You can see in this image that the files generated by splitare named xaa, xab, xac, etc.

您可以在此图像，通过生成的文件中看到的split被命名为xaa，xab，xac，等。

These names are made up of a prefix and a suffix, which you can specify. Since I didn't specify what I want the prefix or suffix to look like, the prefix defaulted to x, and the suffix defaulted to a two-character alphabetical enumeration.

这些名称由您可以指定的前缀和后缀组成。由于我没有指定我希望前缀或后缀的外观，因此前缀默认为x，后缀默认为两个字符的字母枚举。

Another Example:

另一个例子：

This example demonstrates

这个例子演示了

using a filename prefix of MySlice(instead of the default x),
the -dflag for using numerical suffixes (instead of aa, ab, ac, etc...),
and the option -a 5to tell it I want the suffixes to be 5 digits long:

使用文件名前缀MySlice（而不是默认的x），
所述-d用于使用数字后缀标志（而不是aa，ab，ac等等），
以及-a 5告诉它我希望后缀长度为 5 位数的选项：

Answer 2

回答by bill

Set Arg = WScript.Arguments
set WshShell = createObject("Wscript.Shell")
Set Inp = WScript.Stdin
Set Outp = Wscript.Stdout
    Set rs = CreateObject("ADODB.Recordset")
    With rs
        .Fields.Append "LineNumber", 4 

        .Fields.Append "Txt", 201, 5000 
        .Open
        LineCount = 0
        Do Until Inp.AtEndOfStream
            LineCount = LineCount + 1
            .AddNew
            .Fields("LineNumber").value = LineCount
            .Fields("Txt").value = Inp.readline
            .UpDate
        Loop

        .Sort = "LineNumber ASC"

        If LCase(Arg(1)) = "t" then
            If LCase(Arg(2)) = "i" then
                .filter = "LineNumber < " & LCase(Arg(3)) + 1
            ElseIf LCase(Arg(2)) = "x" then
                .filter = "LineNumber > " & LCase(Arg(3))
            End If
        ElseIf LCase(Arg(1)) = "b" then
            If LCase(Arg(2)) = "i" then
                .filter = "LineNumber > " & LineCount - LCase(Arg(3))
            ElseIf LCase(Arg(2)) = "x" then
                .filter = "LineNumber < " & LineCount - LCase(Arg(3)) + 1
            End If
        End If

        Do While not .EOF
            Outp.writeline .Fields("Txt").Value

            .MoveNext
        Loop
    End With

Cut

切

filter cut {t|b} {i|x} NumOfLines

Cuts the number of lines from the top or bottom of file.

从文件的顶部或底部减少行数。

t - top of the file
b - bottom of the file
i - include n lines
x - exclude n lines

Example

例子

cscript /nologo filter.vbs cut t i 5 < "%systemroot%\win.ini"

Another way This outputs lines 5001+, adapt for your use. This uses almost no memory.

另一种方式这输出线5001+，适应您的使用。这几乎不使用内存。

Do Until Inp.AtEndOfStream
         Count = Count + 1
         If count > 5000 then
            OutP.WriteLine Inp.Readline
         End If
Loop

Answer 3

回答by Zimba

Of course there is! Win CMD can do a lot more than just split text files :)

当然有！Win CMD 可以做的不仅仅是拆分文本文件:)

Split a text file into separate files of 'max' lines each:

将文本文件拆分为每个“最大”行的单独文件：

Split text file (max lines each):
: Initialize
set input=file.txt
set max=10000

set /a line=1 >nul
set /a file=1 >nul
set out=!file!_%input%
set /a max+=1 >nul

echo Number of lines in %input%:
find /c /v "" < %input%

: Split file
for /f "tokens=* delims=[" %i in ('type "%input%" ^| find /v /n ""') do (

if !line!==%max% (
set /a line=1 >nul
set /a file+=1 >nul
set out=!file!_%input%
echo Writing file: !out!
)

REM Write next file
set a=%i
set a=!a:*]=]!
echo:!a:~1!>>out!
set /a line+=1 >nul
)

If above code hangs or crashes, this example code splits files faster (by writing data to intermediate files instead of keeping everything in memory):

如果上述代码挂起或崩溃，此示例代码会更快地拆分文件（通过将数据写入中间文件而不是将所有内容保存在内存中）：

eg. To split a file with 7,600 lines into smaller files of maximum 3000 lines.

例如。将包含 7,600 行的文件拆分为最多 3000 行的较小文件。

Generate regexp string/pattern files with setcommand to be fed to /gflag of findstr

生成正则表达式字符串/模式文件，其中包含set要馈送到/g标志的命令findstr

list1.txt

列表1.txt

\[[0-9]\]
\[[0-9][0-9]\]
\[[0-9][0-9][0-9]\]
\[[0-2][0-9][0-9][0-9]\]

\[[0-9]\]
\[[0-9][0-9]\]
\[[0-9][0-9][0-9]\]
\[[0-2][ 0-9][0-9][0-9]\]

list2.txt

列表2.txt

\[[3-5][0-9][0-9][0-9]\]

list3.txt

列表3.txt

\[[6-9][0-9][0-9][0-9]\]

Split the file into smaller files:

将文件拆分为较小的文件：

type "%input%" | find /v /n "" | findstr /b /r /g:list1.txt > file1.txt
type "%input%" | find /v /n "" | findstr /b /r /g:list2.txt > file2.txt
type "%input%" | find /v /n "" | findstr /b /r /g:list3.txt > file3.txt

type "%input%" | find /v /n "" | findstr /b /r /g:list1.txt > file1.txt
type "%input%" | find /v /n "" | findstr /b /r /g:list2.txt > file2.txt
type "%input%" | find /v /n "" | findstr /b /r /g:list3.txt > file3.txt

remove prefixed line numbers for eachfile split:
eg. for the 1st file:

删除每个文件拆分的前缀行号：
例如。对于第一个文件：

for /f "tokens=* delims=[" %i in ('type "%cd%\file1.txt"') do (
set a=%i
set a=!a:*]=]!
echo:!a:~1!>>file_1.txt)

for /f "tokens=* delims=[" %i in ('type "%cd%\file1.txt"') do (
set a=%i
set a=!a:*]=]!
echo:!a:~1!>>file_1.txt)

Notes:
Works with leading whitespace, blank lines & whitespace lines.

注意：
适用于前导空白、空白行和空白行。

Tested on Win 10 x64 CMD, on 4.4GB text file, 5651982 lines.

在 Win 10 x64 CMD、4.4GB 文本文件、5651982 行上测试。

Answer 4

回答by Shaina Raza

you can split using a third party software http://www.hjsplit.org/, for example give yours input that could be upto 9GB and then split, in my case I split 10 MB each

您可以使用第三方软件http://www.hjsplit.org/进行拆分，例如，输入最多 9GB 的输入，然后拆分，在我的情况下，我每个拆分 10 MB

Answer 5

回答by Wintermute

You can use the command splitfor this task. For example this command entered into the command prompt

您可以将命令split用于此任务。例如这个命令进入命令提示符

split YourLogFile.txt -b 500m

creates several files with a size of 500 MByte each. This will take several minutes for a file of your size. You can rename the output files (by default called "xaa", "xab",... and so on) to *.txt to open it in the editor of your choice.

创建多个大小为 500 MB 的文件。对于您这样大小的文件，这将需要几分钟时间。您可以将输出文件（默认称为“xaa”、“xab”等）重命名为 *.txt 以在您选择的编辑器中打开它。

Make sure to check the help file for the command. You can also split the log file by number of lines or change the name of your output files.

确保检查该命令的帮助文件。您还可以按行数拆分日志文件或更改输出文件的名称。

(tested on Windows 7 64 bit)

（在 Windows 7 64 位上测试）

windows 如何在Windows中拆分大文本文件？

提问by Albin

回答by Josh Withee

That's it!

就是这样！

回答by bill

回答by Zimba

回答by Shaina Raza

回答by Wintermute

相关推荐

最近更新

标签

windows 如何在Windows中拆分大文本文件？

提问by Albin

回答by Josh Withee

That's it!

就是这样！

回答by bill

回答by Zimba

回答by Shaina Raza

回答by Wintermute

相关推荐

Windows 10 中的 IIS 管理器

WordPress。电子商务。添加到购物车之前的操作挂钩

windows 使文件可写以添加新包

如何在 Wordpress 中使用 wp_nav_menu() 在 <li> 中添加类？

相关推荐

最近更新

标签