windows 如何在Windows中拆分大文本文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31786287/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split large text file in windows?
提问by Albin
I have a log file with size of 2.5 GB. Is there any way to split this file into smaller files using windows command prompt?
我有一个大小为 2.5 GB 的日志文件。有没有办法使用 Windows 命令提示符将此文件拆分为较小的文件?
回答by Josh Withee
If you have installed Git for Windows, you should have Git Bash installed, since that comes with Git.
如果您已经安装了 Windows 版 Git,则应该安装 Git Bash,因为 Git 附带了它。
Use the split
command in Git Bash to split a file:
使用split
Git Bash 中的命令来拆分文件:
into files of size 500MB each:
split myLargeFile.txt -b 500m
into files with 10000 lines each:
split myLargeFile.txt -l 10000
成每个大小为 500MB 的文件:
split myLargeFile.txt -b 500m
到每个 10000 行的文件中:
split myLargeFile.txt -l 10000
Tips:
提示:
If you don't have Git/Git Bash, download at https://git-scm.com/download
If you lost the shortcut to Git Bash, you can run it using
C:\Program Files\Git\git-bash.exe
如果您没有 Git/Git Bash,请在https://git-scm.com/download下载
如果您丢失了 Git Bash 的快捷方式,您可以使用
C:\Program Files\Git\git-bash.exe
That's it!
就是这样!
I always like examples though...
不过我总是喜欢例子......
Example:
例子:
You can see in this image that the files generated by split
are named xaa
, xab
, xac
, etc.
您可以在此图像,通过生成的文件中看到的split
被命名为xaa
,xab
,xac
,等。
These names are made up of a prefix and a suffix, which you can specify. Since I didn't specify what I want the prefix or suffix to look like, the prefix defaulted to x
, and the suffix defaulted to a two-character alphabetical enumeration.
这些名称由您可以指定的前缀和后缀组成。由于我没有指定我希望前缀或后缀的外观,因此前缀默认为x
,后缀默认为两个字符的字母枚举。
Another Example:
另一个例子:
This example demonstrates
这个例子演示了
- using a filename prefix of
MySlice
(instead of the defaultx
), - the
-d
flag for using numerical suffixes (instead ofaa
,ab
,ac
, etc...), - and the option
-a 5
to tell it I want the suffixes to be 5 digits long:
- 使用文件名前缀
MySlice
(而不是默认的x
), - 所述
-d
用于使用数字后缀标志(而不是aa
,ab
,ac
等等), - 以及
-a 5
告诉它我希望后缀长度为 5 位数的选项:
回答by bill
Set Arg = WScript.Arguments
set WshShell = createObject("Wscript.Shell")
Set Inp = WScript.Stdin
Set Outp = Wscript.Stdout
Set rs = CreateObject("ADODB.Recordset")
With rs
.Fields.Append "LineNumber", 4
.Fields.Append "Txt", 201, 5000
.Open
LineCount = 0
Do Until Inp.AtEndOfStream
LineCount = LineCount + 1
.AddNew
.Fields("LineNumber").value = LineCount
.Fields("Txt").value = Inp.readline
.UpDate
Loop
.Sort = "LineNumber ASC"
If LCase(Arg(1)) = "t" then
If LCase(Arg(2)) = "i" then
.filter = "LineNumber < " & LCase(Arg(3)) + 1
ElseIf LCase(Arg(2)) = "x" then
.filter = "LineNumber > " & LCase(Arg(3))
End If
ElseIf LCase(Arg(1)) = "b" then
If LCase(Arg(2)) = "i" then
.filter = "LineNumber > " & LineCount - LCase(Arg(3))
ElseIf LCase(Arg(2)) = "x" then
.filter = "LineNumber < " & LineCount - LCase(Arg(3)) + 1
End If
End If
Do While not .EOF
Outp.writeline .Fields("Txt").Value
.MoveNext
Loop
End With
Cut
切
filter cut {t|b} {i|x} NumOfLines
Cuts the number of lines from the top or bottom of file.
从文件的顶部或底部减少行数。
t - top of the file
b - bottom of the file
i - include n lines
x - exclude n lines
Example
例子
cscript /nologo filter.vbs cut t i 5 < "%systemroot%\win.ini"
Another way This outputs lines 5001+, adapt for your use. This uses almost no memory.
另一种方式这输出线5001+,适应您的使用。这几乎不使用内存。
Do Until Inp.AtEndOfStream
Count = Count + 1
If count > 5000 then
OutP.WriteLine Inp.Readline
End If
Loop
回答by Zimba
Of course there is! Win CMD can do a lot more than just split text files :)
当然有!Win CMD 可以做的不仅仅是拆分文本文件:)
Split a text file into separate files of 'max' lines each:
将文本文件拆分为每个“最大”行的单独文件:
Split text file (max lines each):
: Initialize
set input=file.txt
set max=10000
set /a line=1 >nul
set /a file=1 >nul
set out=!file!_%input%
set /a max+=1 >nul
echo Number of lines in %input%:
find /c /v "" < %input%
: Split file
for /f "tokens=* delims=[" %i in ('type "%input%" ^| find /v /n ""') do (
if !line!==%max% (
set /a line=1 >nul
set /a file+=1 >nul
set out=!file!_%input%
echo Writing file: !out!
)
REM Write next file
set a=%i
set a=!a:*]=]!
echo:!a:~1!>>out!
set /a line+=1 >nul
)
If above code hangs or crashes, this example code splits files faster (by writing data to intermediate files instead of keeping everything in memory):
如果上述代码挂起或崩溃,此示例代码会更快地拆分文件(通过将数据写入中间文件而不是将所有内容保存在内存中):
eg. To split a file with 7,600 lines into smaller files of maximum 3000 lines.
例如。将包含 7,600 行的文件拆分为最多 3000 行的较小文件。
- Generate regexp string/pattern files with
set
command to be fed to/g
flag offindstr
- 生成正则表达式字符串/模式文件,其中包含
set
要馈送到/g
标志的命令findstr
list1.txt
列表1.txt
\[[0-9]\]
\[[0-9][0-9]\]
\[[0-9][0-9][0-9]\]
\[[0-2][0-9][0-9][0-9]\]
\[[0-9]\]
\[[0-9][0-9]\]
\[[0-9][0-9][0-9]\]
\[[0-2][ 0-9][0-9][0-9]\]
list2.txt
列表2.txt
\[[3-5][0-9][0-9][0-9]\]
\[[3-5][0-9][0-9][0-9]\]
list3.txt
列表3.txt
\[[6-9][0-9][0-9][0-9]\]
\[[6-9][0-9][0-9][0-9]\]
- Split the file into smaller files:
- 将文件拆分为较小的文件:
type "%input%" | find /v /n "" | findstr /b /r /g:list1.txt > file1.txt type "%input%" | find /v /n "" | findstr /b /r /g:list2.txt > file2.txt type "%input%" | find /v /n "" | findstr /b /r /g:list3.txt > file3.txt
type "%input%" | find /v /n "" | findstr /b /r /g:list1.txt > file1.txt type "%input%" | find /v /n "" | findstr /b /r /g:list2.txt > file2.txt type "%input%" | find /v /n "" | findstr /b /r /g:list3.txt > file3.txt
- remove prefixed line numbers for eachfile split:
eg. for the 1st file:
- 删除每个文件拆分的前缀行号:
例如。对于第一个文件:
for /f "tokens=* delims=[" %i in ('type "%cd%\file1.txt"') do ( set a=%i set a=!a:*]=]! echo:!a:~1!>>file_1.txt)
for /f "tokens=* delims=[" %i in ('type "%cd%\file1.txt"') do ( set a=%i set a=!a:*]=]! echo:!a:~1!>>file_1.txt)
Notes:
Works with leading whitespace, blank lines & whitespace lines.
注意:
适用于前导空白、空白行和空白行。
Tested on Win 10 x64 CMD, on 4.4GB text file, 5651982 lines.
在 Win 10 x64 CMD、4.4GB 文本文件、5651982 行上测试。
回答by Shaina Raza
you can split using a third party software http://www.hjsplit.org/, for example give yours input that could be upto 9GB and then split, in my case I split 10 MB each
您可以使用第三方软件http://www.hjsplit.org/进行拆分,例如,输入最多 9GB 的输入,然后拆分,在我的情况下,我每个拆分 10 MB
回答by Wintermute
You can use the command splitfor this task. For example this command entered into the command prompt
您可以将命令split用于此任务。例如这个命令进入命令提示符
split YourLogFile.txt -b 500m
creates several files with a size of 500 MByte each. This will take several minutes for a file of your size. You can rename the output files (by default called "xaa", "xab",... and so on) to *.txt to open it in the editor of your choice.
创建多个大小为 500 MB 的文件。对于您这样大小的文件,这将需要几分钟时间。您可以将输出文件(默认称为“xaa”、“xab”等)重命名为 *.txt 以在您选择的编辑器中打开它。
Make sure to check the help file for the command. You can also split the log file by number of lines or change the name of your output files.
确保检查该命令的帮助文件。您还可以按行数拆分日志文件或更改输出文件的名称。
(tested on Windows 7 64 bit)
(在 Windows 7 64 位上测试)