windows 用于拆分 .csv 文件的批处理文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20602869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Batch file to split .csv file
提问by SeekingAlpha
I have a very large .csv file (>500mb) and I wish to break this up into into smaller .csv files in command prompt. (Basically trying to find a linux "split" function in Windows".
我有一个非常大的 .csv 文件(> 500mb),我希望在命令提示符下将其分解为较小的 .csv 文件。(基本上是试图在 Windows 中找到一个 linux 的“拆分”功能”。
This has to be a batch script as my machine only has windows installed and requesting softwares is a pain. I came across a number of sample codes (http://forums.techguy.org/software-development/1023949-split-100000-line-csv-into.html), however, it does not work when I execute the batch. All I get is one output file that is only 125kb when I requested it to parse every 20 000 lines.
这必须是一个批处理脚本,因为我的机器只安装了 Windows,请求软件很痛苦。我遇到了许多示例代码(http://forums.techguy.org/software-development/1023949-split-100000-line-csv-into.html),但是,当我执行批处理时它不起作用。当我要求它每 20 000 行解析一次时,我得到的只是一个只有 125kb 的输出文件。
Has anyone ever come across a similar problem and how did you resolve the issue?
有没有人遇到过类似的问题,你是如何解决这个问题的?
回答by Dale
Try this out:
试试这个:
@echo off
setLocal EnableDelayedExpansion
set limit=20000
set file=export.csv
set lineCounter=1
set filenameCounter=1
set name=
set extension=
for %%a in (%file%) do (
set "name=%%~na"
set "extension=%%~xa"
)
for /f "tokens=*" %%a in (%file%) do (
set splitFile=!name!-part!filenameCounter!!extension!
if !lineCounter! gtr !limit! (
set /a filenameCounter=!filenameCounter! + 1
set lineCounter=1
echo Created !splitFile!.
)
echo %%a>> !splitFile!
set /a lineCounter=!lineCounter! + 1
)
As shown in the code above, it will split the original csv file into multiple csv file with a limit of 20 000 lines. All you have to do is to change the !file!
and !limit!
variable accordingly. Hope it helps.
如上面的代码所示,它将原始csv文件拆分为多个csv文件,限制为20 000行。您所要做的就是相应地更改!file!
和!limit!
变量。希望能帮助到你。
回答by Gonki
A free windows app that does that
一个免费的 Windows 应用程序可以做到这一点
http://www.addictivetips.com/windows-tips/csv-splitter-for-windows/
http://www.additivetips.com/windows-tips/csv-splitter-for-windows/
回答by hhh
Use the cgwin command SPLIT. Samples
使用 cgwin 命令 SPLIT。样品
To split a file every 500 lines counts:
每 500 行拆分一个文件计数:
split -l 500 [filename.ext]
by default, it adds xa,xb,xc... to filename after extension
默认情况下,它会在扩展名后添加 xa,xb,xc... 到文件名
To generate files with numbers and ending in correct extension, use following
要生成带有数字并以正确扩展名结尾的文件,请使用以下命令
split -l 1000 sourcefilename.ext destinationfilename -d --additional-suffix=.ext
the position of -d or -l does not matter,
-d 或 -l 的位置无关紧要,
- "-d"is same as ??numeric?suffixes
- "-l"is same as --lines
- “-d”与??numeric?suffixes相同
- "-l"与--lines相同
For more: split --help
更多信息:split --help
回答by sancho.s ReinstateMonicaCellio
If splitting very large files, the solution I found is an adaptation from this, with PowerShell "embedded" in a batch file. This works fast, as opposed to many other things I tried (I wouldn't know about other options posted here).
如果拆分非常大的文件,我找到的解决方案是改编自this,将 PowerShell“嵌入”在批处理文件中。与我尝试过的许多其他事情相比,这工作得很快(我不知道这里发布的其他选项)。
The way to use mysplit.bat
below is
mysplit.bat
下面的使用方法是
mysplit.bat <mysize> 'myfile'
mysplit.bat <mysize> 'myfile'
Note: The script was intended to use the first argument as the split size. It is currently hardcoded at 100Mb. It should not be difficult to fix this.
注意:该脚本旨在使用第一个参数作为拆分大小。它目前硬编码为 100Mb。解决这个问题应该不难。
Note 2: The filname should be enclosed in single quotes. Other alternatives for quoting apparently do not work.
注2:文件名应该用单引号括起来。其他引用的替代方法显然不起作用。
Note 3: It splits the file at given number of bytes, not at given number of lines. For me this was good enough. Some lines of code could be probably added to complete each chunk read, up to the next CR/LF. This will split in full lines (not with a constant number of them), with no sacrifice in processing time.
注 3:它以给定的字节数而不是给定的行数拆分文件。对我来说,这已经足够了。可能会添加一些代码行来完成每个块读取,直到下一个 CR/LF。这将分成整行(而不是固定数量),而不会牺牲处理时间。
Script mysplit.bat
:
脚本mysplit.bat
:
@REM Using https://stackoverflow.com/questions/19335004/how-to-run-a-powershell-script-from-a-batch-file
@REM and https://stackoverflow.com/questions/1001776/how-can-i-split-a-text-file-using-powershell
@PowerShell ^
$upperBound = 100MB; ^
$rootName = %2; ^
$from = $rootName; ^
$fromFile = [io.file]::OpenRead($from); ^
$buff = new-object byte[] $upperBound; ^
$count = $idx = 0; ^
try { ^
do { ^
'Reading ' + $upperBound; ^
$count = $fromFile.Read($buff, 0, $buff.Length); ^
if ($count -gt 0) { ^
$to = '{0}.{1}' -f ($rootName, $idx); ^
$toFile = [io.file]::OpenWrite($to); ^
try { ^
'Writing ' + $count + ' to ' + $to; ^
$tofile.Write($buff, 0, $count); ^
} finally { ^
$tofile.Close(); ^
} ^
} ^
$idx ++; ^
} while ($count -gt 0); ^
} ^
finally { ^
$fromFile.Close(); ^
} ^
%End PowerShell%
回答by foxidrive
This will give you lines 1 to 20000
in newfile1.csv
and lines 20001 to the end
in file newfile2.csv
这会给你行1 to 20000
中newfile1.csv
和行20001 to the end
文件newfile2.csv
It overcomes the 8K character limit per line too.
它也克服了每行 8K 字符的限制。
This uses a helper batch file called findrepl.bat
from - https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.bat
这使用findrepl.bat
从 - https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.bat调用的帮助程序批处理文件
Place findrepl.bat
in the same folder as the batch file or on the path.
将findrepl.bat
在同一文件夹中的批处理文件或路径上。
It's more robust than a plain batch file, and quicker too.
它比普通的批处理文件更强大,也更快。
findrepl /o:1:20000 <file.csv >newfile1.csv
findrepl /o:20001 <file.csv >newfile2.csv
回答by SuperMykEl
I found this question while looking for a similar solution. I modified the answer that @Dale gave to suit my purposes. I wanted something that was a little more flexible and had some error trapping. Just thought I might put it here for anyone looking for the same thing.
我在寻找类似的解决方案时发现了这个问题。我修改了@Dale 给出的答案以适合我的目的。我想要一些更灵活并且有一些错误陷阱的东西。只是想我可以把它放在这里供任何寻找相同事物的人使用。
@echo off
setLocal EnableDelayedExpansion
GOTO checkvars
:checkvars
IF "%1"=="" GOTO syntaxerror
IF NOT "%1"=="-f" GOTO syntaxerror
IF %2=="" GOTO syntaxerror
IF NOT EXIST %2 GOTO nofile
IF "%3"=="" GOTO syntaxerror
IF NOT "%3"=="-n" GOTO syntaxerror
IF "%4"=="" GOTO syntaxerror
set param=%4
echo %param%| findstr /xr "[1-9][0-9]* 0" >nul && (
goto proceed
) || (
echo %param% is NOT a valid number
goto syntaxerror
)
:proceed
set limit=%4
set file=%2
set lineCounter=1+%limit%
set filenameCounter=0
set name=
set extension=
for %%a in (%file%) do (
set "name=%%~na"
set "extension=%%~xa"
)
for /f "usebackq tokens=*" %%a in (%file%) do (
if !lineCounter! gtr !limit! (
set splitFile=!name!_part!filenameCounter!!extension!
set /a filenameCounter=!filenameCounter! + 1
set lineCounter=1
echo Created !splitFile!.
)
cls
echo Adding Line !splitFile! - !lineCounter!
echo %%a>> !splitFile!
set /a lineCounter=!lineCounter! + 1
)
echo Done!
goto end
:syntaxerror
Echo Syntax: %0 -f Filename -n "Number Of Rows Per File"
goto end
:nofile
echo %2 does not exist
goto end
:end