windows 用于拆分 .csv 文件的批处理文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20602869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 11:11:26  来源:igfitidea点击:

Batch file to split .csv file

windowsbatch-filecsvcommand-prompt

提问by SeekingAlpha

I have a very large .csv file (>500mb) and I wish to break this up into into smaller .csv files in command prompt. (Basically trying to find a linux "split" function in Windows".

我有一个非常大的 .csv 文件(> 500mb),我希望在命令提示符下将其分解为较小的 .csv 文件。(基本上是试图在 Windows 中找到一个 linux 的“拆分”功能”。

This has to be a batch script as my machine only has windows installed and requesting softwares is a pain. I came across a number of sample codes (http://forums.techguy.org/software-development/1023949-split-100000-line-csv-into.html), however, it does not work when I execute the batch. All I get is one output file that is only 125kb when I requested it to parse every 20 000 lines.

这必须是一个批处理脚本,因为我的机器只安装了 Windows,请求软件很痛苦。我遇到了许多示例代码(http://forums.techguy.org/software-development/1023949-split-100000-line-csv-into.html),但是,当我执行批处理时它不起作用。当我要求它每 20 000 行解析一次时,我得到的只是一个只有 125kb 的输出文件。

Has anyone ever come across a similar problem and how did you resolve the issue?

有没有人遇到过类似的问题,你是如何解决这个问题的?

回答by Dale

Try this out:

试试这个:

@echo off
setLocal EnableDelayedExpansion

set limit=20000
set file=export.csv
set lineCounter=1
set filenameCounter=1

set name=
set extension=
for %%a in (%file%) do (
    set "name=%%~na"
    set "extension=%%~xa"
)

for /f "tokens=*" %%a in (%file%) do (
    set splitFile=!name!-part!filenameCounter!!extension!
    if !lineCounter! gtr !limit! (
        set /a filenameCounter=!filenameCounter! + 1
        set lineCounter=1
        echo Created !splitFile!.
    )
    echo %%a>> !splitFile!

    set /a lineCounter=!lineCounter! + 1
)

As shown in the code above, it will split the original csv file into multiple csv file with a limit of 20 000 lines. All you have to do is to change the !file!and !limit!variable accordingly. Hope it helps.

如上面的代码所示,它将原始csv文件拆分为多个csv文件,限制为20 000行。您所要做的就是相应地更改!file!!limit!变量。希望能帮助到你。

回答by Gonki

A free windows app that does that

一个免费的 Windows 应用程序可以做到这一点

http://www.addictivetips.com/windows-tips/csv-splitter-for-windows/

http://www.additivetips.com/windows-tips/csv-splitter-for-windows/

回答by hhh

Use the cgwin command SPLIT. Samples

使用 cgwin 命令 SPLIT。样品

To split a file every 500 lines counts:

每 500 行拆分一个文件计数:

split -l 500 [filename.ext]

by default, it adds xa,xb,xc... to filename after extension

默认情况下,它会在扩展名后添加 xa,xb,xc... 到文件名

To generate files with numbers and ending in correct extension, use following

要生成带有数字并以正确扩展名结尾的文件,请使用以下命令

split -l 1000 sourcefilename.ext destinationfilename -d --additional-suffix=.ext

the position of -d or -l does not matter,

-d 或 -l 的位置无关紧要,

  • "-d"is same as ??numeric?suffixes
  • "-l"is same as --lines
  • “-d”??numeric?suffixes相同
  • "-l"--lines相同

For more: split --help

更多信息:split --help

回答by sancho.s ReinstateMonicaCellio

If splitting very large files, the solution I found is an adaptation from this, with PowerShell "embedded" in a batch file. This works fast, as opposed to many other things I tried (I wouldn't know about other options posted here).

如果拆分非常大的文件,我找到的解决方案是改编自this,将 PowerShell“嵌入”在批处理文件中。与我尝试过的许多其他事情相比,这工作得很快(我不知道这里发布的其他选项)。

The way to use mysplit.batbelow is

mysplit.bat下面的使用方法是

mysplit.bat <mysize> 'myfile'

mysplit.bat <mysize> 'myfile'

Note: The script was intended to use the first argument as the split size. It is currently hardcoded at 100Mb. It should not be difficult to fix this.

注意:该脚本旨在使用第一个参数作为拆分大小。它目前硬编码为 100Mb。解决这个问题应该不难。

Note 2: The filname should be enclosed in single quotes. Other alternatives for quoting apparently do not work.

注2:文件名应该用单引号括起来。其他引用的替代方法显然不起作用。

Note 3: It splits the file at given number of bytes, not at given number of lines. For me this was good enough. Some lines of code could be probably added to complete each chunk read, up to the next CR/LF. This will split in full lines (not with a constant number of them), with no sacrifice in processing time.

注 3:它以给定的字节数而不是给定的行数拆分文件。对我来说,这已经足够了。可能会添加一些代码行来完成每个块读取,直到下一个 CR/LF。这将分成整行(而不是固定数量),而不会牺牲处理时间。

Script mysplit.bat:

脚本mysplit.bat

@REM Using https://stackoverflow.com/questions/19335004/how-to-run-a-powershell-script-from-a-batch-file
@REM and https://stackoverflow.com/questions/1001776/how-can-i-split-a-text-file-using-powershell
@PowerShell  ^
    $upperBound = 100MB;  ^
    $rootName = %2;  ^
    $from = $rootName;  ^
    $fromFile = [io.file]::OpenRead($from);  ^
    $buff = new-object byte[] $upperBound;  ^
    $count = $idx = 0;  ^
    try {  ^
        do {  ^
            'Reading ' + $upperBound;  ^
            $count = $fromFile.Read($buff, 0, $buff.Length);  ^
            if ($count -gt 0) {  ^
                $to = '{0}.{1}' -f ($rootName, $idx);  ^
                $toFile = [io.file]::OpenWrite($to);  ^
                try {  ^
                    'Writing ' + $count + ' to ' + $to;  ^
                    $tofile.Write($buff, 0, $count);  ^
                } finally {  ^
                    $tofile.Close();  ^
                }  ^
            }  ^
            $idx ++;  ^
        } while ($count -gt 0);  ^
    }  ^
    finally {  ^
        $fromFile.Close();  ^
    }  ^
%End PowerShell%

回答by foxidrive

This will give you lines 1 to 20000in newfile1.csv
and lines 20001 to the endin file newfile2.csv

这会给你行1 to 20000newfile1.csv
和行20001 to the end文件newfile2.csv

It overcomes the 8K character limit per line too.

它也克服了每行 8K 字符的限制。

This uses a helper batch file called findrepl.batfrom - https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.bat

这使用findrepl.bat从 - https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.bat调用的帮助程序批处理文件

Place findrepl.batin the same folder as the batch file or on the path.

findrepl.bat在同一文件夹中的批处理文件或路径上。

It's more robust than a plain batch file, and quicker too.

它比普通的批处理文件更强大,也更快。

findrepl /o:1:20000 <file.csv >newfile1.csv
findrepl /o:20001   <file.csv >newfile2.csv

回答by SuperMykEl

I found this question while looking for a similar solution. I modified the answer that @Dale gave to suit my purposes. I wanted something that was a little more flexible and had some error trapping. Just thought I might put it here for anyone looking for the same thing.

我在寻找类似的解决方案时发现了这个问题。我修改了@Dale 给出的答案以适合我的目的。我想要一些更灵活并且有一些错误陷阱的东西。只是想我可以把它放在这里供任何寻找相同事物的人使用。

@echo off
setLocal EnableDelayedExpansion
GOTO checkvars

:checkvars
    IF "%1"=="" GOTO syntaxerror
    IF NOT "%1"=="-f"  GOTO syntaxerror
    IF %2=="" GOTO syntaxerror
    IF NOT EXIST %2 GOTO nofile
    IF "%3"=="" GOTO syntaxerror
    IF NOT "%3"=="-n" GOTO syntaxerror
    IF "%4"==""  GOTO syntaxerror
    set param=%4
    echo %param%| findstr /xr "[1-9][0-9]* 0" >nul && (
        goto proceed
    ) || (
        echo %param% is NOT a valid number
        goto syntaxerror
    )

:proceed
    set limit=%4
    set file=%2
    set lineCounter=1+%limit%
    set filenameCounter=0

    set name=
    set extension=

    for %%a in (%file%) do (
        set "name=%%~na"
        set "extension=%%~xa"
    )

    for /f "usebackq tokens=*" %%a in (%file%) do (
        if !lineCounter! gtr !limit! (
            set splitFile=!name!_part!filenameCounter!!extension!
            set /a filenameCounter=!filenameCounter! + 1
            set lineCounter=1
            echo Created !splitFile!.
        )
        cls
        echo Adding Line !splitFile! - !lineCounter!
        echo %%a>> !splitFile!
        set /a lineCounter=!lineCounter! + 1
    )
    echo Done!
    goto end
:syntaxerror
    Echo Syntax: %0 -f Filename -n "Number Of Rows Per File"
    goto end
:nofile
    echo %2 does not exist
    goto end
:end