Linux 如何根据指定的行数拆分 CSV 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20721120/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-07 01:41:33  来源:igfitidea点击:

How to split CSV files as per number of rows specified?

linuxunixcsvsplit

提问by Pawan Mude

I've CSV file (around 10,000 rows ; each row having 300 columns) stored on LINUX server. I want to break this CSV file into 500 CSV files of 20 records each. (Each having same CSV header as present in original CSV)

我在 LINUX 服务器上存储了 CSV 文件(大约 10,000 行;每行有 300 列)。我想将此 CSV 文件分解为 500 个 CSV 文件,每个文件有 20 条记录。(每个都具有与原始 CSV 中相同的 CSV 标头)

Is there any linux command to help this conversion?

是否有任何 linux 命令可以帮助此转换?

采纳答案by Martin Dinov

Made it into a function. You can now call splitCsv <Filename> [chunkSize]

把它变成了一个函数。你现在可以打电话splitCsv <Filename> [chunkSize]

splitCsv() {
    HEADER=$(head -1 )
    if [ -n "" ]; then
        CHUNK=
    else 
        CHUNK=1000
    fi
    tail -n +2  | split -l $CHUNK - _split_
    for i in _split_*; do
        echo -e "$HEADER\n$(cat $i)" > $i
    done
}

Found on: http://edmondscommerce.github.io/linux/linux-split-file-eg-csv-and-keep-header-row.html

发现于:http: //edmondscommerce.github.io/linux/linux-split-file-eg-csv-and-keep-header-row.html

回答by James King

Use the Linux split command:

使用 Linux 拆分命令:

split -l 20 file.txt new    

Split the file "file.txt" into files beginning with the name "new" each containing 20 lines of text each.

将文件“file.txt”拆分为以名称“new”开头的文件,每个文件包含 20 行文本。

Type man splitat the Unix prompt for more information. However you will have to first remove the header from file.txt (using the tailcommand, for example) and then add it back on to each of the split files.

man split在 Unix 提示符下键入以获取更多信息。但是,您必须首先从 file.txt 中删除标题(tail例如,使用该命令),然后将其重新添加到每个拆分文件中。

回答by Mark Setchell

This should do it for you - all your files will end up called Part1-Part500.

这应该为您完成 - 您的所有文件最终都将称为 Part1-Part500。

#!/bin/bash
FILENAME=10000.csv
HDR=$(head -1 $FILENAME)   # Pick up CSV header line to apply to each file
split -l 20 $FILENAME xyz  # Split the file into chunks of 20 lines each
n=1
for f in xyz*              # Go through all newly created chunks
do
   echo $HDR > Part${n}    # Write out header to new file called "Part(n)"
   cat $f >> Part${n}      # Add in the 20 lines from the "split" command
   rm $f                   # Remove temporary file
   ((n++))                 # Increment name of output part
done

回答by Coral

This should work !!!

这应该有效!!!

file_name= Name of the file you want to split.
10000= Number of rows each split file would contain
file_part_= Prefix of split file name (file_part_0,file_part_1,file_part_2..etc goes on)

file_name= 要拆分的文件的名称。
10000= 每个拆分文件将包含的行数
file_part_= 拆分文件名的前缀(file_part_0、file_part_1、file_part_2..etc 继续)

split -d -l 10000 file_name.csv file_part_

split -d -l 10000 file_name.csv file_part_

回答by Tim Richardson

I have a one-liner answer (this gives you 999 lines of data and one header row per file)

我有一个单行答案(这为您提供了 999 行数据和每个文件的一个标题行)

cat bigFile.csv | parallel --header : --pipe -N999 'cat >file_{#}.csv'

https://stackoverflow.com/a/53062251/401226

https://stackoverflow.com/a/53062251/401226