Linux 如何根据指定的行数拆分 CSV 文件？

Question

提问by Pawan Mude

I've CSV file (around 10,000 rows ; each row having 300 columns) stored on LINUX server. I want to break this CSV file into 500 CSV files of 20 records each. (Each having same CSV header as present in original CSV)

我在 LINUX 服务器上存储了 CSV 文件（大约 10,000 行；每行有 300 列）。我想将此 CSV 文件分解为 500 个 CSV 文件，每个文件有 20 条记录。（每个都具有与原始 CSV 中相同的 CSV 标头）

Is there any linux command to help this conversion?

是否有任何 linux 命令可以帮助此转换？

Answer 1

采纳答案by Martin Dinov

Made it into a function. You can now call splitCsv <Filename> [chunkSize]

把它变成了一个函数。你现在可以打电话splitCsv <Filename> [chunkSize]

splitCsv() {
    HEADER=$(head -1 )
    if [ -n "" ]; then
        CHUNK=
    else 
        CHUNK=1000
    fi
    tail -n +2  | split -l $CHUNK - _split_
    for i in _split_*; do
        echo -e "$HEADER\n$(cat $i)" > $i
    done
}

Found on: http://edmondscommerce.github.io/linux/linux-split-file-eg-csv-and-keep-header-row.html

发现于：http: //edmondscommerce.github.io/linux/linux-split-file-eg-csv-and-keep-header-row.html

Answer 2

回答by James King

Use the Linux split command:

使用 Linux 拆分命令：

split -l 20 file.txt new

Split the file "file.txt" into files beginning with the name "new" each containing 20 lines of text each.

将文件“file.txt”拆分为以名称“new”开头的文件，每个文件包含 20 行文本。

Type man splitat the Unix prompt for more information. However you will have to first remove the header from file.txt (using the tailcommand, for example) and then add it back on to each of the split files.

man split在 Unix 提示符下键入以获取更多信息。但是，您必须首先从 file.txt 中删除标题（tail例如，使用该命令），然后将其重新添加到每个拆分文件中。

Answer 3

回答by Mark Setchell

This should do it for you - all your files will end up called Part1-Part500.

这应该为您完成 - 您的所有文件最终都将称为 Part1-Part500。

#!/bin/bash
FILENAME=10000.csv
HDR=$(head -1 $FILENAME)   # Pick up CSV header line to apply to each file
split -l 20 $FILENAME xyz  # Split the file into chunks of 20 lines each
n=1
for f in xyz*              # Go through all newly created chunks
do
   echo $HDR > Part${n}    # Write out header to new file called "Part(n)"
   cat $f >> Part${n}      # Add in the 20 lines from the "split" command
   rm $f                   # Remove temporary file
   ((n++))                 # Increment name of output part
done

Answer 4

回答by Coral

This should work !!!

这应该有效！！！

file_name= Name of the file you want to split.
10000= Number of rows each split file would contain
file_part_= Prefix of split file name (file_part_0,file_part_1,file_part_2..etc goes on)

file_name= 要拆分的文件的名称。
10000= 每个拆分文件将包含的行数
file_part_= 拆分文件名的前缀（file_part_0、file_part_1、file_part_2..etc 继续）

split -d -l 10000 file_name.csv file_part_

Answer 5

回答by Tim Richardson

I have a one-liner answer (this gives you 999 lines of data and one header row per file)

我有一个单行答案（这为您提供了 999 行数据和每个文件的一个标题行）

cat bigFile.csv | parallel --header : --pipe -N999 'cat >file_{#}.csv'

https://stackoverflow.com/a/53062251/401226

Linux 如何根据指定的行数拆分 CSV 文件？

提问by Pawan Mude

采纳答案by Martin Dinov

回答by James King

回答by Mark Setchell

回答by Coral

回答by Tim Richardson

相关推荐

最近更新

标签

Linux 如何根据指定的行数拆分 CSV 文件？

提问by Pawan Mude

采纳答案by Martin Dinov

回答by James King

回答by Mark Setchell

回答by Coral

回答by Tim Richardson

相关推荐

Linux 为什么有这么多apache进程在运行？

C# 比较字符串与容差

Linux Node.js bash: /usr/local/bin/node: 权限被拒绝

Linux chmod 一个新安装的外部驱动器来设置写入访问

相关推荐

最近更新

标签