bash 如何将文件分成相等的部分，而不会破坏单独的行？

Question

提问by Abdel

I was wondering if it was possible to split a file into equal parts (edit:= all equal except for the last), without breaking the line? Using the split command in Unix, lines may be broken in half. Is there a way to, say, split up a file in 5 equal parts, but have it still only consist of whole lines (it's no problem if one of the files is a little larger or smaller)? I know I could just calculate the number of lines, but I have to do this for a lot of files in a bash script. Many thanks!

我想知道是否可以将文件分成相等的部分（编辑：= 除最后一个外都相等），而不会断线？在 Unix 中使用 split 命令，行可能会被分成两半。有没有办法，比如说，将一个文件分成 5 个相等的部分，但它仍然只包含整行（如果其中一个文件大一点或小一点也没有问题）？我知道我可以只计算行数，但是我必须对 bash 脚本中的很多文件执行此操作。非常感谢！

Answer 1

回答by paxdiablo

If you mean an equal number of lines,splithas an option for this:

如果您的意思是行数相等，split则可以选择：

split --lines=75

If you need to know what that 75should really be for Nequal parts, its:

如果您需要知道75对于N相等的部分应该是什么，它的：

lines_per_part = int(total_lines + N - 1) / N

where total lines can be obtained with wc -l.

可以用获得总行数wc -l。

See the following script for an example:

有关示例，请参见以下脚本：

#!/usr/bin/bash

# Configuration stuff

fspec=qq.c
num_files=6

# Work out lines per file.

total_lines=$(wc -l <${fspec})
((lines_per_file = (total_lines + num_files - 1) / num_files))

# Split the actual file, maintaining lines.

split --lines=${lines_per_file} ${fspec} xyzzy.

# Debug information

echo "Total lines     = ${total_lines}"
echo "Lines  per file = ${lines_per_file}"    
wc -l xyzzy.*

This outputs:

这输出：

Total lines     = 70
Lines  per file = 12
  12 xyzzy.aa
  12 xyzzy.ab
  12 xyzzy.ac
  12 xyzzy.ad
  12 xyzzy.ae
  10 xyzzy.af
  70 total

More recent versions of splitallow you to specify a number of CHUNKSwith the -n/--numberoption. You can therefore use something like:

更新的版本split允许您CHUNKS使用-n/--number选项指定多个。因此，您可以使用以下内容：

split --number=l/6 ${fspec} xyzzy.

(that's ell-slash-six, meaning lines, not one-slash-six).

（即ell-slash-six，意思lines，不是one-slash-six）。

That will give you roughly equal files in terms of size, with no mid-line splits.

这将使您在大小方面大致相同的文件，没有中线分割。

I mention that last point because it doesn't give you roughly the same number of linesin each file, more the same number of characters.

我提到最后一点是因为它不会为您提供每个文件中大致相同的行数，而是更多相同的字符数。

So, if you have one 20-character line and 19 1-character lines (twenty lines in total) and split to five files, you most likely won'tget four lines in every file.

因此，如果您有一个 20 个字符的行和 19 个 1 个字符的行（总共 20 行）并拆分为五个文件，那么您很可能不会在每个文件中得到四行。

Answer 2

回答by jbr

The script isn't even necessary, split(1)supports the wanted feature out of the box:
split -l 75 auth.log auth.log.The above command splits the file in chunks of 75 lines a piece, and outputs file on the form: auth.log.aa, auth.log.ab, ...

该脚本甚至不是必需的，split(1)支持开箱即用的所需功能：
split -l 75 auth.log auth.log.上面的命令将文件分成 75 行的块，并在表单上输出文件：auth.log.aa, auth.log.ab, ...

wc -lon the original file and output gives:

wc -l在原始文件和输出上给出：

  321 auth.log
   75 auth.log.aa
   75 auth.log.ab
   75 auth.log.ac
   75 auth.log.ad
   21 auth.log.ae
  642 total

Answer 3

回答by user3769065

split was updated in coreutils release 8.8 (announced 22 Dec 2010) with the --number option to generate a specific number of files. The option --number=l/n generates n files without splitting lines.

split 在 coreutils 8.8 版（2010 年 12 月 22 日发布）中更新，使用 --number 选项生成特定数量的文件。选项 --number=l/n 生成 n 个文件而不拆分行。

http://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html#split-invocation http://savannah.gnu.org/forum/forum.php?forum_id=6662

Answer 4

回答by Kuf

A simple solution for a simple question:

一个简单问题的简单解决方案：

split -n l/5 your_file.txt

no need for scripting here.

这里不需要编写脚本。

From the manfile, CHUNKS may be:

从man文件中，CHUNKS may be:

l/N     split into N files without splitting lines

Update

更新

Not all unix dist include this flag. For example, it will not work in OSX. To use it, you can consider replacing the Mac OS X utilities with GNU core utilities.

并非所有的 unix dist 都包含这个标志。例如，它在 OSX 中不起作用。要使用它，您可以考虑将 Mac OS X 实用程序替换为 GNU 核心实用程序。

Answer 5

回答by Jose Ricardo Bustos M.

I made a bash script, that given a number of parts as input, split a file

我制作了一个 bash 脚本，将多个部分作为输入，拆分文件

#!/bin/sh

parts_total="";
input="";

parts=$((parts_total))
for i in $(seq 0 $((parts_total-2))); do
  lines=$(wc -l "$input" | cut -f 1 -d" ")
  #n is rounded, 1.3 to 2, 1.6 to 2, 1 to 1
  n=$(awk  -v lines=$lines -v parts=$parts 'BEGIN { 
    n = lines/parts;
    rounded = sprintf("%.0f", n);
    if(n>rounded){
      print rounded + 1;
    }else{
      print rounded;
    }
  }');
  head -$n "$input" > split${i}
  tail -$((lines-n)) "$input" > .tmp${i}
  input=".tmp${i}"
  parts=$((parts-1));
done
mv .tmp$((parts_total-2)) split$((parts_total-1))
rm .tmp*

I used headand tailcommands, and store in tmp files, for split the files

我使用head和tail命令，并存储在 tmp 文件中，用于拆分文件

#10 means 10 parts
sh mysplitXparts.sh input_file 10

or with awk, where 0.1 is 10% => 10 parts, or 0.334 is 3 parts

或使用 awk，其中 0.1 为 10% => 10 份，或 0.334 为 3 份

awk -v size=$(wc -l < input) -v perc=0.1 '{
  nfile = int(NR/(size*perc)); 
  if(nfile >= 1/perc){
    nfile--;
  } 
  print > "split_"nfile
}' input

Answer 6

回答by Prabu

var dict = File.ReadLines("test.txt")
               .Where(line => !string.IsNullOrWhitespace(line))
               .Select(line => line.Split(new char[] { '=' }, 2, 0))
               .ToDictionary(parts => parts[0], parts => parts[1]);


or 

    enter code here

line="[email protected][email protected]";
string[] tokens = line.Split(new char[] { '=' }, 2, 0);

ans:
tokens[0]=to
token[1][email protected][email protected]"

bash 如何将文件分成相等的部分，而不会破坏单独的行？

提问by Abdel

回答by paxdiablo

回答by jbr

回答by user3769065

回答by Kuf

回答by Jose Ricardo Bustos M.

回答by Prabu

相关推荐

最近更新

标签

bash 如何将文件分成相等的部分，而不会破坏单独的行？

提问by Abdel

回答by paxdiablo

回答by jbr

回答by user3769065

回答by Kuf

回答by Jose Ricardo Bustos M.

回答by Prabu

相关推荐

使用 Bash 逐行读取并保留空间

bash：获取第 n 列输出的最短方法

Bash 续行

bash UNIX 导出命令

相关推荐

最近更新

标签