Linux 根据分隔符将一个文件拆分为多个文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11313852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 07:13:32  来源:igfitidea点击:

Split one file into multiple files based on delimiter

linuxunixawksplit

提问by user1499178

I have one file with -|as delimiter after each section...need to create separate files for each section using unix.

-|在每个部分后都有一个带有分隔符的文件...需要使用 unix 为每个部分创建单独的文件。

example of input file

输入文件示例

wertretr
ewretrtret
1212132323
000232
-|
ereteertetet
232434234
erewesdfsfsfs
0234342343
-|
jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|

Expected result in File 1

文件 1 中的预期结果

wertretr
ewretrtret
1212132323
000232
-|

Expected result in File 2

文件 2 中的预期结果

ereteertetet
232434234
erewesdfsfsfs
0234342343
-|

Expected result in File 3

文件 3 中的预期结果

jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|

回答by twalberg

Debian has csplit, but I don't know if that's common to all/most/other distributions. If not, though, it shouldn't be too hard to track down the source and compile it...

Debian 有csplit,但我不知道这是否对所有/大多数/其他发行版通用。如果没有,那么追踪源代码并编译它应该不会太难......

回答by mbonnin

cat file| ( I=0; echo -n "">file0; while read line; do echo $line >> file$I; if [ "$line" == '-|' ]; then I=$[I+1]; echo -n "" > file$I; fi; done )

and the formated version:

和格式化版本:

#!/bin/bash
cat FILE | (
  I=0;
  echo -n"">file0;
  while read line; 
  do
    echo $line >> file$I;
    if [ "$line" == '-|' ];
    then I=$[I+1];
      echo -n "" > file$I;
    fi;
  done;
)

回答by rkyser

You can also use awk. I'm not very familiar with awk, but the following did seem to work for me. It generated part1.txt, part2.txt, part3.txt, and part4.txt. Do note, that the last partn.txt file that this generates is empty. I'm not sure how fix that, but I'm sure it could be done with a little tweaking. Any suggestions anyone?

您也可以使用 awk。我对 awk 不是很熟悉,但以下内容似乎对我有用。它生成了 part1.txt、part2.txt、part3.txt 和 part4.txt。请注意,生成的最后一个 partn.txt 文件是空的。我不确定如何解决这个问题,但我相信它可以通过一些调整来完成。有人有什么建议吗?

awk_pattern file:

awk_pattern 文件:

BEGIN{ fn = "part1.txt"; n = 1 }
{
   print > fn
   if (substr(
#!/usr/bin/perl
open(FI,"file.txt") or die "Input file not found";
$cur=0;
open(FO,">res.$cur.txt") or die "Cannot open output file $cur";
while(<FI>)
{
    print FO $_;
    if(/^-\|/)
    {
        close(FO);
        $cur++;
        open(FO,">res.$cur.txt") or die "Cannot open output file $cur"
    }
}
close(FO);
,1,2) == "-|") { close (fn) n++ fn = "part" n ".txt" } }

bash command:

bash 命令:

awk -f awk_pattern input.file

awk -f awk_pattern input.file

回答by amaksr

Here is a perl code that will do the thing

这是一个可以做这件事的 perl 代码

awk '{print 
csplit --digits=2  --quiet --prefix=outfile infile "/-|/+1" "{*}"
" -|"> "file" NR}' RS='-\|' input-file

回答by William Pursell

$ ./context-split -h
usage:
./context-split [-s separator] [-n name] [-z length]
        -s specifies what regex should separate output files
        -n specifies how output files are named (default: numeric
        -z specifies how long numbered filenames (if any) should be
        -i include line containing separator in output files
        operations are always performed on stdin

Explanation (edited):

说明(已编辑):

RSis the record separator, and this solution uses a gnu awk extension which allows it to be more than one character. NRis the record number.

RS是记录分隔符,此解决方案使用 gnu awk 扩展名,允许它是多个字符。NR是记录号。

The print statement prints a record followed by " -|"into a file that contains the record number in its name.

打印语句将记录打印" -|"到一个文件中,该文件在其名称中包含记录号。

回答by ctrl-alt-delor

A one liner, no programming. (except the regexp etc.)

一个班轮,没有编程。(除了正则表达式等)

#!/path/to/perl -w

#comment the line below for UNIX systems
use Win32::Clipboard;

# Get command line flags

#print ($#ARGV, "\n");
if($#ARGV == 0) {
    print STDERR "usage: ncsplit.pl --mff -- filename.txt [...] \n\nNote that no space is allowed between the '--' and the related parameter.\n\nThe mff is found on a line followed by a filename.  All of the contents of filename.txt are written to that file until another mff is found.\n";
    exit;
}

# this package sets the ARGV count variable to -1;

use Getopt::Long;
my $mff = "";
GetOptions('mff' => $mff);

# set a default $mff variable
if ($mff eq "") {$mff = "-#-"};
print ("using file switch=", $mff, "\n\n");

while($_ = shift @ARGV) {
    if(-f "$_") {
    push @filelist, $_;
    } 
}

# Could be more than one file name on the command line, 
# but this version throws away the subsequent ones.

$readfile = $filelist[0];

open SOURCEFILE, "<$readfile" or die "File not found...\n\n";
#print SOURCEFILE;

while (<SOURCEFILE>) {
  /^$mff (.*$)/o;
    $outname = ;
#   print $outname;
#   print "right is:  \n";

if (/^$mff /) {

    open OUTFILE, ">$outname" ;
    print "opened $outname\n";
    }
    else {print OUTFILE "$_"};
  }

回答by user1277476

This is the sort of problem I wrote context-split for: http://stromberg.dnsalias.org/~strombrg/context-split.html

这是我为以下问题编写的上下文拆分问题:http: //stromberg.dnsalias.org/~strombrg/context-split.html

awk 'BEGIN{file = 0; filename = "output_" file ".txt"}
    /-|/ {getline; file ++; filename = "output_" file ".txt"}
    {print 
# Ignored

######## FILTER BEGIN foo.conf
This goes in foo.conf.
######## FILTER END

# Ignored

######## FILTER BEGIN bar.conf
This goes in bar.conf.
######## FILTER END
> filename}' input

回答by John David Smith

I solved a slightly different problem, where the file contains a line with the name where the text that follows should go. This perl code does the trick for me:

我解决了一个稍微不同的问题,其中文件包含一行名称,后面的文本应该放在那里。这个 perl 代码对我有用:

#!/usr/bin/env python3

import os
import argparse

# global settings
start_delimiter = '######## FILTER BEGIN'
end_delimiter = '######## FILTER END'

# parse command line arguments
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input-file", required=True, help="input filename")
parser.add_argument("-o", "--output-dir", required=True, help="output directory")

args = parser.parse_args()

# read the input file
with open(args.input_file, 'r') as input_file:
    input_data = input_file.read()

# iterate through the input data by line
input_lines = input_data.splitlines()
while input_lines:
    # discard lines until the next start delimiter
    while input_lines and not input_lines[0].startswith(start_delimiter):
        input_lines.pop(0)

    # corner case: no delimiter found and no more lines left
    if not input_lines:
        break

    # extract the output filename from the start delimiter
    output_filename = input_lines.pop(0).replace(start_delimiter, "").strip()
    output_path = os.path.join(args.output_dir, output_filename)

    # open the output file
    print("extracting file: {0}".format(output_path))
    with open(output_path, 'w') as output_file:
        # while we have lines left and they don't match the end delimiter
        while input_lines and not input_lines[0].startswith(end_delimiter):
            output_file.write("{0}\n".format(input_lines.pop(0)))

        # remove end delimiter if present
        if not input_lines:
            input_lines.pop(0)

回答by Thanh

The following command works for me. Hope it helps.

以下命令对我有用。希望能帮助到你。

$ python3 script.py -i input-file.txt -o ./output-folder/

回答by ctrlc-root

Here's a Python 3 script that splits a file into multiple files based on a filename provided by the delimiters. Example input file:

这是一个 Python 3 脚本,它根据分隔符提供的文件名将文件拆分为多个文件。示例输入文件:

##代码##

Here's the script:

这是脚本:

##代码##

Finally here's how you run it:

最后是你如何运行它:

##代码##