bash 使用正则表达式告诉 csplit 在何处拆分文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18364411/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 06:20:47  来源:igfitidea点击:

Using regex to tell csplit where to split the file

regexbashbsdcsplit

提问by Philip Meissner

I have a large text file with content set up like this:

我有一个大文本文件,内容设置如下:

---
title: Lorim Ipsum Dolar
---
Lorim ipsum content
---
title: Excelvier whatever 
---
Lorim ipsum content goes here.

I'm trying to split up this file into individual files using csplit.

我正在尝试使用csplit.

The individual files would have content formatted like this:

单个文件的内容格式如下:

---
title: Lorim Ipsum Dolar
---
Lorim ipsum content

I was hoping to be able to regex the ---, newline & title like so ---\ntitle

我希望能够像这样正则表达式 ---、换行符和标题 ---\ntitle

But I'm not able to select it with…

但我无法选择它......

csplit -k products.txt '/---[^\n]title/' {99}

I've tried lots of variations to no avail. I keeping getting "no match".

我尝试了很多变体都无济于事。我不断收到“不匹配”。

回答by inthenite

You could use a regular expression that matches until the end of the line ($)

您可以使用匹配到行尾的正则表达式 ( $)

What do you think about:

你有什么想法:

csplit -k products.txt '/^title:/' {99}

回答by John Kugelman

csplit reads the input file one line at a time and applies the regex to each line. It is therefore not possible to match a regex across multiple lines.

csplit 一次读取输入文件一行,并将正则表达式应用于每一行。因此不可能跨多行匹配正则表达式。

One way around this is to massage the input file first, replacing ---\ntitle:with a single line pattern that csplit can match. For example, using sed:

解决此问题的一种方法是先处理输入文件,替换---\ntitle:为 csplit 可以匹配的单行模式。例如,使用 sed:

sed 'N;s/---\ntitle: /===\n' products.txt | csplit -k - '/===/' {*}
sed 'N;s/===\n/---\ntitle: /' -i xx*

This replaces ---\ntitle:with a single line ===, then has csplit split when it sees that pattern. Passing -as a file name tells csplit to read from stdin. The second sed command reverses the change.

这将替换---\ntitle:为一行===,然后在看到该模式时进行 csplit 拆分。-作为文件名传递告诉 csplit 从标准输入读取。第二个 sed 命令反转更改。

回答by Aleks-Daniel Jakimenko-A.

Try using {*}instead of {99}to fix match not foundproblem.

尝试使用{*}而不是{99}解决match not found问题。

回答by potong

This might work for you:

这可能对你有用:

csplit -z products.txt '/^title/-1' '{*}'

回答by Luke Davis

For me, the answer was don't use csplit, use awk.

对我来说,答案是不要使用csplit,使用awk.

awk '
/^title:/ {++count; file="file"count".txt"; print file}
file {print line > file}
{line=##代码##}
' products.txt

The first command declares a new file when title:is encoutered. The second command writes the precedingline to fileif filehas been declared. The third command assigns the current line to a variable.

第一个命令在title:遇到时声明一个新文件。第二个命令将一行写入fileiffile已声明。第三个命令将当前行分配给一个变量。