bash 使用正则表达式告诉 csplit 在何处拆分文件

Question

提问by Philip Meissner

I have a large text file with content set up like this:

我有一个大文本文件，内容设置如下：

---
title: Lorim Ipsum Dolar
---
Lorim ipsum content
---
title: Excelvier whatever 
---
Lorim ipsum content goes here.

I'm trying to split up this file into individual files using csplit.

我正在尝试使用csplit.

The individual files would have content formatted like this:

单个文件的内容格式如下：

---
title: Lorim Ipsum Dolar
---
Lorim ipsum content

I was hoping to be able to regex the ---, newline & title like so ---\ntitle

我希望能够像这样正则表达式 ---、换行符和标题 ---\ntitle

But I'm not able to select it with…

但我无法选择它......

csplit -k products.txt '/---[^\n]title/' {99}

I've tried lots of variations to no avail. I keeping getting "no match".

我尝试了很多变体都无济于事。我不断收到“不匹配”。

Answer 1

回答by inthenite

You could use a regular expression that matches until the end of the line ($)

您可以使用匹配到行尾的正则表达式 ( $)

What do you think about:

你有什么想法：

csplit -k products.txt '/^title:/' {99}

Answer 2

回答by John Kugelman

csplit reads the input file one line at a time and applies the regex to each line. It is therefore not possible to match a regex across multiple lines.

csplit 一次读取输入文件一行，并将正则表达式应用于每一行。因此不可能跨多行匹配正则表达式。

One way around this is to massage the input file first, replacing ---\ntitle:with a single line pattern that csplit can match. For example, using sed:

解决此问题的一种方法是先处理输入文件，替换---\ntitle:为 csplit 可以匹配的单行模式。例如，使用 sed：

sed 'N;s/---\ntitle: /===\n' products.txt | csplit -k - '/===/' {*}
sed 'N;s/===\n/---\ntitle: /' -i xx*

This replaces ---\ntitle:with a single line ===, then has csplit split when it sees that pattern. Passing -as a file name tells csplit to read from stdin. The second sed command reverses the change.

这将替换---\ntitle:为一行===，然后在看到该模式时进行 csplit 拆分。-作为文件名传递告诉 csplit 从标准输入读取。第二个 sed 命令反转更改。

Answer 3

回答by Aleks-Daniel Jakimenko-A.

Try using {*}instead of {99}to fix match not foundproblem.

尝试使用{*}而不是{99}解决match not found问题。

Answer 4

回答by potong

This might work for you:

这可能对你有用：

csplit -z products.txt '/^title/-1' '{*}'

Answer 5

回答by Luke Davis

For me, the answer was don't use csplit, use awk.

对我来说，答案是不要使用csplit，使用awk.

awk '
/^title:/ {++count; file="file"count".txt"; print file}
file {print line > file}
{line=##代码##}
' products.txt

The first command declares a new file when title:is encoutered. The second command writes the precedingline to fileif filehas been declared. The third command assigns the current line to a variable.

第一个命令在title:遇到时声明一个新文件。第二个命令将前一行写入fileiffile已声明。第三个命令将当前行分配给一个变量。

bash 使用正则表达式告诉 csplit 在何处拆分文件

提问by Philip Meissner

回答by inthenite

回答by John Kugelman

回答by Aleks-Daniel Jakimenko-A.

回答by potong

回答by Luke Davis

相关推荐

最近更新

标签

bash 使用正则表达式告诉 csplit 在何处拆分文件

提问by Philip Meissner

回答by inthenite

回答by John Kugelman

回答by Aleks-Daniel Jakimenko-A.

回答by potong

回答by Luke Davis

相关推荐

bash 如何让 Perl 遍历目录中的所有文件？

通过 Bash 打开到 Arduino 的串行连接

bash 用于检查服务器是否在线的 Ping 工具

如何在 bash 中检查参数的长度？

相关推荐

最近更新

标签