bash 使用正则表达式告诉 csplit 在何处拆分文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18364411/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using regex to tell csplit where to split the file
提问by Philip Meissner
I have a large text file with content set up like this:
我有一个大文本文件,内容设置如下:
---
title: Lorim Ipsum Dolar
---
Lorim ipsum content
---
title: Excelvier whatever
---
Lorim ipsum content goes here.
I'm trying to split up this file into individual files using csplit
.
我正在尝试使用csplit
.
The individual files would have content formatted like this:
单个文件的内容格式如下:
---
title: Lorim Ipsum Dolar
---
Lorim ipsum content
I was hoping to be able to regex the ---, newline & title like so ---\ntitle
我希望能够像这样正则表达式 ---、换行符和标题 ---\ntitle
But I'm not able to select it with…
但我无法选择它......
csplit -k products.txt '/---[^\n]title/' {99}
I've tried lots of variations to no avail. I keeping getting "no match".
我尝试了很多变体都无济于事。我不断收到“不匹配”。
回答by inthenite
You could use a regular expression that matches until the end of the line ($
)
您可以使用匹配到行尾的正则表达式 ( $
)
What do you think about:
你有什么想法:
csplit -k products.txt '/^title:/' {99}
回答by John Kugelman
csplit reads the input file one line at a time and applies the regex to each line. It is therefore not possible to match a regex across multiple lines.
csplit 一次读取输入文件一行,并将正则表达式应用于每一行。因此不可能跨多行匹配正则表达式。
One way around this is to massage the input file first, replacing ---\ntitle:
with a single line pattern that csplit can match. For example, using sed:
解决此问题的一种方法是先处理输入文件,替换---\ntitle:
为 csplit 可以匹配的单行模式。例如,使用 sed:
sed 'N;s/---\ntitle: /===\n' products.txt | csplit -k - '/===/' {*}
sed 'N;s/===\n/---\ntitle: /' -i xx*
This replaces ---\ntitle:
with a single line ===
, then has csplit split when it sees that pattern. Passing -
as a file name tells csplit to read from stdin. The second sed command reverses the change.
这将替换---\ntitle:
为一行===
,然后在看到该模式时进行 csplit 拆分。-
作为文件名传递告诉 csplit 从标准输入读取。第二个 sed 命令反转更改。
回答by Aleks-Daniel Jakimenko-A.
Try using {*}
instead of {99}
to fix match not found
problem.
尝试使用{*}
而不是{99}
解决match not found
问题。
回答by potong
This might work for you:
这可能对你有用:
csplit -z products.txt '/^title/-1' '{*}'
回答by Luke Davis
For me, the answer was don't use csplit
, use awk
.
对我来说,答案是不要使用csplit
,使用awk
.
awk '
/^title:/ {++count; file="file"count".txt"; print file}
file {print line > file}
{line=##代码##}
' products.txt
The first command declares a new file when title:
is encoutered. The second command writes the precedingline to file
if file
has been declared. The third command assigns the current line to a variable.
第一个命令在title:
遇到时声明一个新文件。第二个命令将前一行写入file
iffile
已声明。第三个命令将当前行分配给一个变量。