Shell/Bash 解析文本文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22546395/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 09:58:44  来源:igfitidea点击:

Shell/Bash parsing text file

bashshellparsingawktext-processing

提问by FanaticD

I have this text file, which looks like this

我有这个文本文件,看起来像这样

Item:
SubItem01
SubItem02
SubItem03
Item2:
SubItem0201
SubItem0202
Item3:
SubItem0301
...etc...

And I need is to get it to look like this:

我需要的是让它看起来像这样:

Item=>SubItem01
Item=>SubItem02
Item=>SubItem03
Item2=>SubItem0201
Item2=>SubItem0202
Item3=>SubItem0301

I am aware of the fact, that I need two for loops to get this. I did some tests, but... well, it didn't end up well.

我知道这样一个事实,我需要两个 for 循环才能得到这个。我做了一些测试,但是……好吧,结果并不好。

for(( c=1; c<=lineCount; c++ ))
do

   var=`sed -n "${c}p" TMPFILE`
   echo "$var"

   if [[ "$var" == *:* ]];
   then
   printf "%s->" $var
   else
   printf "%s\n"
   fi
done

Could anyone please kick me back on the road? I tried bunch of variete ways, but I am not getting anywhere. Thanks.

有人能把我踢回路上吗?我尝试了很多不同的方式,但我一无所获。谢谢。

回答by Digital Trauma

If you want to continue down the bashshell road, you can do something like this:

如果你想继续沿着bashshell 的道路前进,你可以这样做:

item_re="^(Item.*):$"
while read -r; do
    if [[ $REPLY =~ $item_re ]]; then
        item=${BASH_REMATCH[1]}
    else
        printf "%s=>%s\n" "$item" "$REPLY"
    fi
done < file.txt

回答by jaypal singh

Text parsing is best done with awk:

文本解析最好使用awk

$ awk '/:$/{sub(/:$/,"");h=
awk '/:/{s=;next}{print s OFS 
awk -F: '/^Item/{ITM=} !/^Item/{print ITM"=>"
@(collect)
@left:
@  (collect)
@right
@  (until)
@(skip):
@  (end)
@(end)
@(output)
@  (repeat)
@    (repeat)
@left=>@right
@    (end)
@  (end)
@(end)

$ txr regroup.txr data.txt
Item=>SubItem01
Item=>SubItem02
Item=>SubItem03
Item2=>SubItem0201
Item2=>SubItem0202
Item3=>SubItem0301
}'
}' FS=: OFS="=>" file
;next}{print h"=>"##代码##}' file Item=>SubItem01 Item=>SubItem02 Item=>SubItem03 Item2=>SubItem0201 Item2=>SubItem0202 Item3=>SubItem0301

回答by BMW

Using awk

使用 awk

##代码##

回答by Cole Tierney

Here's another awkalternative:

这是另一种awk选择:

##代码##

If a line begins with 'Item', save the item name in ITM. If the line does notbegin with 'Item', print the previously saved item name (ITM), '=>', and the sub item. Splitting on : makes it easier to get the item name.

如果一行以“Item”开头,则将项目名称保存在 ITM 中。如果该行以“Item”开头,则打印之前保存的项目名称 (ITM)、“=>”和子项目。拆分 : 使获取项目名称更容易。

The assumption is that groups of subitems will always be preceded by an Item: entry, so the variable ITM should always have the name of the current group.

假设子项组将始终以 Item: 条目开头,因此变量 ITM 应始终具有当前组的名称。

回答by Kaz

TXRSolution:

TXR解决方案:

##代码##