bash 用来自不同文件的列替换文件中的列,同时保留格式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20552378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replace a column in file with a column from a different file while retaining the format
提问by rohit
I am stuck with an issue which might not seem too difficult to advanced shell users. Here is the problem.
我遇到了一个对于高级 shell 用户来说似乎不太难的问题。这是问题所在。
I have 2 files:
我有2个文件:
File1 with a format like this:
File1 格式如下:
ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 0.00 A
ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 0.00 B
ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 0.00 C
ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 0.00 D
ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 0.00 E
ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 0.00 F
ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 0.00 G
ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 0.00 H
-
-
File2 with a single column:
带有单列的 File2:
-0.14
-0.47
-0.58
-0.69
-0.25
-0.69
-0.12
-0.44
I want to replace the 11th column in File1 with the only column in File2. I do the following
我想用 File2 中的唯一列替换 File1 中的第 11 列。我做以下
paste File1 File2 | awk '{=;=""}1' > output
Although it replaces the column just fine, it messes up the original format of File1 which I would like to retain. As you can see that there are different number of spaces between all the fields of File1 and I would like to retain that even after replacing $11.
虽然它很好地替换了列,但它弄乱了我想保留的 File1 的原始格式。正如您所看到的,File1 的所有字段之间有不同数量的空格,即使在替换 $11 之后我也想保留它。
I have tried several approaches including column
and printf
but none seem to be working. Maybe I am doing something wrong.
我已经尝试了几种方法,包括column
和printf
但似乎都没有工作。也许我做错了什么。
Does anyone know how I can achieve the desired result preferably with awk or sed?
有谁知道我如何最好使用 awk 或 sed 达到预期的结果?
Thanks!
谢谢!
Rohit
罗希特
采纳答案by glenn Hymanman
If you need to retain fixed widths columns, you could work with substrings:
如果您需要保留固定宽度的列,您可以使用子字符串:
cat file1
echo
awk '
NR==FNR {v[FNR]=; next}
{print substr(ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 0.00 A
ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 0.00 B
ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 0.00 C
ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 0.00 D
ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 0.00 E
ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 0.00 F
ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 0.00 G
ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 0.00 H
ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 -0.14 A
ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 -0.47 B
ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 -0.58 C
ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 -0.69 D
ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 -0.25 E
ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 -0.69 F
ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 -0.12 G
ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 -0.44 H
,1,62) sprintf("%-15s", v[FNR]) substr(awk 'FNR==NR {a[NR]=$ echo "a b c d e" |
gawk '{print gensub(/(([^[:space:]]+[[:space:]]+){2})[^[:space:]]+/,"\1BOB","")}'
a b BOB d e
;next} {=a[FNR]}1' OFS="\t" a t
ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 -0.14 A
ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 -0.47 B
ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 -0.58 C
ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 -0.69 D
ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 -0.25 E
ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 -0.69 F
ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 -0.12 G
ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 -0.44 H
,78)}
' file2 file1
$ cat file1
ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 0.00 A
ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 0.00 B
ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 0.00 C
ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 0.00 D
ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 0.00 E
ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 0.00 F
ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 0.00 G
ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 0.00 H
$
$ cat file2
-0.14
-0.47
-0.58
-0.69
-0.25
-0.69
-0.12
-0.44
$
$ gawk 'NR==FNR{map[FNR]=sed = file2 | sed -r '$!N;s|(.*)\n(.*)|s/\S+//11|' | sed -rf - file1
; next} {print gensub(/(([^[:space:]]+[[:space:]]+){10})[^[:space:]]+/,"\1" map[FNR],"")}' file2 file1
ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 -0.14 A
ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 -0.47 B
ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 -0.58 C
ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 -0.69 D
ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 -0.25 E
ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 -0.69 F
ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 -0.12 G
ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 -0.44 H
回答by Jotne
Using awk
使用 awk
awk 'FNR==NR{a[NR]=##代码##;next} {sub(, a[FNR])}1' file2 file1
ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 -0.14 A
ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 -0.47 B
ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 -0.58 C
ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 -0.69 D
ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 -0.25 E
ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 -0.69 F
ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 -0.12 G
ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 -0.44 H
Edit reverted to original due to the error with sub
由于错误,编辑恢复为原始 sub
回答by Ed Morton
When you assign a value to a field in awk, it recompiles the current record using the current value of OFS to separate fields. To retain original spacing, then, you cannot assign a new value to a field. Instead you have to use an RE to describe how many non-space/spaces to skip before and after your assignment. Like this to replace the letter "c" (the 3rd field, hence the number "2" below for the number of leading fields to skip) with the word "BOB" using GNU awk:
当您在 awk 中为字段赋值时,它会使用 OFS 的当前值重新编译当前记录以分隔字段。为了保留原始间距,您不能为字段分配新值。相反,您必须使用 RE 来描述在分配之前和之后要跳过多少个非空格/空格。像这样使用GNU awk用单词“BOB”替换字母“c”(第三个字段,因此下面的数字“2”代表要跳过的前导字段的数量):
##代码##This preserves spacing because you are working on the whole record, not just one field, and so awk won't recompile the record.
这保留了间距,因为您正在处理整个记录,而不仅仅是一个字段,因此 awk 不会重新编译记录。
So for your case it'd be:
所以对于你的情况,它会是:
##代码##If you don't have gawk (for gensub()), you can use match() to find where the field you care about starts, a second match() for where it ends, and judicious substr()s to replace it with the new value.
如果您没有 gawk(对于 gensub()),您可以使用 match() 来查找您关心的字段的开始位置,第二个 match() 用于它结束的位置,以及明智的 substr()s 将其替换为新值。
@GlennHymanman mentioned fixed width fields in his solution. If that's what you have you can use GNU awks FIELDWIDTHS variable to specify the width of each field and just work with that. See the gawk manual for details.
@GlennHymanman 在他的解决方案中提到了固定宽度的字段。如果这就是你所拥有的,你可以使用 GNU awks FIELDWIDTHS 变量来指定每个字段的宽度,然后就可以使用它了。有关详细信息,请参阅 gawk 手册。
回答by potong
This might work for you (GNU sed):
这可能对你有用(GNU sed):
##代码##回答by anubhava
Pure awk solution:
纯awk解决方案:
##代码##