bash 用来自不同文件的列替换文件中的列,同时保留格式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20552378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 08:58:29  来源:igfitidea点击:

Replace a column in file with a column from a different file while retaining the format

bashsedawk

提问by rohit

I am stuck with an issue which might not seem too difficult to advanced shell users. Here is the problem.

我遇到了一个对于高级 shell 用户来说似乎不太难的问题。这是问题所在。

I have 2 files:

我有2个文件:

File1 with a format like this:

File1 格式如下:

ALPH      1  M   GIF M   1      11.111  23.123  -4.412  1.00  0.00           A
ALPH      2  BA  GIF M   1      22.222  78.251  -6.215  2.00  0.00           B
ALPH      3  C   GIF M   1      22.223  46.321  -6.124  3.00  0.00           C
ALPH      4  D   GIF M   1      23.333  15.214  -6.125  4.00  0.00           D
ALPH      5  AB  GIF M   1      24.111  61.458  -8.214  5.00  0.00           E
ALPH      6  LM  GIF M   1      25.333  78.214  -9.321  6.00  0.00           F
ALPH      7  BA  GIF M   1      17.645  87.256  -9.365  7.00  0.00           G
ALPH      8  BA2 GIF M   1      14.125  19.365  -1.258  8.00  0.00           H
-
-

File2 with a single column:

带有单列的 File2:

-0.14
-0.47
-0.58
-0.69
-0.25
-0.69
-0.12
-0.44

I want to replace the 11th column in File1 with the only column in File2. I do the following

我想用 File2 中的唯一列替换 File1 中的第 11 列。我做以下

paste File1 File2 | awk '{=;=""}1' > output

Although it replaces the column just fine, it messes up the original format of File1 which I would like to retain. As you can see that there are different number of spaces between all the fields of File1 and I would like to retain that even after replacing $11.

虽然它很好地替换了列,但它弄乱了我想保留的 File1 的原始格式。正如您所看到的,File1 的所有字段之间有不同数量的空格,即使在替换 $11 之后我也想保留它。

I have tried several approaches including columnand printfbut none seem to be working. Maybe I am doing something wrong.

我已经尝试了几种方法,包括columnprintf但似乎都没有工作。也许我做错了什么。

Does anyone know how I can achieve the desired result preferably with awk or sed?

有谁知道我如何最好使用 awk 或 sed 达到预期的结果?

Thanks!

谢谢!

Rohit

罗希特

采纳答案by glenn Hymanman

If you need to retain fixed widths columns, you could work with substrings:

如果您需要保留固定宽度的列,您可以使用子字符串:

cat file1
echo
awk '
    NR==FNR {v[FNR]=; next}
    {print substr(
ALPH      1  M   GIF M   1      11.111  23.123  -4.412  1.00  0.00           A
ALPH      2  BA  GIF M   1      22.222  78.251  -6.215  2.00  0.00           B
ALPH      3  C   GIF M   1      22.223  46.321  -6.124  3.00  0.00           C
ALPH      4  D   GIF M   1      23.333  15.214  -6.125  4.00  0.00           D
ALPH      5  AB  GIF M   1      24.111  61.458  -8.214  5.00  0.00           E
ALPH      6  LM  GIF M   1      25.333  78.214  -9.321  6.00  0.00           F
ALPH      7  BA  GIF M   1      17.645  87.256  -9.365  7.00  0.00           G
ALPH      8  BA2 GIF M   1      14.125  19.365  -1.258  8.00  0.00           H

ALPH      1  M   GIF M   1      11.111  23.123  -4.412  1.00  -0.14          A
ALPH      2  BA  GIF M   1      22.222  78.251  -6.215  2.00  -0.47          B
ALPH      3  C   GIF M   1      22.223  46.321  -6.124  3.00  -0.58          C
ALPH      4  D   GIF M   1      23.333  15.214  -6.125  4.00  -0.69          D
ALPH      5  AB  GIF M   1      24.111  61.458  -8.214  5.00  -0.25          E
ALPH      6  LM  GIF M   1      25.333  78.214  -9.321  6.00  -0.69          F
ALPH      7  BA  GIF M   1      17.645  87.256  -9.365  7.00  -0.12          G
ALPH      8  BA2 GIF M   1      14.125  19.365  -1.258  8.00  -0.44          H
,1,62) sprintf("%-15s", v[FNR]) substr(
awk 'FNR==NR {a[NR]=
$ echo "a   b c    d e" |
gawk '{print gensub(/(([^[:space:]]+[[:space:]]+){2})[^[:space:]]+/,"\1BOB","")}'
a   b BOB    d e
;next} {=a[FNR]}1' OFS="\t" a t ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 -0.14 A ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 -0.47 B ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 -0.58 C ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 -0.69 D ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 -0.25 E ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 -0.69 F ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 -0.12 G ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 -0.44 H
,78)} ' file2 file1
$ cat file1
ALPH      1  M   GIF M   1      11.111  23.123  -4.412  1.00  0.00           A
ALPH      2  BA  GIF M   1      22.222  78.251  -6.215  2.00  0.00           B
ALPH      3  C   GIF M   1      22.223  46.321  -6.124  3.00  0.00           C
ALPH      4  D   GIF M   1      23.333  15.214  -6.125  4.00  0.00           D
ALPH      5  AB  GIF M   1      24.111  61.458  -8.214  5.00  0.00           E
ALPH      6  LM  GIF M   1      25.333  78.214  -9.321  6.00  0.00           F
ALPH      7  BA  GIF M   1      17.645  87.256  -9.365  7.00  0.00           G
ALPH      8  BA2 GIF M   1      14.125  19.365  -1.258  8.00  0.00           H
$          
$ cat file2
-0.14
-0.47
-0.58
-0.69
-0.25
-0.69
-0.12
-0.44
$ 
$ gawk 'NR==FNR{map[FNR]=
sed = file2 | sed -r '$!N;s|(.*)\n(.*)|s/\S+//11|' | sed -rf - file1
; next} {print gensub(/(([^[:space:]]+[[:space:]]+){10})[^[:space:]]+/,"\1" map[FNR],"")}' file2 file1 ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 -0.14 A ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 -0.47 B ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 -0.58 C ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 -0.69 D ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 -0.25 E ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 -0.69 F ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 -0.12 G ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 -0.44 H

回答by Jotne

Using awk

使用 awk

awk 'FNR==NR{a[NR]=##代码##;next} {sub(, a[FNR])}1' file2 file1
ALPH      1  M   GIF M   1      11.111  23.123  -4.412  1.00  -0.14           A
ALPH      2  BA  GIF M   1      22.222  78.251  -6.215  2.00  -0.47           B
ALPH      3  C   GIF M   1      22.223  46.321  -6.124  3.00  -0.58           C
ALPH      4  D   GIF M   1      23.333  15.214  -6.125  4.00  -0.69           D
ALPH      5  AB  GIF M   1      24.111  61.458  -8.214  5.00  -0.25           E
ALPH      6  LM  GIF M   1      25.333  78.214  -9.321  6.00  -0.69           F
ALPH      7  BA  GIF M   1      17.645  87.256  -9.365  7.00  -0.12           G
ALPH      8  BA2 GIF M   1      14.125  19.365  -1.258  8.00  -0.44           H

Edit reverted to original due to the error with sub

由于错误,编辑恢复为原始 sub

回答by Ed Morton

When you assign a value to a field in awk, it recompiles the current record using the current value of OFS to separate fields. To retain original spacing, then, you cannot assign a new value to a field. Instead you have to use an RE to describe how many non-space/spaces to skip before and after your assignment. Like this to replace the letter "c" (the 3rd field, hence the number "2" below for the number of leading fields to skip) with the word "BOB" using GNU awk:

当您在 awk 中为字段赋值时,它会使用 OFS 的当前值重新编译当前记录以分隔字段。为了保留原始间距,您不能为字段分配新值。相反,您必须使用 RE 来描述在分配之前和之后要跳过多少个非空格/空格。像这样使用GNU awk用单词“BOB”替换字母“c”(第三个字段,因此下面的数字“2”代表要跳过的前导字段的数量):

##代码##

This preserves spacing because you are working on the whole record, not just one field, and so awk won't recompile the record.

这保留了间距,因为您正在处理整个记录,而不仅仅是一个字段,因此 awk 不会重新编译记录。

So for your case it'd be:

所以对于你的情况,它会是:

##代码##

If you don't have gawk (for gensub()), you can use match() to find where the field you care about starts, a second match() for where it ends, and judicious substr()s to replace it with the new value.

如果您没有 gawk(对于 gensub()),您可以使用 match() 来查找您关心的字段的开始位置,第二个 match() 用于它结束的位置,以及明智的 substr()s 将其替换为新值。

@GlennHymanman mentioned fixed width fields in his solution. If that's what you have you can use GNU awks FIELDWIDTHS variable to specify the width of each field and just work with that. See the gawk manual for details.

@GlennHymanman 在他的解决方案中提到了固定宽度的字段。如果这就是你所拥有的,你可以使用 GNU awks FIELDWIDTHS 变量来指定每个字段的宽度,然后就可以使用它了。有关详细信息,请参阅 gawk 手册。

回答by potong

This might work for you (GNU sed):

这可能对你有用(GNU sed):

##代码##

回答by anubhava

Pure awk solution:

纯awk解决方案:

##代码##