bash 在循环中使用粘贴命令

Question

提问by Greg

I am using Fedora, and bash to do some text manipulation with the files I have. I am trying to combine a large number of files, each one with two columns of data. From these files, I want to extract the data on the 2nd column of the files, and put it in a single file. Previously, I used the following script:

我正在使用 Fedora 和 bash 对我拥有的文件进行一些文本操作。我正在尝试组合大量文件，每个文件都有两列数据。从这些文件中，我想提取文件第二列的数据，并将其放入一个文件中。以前，我使用了以下脚本：

paste 0_0.dat 0_6.dat 0_12.dat | awk '{print , , }' >0.dat

But this is painfully hard as the number of files gets larger -- trying to do with 100 files. So I looked through the web to see if there's a way to achieve this in a simple way, but come up empty-handed.

但随着文件数量的增加，这非常困难——试图处理 100 个文件。所以我浏览了网络，看看是否有办法以简单的方式实现这一目标，但空手而归。

I'd like to invoke a 'for' loop, if possible -- for example,

如果可能的话，我想调用一个“for”循环——例如，

for i in $(seq 0 6 600)
do
  paste 0_0.dat | awk '{print }'>>0.dat
done

but this does not work, of course, with paste command.

但这当然不适用于粘贴命令。

Please let me know if you have any recommendations on how to do what I'm trying to do ...

如果您对如何做我正在尝试做的事情有任何建议，请告诉我......

DATA FILE #1 looks like below (deliminated by a space)

数据文件 #1 如下所示（以空格分隔）

-180 0.00025432
-179 0.000309643
-178 0.000189226
.
.
.
-1 2E-5
0 1.4E-6
1 0.00000
.
.
.
178 0.0023454268
179 0.002352534
180 0.001504992

DATA FILE #2

数据文件#2

-180 0.0002352
-179 0.000423452
-178 0.00019304
.
.
.
-1 2E-5
0 1.4E-6
1 0.00000
.
.
.
178 0.0023454268
179 0.002352534
180 0.001504992

First column goes from -180 to 180, with increment of 1.

第一列从 -180 到 180，增量为 1。

DESIRED (n is the # of columns; and # of files)

DESIRED（n 是列数；和文件数）

-180 0.00025432 0.00025123 0.000235123 0.00023452 0.00023415 ... n
-179 0.000223432 0.0420504 0.2143450 0.002345123 0.00125235 ... n
.
.
.
-1 2E-5
0 1.4E-6
1 0.00000    
.
.
.
179 0.002352534 ... n
180 0.001504992 ... n

Thanks,

谢谢，

Answer 1

采纳答案by Jonathan Leffler

How about this:

这个怎么样：

paste "$@" | awk '{ printf("%s", ); 
for (i = 2; i < NF; i += 2) 
       printf(" %s", $i); printf "\n"; 
}'

This assumes that you don't run into a limit with paste(check how many open files it can have). The "$@"notation means 'all the arguments given, exactly as given'. The awkscript simply prints $1from each line of pasted output, followed by the even-numbered columns; followed by a newline. It doesn't validate that the odd-numbered columns all match; it would perhaps be sensible to do so, and you could code a vaguely similar loop to do so in awk. It also doesn't check that the number of fields on this line is the same as the number on the previous line; that's another reasonable check. But this does do the whole job in one pass over all the files - for an essentially arbitrary list of files.

这假设您没有遇到限制paste（检查它可以有多少打开的文件）。该"$@"符号的意思是“给出的所有参数，完全按照给出的”。该awk脚本只是$1从粘贴输出的每一行打印出来，然后是偶数列；后跟换行符。它不验证奇数列是否全部匹配；这样做也许是明智的，您可以编写一个模糊相似的循环来在awk. 它也不检查这一行的字段数是否与前一行的数字相同；这是另一个合理的检查。但这确实可以一次性完成所有文件的全部工作 - 对于基本上任意的文件列表。

I have 100 input files -- how do I use this code to open up these files?

我有 100 个输入文件——如何使用此代码打开这些文件？

You put my original answer in a script 'filter-data'; you invoke the script with the 101 file names generated by seq. The pastecommand pastes all 101 files together; the awkcommand selects the columns you are interested in.

您将我的原始答案放在脚本“过滤器数据”中；您使用由seq. 该paste命令将所有 101 个文件粘贴在一起；该awk命令选择您感兴趣的列。

filter-data $(seq --format="0_%g.dat" 0 6 600)

The seqcommand with the format will list you 101 file names; these are the 101 files that will be pasted.

seq具有格式的命令将列出 101 个文件名；这些是将被粘贴的 101 个文件。

You could even do without the filter-datascript:

你甚至可以不用filter-data脚本：

paste $(seq --format="0_%g.dat" 0 6 600) | awk '{ printf("%s", ); 
for (i = 2; i < NF; i += 2) 
printf(" %s", $i); printf "\n"; 
}'

I'd probably go with the more general script as the main script, and if need be I'd create a 'one-liner' that invokes the main script with the specific set of arguments currently of interest.

我可能会使用更通用的脚本作为主脚本，如果需要，我会创建一个“单行”，它使用当前感兴趣的特定参数集调用主脚本。

The other key point which might be a stumbling block: pasteis not limited to 2 files only; it can paste as many files as you can have open (give or take about 3).

另一个可能是绊脚石的关键点：paste不仅限于 2 个文件；它可以粘贴尽可能多的文件（提供或占用大约 3 个）。

Answer 2

回答by jaypal singh

joincan get you your desired result.

join可以得到你想要的结果。

join <(sort -r file1) <(sort -r file2)

Test:

测试：

[jaypal:~/Temp] cat file1
-180 0.00025432
-179 0.000309643
-178 0.000189226
[jaypal:~/Temp] cat file2
-180 0.0005524243
-179 0.0002424433
-178 0.0001833333
[jaypal:~/Temp] join <(sort -r file1) <(sort -r file2)
-180 0.00025432 0.0005524243
-179 0.000309643 0.0002424433
-178 0.000189226 0.0001833333

To do multiple files at once, you can use it with findcommand -

要一次处理多个文件，您可以将它与find命令一起使用-

find . -type f -name "file*" -exec join '{}' +

Answer 3

回答by shellter

Based on my assumptions that you see in the comments above, you don't need paste. Try this

根据我在上面评论中看到的假设，您不需要粘贴。尝试这个

awk '{
  arr[] = arr[] "\t"  }; 
  END {for (x=-180;x<=180;x++) print  x "\t" arr[x]
 }' *.txt \
| sort -n

Note that we just take all of the values into an array based on the value in the first field, and append values based on the $1 key. After all data has been read in, The END section prints out the key and the value. I've added things like "x=", ":vals= "to help 'explain' what is happening. Remove those for completely clean tab-seperated data. Change '\t' to ':' or '|', or ... shudder ',' if you need to. Change the *.txtto what every your filespec is.

请注意，我们只是根据第一个字段中的值将所有值放入一个数组中，并根据 $1 键附加值。读入所有数据后，END 部分打印出键和值。我添加了诸如"x=", 之类的内容，":vals= "以帮助“解释”正在发生的事情。删除那些完全干净的制表符分隔数据。如果需要，将 '\t' 更改为 ':' 或 '|'，或者 ... 不寒而栗','。将更改为*.txt您的每个文件规范。

Be aware that all Unix command lines have limitations to the number and size (length of filenames, not the data inside), of filenames that can be processed in 1 invocation. Let us know if you get error messages about that.

请注意，所有 Unix 命令行对可以在 1 次调用中处理的文件名的数量和大小（文件名的长度，而不是其中的数据）都有限制。如果您收到有关此的错误消息，请告诉我们。

The pipe to sort ensures that data is sorted by column1.

要排序的管道确保数据按 column1 排序。

With my test data, the output was

使用我的测试数据，输出是

-178            0.0001892261    0.0001892262    0.0001892263    0.000189226
-179            0.0003096431    0.0003096432    0.0003096433    0.000309643
-180            0.000254321     0.000254322     0.000254323     0.00025432
178             0.0001892261    0.0001892262    0.0001892263    0.000189226
179             0.0003096431    0.0003096432    0.0003096433    0.000309643
180             0.000254321     0.000254322     0.000254323     0.00025432

Based on 4 files of input.

基于 4 个输入文件。

I hope this helps.

我希望这有帮助。

P.S. Welcome to StackOverflow (S.O.) Please remeber to read the FAQs, http://tinyurl.com/2vycnvr, vote for good Q/A by using the gray triangles, http://i.imgur.com/kygEP.png, and to accept the answer that bes solves your problem, if any, by pressing the checkmark sign , http://i.imgur.com/uqJeW.png

PS 欢迎使用 StackOverflow (SO) 请记住阅读常见问题解答，http://tinyurl.com/2vycnvr，使用灰色三角形为好的 Q/A 投票， http://i.imgur.com/kygEP.png，并接受可以解决您的问题的答案（如果有），请按复选标记，http://i.imgur.com/uqJeW.png

Answer 4

回答by potong

This might work for you:

这可能对你有用：

echo *.dat | sed 's/\S*/<(cut -f2 &)/2g;s/^/paste /' | bash >all.dat

bash 在循环中使用粘贴命令

提问by Greg

采纳答案by Jonathan Leffler

回答by jaypal singh

Test:

测试：

回答by shellter

回答by potong

相关推荐

最近更新

标签

bash 在循环中使用粘贴命令

提问by Greg

采纳答案by Jonathan Leffler

回答by jaypal singh

Test:

测试：

回答by shellter

回答by potong

相关推荐

bash 当文件已重定向到 stdin 时，读取用户输入的 stdin

bash ${var?} 中bash变量参数扩展中问号的含义是什么？

bash 如何让 Vim 从 shebang 行检测文件类型？

bash Emacs shell 脚本模式钩子

相关推荐

最近更新

标签