bash 将列粘贴到循环中的现有文件

Question

提问by bennos

I am using the paste command in a bash loop to add new columns to a CSV-file. I would like to reuse the CSV-file. Currently I am using a temporary file to accomplish this:

我在 bash 循环中使用 paste 命令将新列添加到 CSV 文件。我想重用 CSV 文件。目前我正在使用一个临时文件来完成这个：

while [ $i -le $max ]
    do
        # create text from grib2
        wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text tmptxt.txt

        #paste to temporary file
        paste -d, existingfile.csv tmptxt.txt > tmpcsv.csv  

        #overwrite old csv with new csv
        mv tmpcsv.csv existingfile.csv

        ((i++))
    done

After adding some columns the copy is getting slow, because the file is becoming bigger and bigger (every tmptxt.txthas about 2 MB, adding to approx 100 MB).

添加一些列后，副本变得越来越慢，因为文件变得越来越大（每个tmptxt.txt大约有 2 MB，增加到大约 100 MB）。

A tmptxt.txtis a plain txt-file with one column and one value per row:

Atmptxt.txt是一个普通的 txt 文件，其中包含一列和每行一个值：

1
2
3
.
.

The existingfile.csvwould then be

在existingfile.csv随后将

1,1,x
2,2,y
3,3,z
.,.,.
.,.,.

Is there any way to use the paste command to add a column to an existing file? Or is there any other way?

有什么方法可以使用粘贴命令将列添加到现有文件中吗？或者还有其他方法吗？

Thanks

谢谢

Answer 1

采纳答案by German Garcia

Would it be feasible to split the operation in 2 ? One step for generating allthe intermediate files; and another for generating allthe final output file. The idea is to avoid rereading and rewriting over and over the final file.

将操作拆分为 2 是否可行？一步生成所有中间文件；另一个用于生成所有最终输出文件。这个想法是为了避免一遍又一遍地重新读取和重写最终文件。

The changes to the script would be something like this:

对脚本的更改如下所示：

while [ $i -le $max ]
do
    n=$(printf "%05d" $i)    # to preserve lexical order if $max > 9
    # create text from grib2
    wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text tmptxt$n.txt
    ((i++))
done

#make final file
paste -d, existingfile.csv tmptxt[0-9]*.txt > tmpcsv.csv  

#overwrite old csv with new csv
mv tmpcsv.csv existingfile.csv

Answer 2

回答by doubleDown

Assuming the number of lines output by the program is constant and is equal to number of lines in existingfile.csv(which should be the case since you are using paste)

假设程序输出的行数是常数并且等于行数existingfile.csv（因为您使用的是这种情况paste）

Disclaimer: I'm not exactly sure if this would speed things up (depending on whether io redirection >>writes to the file exactly once or not). Anyway give it a try and let me know.

免责声明：我不确定这是否会加快速度（取决于 io 重定向是否只>>写入一次文件）。无论如何试一试，让我知道。

So the basic idea is

所以基本思路是

append the output in one go after the loop is done (note the change: wgrib now prints to -which is stdout)
use awk to move every linenumrows (linenumbeing the number of lines in existingfile.csv) to the end to the first linenumrows
Save to tempcsv.csv(because I can't find a way to save in the same file)
rename to / overwrite existingfile.csv

在循环完成后一次性追加输出（注意更改：wgrib 现在打印到-which is stdout）
使用 awk 将每一linenum行（linenum即行数existingfile.csv）移动到第一linenum行的末尾
保存到tempcsv.csv（因为我找不到保存在同一个文件中的方法）
重命名为/覆盖 existingfile.csv

.

while [ $i -le $max ]; do
  # create text from grib2
  wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text -

  ((i++))
done >> existingfile.csv

awk -v linenum=4 '
  { array[FNR%linenum]=array[FNR%linenum]","##代码## } 
  END { for(i=1;i<linenum;i++) print array[i%linenum] }
' existingfile.csv > tempcsv.csv

mv tempcsv.csv existingfile.csv

If this is how I imagine it would work (internally), you should have 2 writes to existingfile.csvinstead of $maxnumber of writes. So hopefully this would speed things up.

如果这是我想象的（内部）工作方式，那么您应该有 2 次写入existingfile.csv而不是$max写入次数。所以希望这会加快速度。

bash 将列粘贴到循环中的现有文件

提问by bennos

采纳答案by German Garcia

回答by doubleDown

相关推荐

最近更新

标签

bash 将列粘贴到循环中的现有文件

提问by bennos

采纳答案by German Garcia

回答by doubleDown

相关推荐

bash 在 UNIX 中打印/获取文本文件中一行的第一个字符

bash 一次将多个字段设置为 awk 变量

bash 如何在文件中特定字段的单词字符之间插入空格

bash Shell 脚本语法错误：意外的行尾

相关推荐

最近更新

标签