bash 将列粘贴到循环中的现有文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12912727/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 03:33:12  来源:igfitidea点击:

Paste column to existing file in a loop

bashshellunixpaste

提问by bennos

I am using the paste command in a bash loop to add new columns to a CSV-file. I would like to reuse the CSV-file. Currently I am using a temporary file to accomplish this:

我在 bash 循环中使用 paste 命令将新列添加到 CSV 文件。我想重用 CSV 文件。目前我正在使用一个临时文件来完成这个:

while [ $i -le $max ]
    do
        # create text from grib2
        wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text tmptxt.txt

        #paste to temporary file
        paste -d, existingfile.csv tmptxt.txt > tmpcsv.csv  

        #overwrite old csv with new csv
        mv tmpcsv.csv existingfile.csv

        ((i++))
    done

After adding some columns the copy is getting slow, because the file is becoming bigger and bigger (every tmptxt.txthas about 2 MB, adding to approx 100 MB).

添加一些列后,副本变得越来越慢,因为文件变得越来越大(每个tmptxt.txt大约有 2 MB,增加到大约 100 MB)。

A tmptxt.txtis a plain txt-file with one column and one value per row:

Atmptxt.txt是一个普通的 txt 文件,其中包含一列和每行一个值:

1
2
3
.
.

The existingfile.csvwould then be

existingfile.csv随后将

1,1,x
2,2,y
3,3,z
.,.,.
.,.,.

Is there any way to use the paste command to add a column to an existing file? Or is there any other way?

有什么方法可以使用粘贴命令将列添加到现有文件中吗?或者还有其他方法吗?

Thanks

谢谢

采纳答案by German Garcia

Would it be feasible to split the operation in 2 ? One step for generating allthe intermediate files; and another for generating allthe final output file. The idea is to avoid rereading and rewriting over and over the final file.

将操作拆分为 2 是否可行?一步生成所有中间文件;另一个用于生成所有最终输出文件。这个想法是为了避免一遍又一遍地重新读取和重写最终文件。

The changes to the script would be something like this:

对脚本的更改如下所示:

while [ $i -le $max ]
do
    n=$(printf "%05d" $i)    # to preserve lexical order if $max > 9
    # create text from grib2
    wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text tmptxt$n.txt
    ((i++))
done

#make final file
paste -d, existingfile.csv tmptxt[0-9]*.txt > tmpcsv.csv  

#overwrite old csv with new csv
mv tmpcsv.csv existingfile.csv

回答by doubleDown

Assuming the number of lines output by the program is constant and is equal to number of lines in existingfile.csv(which should be the case since you are using paste)

假设程序输出的行数是常数并且等于行数existingfile.csv(因为您使用的是这种情况paste

Disclaimer: I'm not exactly sure if this would speed things up (depending on whether io redirection >>writes to the file exactly once or not). Anyway give it a try and let me know.

免责声明:我不确定这是否会加快速度(取决于 io 重定向是否只>>写入一次文件)。无论如何试一试,让我知道。

So the basic idea is

所以基本思路是

  1. append the output in one go after the loop is done (note the change: wgrib now prints to -which is stdout)

  2. use awk to move every linenumrows (linenumbeing the number of lines in existingfile.csv) to the end to the first linenumrows

    Save to tempcsv.csv(because I can't find a way to save in the same file)

  3. rename to / overwrite existingfile.csv

  1. 在循环完成后一次性追加输出(注意更改:wgrib 现在打印到-which is stdout

  2. 使用 awk 将每一linenum行(linenum即行数existingfile.csv)移动到第一linenum行的末尾

    保存到tempcsv.csv(因为我找不到保存在同一个文件中的方法)

  3. 重命名为/覆盖 existingfile.csv

.

.

while [ $i -le $max ]; do
  # create text from grib2
  wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text -

  ((i++))
done >> existingfile.csv

awk -v linenum=4 '
  { array[FNR%linenum]=array[FNR%linenum]","##代码## } 
  END { for(i=1;i<linenum;i++) print array[i%linenum] }
' existingfile.csv > tempcsv.csv

mv tempcsv.csv existingfile.csv

If this is how I imagine it would work (internally), you should have 2 writes to existingfile.csvinstead of $maxnumber of writes. So hopefully this would speed things up.

如果这是我想象的(内部)工作方式,那么您应该有 2 次写入existingfile.csv而不是$max写入次数。所以希望这会加快速度。