bash 将列粘贴到循环中的现有文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12912727/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Paste column to existing file in a loop
提问by bennos
I am using the paste command in a bash loop to add new columns to a CSV-file. I would like to reuse the CSV-file. Currently I am using a temporary file to accomplish this:
我在 bash 循环中使用 paste 命令将新列添加到 CSV 文件。我想重用 CSV 文件。目前我正在使用一个临时文件来完成这个:
while [ $i -le $max ]
do
# create text from grib2
wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text tmptxt.txt
#paste to temporary file
paste -d, existingfile.csv tmptxt.txt > tmpcsv.csv
#overwrite old csv with new csv
mv tmpcsv.csv existingfile.csv
((i++))
done
After adding some columns the copy is getting slow, because the file is becoming bigger and bigger (every tmptxt.txthas about 2 MB, adding to approx 100 MB).
添加一些列后,副本变得越来越慢,因为文件变得越来越大(每个tmptxt.txt大约有 2 MB,增加到大约 100 MB)。
A tmptxt.txtis a plain txt-file with one column and one value per row:
Atmptxt.txt是一个普通的 txt 文件,其中包含一列和每行一个值:
1
2
3
.
.
The existingfile.csvwould then be
在existingfile.csv随后将
1,1,x
2,2,y
3,3,z
.,.,.
.,.,.
Is there any way to use the paste command to add a column to an existing file? Or is there any other way?
有什么方法可以使用粘贴命令将列添加到现有文件中吗?或者还有其他方法吗?
Thanks
谢谢
采纳答案by German Garcia
Would it be feasible to split the operation in 2 ? One step for generating allthe intermediate files; and another for generating allthe final output file. The idea is to avoid rereading and rewriting over and over the final file.
将操作拆分为 2 是否可行?一步生成所有中间文件;另一个用于生成所有最终输出文件。这个想法是为了避免一遍又一遍地重新读取和重写最终文件。
The changes to the script would be something like this:
对脚本的更改如下所示:
while [ $i -le $max ]
do
n=$(printf "%05d" $i) # to preserve lexical order if $max > 9
# create text from grib2
wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text tmptxt$n.txt
((i++))
done
#make final file
paste -d, existingfile.csv tmptxt[0-9]*.txt > tmpcsv.csv
#overwrite old csv with new csv
mv tmpcsv.csv existingfile.csv
回答by doubleDown
Assuming the number of lines output by the program is constant and is equal to number of lines in existingfile.csv(which should be the case since you are using paste)
假设程序输出的行数是常数并且等于行数existingfile.csv(因为您使用的是这种情况paste)
Disclaimer: I'm not exactly sure if this would speed things up (depending on whether io redirection >>writes to the file exactly once or not). Anyway give it a try and let me know.
免责声明:我不确定这是否会加快速度(取决于 io 重定向是否只>>写入一次文件)。无论如何试一试,让我知道。
So the basic idea is
所以基本思路是
append the output in one go after the loop is done (note the change: wgrib now prints to
-which isstdout)use awk to move every
linenumrows (linenumbeing the number of lines inexistingfile.csv) to the end to the firstlinenumrowsSave to
tempcsv.csv(because I can't find a way to save in the same file)rename to / overwrite
existingfile.csv
在循环完成后一次性追加输出(注意更改:wgrib 现在打印到
-which isstdout)使用 awk 将每一
linenum行(linenum即行数existingfile.csv)移动到第一linenum行的末尾保存到
tempcsv.csv(因为我找不到保存在同一个文件中的方法)重命名为/覆盖
existingfile.csv
.
.
while [ $i -le $max ]; do
# create text from grib2
wgrib2 -d 1.$(($i+1)) -no_header myGribFile.grb2 -text -
((i++))
done >> existingfile.csv
awk -v linenum=4 '
{ array[FNR%linenum]=array[FNR%linenum]","##代码## }
END { for(i=1;i<linenum;i++) print array[i%linenum] }
' existingfile.csv > tempcsv.csv
mv tempcsv.csv existingfile.csv
If this is how I imagine it would work (internally), you should have 2 writes to existingfile.csvinstead of $maxnumber of writes. So hopefully this would speed things up.
如果这是我想象的(内部)工作方式,那么您应该有 2 次写入existingfile.csv而不是$max写入次数。所以希望这会加快速度。

