bash awk 是否支持动态用户定义变量?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11880654/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 02:58:08  来源:igfitidea点击:

Does awk support dynamic user-defined variables?

bashdynamicawk

提问by fanlix

awk supports this:

awk 支持这个:

awk '{print $(NF-1);}'

but not for user-defined variables:

但不适用于用户定义的变量:

awk '{a=123; b="a"; print $($b);}'

by the way, shell supports this:

顺便说一下,shell 支持这个:

a=123;
b="a";
eval echo ${$b};

How can I achieve my purpose in awk?

我怎样才能在 awk 中达到我的目的?

采纳答案by GreenFox

Not at the moment. However, if you provide a wrapper, it is (somewhat hacky and dirty) possible. The idea is to use @ operator, introduced in the recent versions of gawk.

现在不行。但是,如果您提供包装器,则它是(有点笨拙和肮脏的)可能的。这个想法是使用@ 操作符,在最新版本的 gawk 中引入。

This @ operator is normally used to call a function by name. So if you had

这个@ 操作符通常用于按名称调用函数。所以如果你有

function foo(s){print "Called foo "s}
function bar(s){print "Called bar "s}
{
    var = "";
    if(today_i_feel_like_calling_foo){
        var = "foo";
    }else{
        var = "bar";
    }
    @var( "arg" ); # This calls function foo(), or function bar() with "arg"
}

Now, this is usefull on it's own. Assuming we know var names beforehand, we can write a wrapper to indirectly modify and obtain vars

现在,这本身就很有用。 假设我们事先知道var名称,我们可以编写一个包装器来间接修改和获取var。

function get(varname, this, call){call="get_"varname;return @call();}
function set(varname, arg, this, call){call="set_"varname; @call(arg);}

So now, for each var name you want to prrvide access by name, you declare these two functions

所以现在,对于每个想要按名称访问的 var 名称,您声明这两个函数

function get_my_var(){return my_var;}
function set_my_var(arg){my_var = arg;}

And prahaps, somewhere in your BEGIN{} block,

而且,在你的 BEGIN{} 块中的某个地方,

BEGIN{ my_var = ""; }

To declare it for global access. Then you can use

将其声明为全局访问。然后你可以使用

get("my_var");
set("my_var", "whatever");

This may appear useless at first, however there are perfectly good use cases, such as keeping a linked list of vars, by holding the var's name in another var's array, and such. It works for arrays too, and to be honest, I use this for nesting and linking Arrays within Arrays, so I can walk through multiple Arrays like using pointers.

起初这可能看起来毫无用处,但是有一些非常好的用例,例如通过将 var 的名称保存在另一个 var 的数组中来保存 var 的链表,等等。它也适用于数组,老实说,我使用它在数组中嵌套和链接数组,因此我可以像使用指针一样遍历多个数组。

You can also write configure scripts that refer to var names inside awk this way, in effect having a interpreter-inside-a-interpreter type of things, too...

您还可以编写配置脚本,以这种方式引用 awk 中的 var 名称,实际上也具有解释器内部解释器类型的东西......

Not the best way to do things, however, it gets the job done, and I do not have to worry about null pointer exceptions, or GC and such :-)

不是最好的做事方式,但是,它可以完成工作,而且我不必担心空指针异常或 GC 之类的 :-)

回答by GreenFox

OK, since some of us like to eat spaghetti through their nose, here is some actualcode that I wrote in the past :-)
First of all, getting a self modifying code in a language that does not support it will be extremely non-trivial.

好的,因为我们中的一些人喜欢吃意大利面,这是我过去写的一些实际代码:-)
首先,在不支持它的语言中获得自我修改代码将是非常不-琐碎的。

The idea to allow dynamic variables, function names, in a language that does not support one is very simple. At some state in the program, you want a dynamic anythingto self modify your code, and resume execution from where you left off. a eval(), that is.

在不支持的语言中允许动态变量、函数名的想法非常简单。在程序的某些状态下,您需要一个动态的任何东西来自我修改您的代码,并从您中断的地方恢复执行。一个eval(),也就是。

This is all very trivial, if the language supports eval()and such equlavant. However, awk does not have such function. Therefore, you, the programmerhas to provide a interface to such thing.

如果语言支持eval()和这样的平等,这一切都是微不足道的。但是,awk 没有这样的功能。因此,您,程序员必须为此类事情提供接口。

To allow all this to happen, you have three main problems

为了让这一切发生,你有三个主要问题

  1. How to get our self so we can modify it
  2. How to load the modified code, and resume from where we left off
  3. Finding a way for the interpreter to accept our modified code
  1. 如何获得我们的自我,以便我们可以修改它
  2. 如何加载修改后的代码,并从我们停止的地方继续
  3. 寻找一种方式让解释器接受我们修改后的代码

How to get our self so we can modify it

如何获得我们的自我,以便我们可以修改它

Here is a example code, suitable for direct execution. This one is the infastrucure that I inject for enviroments running gawk, as it requires PROCINFO

这是一个示例代码,适合直接执行。这是我为运行 gawk 的环境注入的 infastrucure,因为它需要 PROCINFO

echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
 x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
 A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
 while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
    print "foo";
    dbg_argv(A);
    dbg_printarray(A);
    print "bar";
}'

Result:

结果:

foo
A[1]=[awk]
A[2]=[
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
 x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
 A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
 while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
print "foo";
dbg_argv(A);
dbg_printarray(A);
print "bar";
}]
bar

As you can see, as long as the OS does not play with our args, and /proc/is available, it is possible to read our self. This may appear useless at first, but we needit for push/pop of our stack, so that our execution state can be enbedded within the code, so we can save/resume and survive OS shutdown/reboots

如您所见,只要操作系统不使用我们的 args,并且/proc/可用,就可以读取我们的 self.args。起初这可能看起来没用,但我们需要它来推送/弹出堆栈,以便我们的执行状态可以嵌入代码中,因此我们可以保存/恢复并在操作系统关闭/重新启动后幸存下来

I have left out the OS detection function and the bootloader (written in awk), because, if I publish that, kids can build platform independent polynormal code, and it is easy to cause havoc with it.

我省略了操作系统检测功能和引导加载程序(用 awk 编写),因为如果我发布它,孩子们可以构建独立于平台的 polynormal 代码,并且很容易对其造成破坏。

how to load the modified code, and resume from where we left off

如何加载修改后的代码,并从我们停止的地方继续

Now, normaly you have push()and pop()for registers, so you can save your state and play with your self, and resume from where you left off. a Call and reading your stack is a typical way to get the memory address.

现在,您通常拥有push()pop()用于寄存器,因此您可以保存状态并玩弄自己,然后从上次中断的地方继续。调用并读取堆栈是获取内存地址的典型方法。

Unfortunetly, in awk, under normal situations we can not use pointers (with out a lot of dirty work), or registers (unless you can inject other stuff along the way). However you need a way to suspend and resume from your code.

不幸的是,在 awk 中,在正常情况下我们不能使用指针(没有很多肮脏的工作)或寄存器(除非您可以在此过程中注入其他东西)。但是,您需要一种方法来暂停和恢复您的代码。

The idea is simple. Instead of letting awk in control of your loops and while, if else conditions, recrusion depth, and functions you are in, the code should. Keep a stack, list of variable names, list of function names, and manage it your self. Just make sure that your code always calls self_modify( bool )constantly, so that even upon sudden failure, As soon as the script is re-run, we can enter self_modify( bool )and resume our state. When you want to self modify your code, you must provide a custom made write_stack()and read_stack()code, that writes out the state of stack as string, and reads string from the values out from the code embedded string itself, and resume the execution state.

这个想法很简单。不要让 awk 控制你的循环和 while,如果其他条件、递归深度和你所在的函数,代码应该。保留一个堆栈、变量名称列表、函数名称列表,并自行管理。只要确保您的代码始终self_modify( bool )不断调用,即使突然失败,只要重新运行脚本,我们就可以进入self_modify( bool )并恢复我们的状态。当你想自己修改代码,你必须提供一个定制 write_stack()read_stack()代码,即写出栈字符串的状态,读取数值串出从代码嵌入字符串本身,并恢复执行状态。

Here is a small piece of code that demonstrates the whole flow

这是演示整个流程的一小段代码

echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
 x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function _(s){return s}
function dbg_argv(A ,this,p){
 A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
 while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
    _(BEGIN_MODIFY"|");print "#foo";_("|"END_MODIFY)
    dbg_argv(A);
    sub( \
    "BEGIN_MODIFY\x22\x5c\x7c[^\x5c\x7c]*\x5c\x7c\x22""END_MODIFY", \
    "BEGIN_MODIFY\x22\x7c\x22);print \"#"PROCINFO["pid"]"\";_(\x22\x7c\x22""END_MODIFY" \
     ,A[2]) 
    print "echo \x22\x22\x7c awk \x27"A[2]"";
    print "function bar_"PROCINFO["pid"]"_(s){print \x22""doe\x22}";
    print "\x27"
}'

Result:

结果:

Exactly same as our original code, except

与我们的原始代码完全相同,除了

_(BEGIN_MODIFY"|");print "65964";_("|"ND_MODIFY)

and

function bar_56228_(s){print "doe"}

at the end of code

在代码末尾

Now, this may seem useless, as we are only replaceing code print "foo";with our pid. But it becomes usefull, when there are multiple _() with separate MAGIC strings to identify BLOCKS, and a custome made multi line string replacement routine instead of sub()

现在,这似乎没用,因为我们只是print "foo";用我们的 pid 替换代码。但它变得有用,当有多个 _() 用单独的 MAGIC 字符串来标识块,并且定制的多行字符串替换例程而不是sub()

You msut provide BLOCKS for stack, function list, execution point, as a bare minimum.

您至少要为堆栈、函数列表、执行点提供 BLOCKS。

And notice that the last line contains barThis it self is just a sting, but when this code repeatedly gets executed, notice that

请注意,最后一行包含barThis it self 只是一个刺痛,但是当此代码重复执行时,请注意

function bar_56228_(s){print "doe"}
function bar_88128_(s){print "doe"}
...

and it keeps growing. While the example is intentionally made so that it does nothing useful, if we provide a routine to call bar_pid_(s)instead of that print "foo"code, Sudenly it means we have eval()on our hands:-) Now, isn't eval() usefull :-)

并且它一直在增长。虽然这个例子是故意制作的,所以它没有任何用处,但如果我们提供一个例程来调用bar_pid_(s)而不是那个print "foo"代码,突然这意味着我们eval()手上有:-) 现在, eval() 不是有用的 :-)

Don't forget to provide a custome made remove_block() function so that the code maintains a reasonable size, instead of growing every time you execute.

不要忘记提供定制的 remove_block() 函数,以便代码保持合理的大小,而不是每次执行时都会增长。

Finding a way for the interpreter to accept our modified code

寻找一种方式让解释器接受我们修改后的代码

Normally calling a binary is trivial. However, when doing so from with in awk, it becomes difficult. You may say system() is the way.

通常调用二进制文件是微不足道的。但是,当从 awk 中这样做时,它变得困难。你可能会说 system() 就是这样。

There are two problems to that.

有两个问题。

  1. system() may not work on some envoroments
  2. it blocks while you are executing code, trus you can not perform recrusive calls and keep the user happy at the same time.
  1. system() 可能不适用于某些环境
  2. 它会在您执行代码时阻塞,相信您无法同时执行递归调用并让用户满意。

If you must use system(), ensure that it does not block. A normal call to system("sleep 20 && echo from-sh & ")will not work. The solution is simple,

如果必须使用system(),请确保它不会阻塞。正常调用system("sleep 20 && echo from-sh & ")将不起作用。解决方法很简单,

echo ""|awk '{print "foo";E="echo ep ; sleep 20 && echo foo & disown ; ";  E | getline v;close(E);print "bar";}'

Now you have a async system() call that does not block :-)

现在你有一个不会阻塞的异步 system() 调用:-)

回答by brandizzi

The $notation is not a mark for variables, as in shell, PHP, Perl etc. It is rather an operator, which receives an integervalue nand returns the n-th column from the input. So, what you did in the first example is not the setting/getting of a variable dynamically but rather a call to an operator/function.

$表示法是不变量的标记,如在壳,PHP,Perl的等,这是相当的操作者,它接收数值Ñ并从输入返回第n列。因此,您在第一个示例中所做的不是动态设置/获取变量,而是调用运算符/函数。

As stated by commenters, you can archive the behavior you are looking for with arrays:

正如评论者所说,您可以使用数组归档您正在寻找的行为:

awk '{a=123; b="a"; v[b] = a; print v[b];}'