在 bash 脚本中运行 hadoop 命令
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19148745/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
run hadoop command in bash script
提问by user468587
i need to run hadoop command in bash script, which go through bunch of folders on amazon S3, then write those folder names into a txt file, then do further process. but the problem is when i ran the script, seems no folder names were written to txt file. i wonder if it's the hadoop command took too long to run and the bash script didn't wait until it finished and go ahead to do further process, if so how i can make bash wait until the hadoop command finished then go do other process?
我需要在 bash 脚本中运行 hadoop 命令,它会遍历亚马逊 S3 上的一堆文件夹,然后将这些文件夹名称写入一个 txt 文件,然后进行进一步的处理。但问题是当我运行脚本时,似乎没有文件夹名称写入 txt 文件。我想知道是不是 hadoop 命令运行时间太长,而 bash 脚本没有等到它完成并继续做进一步的处理,如果是这样,我如何让 bash 等到 hadoop 命令完成然后去执行其他过程?
here is my code, i tried both way, neither works:
这是我的代码,我尝试了两种方法,都不起作用:
1.
listCmd="hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate | grep s3n | awk -F' ' '{print }' | cut -f 4- -d / > $FILE_NAME"
echo -e "listing... $listCmd\n"
eval $listCmd
...other process ...
2.
echo -e "list the folders we want to copy into a file"
hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate | grep s3n | awk -F' ' '{print }' | cut -f 4- -d / > $FILE_NAME
... other process ....
any one knows what might be wrong? and is it better to use the eval function or just use the second way to run hadoop command directly
任何人都知道可能有什么问题?是使用eval函数更好还是直接使用第二种方式运行hadoop命令
thanks.
谢谢。
回答by iamauser
I would prefer to eval
in this case, prettier to append the next command to this one. and I would rather break down listCmd
into parts, so that you know there is nothing wrong at the grep
, awk
or cut
level.
我宁愿eval
在这种情况下,更漂亮的下一个命令追加到这一个。我宁愿分解listCmd
成部分,以便您知道grep
,awk
或cut
级别没有任何问题。
listCmd="hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate > $raw_File"
gcmd="cat $raw_File | grep s3n | awk -F' ' '{print }' | cut -f 4- -d / > $FILE_NAME"
echo "Running $listCmd and other commands after that"
otherCmd="cat $FILE_NAME"
eval "$listCmd";
echo $? # This will print the exit status of the $listCmd
eval "$gcmd" && echo "Finished Listing" && eval "$otherCmd"
otherCmd
will only be executed if $gcmd
succeeds. If you have too many commands that you need to execute, then this becomes a bit ugly. If you roughly know how long it will take, you can insert a sleep command.
otherCmd
只有$gcmd
成功才会执行。如果你有太多的命令需要执行,那么这会变得有点难看。如果大致知道需要多长时间,可以插入sleep命令。
eval "$listCmd"
sleep 1800 # This will sleep 1800 seconds
eval "$otherCmd"