在 bash 脚本中运行 hadoop 命令

Question

提问by user468587

i need to run hadoop command in bash script, which go through bunch of folders on amazon S3, then write those folder names into a txt file, then do further process. but the problem is when i ran the script, seems no folder names were written to txt file. i wonder if it's the hadoop command took too long to run and the bash script didn't wait until it finished and go ahead to do further process, if so how i can make bash wait until the hadoop command finished then go do other process?

我需要在 bash 脚本中运行 hadoop 命令，它会遍历亚马逊 S3 上的一堆文件夹，然后将这些文件夹名称写入一个 txt 文件，然后进行进一步的处理。但问题是当我运行脚本时，似乎没有文件夹名称写入 txt 文件。我想知道是不是 hadoop 命令运行时间太长，而 bash 脚本没有等到它完成并继续做进一步的处理，如果是这样，我如何让 bash 等到 hadoop 命令完成然后去执行其他过程？

here is my code, i tried both way, neither works:

这是我的代码，我尝试了两种方法，都不起作用：

1. 
listCmd="hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate | grep s3n | awk -F' ' '{print }' | cut -f 4- -d / > $FILE_NAME"                            
echo -e "listing... $listCmd\n"                                                                                                                                                   
eval $listCmd
...other process ...

2. 
echo -e "list the folders we want to copy into a file"
hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate | grep s3n | awk -F' ' '{print }' | cut -f 4- -d / > $FILE_NAME
... other process ....

any one knows what might be wrong? and is it better to use the eval function or just use the second way to run hadoop command directly

任何人都知道可能有什么问题？是使用eval函数更好还是直接使用第二种方式运行hadoop命令

thanks.

谢谢。

Answer 1

回答by iamauser

I would prefer to evalin this case, prettier to append the next command to this one. and I would rather break down listCmdinto parts, so that you know there is nothing wrong at the grep, awkor cutlevel.

我宁愿eval在这种情况下，更漂亮的下一个命令追加到这一个。我宁愿分解listCmd成部分，以便您知道grep,awk或cut级别没有任何问题。

listCmd="hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate > $raw_File"
gcmd="cat $raw_File | grep s3n | awk -F' ' '{print }' | cut -f 4- -d / > $FILE_NAME"
echo "Running $listCmd and other commands after that"
otherCmd="cat $FILE_NAME"
eval "$listCmd";
echo $?  # This will print the exit status of the $listCmd
eval "$gcmd" && echo "Finished Listing" && eval "$otherCmd"

otherCmdwill only be executed if $gcmdsucceeds. If you have too many commands that you need to execute, then this becomes a bit ugly. If you roughly know how long it will take, you can insert a sleep command.

otherCmd只有$gcmd成功才会执行。如果你有太多的命令需要执行，那么这会变得有点难看。如果大致知道需要多长时间，可以插入sleep命令。

 eval "$listCmd"
 sleep 1800  # This will sleep 1800 seconds
 eval "$otherCmd"

在 bash 脚本中运行 hadoop 命令

提问by user468587

回答by iamauser

相关推荐

最近更新

标签

在 bash 脚本中运行 hadoop 命令

提问by user468587

回答by iamauser

相关推荐

bash LDAP 中的用户帐户创建日期

bash 如何将 json 或 class 对象传递给命令行 php 脚本

bash pgrep 检测重复进程

bash 从日志文件创建 .html 文件

相关推荐

最近更新

标签