bash 获取 HDFS 中最后更新的文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34688792/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get the last updated file in HDFS
提问by Neethu
I want the latest updated file from one of my HDFS directories. The code should basically loop through the directories and sub directories and the get the latest file path with the file name.I was able to get the latest file in local file system but not sure how to do it for HDFS one.
我想要来自我的 HDFS 目录之一的最新更新文件。代码应该基本上循环遍历目录和子目录,并获取带有文件名的最新文件路径。我能够在本地文件系统中获取最新文件,但不确定如何为 HDFS 执行此操作。
find /tmp/sdsa -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head
find /tmp/sdsa -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head
The above code is working for local file system. I am able to get the date , time and file name from HDFS, but how do I get the latest file using these 3 parameters?
上面的代码适用于本地文件系统。我可以从 HDFS 获取日期、时间和文件名,但是如何使用这 3 个参数获取最新文件?
this is the code I tried:
这是我试过的代码:
hadoop fs -ls -R /tmp/apps | awk -F" " '{print $6" "$7" "$8}'
hadoop fs -ls -R /tmp/apps | awk -F" " '{print $6" "$7" "$8}'
Any help will be appreciated.
任何帮助将不胜感激。
Thanks in advance.
提前致谢。
回答by Neethu
This one worked for me:
这个对我有用:
hadoop fs -ls -R /tmp/app | awk -F" " '{print $6" "$7" "$8}' | sort -nr | head -1 | cut -d" " -f3
hadoop fs -ls -R /tmp/app | awk -F" " '{print $6" "$7" "$8}' | sort -nr | head -1 | cut -d" " -f3
The output is the entire file path.
输出是整个文件路径。
回答by Durga Viswanath Gadiraju
Here is the command:
这是命令:
hadoop fs -ls -R /user| awk -F" " '{print " "" "}'|sort -nr|head|cut -d" " -f3-
Your script it self is good enough. Hadoop returns the dates in YYYY-MM-DD HH24:MI:SS format and hence you can just sort them alphabetically.
你的脚本本身就足够好了。Hadoop 以 YYYY-MM-DD HH24:MI:SS 格式返回日期,因此您可以按字母顺序对它们进行排序。