bash 循环 HDFS 目录中的文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40010820/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 15:18:07 来源:igfitidea点击:
Loop over files in HDFS directory
提问by Sal
I need to loop over all csv files in a Hadoop file system. I can list all of the files in a HDFS directory with
我需要遍历 Hadoop 文件系统中的所有 csv 文件。我可以列出 HDFS 目录中的所有文件
> hadoop fs -ls /path/to/directory
Found 2 items
drwxr-xr-x - hadoop hadoop 2 2016-10-12 16:20 /path/to/directory/tmp
-rwxr-xr-x 3 hadoop hadoop 4691945927 2016-10-12 19:37 /path/to/directory/myfile.csv
and can loop over all files in a standard directory with
并且可以循环遍历标准目录中的所有文件
for filename in /path/to/another/directory/*.csv; do echo $filename; done
but how can I combine the two?I've tried
但我怎样才能将两者结合起来呢?我试过了
for filename in `hadoop fs -ls /path/to/directory | grep csv`; do echo $filename; done
but that gives me some nonsense like
但这给了我一些废话
Found
2
items
drwxr-xr-x
hadoop
hadoop
2
2016-10-12
....
采纳答案by matesc
This should work
这应该工作
for filename in `hadoop fs -ls /path/to/directory | awk '{print $NF}' | grep .csv$ | tr '\n' ' '`
do echo $filename; done