Java 使用 distcp 或 s3distcp 将文件从 S3 复制到 HDFS

Question

提问by scalauser

I am trying to copy files from S3 to HDFS using the following command:

我正在尝试使用以下命令将文件从 S3 复制到 HDFS：

hadoop distcp s3n://bucketname/filename hdfs://namenodeip/directory

However this is not working, getting an error as following:

但是这不起作用，出现如下错误：

ERROR tools.DistCp: Exception encountered 
java.lang.IllegalArgumentException: Invalid hostname in URI

I have tried to add the S3 keys in hadoop conf.xml, and it is also not working. Please help me the appropriate step by step procedure to achieve the file copy from S3 to HDFS.

我试图在 hadoop conf.xml 中添加 S3 密钥，但它也不起作用。请帮助我完成从 S3 到 HDFS 的文件复制的适当分步过程。

Thanks in advance.

提前致谢。

Answer 1

采纳答案by scalauser

The command should be like this :

命令应该是这样的：

Hadoop distcp s3n://bucketname/directoryname/test.csv /user/myuser/mydirectory/

This will copy test.csv file from S3 to a HDFS directory called /mydirectory in the specified HDFS path. In this S3 file system is being used in a native mode. More details can be found on http://wiki.apache.org/hadoop/AmazonS3

这会将 test.csv 文件从 S3 复制到指定 HDFS 路径中名为 /mydirectory 的 HDFS 目录。在此 S3 文件系统中使用的是本机模式。更多细节可以在http://wiki.apache.org/hadoop/AmazonS3上找到

Answer 2

回答by Sathish

Copy log files stored in an Amazon S3 bucket into HDFS. Here --srcPattern option is used to limit the data copied to the daemon logs.

将存储在 Amazon S3 存储桶中的日志文件复制到 HDFS。这里 --srcPattern 选项用于限制复制到守护程序日志的数据。

Linux, UNIX, and Mac OS X users:

Linux、UNIX 和 Mac OS X 用户：

./elastic-mapreduce --jobflow j-3GY8JC4179IOJ --jar \
/home/hadoop/lib/emr-s3distcp-1.0.jar \
--args '--src,s3://myawsbucket/logs/j-3GY8JC4179IOJ/node/,\
--dest,hdfs:///output,\
--srcPattern,.*daemons.*-hadoop-.*'

Windows users:

视窗用户：

ruby elastic-mapreduce --jobflow j-3GY8JC4179IOJ --jar /home/hadoop/lib/emr-s3distcp-1.0.jar --args '--src,s3://myawsbucket/logs/j-3GY8JC4179IOJ/node/,--dest,hdfs:///output,--srcPattern,.*daemons.*-hadoop-.*'

Please check this link for more :
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html

请查看此链接了解更多信息：http:
//docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html

Hope this helps!

希望这可以帮助！

Java 使用 distcp 或 s3distcp 将文件从 S3 复制到 HDFS

提问by scalauser

采纳答案by scalauser

回答by Sathish

相关推荐

最近更新

标签

Java 使用 distcp 或 s3distcp 将文件从 S3 复制到 HDFS

提问by scalauser

采纳答案by scalauser

回答by Sathish

相关推荐

类型映射 MySQL 类型文本到 Java Hibernate

Java ESAPI getValidInput 方法的使用

Java 推荐的 JSF 2.0 CRUD 框架

Java编译器/解释器

相关推荐

最近更新

标签