oracle 在 Sqoop 中使用 HCatalog 时,hive-drop-import-delims 不删除换行符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28076200/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
hive-drop-import-delims not removing newline while using HCatalog in Sqoop
提问by Suraj Nayak
Sqoop while used with HCatalog import not able to remove new line (\n) from column data even after using --hive-drop-import-delims option in the command when running Apache Sqoop with Oracle.
Sqoop 与 HCatalog 导入一起使用时,即使在将 Apache Sqoop 与 Oracle 一起运行时在命令中使用 --hive-drop-import-delims 选项后,也无法从列数据中删除新行 (\n)。
Sqoop Query:
Sqoop 查询:
sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""
Data in Oracle Column col4as below: (Data has control characters such as ^M)
Oracle Column col4中的数据如下:(数据有^M等控制字符)
<li>Details:^M
<ul>^M
<li>
Does Control character causing this problem?
控制字符是否导致此问题?
Am I missing anything ? Is there any workaround or solution for this problem?
我错过了什么吗?这个问题有什么解决方法或解决方案吗?
回答by Suraj Nayak
Use --map-column-java
option to explicitly state the column is of type String
. Then --hive-drop-import-delims
works as expected (to remove \n
from data).
使用--map-column-java
选项显式声明该列的类型String
。然后--hive-drop-import-delims
按预期工作(\n
从数据中删除)。
Changed Sqoop Command :
更改了 Sqoop 命令:
sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string "" --map-column-java col4=String
回答by bunty
sqoop import \
--connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username 123 \
--password 123 \
--table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 \
--hive-delims-replacement "anything" \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""
You can try this --hive-delims-replacement "anything" this will replace all \n , \t , and \01 characters with the string you provided(in this case replace with string "anything").
您可以试试这个 --hive-delims-replacement "anything" 这将用您提供的字符串替换所有 \n 、 \t 和 \01 字符(在这种情况下替换为字符串“anything”)。
回答by Lynn Han
From the official website: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html
来自官网:https: //sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html
Hive will have problems using Sqoop-imported data if your database's rows contain string fields that have Hive's default row delimiters (\n and \r characters) or column delimiters (\01 characters) present in them. You can use the --hive-drop-import-delims option to drop those characters on import to give Hive-compatible text data. Alternatively, you can use the --hive-delims-replacement option to replace those characters with a user-defined string on import to give Hive-compatible text data. These options should only be used if you use Hive's default delimiters and should not be used if different delimiters are specified.
如果您的数据库行包含的字符串字段中存在 Hive 的默认行分隔符(\n 和 \r 字符)或列分隔符(\01 字符),则 Hive 在使用 Sqoop 导入的数据时会出现问题。您可以使用 --hive-drop-import-delims 选项在导入时删除这些字符以提供与 Hive 兼容的文本数据。或者,您可以使用 --hive-delims-replacement 选项在导入时将这些字符替换为用户定义的字符串,以提供与 Hive 兼容的文本数据。仅当您使用 Hive 的默认分隔符时才应使用这些选项,如果指定了不同的分隔符,则不应使用这些选项。