oracle 在 Sqoop 中使用 HCatalog 时，hive-drop-import-delims 不删除换行符

Question

提问by Suraj Nayak

Sqoop while used with HCatalog import not able to remove new line (\n) from column data even after using --hive-drop-import-delims option in the command when running Apache Sqoop with Oracle.

Sqoop 与 HCatalog 导入一起使用时，即使在将 Apache Sqoop 与 Oracle 一起运行时在命令中使用 --hive-drop-import-delims 选项后，也无法从列数据中删除新行 (\n)。

Sqoop Query:

Sqoop 查询：

    sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

Data in Oracle Column col4as below: (Data has control characters such as ^M)

Oracle Column col4中的数据如下：（数据有^M等控制字符）

<li>Details:^M
    <ul>^M
        <li>

Does Control character causing this problem?

控制字符是否导致此问题？

Am I missing anything ? Is there any workaround or solution for this problem?

我错过了什么吗？这个问题有什么解决方法或解决方案吗？

Answer 1

回答by Suraj Nayak

Use --map-column-javaoption to explicitly state the column is of type String. Then --hive-drop-import-delimsworks as expected (to remove \nfrom data).

使用--map-column-java选项显式声明该列的类型String。然后--hive-drop-import-delims按预期工作（\n从数据中删除）。

Changed Sqoop Command :

更改了 Sqoop 命令：

sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string "" --map-column-java col4=String

Answer 2

回答by bunty

sqoop import \
--connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username 123 \
--password 123 \
--table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 \
--hive-delims-replacement "anything" \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

You can try this --hive-delims-replacement "anything" this will replace all \n , \t , and \01 characters with the string you provided(in this case replace with string "anything").

您可以试试这个 --hive-delims-replacement "anything" 这将用您提供的字符串替换所有 \n 、 \t 和 \01 字符（在这种情况下替换为字符串“anything”）。

Answer 3

回答by Lynn Han

From the official website: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

来自官网：https: //sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

Hive will have problems using Sqoop-imported data if your database's rows contain string fields that have Hive's default row delimiters (\n and \r characters) or column delimiters (\01 characters) present in them. You can use the --hive-drop-import-delims option to drop those characters on import to give Hive-compatible text data. Alternatively, you can use the --hive-delims-replacement option to replace those characters with a user-defined string on import to give Hive-compatible text data. These options should only be used if you use Hive's default delimiters and should not be used if different delimiters are specified.

如果您的数据库行包含的字符串字段中存在 Hive 的默认行分隔符（\n 和 \r 字符）或列分隔符（\01 字符），则 Hive 在使用 Sqoop 导入的数据时会出现问题。您可以使用 --hive-drop-import-delims 选项在导入时删除这些字符以提供与 Hive 兼容的文本数据。或者，您可以使用 --hive-delims-replacement 选项在导入时将这些字符替换为用户定义的字符串，以提供与 Hive 兼容的文本数据。仅当您使用 Hive 的默认分隔符时才应使用这些选项，如果指定了不同的分隔符，则不应使用这些选项。

oracle 在 Sqoop 中使用 HCatalog 时，hive-drop-import-delims 不删除换行符

提问by Suraj Nayak

回答by Suraj Nayak

回答by bunty

回答by Lynn Han

相关推荐

最近更新

标签

oracle 在 Sqoop 中使用 HCatalog 时，hive-drop-import-delims 不删除换行符

提问by Suraj Nayak

回答by Suraj Nayak

回答by bunty

回答by Lynn Han

相关推荐

Oracle 在级联中启用禁用约束

如何使用 SQL 确定 Oracle DB 字段中的字符是否在 UTF8 字符集中但在 LATN-1 之外？

Oracle 中的 NLS_CHARACTERSET WE8ISO8859P1 和 UTF8 问题

oracle 提交后如何回滚我的数据库更改？

相关推荐

最近更新

标签