Support Questions
Find answers, ask questions, and share your expertise

Sqoop HCatalog --hcatalog-storage-stanza Not Supporting ORC file format

Highlighted

Sqoop HCatalog --hcatalog-storage-stanza Not Supporting ORC file format

New Contributor

I want to import data from rdbms syatems to hdfs in orc format & snappy compression. for that as per the instructions given in forum i am using sqoop hcatalog methodology to achieve my goal.--hcatalog-storage-stanza not supporting orcfile options other than rcfile. Kindly provide your valuable inputs to move forward with Sqoop Hcatalog.Below are the observations & additional details

 

Hadoop Distribution: HDP 2.6.5.165-3

Running Sqoop version: 1.4.6.2.6.5.165-3

 

Below Sqoop import command configured in oozie.

<command>import  -Dhadoop.security.credential.provider.path=jceks://hdfs/appz/xyz/credential/centralstore.jceks --connect jdbc:oracle:thin:@oxxx21:1621/oxxx21 --password-alias hdpedmingest@oram21 --username hdpedmingest --compression-codec org.apache.hadoop.io.compress.SnappyCodec --table CO0101.ADDRESS --validate --escaped-by \\ --null-string \\N --null-non-string \\N -m 1 --hcatalog-database srcpub_oracle_oxxx21_co0101 --hcatalog-table address_orc --drop-and-create-hcatalog-table --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' </command>

 

Error:

2020-05-17 15:27:43,825 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: AS
2020-05-17 15:27:43,825 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: ORC
2020-05-17 15:27:43,825 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: tblproperties
2020-05-17 15:27:43,825 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: ("orc.compress"="SNAPPY")'

 

If I remove --hcatalog-storage-stanza getting below error. 

2020-05-17 14:26:35,731 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - HCatalog Create table statement:

drop table `srcpub_oracle_oram21_co0101`.`address_orc`;
create table `srcpub_oracle_oram21_co0101`.`address_orc` (
`cl_id` varchar(9),
`cl_ad_sqn` decimal(10),
`ad_lst_cg_dt` string,
`cty_ad` varchar(31),
`state_cd` varchar(2),
`zip` varchar(9),
`extnd_zip_cd` varchar(4),
`str_one_ad` varchar(31),
`str_two_ad` varchar(31),
`str_thre_ad` varchar(31),
`cntry_cd` varchar(3),
`us_cnty_cd` varchar(3),
`uclmd_mail_dt` string,
`ad_ovrd_ir` varchar(1),
`cntc_pt_tp_cd` varchar(3),
`crea_logn_id_cd` varchar(8),
`crea_ts` string,
`lst_mod_logn_id_cd` varchar(8),
`lst_mod_ts` string)
stored as rcfile

2020-05-17 15:36:35,731 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Executing external HCatalog CLI process with args :-f,/tmp/hcat-script-1589739995731
2020-05-17 15:36:35,735 [main] ERROR org.apache.sqoop.Sqoop - Got exception running Sqoop: java.lang.NullPointerException

 

 

1 REPLY 1

Re: Sqoop HCatalog --hcatalog-storage-stanza Not Supporting ORC file format

New Contributor

Resolved the Issue. by configuring in sqoop action with arguments and also configured hcatalog-home to resolve the issue as below.

<action name="run-sqoop" cred="hcat_credentials">
<sqoop xmlns="uri:oozie:sqoop-action:0.4">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/apps/dif/config/hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>oozie.launcher.mapred.job.queue.name</name>
<value>${oozieQueueName}</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>mr</value>
</property>
</configuration>
<arg>import</arg>
<arg>-Dhadoop.security.credential.provider.path=${jceks_locn}</arg>
<arg>--connect</arg>
<arg>${jdbc_url}</arg>
<arg>--username</arg>
<arg>${user_name}</arg>
<arg>--password-alias</arg>
<arg>${password_aloas}</arg>
<arg>--compress</arg>
<arg>--compression-codec</arg>
<arg>org.apache.hadoop.io.compress.SnappyCodec</arg>
<arg>--table</arg>
<arg>${table_name}</arg>
<arg>--validate</arg>
<arg>--escaped-by</arg>
<arg>\\</arg>
<arg>--null-string</arg>
<arg>\\N</arg>
<arg>--null-non-string</arg>
<arg>\\N</arg>
<arg>--map-column-hive</arg>
<arg>${map_coluns}</arg>
<arg>--hive-delims-replacement</arg>
<arg>" "</arg>
<arg>--hcatalog-home</arg>
<arg>/usr/hdp/current/hive-webhcat</arg>
<arg>--hcatalog-database</arg>
<arg>${target_db_name}</arg>
<arg>--hcatalog-table</arg>
<arg>${target_table_name}</arg>
<arg>--drop-and-create-hcatalog-table</arg>
<arg>--hcatalog-storage-stanza</arg>
<arg>"STORED AS ORC tblproperties ('orc.compress'='SNAPPY')"</arg>
<arg>--m</arg>
<arg>1</arg>
<arg>--skip-dist-cache</arg>
<file>/apps/dif/config/hive-site.xml</file>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>