Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

sqoop import to hive with append mode

Solved Go to solution
Highlighted

sqoop import to hive with append mode

Super Collaborator

hi:

can i import with sqoop to hive with append?? i am receiving this error, also 2.8GB it took 2 hour and 35 minutes its normal??

16/06/08 21:23:40 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/06/08 21:23:40 DEBUG sqoop.Sqoop: Append mode for hive imports is not  yet supported. Please remove the parameter --append-mode
Append mode for hive imports is not  yet supported. Please remove the parameter --append-mode
	at org.apache.sqoop.tool.BaseSqoopTool.validateHiveOptions(BaseSqoopTool.java:1410)
	at org.apache.sqoop.tool.ImportTool.validateOptions(ImportTool.java:1130)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:138)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
Append mode for hive imports is not  yet supported. Please remove the parameter --append-mode

and mi script is like this

sqoop import -D oraoop.disabled=true --verbose \
--connect jdbc:oracle:thin:@hostname:2521/CIP_BATCH  \
--username=U029550 \
--password=Mayo2016 \
--hive-import \
--hive-table desercion_clientes_2 \
--hive-overwrite \
--query "select ID_INTERNO_PE,MI_FECHA_FIN_MES,COD_NRBE_EN,COD_LINEA,ID_GRP_PD,MI_NUM_TOT_AC_ACT,MI_NUM_AC_SUS,MI_SDO_AC_P,MI_NUM_AC_P,MI_DIA_AC_P,MI_INT_DEV_ACR_D,MI_INT_DEV_DEU_D,MI_COMIS_APL_D,MI_TOT_MOV_D,MI_TOT_MOV_H,MI_TOT_IMP_MOV_D,MI_TOT_IMP_MOV_H from RDWC01.MI_CLTE_ECO_GEN where \$CONDITIONS AND COD_RL_PERS_AC = 01 AND COD_LINEA in ('01','03','04','05') AND COD_NRBE_EN = '3159' AND TRUNC(MI_FECHA_FIN_MES) >=TO_DATE('2010-01-01', 'YYYY-MM-DD')" \
--boundary-query "select min(ID_INTERNO_PE), max(ID_INTERNO_PE) from RDWC01.MI_CLTE_ECO_GEN" \
--incremental append \
--check-column MI_FECHA_FIN_MES \
--last-value $LAST_ROW \
--num-mappers 5 \
--split-by ID_INTERNO_PE \
--direct \
--fetch-size 10000 \
--target-dir /RSI/staging/tmp/desercion_clientes_2
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: sqoop import to hive with append mode

As the error message indicates, you can't append to a Hive table yet. The "--incremental append" can be used for HDFS based output tied to a temporary Hive table. You can then insert into your main Hive table from the temporary table. Here is a blog article about the general approach: http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/

As for the performance, I would suggest that you look at Oracle while your query is running to determine if there are any bottlenecks there. I've been able to use Sqoop to pull hundreds of millions of records from Oracle in under a couple of hours easily. I've always found the bottleneck is on the Oracle side. You can try tweaking the --num-mappers value and --fetch-size to see if that improves your speeds.

View solution in original post

3 REPLIES 3
Highlighted

Re: sqoop import to hive with append mode

Highlighted

Re: sqoop import to hive with append mode

This approach definitely works and is much easier to follow.

Highlighted

Re: sqoop import to hive with append mode

As the error message indicates, you can't append to a Hive table yet. The "--incremental append" can be used for HDFS based output tied to a temporary Hive table. You can then insert into your main Hive table from the temporary table. Here is a blog article about the general approach: http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/

As for the performance, I would suggest that you look at Oracle while your query is running to determine if there are any bottlenecks there. I've been able to use Sqoop to pull hundreds of millions of records from Oracle in under a couple of hours easily. I've always found the bottleneck is on the Oracle side. You can try tweaking the --num-mappers value and --fetch-size to see if that improves your speeds.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here