Support Questions

Find answers, ask questions, and share your expertise

sqoop import to hive with append mode

avatar
Master Collaborator

hi:

can i import with sqoop to hive with append?? i am receiving this error, also 2.8GB it took 2 hour and 35 minutes its normal??

16/06/08 21:23:40 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/06/08 21:23:40 DEBUG sqoop.Sqoop: Append mode for hive imports is not  yet supported. Please remove the parameter --append-mode
Append mode for hive imports is not  yet supported. Please remove the parameter --append-mode
	at org.apache.sqoop.tool.BaseSqoopTool.validateHiveOptions(BaseSqoopTool.java:1410)
	at org.apache.sqoop.tool.ImportTool.validateOptions(ImportTool.java:1130)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:138)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
Append mode for hive imports is not  yet supported. Please remove the parameter --append-mode

and mi script is like this

sqoop import -D oraoop.disabled=true --verbose \
--connect jdbc:oracle:thin:@hostname:2521/CIP_BATCH  \
--username=U029550 \
--password=Mayo2016 \
--hive-import \
--hive-table desercion_clientes_2 \
--hive-overwrite \
--query "select ID_INTERNO_PE,MI_FECHA_FIN_MES,COD_NRBE_EN,COD_LINEA,ID_GRP_PD,MI_NUM_TOT_AC_ACT,MI_NUM_AC_SUS,MI_SDO_AC_P,MI_NUM_AC_P,MI_DIA_AC_P,MI_INT_DEV_ACR_D,MI_INT_DEV_DEU_D,MI_COMIS_APL_D,MI_TOT_MOV_D,MI_TOT_MOV_H,MI_TOT_IMP_MOV_D,MI_TOT_IMP_MOV_H from RDWC01.MI_CLTE_ECO_GEN where \$CONDITIONS AND COD_RL_PERS_AC = 01 AND COD_LINEA in ('01','03','04','05') AND COD_NRBE_EN = '3159' AND TRUNC(MI_FECHA_FIN_MES) >=TO_DATE('2010-01-01', 'YYYY-MM-DD')" \
--boundary-query "select min(ID_INTERNO_PE), max(ID_INTERNO_PE) from RDWC01.MI_CLTE_ECO_GEN" \
--incremental append \
--check-column MI_FECHA_FIN_MES \
--last-value $LAST_ROW \
--num-mappers 5 \
--split-by ID_INTERNO_PE \
--direct \
--fetch-size 10000 \
--target-dir /RSI/staging/tmp/desercion_clientes_2
1 ACCEPTED SOLUTION

avatar
Super Guru

As the error message indicates, you can't append to a Hive table yet. The "--incremental append" can be used for HDFS based output tied to a temporary Hive table. You can then insert into your main Hive table from the temporary table. Here is a blog article about the general approach: http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/

As for the performance, I would suggest that you look at Oracle while your query is running to determine if there are any bottlenecks there. I've been able to use Sqoop to pull hundreds of millions of records from Oracle in under a couple of hours easily. I've always found the bottleneck is on the Oracle side. You can try tweaking the --num-mappers value and --fetch-size to see if that improves your speeds.

View solution in original post

3 REPLIES 3

avatar

avatar
Super Guru

This approach definitely works and is much easier to follow.

avatar
Super Guru

As the error message indicates, you can't append to a Hive table yet. The "--incremental append" can be used for HDFS based output tied to a temporary Hive table. You can then insert into your main Hive table from the temporary table. Here is a blog article about the general approach: http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/

As for the performance, I would suggest that you look at Oracle while your query is running to determine if there are any bottlenecks there. I've been able to use Sqoop to pull hundreds of millions of records from Oracle in under a couple of hours easily. I've always found the bottleneck is on the Oracle side. You can try tweaking the --num-mappers value and --fetch-size to see if that improves your speeds.