Created 01-12-2018 01:23 AM
My Sqoop import appears to be failing at the step where it copies the data from the hdfs INPATH to the Hive table the data is being imported to. I pasted the last snippet of the output below. When I test the status code from the Sqoop command it returns a 1. I suspect it is something with my Linux shell script, because I have other versions of the script that work fine. I have not been able to debug this even though I have turned on --verbose option in Sqoop and examined log files from YARN. I am pretty sure the error has something to do with the data not being correctly transferred from the hdfs directory where the files are imported to the hdfs directory associated with the managed table (name shown below) but I can't find any error messages that point me to the solution. Any ideas how to debug this? 18/01/12 00:54:53 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://surus-nameservice/data/groups/hdp_ground/sqoop/offload_scan_detail_staging_test' OVERWRITE INTO TABLE `hdp_ground.offload_scan_detail_staging_test` 18/01/12 00:54:53 INFO hive.HiveImport: Loading uploaded data into Hive 18/01/12 00:54:53 DEBUG hive.HiveImport: Using in-process Hive instance. 18/01/12 00:54:53 DEBUG util.SubprocessSecurityManager: Installing subprocess security manager Logging initialized using configuration in jar:file:/usr/hdp/2.4.2.0-258/hive/lib/hive-common-1.2.1000.2.4.2.0-258.jar!/hive-log4j.properties
Created 01-12-2018 04:46 AM
Can you attach sqoop log and hive service logs?
Created 01-12-2018 05:30 PM
Created 01-15-2018 09:21 AM
Logs are hiveserver2.log and hivemetastore.log.
I'm gonna ask you some questions.
First, check whether existing directory of hive table or not in HDFS.
As far as I know, do not use param "--hive-overwrite" with "--delete-target-dir".
Just remove "--hive-overwrite", and re-run your query.
The sqoop job cannot re-create hive table's directory in HDFS normally after removed existing table directory.
I think, it is a bug..
Second, check the type matching of columns between Oracle Table and Hive Table.
* had to be cast to a less precise type in Hive <- this issue is related with supporting hive datatypes from Oracle, MySQL, PostgreSQL..etc.
Such as "Oracle Table : Timestamp - Hive Table - String.