Created 07-18-2017 09:51 AM
Dear all, perhaps my understanding is incorrect but I'm trying to reload a table using sqoop import, that already exists in hive.
My expectation is that besides overwriting the actual data, also the table will be dropped and recreated. A table structure can of course change and as sqoop cannot detect changes unless you are using the incremental option, it should assume that this will happen - thus dropping and recreating the table with the new structure. Also a change in for instance a delimiter used to separate records or attributes can happen.
Currently the table is not recreated though. The ddl stays untouched and also the the comment when data has been loaded is not updated. Is this a bug or am I missing a command in my sqoop action?
My sqoop action looks like this:
<sqoop xmlns="uri:oozie:sqoop-action:0.4"> <arg>import</arg> <arg>--connect</arg> <arg>jdbc:sap://[server]:30015/?currentschema=[schema]</arg> <arg>--username</arg> <arg>[username]</arg> <arg>--password-file</arg> <arg>[pwd_location]</arg> <arg>--driver</arg> <arg>com.sap.db.jdbc.Driver</arg> <arg>--query</arg> <arg>SELECT * FROM "ZDLTABLES" WHERE $CONDITIONS</arg> <arg>--hive-import</arg> <arg>--hive-database</arg> <arg>[hive_schema]</arg> <arg>--hive-table</arg> <arg>zdltables</arg> <arg>--hive-delims-replacement</arg> <arg>\040</arg> <arg>--fields-terminated-by</arg> <arg>\037</arg> <arg>--hive-overwrite</arg> <arg>--compress</arg> <arg>--num-mappers</arg> <arg>1</arg> <name-node>[name_node]</name-node> <job-tracker>[job_tracker]</job-tracker> <property xmlns=""> <name>oozie.launcher.mapred.job.queue.name</name> <value>default</value> <source>programatically</source> </property> <property xmlns=""> <name>mapreduce.job.queuename</name> <value>default</value> <source>programatically</source> </property> </configuration> </sqoop>
Created 07-18-2017 04:16 PM
If you have multiple Hive installations or hive is not in your $PATH, use the --hive-home option to identify the Hive installation directory.
Refer this guide as well:
https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive
Created 07-19-2017 08:35 AM
Thanks rbiswas, but I'm not sure how this helps. Sqoop creates a table fine the first time. It just doesn't recreate the table with the new tabledefs and updated timestamp from when the data was loaded (as a table comment).
My question is if sqoop should recreate a table when using the --hive-overwrite function or if it is supposed to overwrite only the data.
Created 07-19-2017 04:37 PM
This what will happen in the background, hence I asked about hive home.
If the Hive table already exists, you can specify the --hive-overwrite
option to indicate that existing table in hive must be replaced. After your data is imported into HDFS or this step is omitted, Sqoop will generate a Hive script containing a CREATE TABLE
operation defining your columns using Hive’s types, and a LOAD DATA INPATH
statement to move the data files into Hive’s warehouse directory.
The script will be executed by calling the installed copy of hive on the machine where Sqoop is run. If you have multiple Hive installations, or hive
is not in your $PATH
, use the --hive-home
option to identify the Hive installation directory. Sqoop will use $HIVE_HOME/bin/hive
from here.
Thanks
Created 07-21-2017 07:33 AM
so if I have multiple hive installations (not sure how to check, but I can ask someone who should know this) than by adding the --hive-home to my sqoop script the table should be overwritten? So with new delimiter/columns and comment when the table was created?
Because the documentation only mentions that a create table operation is created but I'm missing the reference there that the table will be dropped first if it exists.
Created 07-12-2018 01:23 PM
This is definitely a bug. Seems like the table definition doesn't change with --hive-overwrite flag afterall.