Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Sqoop --hive-overwrite doesn't recreate table definition

Contributor

Dear all, perhaps my understanding is incorrect but I'm trying to reload a table using sqoop import, that already exists in hive.

My expectation is that besides overwriting the actual data, also the table will be dropped and recreated. A table structure can of course change and as sqoop cannot detect changes unless you are using the incremental option, it should assume that this will happen - thus dropping and recreating the table with the new structure. Also a change in for instance a delimiter used to separate records or attributes can happen.

Currently the table is not recreated though. The ddl stays untouched and also the the comment when data has been loaded is not updated. Is this a bug or am I missing a command in my sqoop action?

My sqoop action looks like this:

<sqoop xmlns="uri:oozie:sqoop-action:0.4">
  <arg>import</arg>
  <arg>--connect</arg>
  <arg>jdbc:sap://[server]:30015/?currentschema=[schema]</arg>
  <arg>--username</arg>
  <arg>[username]</arg>
  <arg>--password-file</arg>
  <arg>[pwd_location]</arg>
  <arg>--driver</arg>
  <arg>com.sap.db.jdbc.Driver</arg>
  <arg>--query</arg>
  <arg>SELECT * FROM "ZDLTABLES" WHERE $CONDITIONS</arg>
  <arg>--hive-import</arg>
  <arg>--hive-database</arg>
  <arg>[hive_schema]</arg>
  <arg>--hive-table</arg>
  <arg>zdltables</arg>
  <arg>--hive-delims-replacement</arg>
  <arg>\040</arg>
  <arg>--fields-terminated-by</arg>
  <arg>\037</arg>
  <arg>--hive-overwrite</arg>
  <arg>--compress</arg>
  <arg>--num-mappers</arg>
  <arg>1</arg>
  <name-node>[name_node]</name-node>
  <job-tracker>[job_tracker]</job-tracker>
    <property xmlns="">
      <name>oozie.launcher.mapred.job.queue.name</name>
      <value>default</value>
      <source>programatically</source>
    </property>
    <property xmlns="">
      <name>mapreduce.job.queuename</name>
      <value>default</value>
      <source>programatically</source>
    </property>
  </configuration>
</sqoop>
5 REPLIES 5

@Rene Sluiter

If you have multiple Hive installations or hive is not in your $PATH, use the --hive-home option to identify the Hive installation directory.

Refer this guide as well:

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive

Contributor

Thanks rbiswas, but I'm not sure how this helps. Sqoop creates a table fine the first time. It just doesn't recreate the table with the new tabledefs and updated timestamp from when the data was loaded (as a table comment).

My question is if sqoop should recreate a table when using the --hive-overwrite function or if it is supposed to overwrite only the data.

@Rene Sluiter

This what will happen in the background, hence I asked about hive home.

If the Hive table already exists, you can specify the --hive-overwrite option to indicate that existing table in hive must be replaced. After your data is imported into HDFS or this step is omitted, Sqoop will generate a Hive script containing a CREATE TABLE operation defining your columns using Hive’s types, and a LOAD DATA INPATHstatement to move the data files into Hive’s warehouse directory.

The script will be executed by calling the installed copy of hive on the machine where Sqoop is run. If you have multiple Hive installations, or hive is not in your $PATH, use the --hive-home option to identify the Hive installation directory. Sqoop will use $HIVE_HOME/bin/hive from here.

Thanks

Contributor

so if I have multiple hive installations (not sure how to check, but I can ask someone who should know this) than by adding the --hive-home to my sqoop script the table should be overwritten? So with new delimiter/columns and comment when the table was created?

Because the documentation only mentions that a create table operation is created but I'm missing the reference there that the table will be dropped first if it exists.

New Contributor

This is definitely a bug. Seems like the table definition doesn't change with --hive-overwrite flag afterall.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.