question Re: Best practices to work Sqoop, HDFS and Hive in Support Questions

Best practices to work Sqoop, HDFS and Hive

sola_carol — Thu, 02 Mar 2017 17:11:28 GMT

I have to use sqoop to import all tables from a mysql database to hdfs and to external tablesin hive (no filters, with the same structure)

In import I want to bring:

New data for existing tables
Updated data for existing tables (using only the id column)
New tables created in mysql (y to create external table in hive)

Then create a sqoop job to do it all automatically.

(I have a mysql database with approximately 60 tables, and with each new client going into production, a new table is created. So I need sqoop to work as automatically as possible)

The first command executed to import all the tables was:

sqoop import-all-tables

--connect jdbc:mysql://IP/db_name

--username user

--password pass

--warehouse-dir /user/hdfs/db_name

-m 1

Here Scoop and support for external Hive tables says that support was added for the creation of external tables in hive, but I did not find documentation or examples on the mentioned commands

What are the best practices to work with in sqoop where it looks at all the updates from a mysql database and passes to hdfs and hive?

Any ideas would be good.

Thanks in advance.

Re: Best practices to work Sqoop, HDFS and Hive

aervits — Thu, 02 Mar 2017 20:58:50 GMT

Here's a good article for that https://community.hortonworks.com/articles/85165/scheduled-incremental-ingestion-of-ms-sql-data-to.html

Re: Best practices to work Sqoop, HDFS and Hive

umair_khan — Fri, 03 Mar 2017 01:27:11 GMT

The patch to create Hive external tables from Sqoop is still unresolved:

https://issues.apache.org/jira/browse/SQOOP-816

Unfortunately you will not be able to pull updates from source tables using only id column. You will need a timestamped (last modified) column for Sqoop to know which rows were updated. So the best practice is rather at your database side where it is always best to keep columns like 'modified', 'modified by' in your tables.

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports