Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Data base migration from mysql to Hive using sqoop

Data base migration from mysql to Hive using sqoop

New Contributor

Dear HWX community ;

I am trying to migrate from mysql data base to hive using sqoop but i need your help to fix some ideas:

1st - create external tables to match with mysql data base : but the problem is the join, constraint , foreign key ? how to deal with it ? in hive shall i just create external tables including attributes or constaints two.

2nd - what is the better way, bring my data from mysql to hdfs and then realise hive view, or automatically injection into hive.

3rd - what are the best practices to do so? and the principal steps?

4rd - Is there a compromise with data volume, ? ( Sqoop etl )

Thank you all !

2 REPLIES 2
Highlighted

Re: Data base migration from mysql to Hive using sqoop

New Contributor

I have searched for the referring document and found the below link:

https://discuss.codecademy.com/t/data-base-migration-from-mysql-to-hive-using-sqoop/228062

Re: Data base migration from mysql to Hive using sqoop

Super Guru

@Radhia BEZZINE

1st - create external tables to match with mysql data base : but the problem is the join, constraint , foreign key ? how to deal with it ? in hive shall i just create external tables including attributes or constaints two.

If you use, HDP 2.6.1, you can have "non validated" primary key, foreign key as well as other constraints. Here is link on how to add those constraints. Since these constraints are not validated you have to ensure the integrity. Hive will not enforce these constraints. You might wonder, then whats the purpose of these constraints. Well, some upstream tools, when connected to Hive, generate better queries if they know about constraints. So these constraints help client tools generate better queries.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable

2nd - what is the better way, bring my data from mysql to hdfs and then realise hive view, or automatically injection into hive.

If you have created a Hive table, then moving data using Sqoop, can do the job in one shot. To import data in ORC format directly which is an efficient format to read, use the following.

--hcatalog-storage-stanza "stored as orcfile"

3rd - what are the best practices to do so? and the principal steps?

Import using Sqoop and in ORC format.

4rd - Is there a compromise with data volume, ? ( Sqoop etl )

No, there shouldn't be but of course you want to verify as there are things that can go wrong outside of Sqoop's control - think network. Sqoop does come with a validation tool. This is simple validation comparing number of records. The validation framework is extensible. You can create your own validation classes if you prefer.

https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#validation