I am working on a project where we need to put data in Hive using Sqoop in HDP 3.0. That data is currently available on MySQL workbench which is running out side of HDP 3.0.
Could you please let me know the steps/tutorial/example so that I can achieve it.
Please note - I did not do any setup of Sqoop in HDP 3.0, not sure I need to install in Sqoop in HDP 3.0.
You need to install sqoop in your cluster. So the import sqoop provides the tool to import data from you DB to HDFS. Please make sure you have the MySQL JDBC driver setup in the cluster so it can be used with sqoop.
sqoop import \ --connect jdbc:mysql://mysql.database_name/schema_database \ --username sqoopuser \ --password sqooppwd \ --table cities
sqoop import \ --connect jdbc:mysql://mysql_localhost/schema_database \ --username sqoopuser \ --password-file /home/sqoop/sqoop.password \ --table cities \ --as-avrodatafile \ --target-dir /etl/input/cities-avro
Thanks @kwabstian53 for reply,
In my case target database will be hive, could you please also guide me what will the driver name and any documentation for setting up that driver in HDP. I've referred link "http://www.sql-workbench.net/manual/jdbc-setup.html" but not sure which driver to be used.
since you have specified the target location in hdfs, you can create a table to read the data from the hdfs location. Also you can import to infer schema from the DB.
sqoop import --connect jdbc:mysql://mysql_localhost/schema_database \ --username sqoop \ --table visits --password '*******' --hive-import --create-hive-table --hive-table Hive_database.visits -m 1