Created on 02-13-2017 05:08 AM - edited 09-16-2022 04:04 AM
Hi,
I have installed HDP into a RHEL 7 environment with 3 RHEL 7 servers.
I need to import data from a SQL table into a Hive table in the HDP.
Can I get some help as to what the best way is to do this?
thanks
Angelo
Created 02-13-2017 05:11 AM
Easist, Use Apache NiFi, using the UI to move data from any RDBMS to hive table. You can use hive streaming option as well with apache nifi. Full details on how to do this is here:
Also you can use Sqoop. Full details on how to sqoop data from RDMBS to hive/hadoop and back is here
Just as an example, to move entire table from mysql into a hive table named EMPLOYEES:
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES
Or only latest data
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES --where "start_date > '2010-01-01'"
or using a query
sqoop import --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' --split-by a.id --target-dir /user/foo/joinresults
I would by default use NiFi since it is the easiest way to get data into hive/hadoop
Created 02-14-2017 11:13 AM
I just realized after running a sqoop list-databases command that I do not have the JDBC driver for MS SQL installed. How can I get that installed on my RHEL 7?
thanks
angelo
Created 02-15-2017 03:26 PM
@Angelo Alexander please refer to the following doc, also you can download the MySQL driver jar from MySQL website and place it in /usr/hdp/current/sqoop-client/lib http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-movement-and-integration/content/apa...