Created on 02-13-2017 05:08 AM - edited 09-16-2022 04:04 AM
Hi,
I have installed HDP into a RHEL 7 environment with 3 RHEL 7 servers.
I need to import data from a SQL table into a Hive table in the HDP.
Can I get some help as to what the best way is to do this?
thanks
Angelo
Created 02-13-2017 05:11 AM
Easist, Use Apache NiFi, using the UI to move data from any RDBMS to hive table. You can use hive streaming option as well with apache nifi. Full details on how to do this is here:
Also you can use Sqoop. Full details on how to sqoop data from RDMBS to hive/hadoop and back is here
Just as an example, to move entire table from mysql into a hive table named EMPLOYEES:
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES
Or only latest data
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES --where "start_date > '2010-01-01'"
or using a query
sqoop import --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' --split-by a.id --target-dir /user/foo/joinresults
I would by default use NiFi since it is the easiest way to get data into hive/hadoop
Created 02-13-2017 05:11 AM
Easist, Use Apache NiFi, using the UI to move data from any RDBMS to hive table. You can use hive streaming option as well with apache nifi. Full details on how to do this is here:
Also you can use Sqoop. Full details on how to sqoop data from RDMBS to hive/hadoop and back is here
Just as an example, to move entire table from mysql into a hive table named EMPLOYEES:
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES
Or only latest data
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES --where "start_date > '2010-01-01'"
or using a query
sqoop import --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' --split-by a.id --target-dir /user/foo/joinresults
I would by default use NiFi since it is the easiest way to get data into hive/hadoop
Created 02-13-2017 05:20 AM
For sql server I have often used this article
Syntax: sqoop import --connect jdbc:sqlserver://sqlserver-name \ --username <username> \ --password <password> \ --driver <driver-manager-class> \ --table <table-name> \ --target-dir <target-folder-name> Sample: sqoop import --connect "jdbc:sqlserver://labDB;database=demo" \ --username sqoop \ --password simp1e \ --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \ --table "dbo.customer" \ --target-dir "/tmp/dbo-customer"
Created 02-13-2017 05:15 AM
the source table is MS SQL table
Created 02-13-2017 11:40 PM
trying that command, getting the following error:
NoViableAltException(26@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java: 1084) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:20 2) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:16 6) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1219) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1260) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1146) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:2 16) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:739 ) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) FAILED: ParseException line 1:0 cannot recognize input near 'sqoop' 'import' '<E OF>'
Created 02-13-2017 11:54 PM
I issued this command within hive. Looks like I need to issue this command from Linux?
Created 02-14-2017 04:56 AM
If you are trying the "sqoop" command, you have to try on linux CLI, where sqoop client should have installed. And proper JDBC driver should be copied to sqoop lib location.
Created 02-14-2017 12:56 AM
also the parameter --target-dir "/tmp/dbo-customer" is this the location in Hive, or rather how do I make this import end up in a Hive table of a particular Hive database
Created 02-14-2017 05:03 AM
If you are using the "--target-dir" option with sqoop, then it is just a location on HDFS, where your output sits, you can create a external hive table on the target-dir location. Or you can use hive-create-table option with sqoop to directly create a table on hive.
Created 02-14-2017 01:26 AM
one more observation; when I issue sqoop import \ it goes to the next line with ">" prompt. Even after entering the next parameters it's still in ">". How do I complete the command and make it executed?
thanks
angelo