- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Load data from SQL table into Hive table
Created on ‎02-13-2017 05:08 AM - edited ‎09-16-2022 04:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have installed HDP into a RHEL 7 environment with 3 RHEL 7 servers.
I need to import data from a SQL table into a Hive table in the HDP.
Can I get some help as to what the best way is to do this?
thanks
Angelo
Created ‎02-13-2017 05:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Easist, Use Apache NiFi, using the UI to move data from any RDBMS to hive table. You can use hive streaming option as well with apache nifi. Full details on how to do this is here:
Also you can use Sqoop. Full details on how to sqoop data from RDMBS to hive/hadoop and back is here
Just as an example, to move entire table from mysql into a hive table named EMPLOYEES:
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES
Or only latest data
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES --where "start_date > '2010-01-01'"
or using a query
sqoop import --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' --split-by a.id --target-dir /user/foo/joinresults
I would by default use NiFi since it is the easiest way to get data into hive/hadoop
Created ‎02-13-2017 05:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Easist, Use Apache NiFi, using the UI to move data from any RDBMS to hive table. You can use hive streaming option as well with apache nifi. Full details on how to do this is here:
Also you can use Sqoop. Full details on how to sqoop data from RDMBS to hive/hadoop and back is here
Just as an example, to move entire table from mysql into a hive table named EMPLOYEES:
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES
Or only latest data
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES --where "start_date > '2010-01-01'"
or using a query
sqoop import --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' --split-by a.id --target-dir /user/foo/joinresults
I would by default use NiFi since it is the easiest way to get data into hive/hadoop
Created ‎02-13-2017 05:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For sql server I have often used this article
Syntax: sqoop import --connect jdbc:sqlserver://sqlserver-name \ --username <username> \ --password <password> \ --driver <driver-manager-class> \ --table <table-name> \ --target-dir <target-folder-name> Sample: sqoop import --connect "jdbc:sqlserver://labDB;database=demo" \ --username sqoop \ --password simp1e \ --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \ --table "dbo.customer" \ --target-dir "/tmp/dbo-customer"
Created ‎02-13-2017 05:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the source table is MS SQL table
Created ‎02-13-2017 11:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
trying that command, getting the following error:
NoViableAltException(26@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java: 1084) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:20 2) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:16 6) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1219) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1260) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1146) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:2 16) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:739 ) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) FAILED: ParseException line 1:0 cannot recognize input near 'sqoop' 'import' '<E OF>'
Created ‎02-13-2017 11:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I issued this command within hive. Looks like I need to issue this command from Linux?
Created ‎02-14-2017 04:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are trying the "sqoop" command, you have to try on linux CLI, where sqoop client should have installed. And proper JDBC driver should be copied to sqoop lib location.
Created ‎02-14-2017 12:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
also the parameter --target-dir "/tmp/dbo-customer" is this the location in Hive, or rather how do I make this import end up in a Hive table of a particular Hive database
Created ‎02-14-2017 05:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are using the "--target-dir" option with sqoop, then it is just a location on HDFS, where your output sits, you can create a external hive table on the target-dir location. Or you can use hive-create-table option with sqoop to directly create a table on hive.
Created ‎02-14-2017 01:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
one more observation; when I issue sqoop import \ it goes to the next line with ">" prompt. Even after entering the next parameters it's still in ">". How do I complete the command and make it executed?
thanks
angelo
