Support Questions

Find answers, ask questions, and share your expertise

Load data from SQL table into Hive table

avatar
Explorer

Hi,

I have installed HDP into a RHEL 7 environment with 3 RHEL 7 servers.

I need to import data from a SQL table into a Hive table in the HDP.

Can I get some help as to what the best way is to do this?

thanks

Angelo

1 ACCEPTED SOLUTION

avatar
Master Guru

Easist, Use Apache NiFi, using the UI to move data from any RDBMS to hive table. You can use hive streaming option as well with apache nifi. Full details on how to do this is here:

https://community.hortonworks.com/articles/45706/using-the-new-hiveql-processors-in-apache-nifi-070....

Also you can use Sqoop. Full details on how to sqoop data from RDMBS to hive/hadoop and back is here

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/using_sqoop_to_mo...

Just as an example, to move entire table from mysql into a hive table named EMPLOYEES:

  • sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES

Or only latest data

sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES  --where "start_date > '2010-01-01'"

or using a query

sqoop import --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id)
WHERE $CONDITIONS' --split-by a.id --target-dir /user/foo/joinresults

I would by default use NiFi since it is the easiest way to get data into hive/hadoop

View solution in original post

11 REPLIES 11

avatar
Master Guru

Easist, Use Apache NiFi, using the UI to move data from any RDBMS to hive table. You can use hive streaming option as well with apache nifi. Full details on how to do this is here:

https://community.hortonworks.com/articles/45706/using-the-new-hiveql-processors-in-apache-nifi-070....

Also you can use Sqoop. Full details on how to sqoop data from RDMBS to hive/hadoop and back is here

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/using_sqoop_to_mo...

Just as an example, to move entire table from mysql into a hive table named EMPLOYEES:

  • sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES

Or only latest data

sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES  --where "start_date > '2010-01-01'"

or using a query

sqoop import --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id)
WHERE $CONDITIONS' --split-by a.id --target-dir /user/foo/joinresults

I would by default use NiFi since it is the easiest way to get data into hive/hadoop

avatar
Master Guru

For sql server I have often used this article

https://danieladeniji.wordpress.com/2013/05/06/hadoop-sqoop-importing-data-from-microsoft-sql-server...

Syntax:
      sqoop import --connect jdbc:sqlserver://sqlserver-name \
          --username <username> \
          --password <password> \
          --driver   <driver-manager-class> \
          --table    <table-name> \
          --target-dir  <target-folder-name>

Sample: 
      sqoop import --connect "jdbc:sqlserver://labDB;database=demo" \
           --username sqoop  \
           --password simp1e \
           --driver   com.microsoft.sqlserver.jdbc.SQLServerDriver \
           --table    "dbo.customer" \
           --target-dir  "/tmp/dbo-customer"
          


avatar
Explorer

the source table is MS SQL table

avatar
Explorer

trying that command, getting the following error:

NoViableAltException(26@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java: 1084) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:20 2) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:16 6) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1219) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1260) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1146) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:2 16) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:739 ) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) FAILED: ParseException line 1:0 cannot recognize input near 'sqoop' 'import' '<E OF>'

avatar
Explorer

I issued this command within hive. Looks like I need to issue this command from Linux?

avatar
Expert Contributor
@Angelo Alexander

If you are trying the "sqoop" command, you have to try on linux CLI, where sqoop client should have installed. And proper JDBC driver should be copied to sqoop lib location.

avatar
Explorer

also the parameter --target-dir "/tmp/dbo-customer" is this the location in Hive, or rather how do I make this import end up in a Hive table of a particular Hive database

avatar
Expert Contributor

@Angelo Alexander

If you are using the "--target-dir" option with sqoop, then it is just a location on HDFS, where your output sits, you can create a external hive table on the target-dir location. Or you can use hive-create-table option with sqoop to directly create a table on hive.

avatar
Explorer

one more observation; when I issue sqoop import \ it goes to the next line with ">" prompt. Even after entering the next parameters it's still in ">". How do I complete the command and make it executed?

thanks

angelo