Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Expert Contributor

Hi all, here a few steps to get a quick example of around sqooping some Oracle Data into HDFS and Hive table working using the Oracle Developer VM and Hortonworks Sandbox. Very simple but may provide some help for people just starting out. I will be using VirtualBox for this walkthrough so I assuming you already have this installed. Also the Oracle VM will require about 2 Gig of memory and Sandbox about 8 Gig, so you will need a machine with decent a amount of memory to give this a try. I used a mac with 16 gig and it ran fine.

Step1: Download Hortonworks Sandbox and import ova into VirtualBox

http://hortonworks.com/products/sandbox/

Step2: Download Oracle Developer VM (this may require setting up an Oracle free account) and import ova into virtualbox.

http://www.oracle.com/technetwork/database/enterprise-edition/databaseappdev-vm-161299.html

Step3: Set up the 2 VMs so they can communicate with each other. There are many options here, but I setup a Nat Network on the second network adapter within both VM’s for this test. Few diagrams below to help. Basically set up new Nat Network in Virtualbox under Vitualbox Preference menu -> select network icon and add new Nat Network (display below – called it DensNetwork). The go into the settings for both VM’s, go to network, click on 2nd adapter and follow diagram below.

VB natnetwork diagram

5015-vbnatnetwork.jpg

VB settings – sandbox

5016-sandboxnetsettings.jpg

VB setting – Oracle VM

5017-oraclenetsettings.jpg

Step4: Fire up the VM’s, open a terminal session, and ssh into the sandbox

ssh -p 2222 root@sandbox.hortonworks.com or ssh -p 2222 root@127.0.0.1 - pw-hadoop

Step5: You can read up a little about Oracle CDB and PDB, will help with understanding the Jdbc connection a little if needed. The Oracle VM database will have SID of orcl12c, and Pluggable DB of orcl, all passwords will be oracle.

http://docs.oracle.com/database/121/CNCPT/cdbovrvw.htm#CNCPT89236

Step6: Sqoop will need the ojdbc6.jar in order to run correctly, I put mine in /usr/share/java/ojdbc6.jar.

Step7: Sqoop list table and Sqoop employees into HDFS.

list system tables

  • 1.sqoop list-tables --connect jdbc:oracle:thin:system/oracle@10.11.12.5:1521:orcl12c

list Pdb tables

  • 2.sqoop list-tables --driver oracle.jdbc.driver.OracleDriver --connect jdbc:oracle:thin:system/oracle@10.11.12.5:1521/orcl --username system --password oracle

import employee table (cleanup, sqoop, check out what you sqooped)

  • 1.hadoop fs -rm -R /user/hive/data/employees/
  • 2.sqoop import --connect jdbc:oracle:thin:system/oracle@10.11.12.5:1521/orcl --username system --password oracle --table HR.EMPLOYEES --target-dir /user/hive/data/employees
  • 3.hadoop fs -ls /user/hive/data/employees
  • 4.hadoop fs -cat /user/hive/data/employees/part-m-00000

Step8: Add and load a hive table using beeline:

  • 1.enter beeline on sandbox terminal window
  • 2.connect - !connect jdbc:hive2://sandbox.hortonworks.com:10000 sandbox pw or !connect jdbc:hive2://127.0.0.1:10000
  • 3.create database HR;
  • 4.USE HR;
  • 5.CREATE TABLE employees (employee_id int, first_name varchar(20), last_name varchar(25), email varchar(25), phone_number varchar(20), hire_date date, job_id varchar(10), salary int, commission_pct int, manager_id int, department_id int) row format delimited fields terminated by ',' lines terminated by '\n' stored as textfile;
  • 6.LOAD DATA INPATH '/user/hive/data/employees/' OVERWRITE INTO TABLE employees;
  • 7.Go ahead and query the table
3,950 Views