Code Repositories

Find and share code repositories
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Rising Star
Repo Description

This zeppelin dashboard demonstrates how to map all the data types from HAWQ to HIVE using sqoop. It uses postgress sql to create the HAWQ table and fills in 1 col for every data type. The more significant piece shown here is how to map the data types that differ from HAWQ to HIVE. For exmaple a boolean column in HAWQ exports as t or f but that is not compatible with HIVE. Using postgress and sqoop this converts to TRUE and FALSE which is accepted by HIVE.

Repo Info
Github Repo URL https://github.com/kirkhas/zeppelin-notebooks/tree/master/HAWQ-Sqoop
Github account name kirkhas/zeppelin-notebooks/tree/master
Repo name HAWQ-Sqoop
1,891 Views
Comments
avatar
Contributor

I would not recommend using Sqoop to pull data out of HAWQ. It will work but it won't scale so it will be slow when working with any sizable amount of data. The much better way to handle this is to leverage PXF to write data from HAWQ directly to HDFS in parallel. You are also working with SQL so it is will be very easy to transform the data into a format that other tools can consume. Below, I handled the boolean column.

--HAWQ

create table kirk ( ID INT, NAME TEXT NOT NULL, AGE INT NOT NULL, ADDRESS CHAR(50), SALARY REAL, JOIN_DATE DATE, PRICE NUMERIC, a BIT(3), b BIT VARYING(5), i interval, ts timestamp, dec decimal(32,2), doub double precision, cv character varying, t time, c char(7), v varchar(20), isactive boolean ) distributed randomly;

INSERT INTO kirk (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE, PRICE, a,b, i, ts,dec,doub,cv,t,c,v, isactive) VALUES (4, 'Mark', 25, '12 Rich-Mond ', 65000.00, '2007-12-13', 12.44, B'101',B'100', '335','2016-02-22 08:01:23.000',33.3333,0.009,'sdfsadf', '08:01:23.000', 'we ', 'varchar', '0'), (5, 'David', 27, 'Texas', 85000.00 , '2007-12-13', 45.44,B'101',B'100','775', '2016-02-22 08:01:23.000',33.5555, 0.008, 'white space ','08:01:23.000', 'we', 'varchar ', 'true' );

create writable external table ext_kirk ( ID INT, NAME TEXT, AGE INT, ADDRESS CHAR(50), SALARY REAL, JOIN_DATE DATE, PRICE NUMERIC, a BIT(3), b BIT VARYING(5), i interval, ts timestamp, dec decimal(32,2), doub double precision, cv character varying, t time, c char(7), v varchar(20), isactive varchar(5) ) location ('pxf://hdb:51200/kirk?Profile=HdfsTextSimple') format 'text' (delimiter = '|') distributed randomly;

insert into ext_kirk (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE, PRICE, a,b, i, ts,dec,doub,cv,t,c,v, isactive) select ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE, PRICE, a,b, i, ts,dec,doub,cv,t,c,v, case when isactive then 'TRUE' else 'FALSE' end from kirk;

avatar
Contributor

I also had to fix the permissions in HDFS as the user pxf. (hdfs dfs -chmod -R 777 /kirk)

--hive

create table ext_kirk ( ID INT, NAME VARCHAR(65355), AGE INT, ADDRESS CHAR(50), SALARY decimal, JOIN_DATE DATE, PRICE decimal, a char(3), b varchar(5), i varchar(100), ts timestamp, dec decimal(32,2), doub decimal, cv varchar(65355), t varchar(15), c char(7), v varchar(20), isactive boolean ) row format delimited fields terminated by '|' location '/kirk';

beeline -u jdbc:hive2://hdb:10000 -n gpadmin -d org.apache.hive.jdbc.HiveDriver -e "select * from ext_kirk"

WARNING: Use "yarn jar" to launch YARN applications. Connecting to jdbc:hive2://hdb:10000 Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258) Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258) Transaction isolation: TRANSACTION_REPEATABLE_READ +--------------+----------------+---------------+-----------------------------------------------------+------------------+---------------------+-----------------+-------------+-------------+-------------+------------------------+---------------+----------------+---------------+-------------+-------------+-------------+--------------------+--+ | ext_kirk.id | ext_kirk.name | ext_kirk.age | ext_kirk.address | ext_kirk.salary | ext_kirk.join_date | ext_kirk.price | ext_kirk.a | ext_kirk.b | ext_kirk.i | ext_kirk.ts | ext_kirk.dec | ext_kirk.doub | ext_kirk.cv | ext_kirk.t | ext_kirk.c | ext_kirk.v | ext_kirk.isactive | +--------------+----------------+---------------+-----------------------------------------------------+------------------+---------------------+-----------------+-------------+-------------+-------------+------------------------+---------------+----------------+---------------+-------------+-------------+-------------+--------------------+--+ | 4 | Mark | 25 | 12 Rich-Mond | 65000 | 2007-12-13 | 12 | 101 | 100 | 00:05:35 | 2016-02-22 08:01:23.0 | 33.33 | 0 | sdfsadf | 08:01:23 | we | varchar | false | | 5 | David | 27 | Texas | 85000 | 2007-12-13 | 45 | 101 | 100 | 00:12:55 | 2016-02-22 08:01:23.0 | 33.56 | 0 | white space | 08:01:23 | we | varchar | true | +--------------+----------------+---------------+-----------------------------------------------------+------------------+---------------------+-----------------+-------------+-------------+-------------+------------------------+---------------+----------------+---------------+-------------+-------------+-------------+--------------------+--+ 2 rows selected (0.179 seconds) Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive Closing: 0: jdbc:hive2://hdb:10000