Created on 07-09-2016 01:29 AM
This zeppelin dashboard demonstrates how to map all the data types from HAWQ to HIVE using sqoop. It uses postgress sql to create the HAWQ table and fills in 1 col for every data type. The more significant piece shown here is how to map the data types that differ from HAWQ to HIVE. For exmaple a boolean column in HAWQ exports as t or f but that is not compatible with HIVE. Using postgress and sqoop this converts to TRUE and FALSE which is accepted by HIVE.
Repo InfoCreated on 07-11-2016 02:48 PM
I would not recommend using Sqoop to pull data out of HAWQ. It will work but it won't scale so it will be slow when working with any sizable amount of data. The much better way to handle this is to leverage PXF to write data from HAWQ directly to HDFS in parallel. You are also working with SQL so it is will be very easy to transform the data into a format that other tools can consume. Below, I handled the boolean column.
--HAWQ
create table kirk ( ID INT, NAME TEXT NOT NULL, AGE INT NOT NULL, ADDRESS CHAR(50), SALARY REAL, JOIN_DATE DATE, PRICE NUMERIC, a BIT(3), b BIT VARYING(5), i interval, ts timestamp, dec decimal(32,2), doub double precision, cv character varying, t time, c char(7), v varchar(20), isactive boolean ) distributed randomly;
INSERT INTO kirk (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE, PRICE, a,b, i, ts,dec,doub,cv,t,c,v, isactive) VALUES (4, 'Mark', 25, '12 Rich-Mond ', 65000.00, '2007-12-13', 12.44, B'101',B'100', '335','2016-02-22 08:01:23.000',33.3333,0.009,'sdfsadf', '08:01:23.000', 'we ', 'varchar', '0'), (5, 'David', 27, 'Texas', 85000.00 , '2007-12-13', 45.44,B'101',B'100','775', '2016-02-22 08:01:23.000',33.5555, 0.008, 'white space ','08:01:23.000', 'we', 'varchar ', 'true' );
create writable external table ext_kirk ( ID INT, NAME TEXT, AGE INT, ADDRESS CHAR(50), SALARY REAL, JOIN_DATE DATE, PRICE NUMERIC, a BIT(3), b BIT VARYING(5), i interval, ts timestamp, dec decimal(32,2), doub double precision, cv character varying, t time, c char(7), v varchar(20), isactive varchar(5) ) location ('pxf://hdb:51200/kirk?Profile=HdfsTextSimple') format 'text' (delimiter = '|') distributed randomly;
insert into ext_kirk (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE, PRICE, a,b, i, ts,dec,doub,cv,t,c,v, isactive) select ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE, PRICE, a,b, i, ts,dec,doub,cv,t,c,v, case when isactive then 'TRUE' else 'FALSE' end from kirk;
Created on 07-11-2016 02:56 PM
I also had to fix the permissions in HDFS as the user pxf. (hdfs dfs -chmod -R 777 /kirk)
--hive
create table ext_kirk ( ID INT, NAME VARCHAR(65355), AGE INT, ADDRESS CHAR(50), SALARY decimal, JOIN_DATE DATE, PRICE decimal, a char(3), b varchar(5), i varchar(100), ts timestamp, dec decimal(32,2), doub decimal, cv varchar(65355), t varchar(15), c char(7), v varchar(20), isactive boolean ) row format delimited fields terminated by '|' location '/kirk';
beeline -u jdbc:hive2://hdb:10000 -n gpadmin -d org.apache.hive.jdbc.HiveDriver -e "select * from ext_kirk"
WARNING: Use "yarn jar" to launch YARN applications. Connecting to jdbc:hive2://hdb:10000 Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258) Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258) Transaction isolation: TRANSACTION_REPEATABLE_READ +--------------+----------------+---------------+-----------------------------------------------------+------------------+---------------------+-----------------+-------------+-------------+-------------+------------------------+---------------+----------------+---------------+-------------+-------------+-------------+--------------------+--+ | ext_kirk.id | ext_kirk.name | ext_kirk.age | ext_kirk.address | ext_kirk.salary | ext_kirk.join_date | ext_kirk.price | ext_kirk.a | ext_kirk.b | ext_kirk.i | ext_kirk.ts | ext_kirk.dec | ext_kirk.doub | ext_kirk.cv | ext_kirk.t | ext_kirk.c | ext_kirk.v | ext_kirk.isactive | +--------------+----------------+---------------+-----------------------------------------------------+------------------+---------------------+-----------------+-------------+-------------+-------------+------------------------+---------------+----------------+---------------+-------------+-------------+-------------+--------------------+--+ | 4 | Mark | 25 | 12 Rich-Mond | 65000 | 2007-12-13 | 12 | 101 | 100 | 00:05:35 | 2016-02-22 08:01:23.0 | 33.33 | 0 | sdfsadf | 08:01:23 | we | varchar | false | | 5 | David | 27 | Texas | 85000 | 2007-12-13 | 45 | 101 | 100 | 00:12:55 | 2016-02-22 08:01:23.0 | 33.56 | 0 | white space | 08:01:23 | we | varchar | true | +--------------+----------------+---------------+-----------------------------------------------------+------------------+---------------------+-----------------+-------------+-------------+-------------+------------------------+---------------+----------------+---------------+-------------+-------------+-------------+--------------------+--+ 2 rows selected (0.179 seconds) Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive Closing: 0: jdbc:hive2://hdb:10000