Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Data Transfer method from Rdbms to phoenix

Explorer

Hi all,

We have to ingest very big table from rdbms to phoenix.

According to the following document, It is not possible from rdbms to phoenix by using sqoop. https://community.hortonworks.com/questions/41848/sqoop-import-to-a-phoenix-table.html

Sqoop Jira SQOOP-2649 for sqoop-phoenix integration is addressed on Sqoop 1.4.7.

https://issues.apache.org/jira/browse/SQOOP-2649

In RDBM table there are not only varchar data types but also some numeric and date fields.

As far as I understand, hbase serialization is different from phoenix.

It'is not possible read some numeric data type underlying hbase by using phoenix with JDBC. (Experiement 1)

I have found some JDBC driver Somewhere on the internet and I used this jdbc driver in the following experiments =>

phoenix-4.3.0-clabs-phoenix-1.0.0-SNAPSHOT-client.jar ..... 50,118 KB

I tried two different methods to ingest data from rdbsm to phoenix, but did't not succeed...

Experiement 1.

1.1. Create hbase table and phoenix table/view

1.2.Transfer data rdbms to hbase

1.3.Read hbase table from phoenix

1.1. Create hbase table and phoenix table/view

hbase > create 'ODS.HBASE_TABLE', 'defcolfam'
		CREATE TABLE ODS.HBASE_TABLE (
		  "defcolfam.COL1"  DATE	NOT NULL
		, "defcolfam.COL2"  INTEGER   	NOT NULL
		, "defcolfam.COL3"  SMALLINT	NOT NULL
		, "defcolfam.COL4"  SMALLINT	NOT NULL
		, "defcolfam.COL5"  VARCHAR(8)
		, "defcolfam.COL6"  VARCHAR(3)
		, "defcolfam.COL7"  INTEGER 
		,  CONSTRAINT pk PRIMARY KEY ("defcolfam.COL1","defcolfam.COL2","defcolfam.COL3","defcolfam.COL4")) ;		

1.2. Transfer data rdbms to hbase ,

sqoop import -connect jdbc:oracle:thin:@111.11.1.111:3043:DBNAME -username XXX -password XXXXXX   \
--query "SELECT COL1,COL2,COL3,COL4,COL5,COL6,COL7 FROM RDBMS_TABLE WHERE COL7 ='99995' AND \$CONDITIONS" \
--hbase-table 'ODS.HBASE_TABLE' --column-family 'defcolfam' \
--split-by COL1

1.3.Read hbase table from phoenix

with phoenix jdbc driver or with ./sqlline.py

with phoenix jdbc driver or with  ./sqlline.py

SELECT * FROM ODS.HBASE_TABLE;


0: jdbc:phoenix:localhost> SELECT * FROM ODS.HBASE_TABLE

java.lang.RuntimeException: java.sql.SQLException: ERROR 201 (22000): Illegal data. Expected length of at least 48 bytes, but had 28
        at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
        at sqlline.TableOutputFormat.print(TableOutputFormat.java:33)
        at sqlline.SqlLine.print(SqlLine.java:1653)


Numeric fields look different from hbase shell.

Reason=> HBase and Phoenix encode floats,integer and some other data types differently. https://community.hortonworks.com/questions/15381/created-phoenix-view-to-map-existing-hbase-table-b...

Experiement 2.

2.1 Create HIVE table

2.2. Transfer from data RDBMs to HIVE table,

2.3. Transfer from HIVE table to PHOENIX by using sqoop , with the following phoenix JDBC driver

(phoenix-4.3.0-clabs-phoenix-1.0.0-SNAPSHOT-client.jar ..... 50,118 KB) ,

2.1. Create HIVE table

hive table    definition >
CREATE TABLE ODS.HIVE_TABLE_NAME_ORC (  
	COL1	DATE		   
	,COL2  INT   
	,COL3  SMALLINT	
	,COL4  DOUBLE	
	,COL5  VARCHAR(8)     
	,COL6  VARCHAR(3)     
	,COL7  INT 
) STORED as ORCFILE;

2.2. Transfer from data rdbms to HIVE table

sqoop import -connect jdbc:oracle:thin:@111.11.1.111:3043:DBNAME -username XXX -password XXXXXX   \
--query "SELECT COL1,COL2,COL3,COL4,COL5,COL6,COL7 FROM RDBMS_TABLE_NAME WHERE COL1 = '9995' AND  \$CONDITIONS" \
--split-by COL1 \
--hcatalog-table HIVE_TABLE_NAME_ORC \
--hcatalog-database "ODS" 


2.3. Transfer from HIVE table to PHOENIX by using phoenix JDBC driver

(phoenix-4.3.0-clabs-phoenix-1.0.0-SNAPSHOT-client.jar ..... 50,118 KB)

phoenix table definition >
	CREATE TABLE ODS.PHOENIX_TABLE_NAME
	(						
	 COL1	DATE        	NOT NULL			
	,COL2  	INTEGER		NOT NULL
	,COL3  	SMALLINT	NOT NULL
	,COL4  	SMALLINT	NOT NULL
	,COL5	VARCHAR(8)
	,COL6  	VARCHAR(3)
	,COL7  	INTEGER
 ,CONSTRAINT pk PRIMARY KEY (COL1,COL2,COL3,COL4)
			)
	
sqoop export --connect jdbc:phoenix:lat01bigdatahwdn:2181:/hbase-unsecure --driver org.apache.phoenix.jdbc.PhoenixDriver \
-username none -password none --table  PHOENIX_TABLE_NAME --hcatalog-table HIVE_TABLE_NAME_ORC --hcatalog-database "ODS"
	
sqoop generates following sql and error ;
17/03/13 18:41:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM ODS.PHOENIX_TABLE_NAME AS t WHERE 1=0	
and following error occurs ;
org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001 (42I01): Undefined column family. familyName=T.null
at org.apache.phoenix.schema.PTableImpl.getColumnFamily(PTableImpl.java:747)

	
in ./sqlline command line I got the same error;	
SELECT t.* FROM TB.HAREKET_FROM_HIVE AS t WHERE 1=0
0: jdbc:phoenix:localhost> SELECT t.* FROM TB.HAREKET_FROM_HIVE AS t WHERE 1=0;
Error: ERROR 1001 (42I01): Undefined column family. familyName=T.null (state=42I01,code=1001)
org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001 (42I01): Undefined column family. familyName=T.null



Solution 3. Am i doing something wrong in Exp.1 and Exp 2. ?

Solution 4. Should we use kafka ?

Some producer can read parallel from rdbms like sqoop.

At the same time some consumer threads can read from kafka and upsert to PHOENIX.

Does that make sense?

Solution 5. From rdbms to csv on hdfs by using sqoop and than import csv to phoenix ?

Solution 6. Can be used in some way NiFi and Kafka together ?

Solution 7. Any other solutions ?

What should be the best practice and fastest solution for a such ingestion ?

2 REPLIES 2

Explorer

Hi did you find any solution for that?

Contributor

Bump! I am also looking for a solution

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.