Created 03-14-2017 03:22 PM
Hi all,
We have to ingest very big table from rdbms to phoenix.
According to the following document, It is not possible from rdbms to phoenix by using sqoop. https://community.hortonworks.com/questions/41848/sqoop-import-to-a-phoenix-table.html
Sqoop Jira SQOOP-2649 for sqoop-phoenix integration is addressed on Sqoop 1.4.7.
https://issues.apache.org/jira/browse/SQOOP-2649
In RDBM table there are not only varchar data types but also some numeric and date fields.
As far as I understand, hbase serialization is different from phoenix.
It'is not possible read some numeric data type underlying hbase by using phoenix with JDBC. (Experiement 1)
I have found some JDBC driver Somewhere on the internet and I used this jdbc driver in the following experiments =>
phoenix-4.3.0-clabs-phoenix-1.0.0-SNAPSHOT-client.jar ..... 50,118 KB
I tried two different methods to ingest data from rdbsm to phoenix, but did't not succeed...
Experiement 1.
1.1. Create hbase table and phoenix table/view
1.2.Transfer data rdbms to hbase
1.3.Read hbase table from phoenix
1.1. Create hbase table and phoenix table/view
hbase > create 'ODS.HBASE_TABLE', 'defcolfam' CREATE TABLE ODS.HBASE_TABLE ( "defcolfam.COL1" DATE NOT NULL , "defcolfam.COL2" INTEGER NOT NULL , "defcolfam.COL3" SMALLINT NOT NULL , "defcolfam.COL4" SMALLINT NOT NULL , "defcolfam.COL5" VARCHAR(8) , "defcolfam.COL6" VARCHAR(3) , "defcolfam.COL7" INTEGER , CONSTRAINT pk PRIMARY KEY ("defcolfam.COL1","defcolfam.COL2","defcolfam.COL3","defcolfam.COL4")) ;
1.2. Transfer data rdbms to hbase ,
sqoop import -connect jdbc:oracle:thin:@111.11.1.111:3043:DBNAME -username XXX -password XXXXXX \ --query "SELECT COL1,COL2,COL3,COL4,COL5,COL6,COL7 FROM RDBMS_TABLE WHERE COL7 ='99995' AND \$CONDITIONS" \ --hbase-table 'ODS.HBASE_TABLE' --column-family 'defcolfam' \ --split-by COL1
1.3.Read hbase table from phoenix
with phoenix jdbc driver or with ./sqlline.py
with phoenix jdbc driver or with ./sqlline.py SELECT * FROM ODS.HBASE_TABLE; 0: jdbc:phoenix:localhost> SELECT * FROM ODS.HBASE_TABLE java.lang.RuntimeException: java.sql.SQLException: ERROR 201 (22000): Illegal data. Expected length of at least 48 bytes, but had 28 at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat.print(TableOutputFormat.java:33) at sqlline.SqlLine.print(SqlLine.java:1653)
Numeric fields look different from hbase shell.
Reason=> HBase and Phoenix encode floats,integer and some other data types differently. https://community.hortonworks.com/questions/15381/created-phoenix-view-to-map-existing-hbase-table-b...
Experiement 2.
2.1 Create HIVE table
2.2. Transfer from data RDBMs to HIVE table,
2.3. Transfer from HIVE table to PHOENIX by using sqoop , with the following phoenix JDBC driver
(phoenix-4.3.0-clabs-phoenix-1.0.0-SNAPSHOT-client.jar ..... 50,118 KB) ,
2.1. Create HIVE table
hive table definition > CREATE TABLE ODS.HIVE_TABLE_NAME_ORC ( COL1 DATE ,COL2 INT ,COL3 SMALLINT ,COL4 DOUBLE ,COL5 VARCHAR(8) ,COL6 VARCHAR(3) ,COL7 INT ) STORED as ORCFILE;
2.2. Transfer from data rdbms to HIVE table
sqoop import -connect jdbc:oracle:thin:@111.11.1.111:3043:DBNAME -username XXX -password XXXXXX \ --query "SELECT COL1,COL2,COL3,COL4,COL5,COL6,COL7 FROM RDBMS_TABLE_NAME WHERE COL1 = '9995' AND \$CONDITIONS" \ --split-by COL1 \ --hcatalog-table HIVE_TABLE_NAME_ORC \ --hcatalog-database "ODS"
2.3. Transfer from HIVE table to PHOENIX by using phoenix JDBC driver
(phoenix-4.3.0-clabs-phoenix-1.0.0-SNAPSHOT-client.jar ..... 50,118 KB)
phoenix table definition > CREATE TABLE ODS.PHOENIX_TABLE_NAME ( COL1 DATE NOT NULL ,COL2 INTEGER NOT NULL ,COL3 SMALLINT NOT NULL ,COL4 SMALLINT NOT NULL ,COL5 VARCHAR(8) ,COL6 VARCHAR(3) ,COL7 INTEGER ,CONSTRAINT pk PRIMARY KEY (COL1,COL2,COL3,COL4) ) sqoop export --connect jdbc:phoenix:lat01bigdatahwdn:2181:/hbase-unsecure --driver org.apache.phoenix.jdbc.PhoenixDriver \ -username none -password none --table PHOENIX_TABLE_NAME --hcatalog-table HIVE_TABLE_NAME_ORC --hcatalog-database "ODS" sqoop generates following sql and error ; 17/03/13 18:41:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM ODS.PHOENIX_TABLE_NAME AS t WHERE 1=0 and following error occurs ; org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001 (42I01): Undefined column family. familyName=T.null at org.apache.phoenix.schema.PTableImpl.getColumnFamily(PTableImpl.java:747) in ./sqlline command line I got the same error; SELECT t.* FROM TB.HAREKET_FROM_HIVE AS t WHERE 1=0 0: jdbc:phoenix:localhost> SELECT t.* FROM TB.HAREKET_FROM_HIVE AS t WHERE 1=0; Error: ERROR 1001 (42I01): Undefined column family. familyName=T.null (state=42I01,code=1001) org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001 (42I01): Undefined column family. familyName=T.null
Solution 3. Am i doing something wrong in Exp.1 and Exp 2. ?
Solution 4. Should we use kafka ?
Some producer can read parallel from rdbms like sqoop.
At the same time some consumer threads can read from kafka and upsert to PHOENIX.
Does that make sense?
Solution 5. From rdbms to csv on hdfs by using sqoop and than import csv to phoenix ?
Solution 6. Can be used in some way NiFi and Kafka together ?
Solution 7. Any other solutions ?
What should be the best practice and fastest solution for a such ingestion ?
Created 06-13-2018 09:04 AM
Hi did you find any solution for that?
Created 08-16-2018 04:24 AM
Bump! I am also looking for a solution