Hi all
Environment is HDP 2.4 .
Sqoop transfers all rows from rdbms to hdfs approx. in 1 min.
CsvBulkLoadTool upsert all rows into table in 9 min.
There is no secondary index.
csv import path ;
hdfs dfs -ls /user/xxsqoop/TABLE1_NC2
Found 5 items
-rw-r--r-- 3 bigdata hdfs 0 2017-04-03 19:13 /user/xxsqoop/TABLE1_NC2/_SUCCESS
-rw-r--r-- 3 bigdata hdfs 365,036,758 2017-04-03 19:13 /user/xxsqoop/TABLE1_NC2/part-m-00000
-rw-r--r-- 3 bigdata hdfs 188,504,177 2017-04-03 19:12 /user/xxsqoop/TABLE1_NC2/part-m-00001
-rw-r--r-- 3 bigdata hdfs 340,190,219 2017-04-03 19:13 /user/xxsqoop/TABLE1_NC2/part-m-00002
-rw-r--r-- 3 bigdata hdfs 256,850,726 2017-04-03 19:13 /user/xxsqoop/TABLE1_NC2/part-m-00003
phoenix import command ;
HADOOP_CLASSPATH=/etc/hbase/conf:$(hbase mapredcp) hadoop jar
/usr/hdp/2.4.0.0-169/phoenix/phoenix-4.4.0.2.4.0.0-169-client.jar
org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dmapreduce.job.reduces=4
--table TB.TABLE1 --input /user/xxsqoop/TABLE1_NC2 --zookeeper lat01bigdatahwdn:2181:/hbase-unsecure --delimiter '^A'
...
...
17/04/03 19:24:08 INFO mapreduce.HFileOutputFormat2: Configuring 1 reduce partitions to match current region count
phoenix TABLE ddl
CREATE TABLE TB.TABLE1
(
COL1 DATE NOT NULL
,COL2 SMALLINT NOT NULL
,COL3 INTEGER NOT NULL
,COL4 SMALLINT NOT NULL
,COL5 VARCHAR(8)
...
...
,CONSTRAINT pk PRIMARY KEY (COL1,COL2,COL3,COL4)
)
DATA_BLOCK_ENCODING='FAST_DIFF', TTL=604800, COMPRESSION='SNAPPY';
CsvBulkLoadTool takes always 1 reducer and it makes csv import very slow.
I thought importer would get 4 reducer jobs (-Dmapreduce.job.reduces=4)