01-13-2016 01:03 PM
The HIVE performance in CDH 5.1.0 is more faster than hive in CDH5.5.1. recently i have migrated datacenter, so currently i have two env in different version. the same hive SQL is fast in 5.1.0, but it's too slow in CDH 5.5.1. it's not just a few statement has this issue, maybe one-third is slower. as i want to say, CDH 5.5.1 is not suitable to production env if your have a lot of work in hive.
01-13-2016 01:07 PM
i am so sad right now, many people complain the HIVE performance, i don't how to get resolved, i am thinking about re-install CD5.1.0, but no much time to do it.
i have checked the SQL plan, some SQL plan is the same, but it's still slower. the hardware most the same, 24cores, 4Tdisk, 48G memory.
the only one big different is old env is 300G disk.
01-13-2016 01:30 PM
Hi, could you give some extra information to see why Hive is slower?
What is the query you're benchmarking between both versions?
What table format are you using?
What's the size of the table you're reading?
Do you have a partitioned table? If so, how many partitions do you have?
Are there any other query running at the same time while running your benchmark?
Are you using SSD or HDD on the nodes?
Is the same CPU/RAM used in both versions?
I will try to help finding the botteneck on this performance issue.
01-14-2016 07:03 AM
here i am going to introduce one specific issue to you.
the SQL is:
select distinct case when a.type=1 then A.User_Id else null end as ff_user from bi.cx_mrs_account_detail a where substr(A.Trade_Date ,1,10 )<= '2016-01-13' ;
it executes 10-20 s in CDH 5.1.0, but it will execute near 30-50mins in CDH 5.5.1
0 CREATE TABLE `bi.cx_mrs_account_detail`( 1 `id` double, 2 `user_id` double, 3 `balance_id` double, 4 `consume_id` double, 5 `type` int, 6 `agency_no` string, 7 `channel_no` string, 8 `channel_source` string, 9 `trade_amount` double, 10 `pay_type` double, 11 `trade_date` string, 12 `before_balance` double, 13 `after_balance` double, 14 `ref_id` string, 15 `trade_content` string, 16 `transaction_id` double, 17 `charge_time` string, 18 `freeze_status` double, 19 `status` double) 20 ROW FORMAT SERDE 21 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 22 STORED AS INPUTFORMAT 23 'org.apache.hadoop.mapred.TextInputFormat' 24 OUTPUTFORMAT 25 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 26 LOCATION 27 'hdfs://nameservice1/user/hive/warehouse/bi.db/cx_mrs_account_detail' 28 TBLPROPERTIES ( 29 'COLUMN_STATS_ACCURATE'='true', 30 'last_modified_by'='bi', 31 'last_modified_time'='1452670798', 32 'numFiles'='4', 33 'numRows'='-1', 34 'rawDataSize'='-1', 35 'totalSize'='475818612', 36 'transient_lastDdlTime'='1452670798')
table records: 2668305 table size: 450M(4 files, two file 200M, two file 20M) map and reduce: map 4 reducee 1 explain also is the same: 0 STAGE DEPENDENCIES: 1 Stage-1 is a root stage 2 Stage-0 depends on stages: Stage-1
hardware mostly i can say it's the same. this is just one example. i have many cases can privode to you.
there are some others issues:
1) some SQL is very quickly in hue or beeline/hive Cli, but it's very slow in oozie job
2) i have found distinct or count(distinct) is very slow in CDH 5.5.1, but it's more fast in CDH 5.1.0 (just like above case, i know some SQL can change to group by, but it doesnt explain why it's faster in CDH 5.1.0)
3) when the SQL is slow, sometimes i will go to check the log, and found below information:
2016-01-14 21:35:01,370 WARN [ResponseProcessor for block BP-880775898-10.32.114.12-1450405765567:blk_1078908247_5174679] org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-880775898-10.32.114.12-1450405765567:blk_1078908247_5174679 java.io.IOException: Bad response ERROR for block BP-880775898-10.32.114.12-1450405765567:blk_1078908247_5174679 from datanode DatanodeInfoWithStorage[10.32.114.17:50010,DS-41264c7f-23fd-4cf8-a4f6-c32886727e4b,DISK] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:1002) 2016-01-14 21:36:36,761 INFO [LeaseRenewer:bi@nameservice1] org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking renewLease of class ClientNamenodeProtocolTranslatorPB over jq-namenode02.hadoop/10.32.114.13:8020. Trying to fail over immediately. java.net.ConnectException: Call From jq-yarn02.hadoop/10.32.114.15 to jq-namenode02.hadoop:8020 failed on connection exception: java.net.ConnectException: Connection timed out; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
it seems can't connect namenode:8020, but i have telnet manually or hdfs dfs , all is ok, why the error happend in oozie job ? (maybe the first question is because of this)
01-14-2016 07:10 AM
show you my one example , how can i put this kind job in production? 4M---1hours , who can explain this ?
one thing i can sure to you, the job is the same, the table is the same, the data is also the same.
01-14-2016 07:24 AM
one thing i want to say again , some jobs execution time has so big different may be it's because of the below error.
java.net.ConnectException: Call From jq-yarn02.hadoop/10.32.114.15 to jq-namenode02.hadoop:8020 failed on connection exception: java.net.ConnectException: Connection timed out; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
can you tell me how to handle this kind error?