Reply
Highlighted
Contributor
Posts: 126
Registered: ‎10-15-2014

Backend 1:Decompressor: block size is too big. hbase impala

I have a hcatalog registered table to an hbase table and I am tryin to use Impala to query it

I get the following error when I try to query it 

Backend 1:Decompressor: block size is too big. Data is likely corrupt. Size: 2564977884

 

Query is very simple but is trying to do a sort on the suffix portion of the key

 

select *
from stage.acct_txn_hbasetest
order by substr(key, locate('|',key,39)+1,14) desc
limit 25
;

 

 

hive resolves it in 300 sec

I figured Impala would be faster

 

Guess I have to try out SparkSQL

 

 

Cloudera Employee
Posts: 25
Registered: ‎11-12-2014

Re: Backend 1:Decompressor: block size is too big. hbase impala

Can you post the output of 'show create table stage.acct_txn_hbasetest' and the stacktrace from the error you're getting?

 

Thanks,

 

Dimitris

Champion Alumni
Posts: 196
Registered: ‎11-18-2014

Re: Backend 1:Decompressor: block size is too big. hbase impala

Hello,

 

 

I have the same error. I have no error showned up when I execute the show command. 

The same command works with hive, so file is not corrupt.

 

 

Thank you,

 

Best regards,

 

Alina GHERMAN

GHERMAN Alina
Champion Alumni
Posts: 196
Registered: ‎11-18-2014

Re: Backend 1:Decompressor: block size is too big. hbase impala

[ Edited ]

The show create table table name (Anonymised):

 

CREATE TABLE database_name.table_name (   a STRING,    b STRING,    c STRING,    d STRING,    e TIMESTAMP,    f TIMESTAMP,    g STRING,    h TIMESTAMP,    i STRING,    j BIGINT,    k STRING,    l STRING,    m STRING,    n STRING,    o STRING,    p STRING,    r STRING,    s STRING,    t STRING,    x STRING,    y TIMESTAMP,    z STRING,    aa STRING,    bb STRING,    cc STRING,    dd STRING,    ee STRING,    ff BIGINT,    gg BIGINT,    hh TIMESTAMP,    ii TIMESTAMP,    jj STRING,    kk STRING,    ll STRING,    mm STRING,    nn STRING,    oo STRING,    oo STRING,    pp STRING,    qq STRING,    rr STRING,    ss STRING,    tt STRING,    uu STRING,    xx STRING,    yy STRING,    zz STRING,    aaa STRING,    bbb STRING,    ccc STRING,    ddd STRING,    eee STRING,    fff BIGINT,    ggg STRING,    hhh STRING,    iii STRING,    jjj STRING,    kkk STRING,    lll STRING,    mmm STRING,    nnn STRING,    ooo STRING,    ppp STRING,    qqq STRING,    rrr STRING,    sss STRING,    ttt STRING,    uuu STRING,    xxx STRING,    yyy STRING,    zzz STRING ) 
PARTITIONED BY (   abc STRING,    abcde STRING ) WITH SERDEPROPERTIES ('serialization.format'='1') 
STORED AS TEXTFILE LOCATION 'hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name' 
TBLPROPERTIES ('STATS_GENERATED_VIA_STATS_TASK'='true', 'transient_lastDdlTime'='1450869167', 'numRows'='106717515')

Query:

select count(distinct a, b, c, d, e),a
from table_name
group by a

 

 

Impala error and logs:

Bad status for request 9360: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=5, errorMessage=None, sqlState=None, errorCode=None)






Query 364e9ad30338567c:3bb748508f6431af: 33% Complete (1370 out of 4149)
Backend 6:For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-12/000000_0_copy_42.snappy offset 134217728
For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-12/000000_0_copy_41.snappy offset 134217728
For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-11/000001_0.snappy offset 134217728
Backend 7:For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-12/000000_0_copy_43.snappy offset 134217728
GHERMAN Alina
Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: Backend 1:Decompressor: block size is too big. hbase impala

Just to confirm: You are getting the same "Decompressor: block size is too big.  Data is likely corrupt." error message? If so can you provide the size reported in the error message?

 

Or are you concerned about those warnings?