Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Backend 1:Decompressor: block size is too big. hbase impala

Backend 1:Decompressor: block size is too big. hbase impala

Contributor

I have a hcatalog registered table to an hbase table and I am tryin to use Impala to query it

I get the following error when I try to query it 

Backend 1:Decompressor: block size is too big. Data is likely corrupt. Size: 2564977884

 

Query is very simple but is trying to do a sort on the suffix portion of the key

 

select *
from stage.acct_txn_hbasetest
order by substr(key, locate('|',key,39)+1,14) desc
limit 25
;

 

 

hive resolves it in 300 sec

I figured Impala would be faster

 

Guess I have to try out SparkSQL

 

 

4 REPLIES 4
Highlighted

Re: Backend 1:Decompressor: block size is too big. hbase impala

Contributor

Can you post the output of 'show create table stage.acct_txn_hbasetest' and the stacktrace from the error you're getting?

 

Thanks,

 

Dimitris

Re: Backend 1:Decompressor: block size is too big. hbase impala

Champion Alumni

Hello,

 

 

I have the same error. I have no error showned up when I execute the show command. 

The same command works with hive, so file is not corrupt.

 

 

Thank you,

 

Best regards,

 

Alina GHERMAN

GHERMAN Alina

Re: Backend 1:Decompressor: block size is too big. hbase impala

Champion Alumni

The show create table table name (Anonymised):

 

CREATE TABLE database_name.table_name (   a STRING,    b STRING,    c STRING,    d STRING,    e TIMESTAMP,    f TIMESTAMP,    g STRING,    h TIMESTAMP,    i STRING,    j BIGINT,    k STRING,    l STRING,    m STRING,    n STRING,    o STRING,    p STRING,    r STRING,    s STRING,    t STRING,    x STRING,    y TIMESTAMP,    z STRING,    aa STRING,    bb STRING,    cc STRING,    dd STRING,    ee STRING,    ff BIGINT,    gg BIGINT,    hh TIMESTAMP,    ii TIMESTAMP,    jj STRING,    kk STRING,    ll STRING,    mm STRING,    nn STRING,    oo STRING,    oo STRING,    pp STRING,    qq STRING,    rr STRING,    ss STRING,    tt STRING,    uu STRING,    xx STRING,    yy STRING,    zz STRING,    aaa STRING,    bbb STRING,    ccc STRING,    ddd STRING,    eee STRING,    fff BIGINT,    ggg STRING,    hhh STRING,    iii STRING,    jjj STRING,    kkk STRING,    lll STRING,    mmm STRING,    nnn STRING,    ooo STRING,    ppp STRING,    qqq STRING,    rrr STRING,    sss STRING,    ttt STRING,    uuu STRING,    xxx STRING,    yyy STRING,    zzz STRING ) 
PARTITIONED BY (   abc STRING,    abcde STRING ) WITH SERDEPROPERTIES ('serialization.format'='1') 
STORED AS TEXTFILE LOCATION 'hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name' 
TBLPROPERTIES ('STATS_GENERATED_VIA_STATS_TASK'='true', 'transient_lastDdlTime'='1450869167', 'numRows'='106717515')

Query:

select count(distinct a, b, c, d, e),a
from table_name
group by a

 

 

Impala error and logs:

Bad status for request 9360: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=5, errorMessage=None, sqlState=None, errorCode=None)






Query 364e9ad30338567c:3bb748508f6431af: 33% Complete (1370 out of 4149)
Backend 6:For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-12/000000_0_copy_42.snappy offset 134217728
For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-12/000000_0_copy_41.snappy offset 134217728
For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-11/000001_0.snappy offset 134217728
Backend 7:For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-12/000000_0_copy_43.snappy offset 134217728
GHERMAN Alina

Re: Backend 1:Decompressor: block size is too big. hbase impala

Master Collaborator

Just to confirm: You are getting the same "Decompressor: block size is too big.  Data is likely corrupt." error message? If so can you provide the size reported in the error message?

 

Or are you concerned about those warnings?