Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Reading external tables with Impala

avatar
Expert Contributor

Hi guys,

 we are trying to read(select count(*)..) the external tables imported via sqoop in impala, but the impala crashes every time. The impala deamon has to be restarted.

CDH version 5.2.

External table imported via sqoop and loaded to HDFS as textfile, compressed by Gzip.

External table definition created in Hive.

 

If the external table is plain textfile, the Impala is ok with that, so I assume the problem is in decompression.

 

The query run by impala crashes, does not matter wheter it runs via Impala ODBC driver, or Impala shell.

 

Does anybody have the same issue?

Any ideas what could be wrong?

 

 

Thanks

Tomas

 

1 ACCEPTED SOLUTION

avatar
Expert Contributor

This issue - with reading large tables compressed by Impala - was (based on my experiences) solved in the release of Impala 2.1 (CDH 5.3.1)

Cloudera did not confirm this as a bug - when I tried to arrange a conf call with cloudera support and they tried to investigate where is the problem - they were not able define what is the root cause of this bug.

 

I assume that this changed helped to solve the problem (Impala 2.1.0 release notes):

The memory requirement for querying gzip-compressed text is reduced. Now Impala decompresses the data as it is read, rather than reading the entire gzipped file and decompressing it in memory

 

But this is not confirmed, after upgrade Impala did not crash anymore.

View solution in original post

1 REPLY 1

avatar
Expert Contributor

This issue - with reading large tables compressed by Impala - was (based on my experiences) solved in the release of Impala 2.1 (CDH 5.3.1)

Cloudera did not confirm this as a bug - when I tried to arrange a conf call with cloudera support and they tried to investigate where is the problem - they were not able define what is the root cause of this bug.

 

I assume that this changed helped to solve the problem (Impala 2.1.0 release notes):

The memory requirement for querying gzip-compressed text is reduced. Now Impala decompresses the data as it is read, rather than reading the entire gzipped file and decompressing it in memory

 

But this is not confirmed, after upgrade Impala did not crash anymore.