Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Reading external tables with Impala

Solved Go to solution

Reading external tables with Impala

Rising Star

Hi guys,

 we are trying to read(select count(*)..) the external tables imported via sqoop in impala, but the impala crashes every time. The impala deamon has to be restarted.

CDH version 5.2.

External table imported via sqoop and loaded to HDFS as textfile, compressed by Gzip.

External table definition created in Hive.

 

If the external table is plain textfile, the Impala is ok with that, so I assume the problem is in decompression.

 

The query run by impala crashes, does not matter wheter it runs via Impala ODBC driver, or Impala shell.

 

Does anybody have the same issue?

Any ideas what could be wrong?

 

 

Thanks

Tomas

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Reading external tables with Impala

Rising Star

This issue - with reading large tables compressed by Impala - was (based on my experiences) solved in the release of Impala 2.1 (CDH 5.3.1)

Cloudera did not confirm this as a bug - when I tried to arrange a conf call with cloudera support and they tried to investigate where is the problem - they were not able define what is the root cause of this bug.

 

I assume that this changed helped to solve the problem (Impala 2.1.0 release notes):

The memory requirement for querying gzip-compressed text is reduced. Now Impala decompresses the data as it is read, rather than reading the entire gzipped file and decompressing it in memory

 

But this is not confirmed, after upgrade Impala did not crash anymore.

1 REPLY 1
Highlighted

Re: Reading external tables with Impala

Rising Star

This issue - with reading large tables compressed by Impala - was (based on my experiences) solved in the release of Impala 2.1 (CDH 5.3.1)

Cloudera did not confirm this as a bug - when I tried to arrange a conf call with cloudera support and they tried to investigate where is the problem - they were not able define what is the root cause of this bug.

 

I assume that this changed helped to solve the problem (Impala 2.1.0 release notes):

The memory requirement for querying gzip-compressed text is reduced. Now Impala decompresses the data as it is read, rather than reading the entire gzipped file and decompressing it in memory

 

But this is not confirmed, after upgrade Impala did not crash anymore.

Don't have an account?
Coming from Hortonworks? Activate your account here