Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how impala has enabled querying batch and stream data together

Highlighted

how impala has enabled querying batch and stream data together

New Contributor

Hi,

 

I need a detailed documentation on how impala has enabled running queries which combines views on batch data and views on stream data. since its an opensource software i appreciate anypointer to source classes and documentations.

 

-Thanks.

-Soheila D.

2 REPLIES 2

Re: how impala has enabled querying batch and stream data together

Contributor

Impala documentation can be found here: 

http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...

 

Impala source code is shared on GitHub:

https://github.com/cloudera/impala

 

Generally speaking, Impala is a query engine running in Hadoop. It can support both batch and streaming data, because Hadoop ecosystem has tools for collecting and processing both types of data.

Highlighted

Re: how impala has enabled querying batch and stream data together

Cloudera Employee

By 'batch' and 'streaming', do you mean with the data residing in HDFS and HBase as the underlying data stores?  The documentation for the Impala+HBase combination is here:

 

http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...

 

All of the other Impala documentation relates to data files stored in HDFS.  You can query HBase tables directly to get at streaming data, or do join queries between HBase and HDFS tables to access both kinds of data from a single query.

Don't have an account?
Coming from Hortonworks? Activate your account here