I need a detailed documentation on how impala has enabled running queries which combines views on batch data and views on stream data. since its an opensource software i appreciate anypointer to source classes and documentations.
Impala documentation can be found here:
Impala source code is shared on GitHub:
Generally speaking, Impala is a query engine running in Hadoop. It can support both batch and streaming data, because Hadoop ecosystem has tools for collecting and processing both types of data.
By 'batch' and 'streaming', do you mean with the data residing in HDFS and HBase as the underlying data stores? The documentation for the Impala+HBase combination is here:
All of the other Impala documentation relates to data files stored in HDFS. You can query HBase tables directly to get at streaming data, or do join queries between HBase and HDFS tables to access both kinds of data from a single query.