11-13-2013 07:41 PM
From Impala documentation and slideshare presentations, it seems the impalad reads hdfs files directly and does not uses the datanode that is used by other hadoop components.
Is this being done by some C++ library supporting all the supported file formats that is part of Impala C++ code ? From search, the apache library libhdfs seems old and unlikely to be used by Impala.
11-21-2013 08:19 PM
The Java stack can use SerDe for various file formats.
For Parquet file format, assuming Impalad does not uses any Java SerDe; is there a parallel C++ SerDe for Parquet created by Impala team ?