Reply
New Contributor
Posts: 4
Registered: ‎11-13-2013

How does impalad reads HDFS files

From Impala documentation and slideshare presentations, it seems the impalad reads hdfs files directly and does not uses the datanode that is used by other hadoop components. 

 

Is this being done by some C++ library supporting all the supported file formats that is part of Impala C++ code ? From search, the apache library libhdfs seems old and unlikely to be used by Impala.

 

Thanks,

Cloudera Employee
Posts: 16
Registered: ‎08-01-2013

Re: How does impalad reads HDFS files

Impala uses short circuit reads via libhdfs.

 

New Contributor
Posts: 4
Registered: ‎11-13-2013

Re: How does impalad reads HDFS files

The Java stack can use SerDe for various file formats. 

 

For Parquet file format, assuming Impalad does not uses any Java SerDe; is there a parallel C++ SerDe for Parquet created by Impala team ?