Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How does impalad reads HDFS files

How does impalad reads HDFS files

New Contributor

From Impala documentation and slideshare presentations, it seems the impalad reads hdfs files directly and does not uses the datanode that is used by other hadoop components. 

 

Is this being done by some C++ library supporting all the supported file formats that is part of Impala C++ code ? From search, the apache library libhdfs seems old and unlikely to be used by Impala.

 

Thanks,

2 REPLIES 2
Highlighted

Re: How does impalad reads HDFS files

Cloudera Employee

Impala uses short circuit reads via libhdfs.

 

Re: How does impalad reads HDFS files

New Contributor

The Java stack can use SerDe for various file formats. 

 

For Parquet file format, assuming Impalad does not uses any Java SerDe; is there a parallel C++ SerDe for Parquet created by Impala team ?