Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Use Impala to read parquet from (local) disk instead of HDFS

Use Impala to read parquet from (local) disk instead of HDFS

Explorer

Hi

 

we are doing a PoC with Kudu and Impala. For testing purposes we are using as well Spark to read Parquet files from the local disk which is pretty easy:

 

val df_parquet1 = spark.read.format("parquet").
load("file:///work/testParquetGZ")

df_parquet1.createOrReplaceTempView("test_parquet1")

 

and then we are able to query it directly within spark:

%sql
select *
from test_parquet1
limit 100

I'm looking for a similar approach for Impala. Is it really a must that I have to load the Parquet files to a HDFS storage? Because in our case this makes no sense, we use mainly kudu, so the HDFS part is only there to get Impala running. Our idea is to store the Parquet on a big file share, but without HDFS, as it would generate additional overhead. 

 

So my question, how can I access Parquet files with Impala from (local) disk without HDFS?

 

Cheers

1 REPLY 1
Highlighted

Re: Use Impala to read parquet from (local) disk instead of HDFS

Expert Contributor

Hi @teether,
I think the answer is no way to do it, because of the Impala is a MPP query engine created on top of hadoop (including the hadoop-hdfs).