Support Questions

Find answers, ask questions, and share your expertise

Impalad COORDINATOR_ONLY with HDFS Short-Circuit Read

avatar
Explorer

Hi,

 

I'm trying to deploy a few COORDINATOR_ONLY impalad on edge nodes in my cluster (CDH 5.16.1). HDFS short-circuit read are enabled on the cluster and works fine.When starting the COORDINATOR_ONLY nodes, I'm getting following error:

 

Invalid short-circuit reads configuration:
  - Impala cannot read or execute the parent directory of dfs.domain.socket.path
Aborting Impala Server startup due to improper configuration. Impalad exiting.

Is there a way to run COORDINATOR_ONLY impalad with hdfs short circuit reads enabled.

 

Thanks in advance!

1 ACCEPTED SOLUTION

avatar
Super Guru

Thanks Lars for pointing it out.

So solution is to disable HDFS shortcircuit read for coordinator only impala daemons:

 

a) create a new role group and add all coordinator only impala daemon hosts to this group

b) Go to "CM -> Cluster -> Impala services -> Configuration";

c) Add the following property into "Impala Daemon HDFS Advanced Configuration Snippet (Safety Valve)" for the new role group that you just created:

<property>
  <name>dfs.client.read.shortcircuit</name>
  <value>false</value>
</property>


d) Save the changes and restart the Impala Daemon instance that are affected.


Hope above can help.

Cheers
Eric

View solution in original post

7 REPLIES 7

avatar
Expert Contributor

Can you check if the short-circuit read is configured as per the Documentation HERE ?

avatar
Explorer

dfs.client.read.shortcircuit was not enabled for HDFS Gateway. I enabled it and then deployed client configuration and restarted all the services in the cluster. Unfortunatley, I'm still getting the same error when starting COORDINATOR_ONLY impalad.

 

The Short circuit reads config looks like following:

 

HDFS config

avatar
Super Guru
Hi,

COORDINATOR_ONLY impala daemon will not perform data reads, rather it only coordinates query execution by distributing jobs to executors.

Why do you need to enable this for COORDINATOR_ONLY impala daemons? Or is it that all your COORDINATOR_ONLY impala daemons fail with this error?

Also, have you checked whether socket /var/run/hdfs-sockets/dn exist on that impala daemon host?

Cheers
Eric

avatar
Explorer

Hi Eric,

 

> COORDINATOR_ONLY impala daemon will not perform data reads, rather it only coordinates query execution by distributing jobs to executors.

 

That's my understanding as well and that's why this failure was a surprise to me.

 

> Why do you need to enable this for COORDINATOR_ONLY impala daemons? Or is it that all your COORDINATOR_ONLY impala daemons fail with this error?

 

Yes, only the COORDINATOR_ONLY impala daemons fails. Other work fines.

 

> Also, have you checked whether socket /var/run/hdfs-sockets/dn exist on that impala daemon host?

 

> /var/run/hdfs-sockets/dn does not exists on COORDINATOR_ONLY impala daemon hosts as there is no DataNode  on them.

 

Regards,

Anand

avatar
Super Collaborator

This is a known issue and has been fixed in CM 6.2. Here is the relevant item in the release notes.

 

Cheers, Lars

avatar
Super Guru

Thanks Lars for pointing it out.

So solution is to disable HDFS shortcircuit read for coordinator only impala daemons:

 

a) create a new role group and add all coordinator only impala daemon hosts to this group

b) Go to "CM -> Cluster -> Impala services -> Configuration";

c) Add the following property into "Impala Daemon HDFS Advanced Configuration Snippet (Safety Valve)" for the new role group that you just created:

<property>
  <name>dfs.client.read.shortcircuit</name>
  <value>false</value>
</property>


d) Save the changes and restart the Impala Daemon instance that are affected.


Hope above can help.

Cheers
Eric

avatar
Explorer

Got it working on my cluster. Thanks Lars and Eric.

 

Cheers,

Anand