Created on 07-23-2019 04:29 AM - edited 09-16-2022 07:31 AM
Hi,
I'm trying to deploy a few COORDINATOR_ONLY impalad on edge nodes in my cluster (CDH 5.16.1). HDFS short-circuit read are enabled on the cluster and works fine.When starting the COORDINATOR_ONLY nodes, I'm getting following error:
Invalid short-circuit reads configuration: - Impala cannot read or execute the parent directory of dfs.domain.socket.path Aborting Impala Server startup due to improper configuration. Impalad exiting.
Is there a way to run COORDINATOR_ONLY impalad with hdfs short circuit reads enabled.
Thanks in advance!
Created on 07-24-2019 04:48 PM - edited 07-24-2019 04:52 PM
Thanks Lars for pointing it out.
So solution is to disable HDFS shortcircuit read for coordinator only impala daemons:
a) create a new role group and add all coordinator only impala daemon hosts to this group
b) Go to "CM -> Cluster -> Impala services -> Configuration";
c) Add the following property into "Impala Daemon HDFS Advanced Configuration Snippet (Safety Valve)" for the new role group that you just created:
<property> <name>dfs.client.read.shortcircuit</name> <value>false</value> </property>
d) Save the changes and restart the Impala Daemon instance that are affected.
Hope above can help.
Cheers
Eric
Created 07-23-2019 10:01 PM
Can you check if the short-circuit read is configured as per the Documentation HERE ?
Created 07-24-2019 12:11 AM
dfs.client.read.shortcircuit was not enabled for HDFS Gateway. I enabled it and then deployed client configuration and restarted all the services in the cluster. Unfortunatley, I'm still getting the same error when starting COORDINATOR_ONLY impalad.
The Short circuit reads config looks like following:
Created 07-24-2019 04:29 AM
Created 07-24-2019 04:45 AM
Hi Eric,
> COORDINATOR_ONLY impala daemon will not perform data reads, rather it only coordinates query execution by distributing jobs to executors.
That's my understanding as well and that's why this failure was a surprise to me.
> Why do you need to enable this for COORDINATOR_ONLY impala daemons? Or is it that all your COORDINATOR_ONLY impala daemons fail with this error?
Yes, only the COORDINATOR_ONLY impala daemons fails. Other work fines.
> Also, have you checked whether socket /var/run/hdfs-sockets/dn exist on that impala daemon host?
> /var/run/hdfs-sockets/dn does not exists on COORDINATOR_ONLY impala daemon hosts as there is no DataNode on them.
Regards,
Anand
Created 07-24-2019 11:26 AM
This is a known issue and has been fixed in CM 6.2. Here is the relevant item in the release notes.
Cheers, Lars
Created on 07-24-2019 04:48 PM - edited 07-24-2019 04:52 PM
Thanks Lars for pointing it out.
So solution is to disable HDFS shortcircuit read for coordinator only impala daemons:
a) create a new role group and add all coordinator only impala daemon hosts to this group
b) Go to "CM -> Cluster -> Impala services -> Configuration";
c) Add the following property into "Impala Daemon HDFS Advanced Configuration Snippet (Safety Valve)" for the new role group that you just created:
<property> <name>dfs.client.read.shortcircuit</name> <value>false</value> </property>
d) Save the changes and restart the Impala Daemon instance that are affected.
Hope above can help.
Cheers
Eric
Created 07-25-2019 03:28 AM
Got it working on my cluster. Thanks Lars and Eric.
Cheers,
Anand