Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

BDR Job failing due to socket timeout

Highlighted

BDR Job failing due to socket timeout

New Contributor

Hi all,

 

Encounter a weird issue where sometimes BDR is successful then when i re-run the same BDR Job i encounter socket timeout issue. As per screenshot:Capture.JPG

 

It is a multi-homing cluster, whereby we have 1 public network(172.x.x.x) and 1 private network(10.x.x.x). I applied the namenode.rpcbindhost configurations as per https://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html and set destination proxyuser on source.

 

Any advice is deeply appreciated.

Thanks!

3 REPLIES 3

Re: BDR Job failing due to socket timeout

Guru
@RobinRo ,

BDR will pick random hosts to run jobs, have you checked whether the fail always from the same host? This is the first thing I will check.

The stacktrace pointed out that the error happened when BDR job tried to get file info from HDFS and timed out, so if you confirm the issue is from the same host, then we should check the connectivity between this host and NN.

Cheers
Eric
Highlighted

Re: BDR Job failing due to socket timeout

New Contributor

Hi @EricL ,

 

I realized failure occurs whenever a this particular node(namenode, either active or passive) is running the job. Whenever it hit step 3 " Trigger a HDFS Replication ", it will fail with the socket timeout error. I have a personal lab, with the same design architecture, and it will fail as well on the same particular namenode.

 

Thanks,

@RobinRo 

Highlighted

Re: BDR Job failing due to socket timeout

Super Collaborator

Hi @RobinRo ,

 

I wonder if the particular node is a mutlihomed host? I found this jira and wonder if it can help:

https://issues.apache.org/jira/browse/HADOOP-16006

 

Thanks!

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

Don't have an account?
Coming from Hortonworks? Activate your account here