Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Missing activated parcel? Starting cluster stalls due to DataNode role start failure.

avatar
Contributor

Our CDH cluster on cloud sometimes fail to start.

Version of CM:  Cloudera Express 5.13.0
Version of CDH: 5.13.0-1.cdh5.13.0.p0.29

Contrary to CM Start Command message "All services successfully started",  the cluster is not going to start.


Inspecting the logs, all the DataNodes failed to start role, showing wierd message below.


"This role requires the following additional parcels to be activated before it can start: [cdh]."



(on CM Start Command view)

> Execute command Start on service HDFS
   Successfully started HDFS service

   > Start HDFS service
       Successfully started service.

   > Starting 14 roles on service
       Successfully started service, but only 10/14 roles started.

       > Execute command Start this DataNode on role DataNode (dn-x)
           Failed to start role.

           > Start a role
               This role requires the following additional parcels to be activated before it can start: [cdh].


       (the same for other DataNodes)



There seems to be no log output on DataNodes, and I suspect a CM issue.
Parcels are distributed and activated properly including cdh parcel.


Current workaround is to restart the cluster.
At second try, it goes well.

I want to give fundamental solution to this issue. 

I would appreciate any helpful information. Thank you.

1 ACCEPTED SOLUTION

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
7 REPLIES 7

avatar
Contributor

I noticed our DataNodes has too many blocks due to having many small files created by (maybe) misconfigured Spark jobs.
CM gives notice like this one.


"Concerning : The DataNode has 989,835 blocks. Warning threshold: 500,000 block(s). "


On cluster startup, DataNodes are consuming time checking these blocks before reporting to NameNode.

Now the cluster startup time is not tolerable for our daily development cycle (about 10 minutes before HDFS gets ready after cluster services startup complete).
Though I'm not confident if this is related to the missing parcel issue, I'm going to resolve this waring first
asking users to remove unnecessary files.

avatar
Champion

For datanode block count threshold , trying run the balancer see if  that fixes your problem 

avatar
Contributor

Hi, csguna. Thanks for your relpy.

Yes, mantra is ringing in my head.

Just removed terrible directory and finished rebalance, and the block count issue is resolved now.
Cluster startup time has returned to normal.

Now I'm going to see how it goes.

avatar
Contributor

Ummm...

 

The initial issue continues to happen occasionally.

 

Cluster start command status is like this.

 

cdh-dn-startup-failure.png

 

 

It's inconvenient to restart manually that I'm going to automate the detection and recovery process:-)

 

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Contributor

This is an additional information about our cluster in issue.

- CDH version: 5.13.0-1.cdh5.13.0.p0.29
- Cloudera Manager version: Cloudera Express 5.13.0
- Java Version: 1.8.0_151
- NameNode HA
- DataNode * 4
- deployed services: HBase, HDFS, Hive, Hue, Oozie, Spark, Spark2 Sqoop2, YARN, ZooKeeper

This cluster is for development purpose. We deploy the cluster on cloud (GCP VM instances) and have automated the start/stop process of the cluster. Usually, the cluster is started on demand via transparent shell command, several times a day depending on workloads.

This issue is rare, but we have ovserved 3 times in this two weeks, first time since the launch of the cluster last February.
We have observerd similar phenomenon with ZooKeeper service startup, which is very rare also.

avatar
Champion

having too many small files in the hadoop cluster is against its mantra 

few large files works best in hadoop cluster. 

I will provide the below link that explains why too many small files is not good for hadoop cluster. 

 

https://blog.cloudera.com/blog/2009/02/the-small-files-problem/

 

Just curious to what type of small files are those if it is parquet format there are code in github that can merge those files and keep em in the cluster based on your data block size