- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Problem with starting CDH cluster on AWS using Cloudera Manager
Created on ‎08-31-2016 10:20 PM - edited ‎09-16-2022 03:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm running a POC on AWS using CDH 5.7.2. I have created and configure a simple environment using Cloudera Director as follow :
Cloudera manager
1 x Master
3 x Workers
1 x Gateway
All the 6 instances are m3.xlarge instance type. The installation is smooth and straight foward using cloudera director. After running my jobs for the POC, I stop the cluster from cloudera manager and then stop the instances on EC2 dashboard.
When I restart the instances and the cluster, I always get the following error in various order :
Bad : 659 missing blocks in the cluster. 986 total blocks in the cluster. Percentage missing blocks: 66.84%. Critical threshold: any.
Bad : 659 under replicated blocks in the cluster. 986 total blocks in the cluster. Percentage under replicated blocks: 66.84%. Critical threshold: 40.00%.
Event Server Down (I have to manually start)
Exception while getting fetch configDefaults hash: none java.net.ConnectException: Connection refused
Failed to publish event: SimpleEvent{attributes={STACKTRACE=[java.net.ConnectExcepion: Connection refused
ERROR com.cloudera.cmf.eventcatcher.server.EventCatcherService Could not fetch descriptor after 5 tries, exiting.
Host monitor Down (I have to manually start)
I consistantly reproduce these errors for every fresh installations I have done:
- At first, all green light
- After stopping the cluster/instances and restarting these errors occur
Is there anything wrong with the approach I use to stop & start my cluster ? I've started googling a bit around the missing block issue and understant that it may be related to corrupted files. How to prevent this issue from happening ? Any best practices are welcomed...
I've realized that I'm spending more than half of my time actually fixing the environment instead of focusing on my POC.
Thanks
Created ‎09-09-2016 09:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I did install from scratch a new cluster using m4 instance type and I could not reproduce the error.
Thanks.
Created ‎09-01-2016 01:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should instead use instances that provide "EBS" storage so the data persists.
For cloud environment deployments we recommend using Cloudera Director to install, deploy and run your Cloudera CM and CDH cluster instead of manually managing it, to avoid the little problems such as these: https://www.cloudera.com/documentation/director/latest/topics/director_intro.html
You can also checkout what instance types are recommended by Cloudera Director for CM and CDH here: https://www.cloudera.com/documentation/director/latest/topics/director_deployment_requirements.html#...
Created ‎09-04-2016 10:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for your reply. I can definitely access the data after start and stop of my instances. In my case, my m3.xlarge instances are attached with an EBS storage device : both my boot and block devices are attached to the same ebs volume. That's also what makes it possible to stop and start the instances.
Also, as you can read in my initial post, I'm using Cloudera Director and Cloudera Manager for the deployment/management of my CDH cluster.
At this stage, I still do not see what's causing the issues I have mentionned above.
Regards.
Created ‎09-04-2016 06:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Are you sure that the blocks are still existing in the DataNode hosts even after rebooting the instances? By default, the location should be under /dfs/dn{1,.2..}.
Created ‎09-09-2016 09:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I did install from scratch a new cluster using m4 instance type and I could not reproduce the error.
Thanks.
