Support Questions

pgb · ‎08-31-2016

Hi,

I'm running a POC on AWS using CDH 5.7.2. I have created and configure a simple environment using Cloudera Director as follow :

Cloudera manager

1 x Master

3 x Workers

1 x Gateway

All the 6 instances are m3.xlarge instance type. The installation is smooth and straight foward using cloudera director. After running my jobs for the POC, I stop the cluster from cloudera manager and then stop the instances on EC2 dashboard.

When I restart the instances and the cluster, I always get the following error in various order :

Bad : 659 missing blocks in the cluster. 986 total blocks in the cluster. Percentage missing blocks: 66.84%. Critical threshold: any.

Bad : 659 under replicated blocks in the cluster. 986 total blocks in the cluster. Percentage under replicated blocks: 66.84%. Critical threshold: 40.00%.

Event Server Down (I have to manually start)

Exception while getting fetch configDefaults hash: none
java.net.ConnectException: Connection refused

Failed to publish event: SimpleEvent{attributes={STACKTRACE=[java.net.ConnectExcepion: Connection refused

ERROR   com.cloudera.cmf.eventcatcher.server.EventCatcherService   Could not fetch descriptor after 5 tries, exiting.

Host monitor Down (I have to manually start)

I consistantly reproduce these errors for every fresh installations I have done:

- At first, all green light

- After stopping the cluster/instances and restarting these errors occur

Is there anything wrong with the approach I use to stop & start my cluster ? I've started googling a bit around the missing block issue and understant that it may be related to corrupted files. How to prevent this issue from happening ? Any best practices are welcomed...

I've realized that I'm spending more than half of my time actually fixing the environment instead of focusing on my POC.

Thanks

pgb · ‎09-09-2016

Hi,

I did install from scratch a new cluster using m4 instance type and I could not reproduce the error.

Thanks.

View solution in original post

Harsh J · ‎09-01-2016

As you can note on https://aws.amazon.com/ec2/instance-types/ and http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#instance-store-lifetime, the m3.xlarge uses 2x "instance store" type disks, which will be entirely destroyed when you stop an instance. When you bring back your instance, it would not have any of its past persisted data, and that's not acceptable to a lot of CM and CDH components. Your blocks on HDFS would no longer be on the disk so they'd be reported as missing too.

You should instead use instances that provide "EBS" storage so the data persists.

For cloud environment deployments we recommend using Cloudera Director to install, deploy and run your Cloudera CM and CDH cluster instead of manually managing it, to avoid the little problems such as these: https://www.cloudera.com/documentation/director/latest/topics/director_intro.html

You can also checkout what instance types are recommended by Cloudera Director for CM and CDH here: https://www.cloudera.com/documentation/director/latest/topics/director_deployment_requirements.html#...

pgb · ‎09-04-2016

Hi,

Thanks for your reply. I can definitely access the data after start and stop of my instances. In my case, my m3.xlarge instances are attached with an EBS storage device : both my boot and block devices are attached to the same ebs volume. That's also what makes it possible to stop and start the instances.

Also, as you can read in my initial post, I'm using Cloudera Director and Cloudera Manager for the deployment/management of my CDH cluster.

At this stage, I still do not see what's causing the issues I have mentionned above.

Regards.

dice · ‎09-04-2016

Hi,

Are you sure that the blocks are still existing in the DataNode hosts even after rebooting the instances? By default, the location should be under /dfs/dn{1,.2..}.

pgb · ‎09-09-2016

Hi,

I did install from scratch a new cluster using m4 instance type and I could not reproduce the error.

Thanks.

Cloudera Community

Support Questions

Problem with starting CDH cluster on AWS using Cloudera Manager