About aarman

aarman · ‎12-08-2017

Any helpful information in cloudera-director-server.out ?

aarman · ‎12-07-2017

Is this reproducible? Director does cache some of the parcel repository information but it gets cleared at the beginning of bootstrap so that shouldn't be the problem. You can also try restarting Director (which also clears the cache) just to make sure.

aarman · ‎12-07-2017

Which OS are you using? Also can you check the Director server log if it's present, it should be at /var/log/cloudera-director-server/application.log

aarman · ‎11-24-2017

1. No, Cloudera Director can't point to existing instances, the EC2 instances must be provisioned by Director. 2. Based on my answer to (1), I don't think this question still applies. In general though, since Director manages provisioning and preparing the instances as well as installing Cloudera Manager and CDH, you'll likely run into much less compatibility issues using Director than if you were doing any of these steps manually. 3. Director can install Streamset via CSD. You can refer to this post to see an example using a bootstrap script. Note that as of Director 2.5 it's actually easier now to specify CSDs and doesn't require a bootstrap script, this is documented here.

aarman · ‎10-26-2017

include file(...) should work but I believe it requires the full file path. The current latest version of Director (Director 2.6) is on typesafe config version 1.2 which doens't have support for required. You can refer to the following spec to see what's supported in this verison: https://github.com/lightbend/config/blob/v1.2.0/HOCON.md

aarman · ‎09-15-2017

You may want to check whether CM is up and running properly. It's possible that CM is out of memory which can happen when the CM instance size is too small.

aarman · ‎04-24-2017

Hey Garren, Good catch, due to the way the NVMe volume devices are named, they unintentionally get skipped during Director's mount script. We will look into fixing the mount script for these volumes in a future release when we add the i3 instances. For now it's still possible to get this working by manually mounting the volumes through an instance bootstrap script. For example, the following script worked for me using i3.4xlarge instances. bootstrapScripts: ["""#!/bin/sh prepare_disk() { mount=$1 device=$2 FS=ext4 FS_OPTS="-E lazy_itable_init=1" echo "Warning: ERASING CONTENTS OF $device" mkfs.$FS -F $FS_OPTS $device -m 0 echo "Mounting $device on $mount" if [ ! -e "${mount}" ]; then mkdir "${mount}" fi mount -o defaults,noatime "${device}" "${mount}" echo "$device $mount $FS defaults,noatime 0 0" >> /etc/fstab } prepare_disk /data0 /dev/nvme0n1 prepare_disk /data1 /dev/nvme1n1 """] The above is a simplified version of Director's mount script, it will mount the 2 NVMe volumes for i3.4xlarge

aarman · ‎03-28-2017

Hi Luca, I'm not sure if your questions are are directed towards using Cloudera Director to deploy CDH or just general CDH deployment on AWS. Some of these answers are tailored towards using Director to deploy CDH on AWS. 1) Is the CDH deployment available only for some kind of instaces or I can deploy it on all the AWS instance types? Most AWS instance types should work, but be sure to choose instances types with enough compute and memory based on the number of services being deployed, otherwise you will run into health warnings / errors in CM and things may not work as expected. The reference architecture for AWS Deployments which you may have already looked at makes some recommendations on choosing instances for Master, Worker and Edge nodes. 2) Assuming I want to create a cluster that will be active 24x7. For a long-running cluster I understood it's better to have a cluster based on local-storage instances. If we consider a cluster of 2PBs I think that d2.8xlarge should be the best choice for the datanodes. About the Master Nodes: - if I want to deploy only 3 Master Nodes, is it better to have them as local-storage instances too or as EBS attached instances to be able to react quickly to a possible Master Node failure? The main purpose of EBS volumes in Director is to give a wider range of storage types (gp2, st1, sc1) and to allow pausing the Cluster for cost saving purposes. If quickly reacting to Master Node failures is a priority the cluster should be set up with High Availability - are there some best practice about the master node instance type (EBS or local-storage)? Both should be viable, no additional reccomendations aside from what's in the reference architecture About the Data Nodes: - if a data node fails, Has the CDH some automated mechanism to automaticly spin-up a new instance and connect it to the cluster in order to restore the cluster without down-times? Have we to build a script from scratch to do this thing? To clarify CDH isn't capable of spinning up a new instance on AWS, but Director is. If an instance that has a data node fails, Director will not automatically spin-up a new instance. The user can go through the Director UI and choose the repair option for the failing instance. This will provision a new data node instance in it's place and add it to the CDH cluster. Repair can also be done through Director API so this can be automated if needed. About the Edge Nodes: - are there some best practice about the instance type (EBS or local-storage)? Both should be viable, no additional reccomendations aside from what's in the reference architecture 3) If I want to do a backup of the cluster on S3: - when I do a distcp from the CDH to S3, can I move the data directly on Glacier instead of the normal S3? I don't think distcp supports Glacier as a destination. You should be able to use S3 lifecycle policies to send an object from S3 to Glacier some number of days after the object is created. So it's true that distcp can't go directly to Glacier, but a simple data flow through S3 is possible. If I have some compression applied on the data (e.g. snappy, gzip, etc.) and I do a distcp to S3: - Is the space occupied on S3 the same or the distcp command decompress the data for the copy? I don't think distcp will decompress the data. If I have a cluster based on EBS attached instances: - is it possible to snapshot the disks and re-attach a datanode based on the snapshot? This workflow is currently not supported. 4) If I have the Data Nodes deploy as r4.8xlarge and I need more horsepower, is it possible to scale-up the cluster from r4.8xlarge to a r4.16xlarge on-the-fly? Attaching and detaching the disks in few mins? This workflow is also currently not supported.

aarman · ‎02-09-2017

I'm not sure if there's official support with Director, but it can be done using the conf file with the help of a bootstrap script: Specify a bootstrap script (only for the CM instance) to download and place the csd jar in the appropriate csd directory. instances { cminstance { type: m4.xlarge image: ami-ac5f2fcc tags { owner: ${?USER} } bootstrapScript: """#!/bin/sh yum -y install wget wget https://archives.streamsets.com/datacollector/2.3.0.0/csd/STREAMSETS-2.3.0.0.jar mkdir -p /opt/cloudera/csd mv STREAMSETS-2.3.0.0.jar /opt/cloudera/csd/ """ } ... } You also have to specify the Product, Service, Role name along with the Parcel Repository URL in the conf file. The following worked for me (I went through a manual install to get these values): cluster { # add the streamset data collector product products { CDH: 5 STREAMSETS_DATACOLLECTOR: 2.3 } # add the streamset parcel repository parcelRepositories: ["http://archive.cloudera.com/cdh5/parcels/5.9/", "https://archives.streamsets.com/datacollector/latest/parcel/"] # add the service services: [HDFS, YARN, STREAMSETS,...] ... workers { ... # add the data collector role to the streamset service roles { HDFS: [DATANODE] YARN: [NODEMANAGER] STREAMSETS: [DATACOLLECTOR] ... } } }

aarman · ‎09-28-2016

The status command is meant for clusters that were bootstrapped locally with the client (standalone mode) and won't work for clusters that were bootstrapped on a remote server. You can refer to https://www.cloudera.com/documentation/director/latest/topics/director_cli_commands.html for additional information on which commands are meant for remote and which are meant for local. The documentation here may also give a better understanding of the difference between bootstrapping with just the client vs bootstrapping against a server: https://www.cloudera.com/documentation/director/latest/topics/director_client_and_server.html

Online	Offline
Last Visited	‎07-03-2019 03:23 PM

Member Since	‎08-24-2015 12:26 PM
Last Visited	‎07-03-2019 03:23 PM
Posts	38
Kudos received	10

Cloudera Community

Re: set nodes hostname during provisioning

Re: CDH 5.13 manifest / directory mismatch

Re: Bootstrap of cluster fails due to attempt to u...

Re: Setup cloudera director, manager and clusters ...

Re: Architecting an CDH cluster on AWS for some PB...

Re: Failed to start Cloudera Director Server. Retu...

Re: Bootstrap of cluster fails due to attempt to u...

Re: Failed to start Cloudera Director Server. Retu...

Re: Setup cloudera director, manager and clusters ...

Re: Using include statement in Cloudera Director H...

Re: MonitorDaemon-Reporter throttling_logger ERROR...

Re: Cloudera Director support for new AWS i3 insta...

Re: Architecting an CDH cluster on AWS for some PB...

Re: Sreamsets setup on CDH 5.9.1

Re: "No deployment found matching the config file....