Member since
08-24-2015
38
Posts
10
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2524 | 03-22-2018 04:39 PM | |
3362 | 03-22-2018 02:04 PM | |
3345 | 12-07-2017 11:55 AM | |
4763 | 11-24-2017 01:02 PM | |
3414 | 03-28-2017 01:22 PM |
12-08-2017
04:21 PM
Any helpful information in cloudera-director-server.out ?
... View more
12-07-2017
11:55 AM
Is this reproducible? Director does cache some of the parcel repository information but it gets cleared at the beginning of bootstrap so that shouldn't be the problem. You can also try restarting Director (which also clears the cache) just to make sure.
... View more
12-07-2017
11:42 AM
Which OS are you using? Also can you check the Director server log if it's present, it should be at /var/log/cloudera-director-server/application.log
... View more
11-24-2017
01:02 PM
1. No, Cloudera Director can't point to existing instances, the EC2 instances must be provisioned by Director. 2. Based on my answer to (1), I don't think this question still applies. In general though, since Director manages provisioning and preparing the instances as well as installing Cloudera Manager and CDH, you'll likely run into much less compatibility issues using Director than if you were doing any of these steps manually. 3. Director can install Streamset via CSD. You can refer to this post to see an example using a bootstrap script. Note that as of Director 2.5 it's actually easier now to specify CSDs and doesn't require a bootstrap script, this is documented here.
... View more
10-26-2017
09:57 AM
include file(...) should work but I believe it requires the full file path. The current latest version of Director (Director 2.6) is on typesafe config version 1.2 which doens't have support for required. You can refer to the following spec to see what's supported in this verison: https://github.com/lightbend/config/blob/v1.2.0/HOCON.md
... View more
09-15-2017
12:29 PM
1 Kudo
You may want to check whether CM is up and running properly. It's possible that CM is out of memory which can happen when the CM instance size is too small.
... View more
04-24-2017
11:12 PM
1 Kudo
Hey Garren, Good catch, due to the way the NVMe volume devices are named, they unintentionally get skipped during Director's mount script. We will look into fixing the mount script for these volumes in a future release when we add the i3 instances. For now it's still possible to get this working by manually mounting the volumes through an instance bootstrap script. For example, the following script worked for me using i3.4xlarge instances. bootstrapScripts: ["""#!/bin/sh
prepare_disk()
{
mount=$1
device=$2
FS=ext4
FS_OPTS="-E lazy_itable_init=1"
echo "Warning: ERASING CONTENTS OF $device"
mkfs.$FS -F $FS_OPTS $device -m 0
echo "Mounting $device on $mount"
if [ ! -e "${mount}" ]; then
mkdir "${mount}"
fi
mount -o defaults,noatime "${device}" "${mount}"
echo "$device $mount $FS defaults,noatime 0 0" >> /etc/fstab
}
prepare_disk /data0 /dev/nvme0n1
prepare_disk /data1 /dev/nvme1n1
"""] The above is a simplified version of Director's mount script, it will mount the 2 NVMe volumes for i3.4xlarge
... View more
03-28-2017
01:22 PM
1 Kudo
Hi Luca, I'm not sure if your questions are are directed towards using Cloudera Director to deploy CDH or just general CDH deployment on AWS. Some of these answers are tailored towards using Director to deploy CDH on AWS. 1) Is the CDH deployment available only for some kind of instaces or I can deploy it on all the AWS instance types? Most AWS instance types should work, but be sure to choose instances types with enough compute and memory based on the number of services being deployed, otherwise you will run into health warnings / errors in CM and things may not work as expected. The reference architecture for AWS Deployments which you may have already looked at makes some recommendations on choosing instances for Master, Worker and Edge nodes. 2) Assuming I want to create a cluster that will be active 24x7. For a long-running cluster I understood it's better to have a cluster based on local-storage instances. If we consider a cluster of 2PBs I think that d2.8xlarge should be the best choice for the datanodes. About the Master Nodes: - if I want to deploy only 3 Master Nodes, is it better to have them as local-storage instances too or as EBS attached instances to be able to react quickly to a possible Master Node failure? The main purpose of EBS volumes in Director is to give a wider range of storage types (gp2, st1, sc1) and to allow pausing the Cluster for cost saving purposes. If quickly reacting to Master Node failures is a priority the cluster should be set up with High Availability - are there some best practice about the master node instance type (EBS or local-storage)? Both should be viable, no additional reccomendations aside from what's in the reference architecture About the Data Nodes: - if a data node fails, Has the CDH some automated mechanism to automaticly spin-up a new instance and connect it to the cluster in order to restore the cluster without down-times? Have we to build a script from scratch to do this thing? To clarify CDH isn't capable of spinning up a new instance on AWS, but Director is. If an instance that has a data node fails, Director will not automatically spin-up a new instance. The user can go through the Director UI and choose the repair option for the failing instance. This will provision a new data node instance in it's place and add it to the CDH cluster. Repair can also be done through Director API so this can be automated if needed. About the Edge Nodes: - are there some best practice about the instance type (EBS or local-storage)? Both should be viable, no additional reccomendations aside from what's in the reference architecture 3) If I want to do a backup of the cluster on S3: - when I do a distcp from the CDH to S3, can I move the data directly on Glacier instead of the normal S3? I don't think distcp supports Glacier as a destination. You should be able to use S3 lifecycle policies to send an object from S3 to Glacier some number of days after the object is created. So it's true that distcp can't go directly to Glacier, but a simple data flow through S3 is possible. If I have some compression applied on the data (e.g. snappy, gzip, etc.) and I do a distcp to S3: - Is the space occupied on S3 the same or the distcp command decompress the data for the copy? I don't think distcp will decompress the data. If I have a cluster based on EBS attached instances: - is it possible to snapshot the disks and re-attach a datanode based on the snapshot? This workflow is currently not supported. 4) If I have the Data Nodes deploy as r4.8xlarge and I need more horsepower, is it possible to scale-up the cluster from r4.8xlarge to a r4.16xlarge on-the-fly? Attaching and detaching the disks in few mins? This workflow is also currently not supported.
... View more
02-09-2017
03:16 PM
1 Kudo
I'm not sure if there's official support with Director, but it can be done using the conf file with the help of a bootstrap script: Specify a bootstrap script (only for the CM instance) to download and place the csd jar in the appropriate csd directory. instances {
cminstance {
type: m4.xlarge
image: ami-ac5f2fcc
tags {
owner: ${?USER}
}
bootstrapScript: """#!/bin/sh
yum -y install wget
wget https://archives.streamsets.com/datacollector/2.3.0.0/csd/STREAMSETS-2.3.0.0.jar
mkdir -p /opt/cloudera/csd
mv STREAMSETS-2.3.0.0.jar /opt/cloudera/csd/
"""
}
...
} You also have to specify the Product, Service, Role name along with the Parcel Repository URL in the conf file. The following worked for me (I went through a manual install to get these values): cluster {
# add the streamset data collector product
products {
CDH: 5
STREAMSETS_DATACOLLECTOR: 2.3
}
# add the streamset parcel repository
parcelRepositories: ["http://archive.cloudera.com/cdh5/parcels/5.9/",
"https://archives.streamsets.com/datacollector/latest/parcel/"]
# add the service
services: [HDFS, YARN, STREAMSETS,...]
...
workers {
...
# add the data collector role to the streamset service
roles {
HDFS: [DATANODE]
YARN: [NODEMANAGER]
STREAMSETS: [DATACOLLECTOR]
...
}
}
}
... View more
09-28-2016
12:26 PM
The status command is meant for clusters that were bootstrapped locally with the client (standalone mode) and won't work for clusters that were bootstrapped on a remote server. You can refer to https://www.cloudera.com/documentation/director/latest/topics/director_cli_commands.html for additional information on which commands are meant for remote and which are meant for local. The documentation here may also give a better understanding of the difference between bootstrapping with just the client vs bootstrapping against a server: https://www.cloudera.com/documentation/director/latest/topics/director_client_and_server.html
... View more
- « Previous
-
- 1
- 2
- Next »