About Sean

Sean · ‎11-02-2016

There's a file at /var/lib/cloudera-quickstart/tutorial/js/config.js you can edit to manually override the detection. Currently it likely contains the line: var managed = true; I'd recommend changing it to: var managed = 'express'; And that should unlock the other parts of the tutorial. Do not that the only parts 'express' unlocks include some sections on checking the health of services required for each step. The 'enterprise' option of CM will also add a section on using Navigator to audit access to the data and trace lineage of data sets.

Sean · ‎10-05-2016

CDH (and Cloudera Manager) are supported on Ubuntu 14.04. You can follow the standard documentation: it will include the necessary details when the procedure differs on different Linux distributions. See http://www.cloudera.com/documentation/enterprise/latest/topics/installation_installation.html .

Sean · ‎09-07-2016

The easiest way would be to download and install the JDK version you want from Oracle's website. They offer RPM packages which should work in the VM, or a tarball that you can extract yourself anywhere you like. Once it's installed, make a note of the directory it installed to: the RPMs will install under /usr/lib/jvm or /usr/java or something like that. The directory will include the version in the name, and should have a /bin/ directory underneath it. With that directory, you'll want to update the value of JAVA_HOME in /etc/profile and restart any shell sessions you have open. If you want CDH to use that JDK as well, export JAVA_HOME in /etc/default/bigtop-utils.

Sean · ‎09-07-2016

SSH in the VM will listen on port 22 by default. You're hitting port 2222 on your host machine. If you're using VirtualBox, you can set up port forwarding in VirtualBox so that port 2222 on your host machine is forwarded to 22 (this is probably the easiest solution, but that isn't done out of the box). The alternative is to configure the VM to use something other than NAT for the virtual network. If you configure it to bridged networking or a similar option, it will get it's own IP address that you can use to connect to port 22 from your host machine.

Sean · ‎08-23-2016

Depending on what you're doing, the Cloudera Management Services are likely not needed for your project. They deal with monitoring the various services. They make it harder to tell from the Cloudera Manager home page if the service is healthy, but if they crash after 5 minutes it shouldn't affect any of the services themselves. In my experience with the VM, often 1 service will fail that impacts the others (often it's the Host Monitor). I'd look at the monitoring data for the services to see which one is going down first, and then dig deeper in it's logs to see what the problem is. 8 GB should not be seen as plenty, but as the absolute bare minimum required. If you're running all of the Cloudera Manager services and putting load on Flume, Kafka and Spark / YARN, I'd expect your VM to be straining to keep up. These are all services designed to run on fairly large clusters, not minimal VMs - it will struggle with certain projects. I'd recommend adding more memory if you're able to - that is likely the reason on of the Cloudera Management Services isn't keeping up.

Sean · ‎08-11-2016

The term gateway may be used in lots of contexts - it usually refers to a machine or service that acts as an entry point to other services. For example, your entire cluster might be behind a firewall which blocks all inbound traffic, except that it allows you to log in to one of the machines. From that machine, you can submit jobs or interact with any of the services in the cluster. That machine would be called a "gateway". Often in a Cloudera context, a gateway is just that: a machine that you're supposed to log into to carry out some tasks that aren't possible from outside the cluster. Cloudera Manager might manage the machine (meaning it deploys configuration to it and does basic health checks) but not run any CDH services on it. The NFS gateway is a similar idea. It connects to your HDFS cluster and exposes the filesystem via the NFS protocol. So you might not expose all of the HDFS ports to your network, but you might expose just the NFS service, and it therefore acts as a gateway.

Sean · ‎06-21-2016

VirtualBox has the ability to take snapshots of VMs that you can restore to at a later date.

Sean · ‎06-20-2016

The QuickStart VM includes a tutorial that will walk you through a use case where you: - ingest some data into HDFS from a relational database using Sqoop, and query it with Impala - ingest some data into HDFS from a batch of log files, ETL it with Hive, and query it with Impala - ingest some data into HDFS from a live stream of logs and index it for searching with Solr - perform link strength analysis on the data using Spark - build a dashboard in Hue - if Hue run the scripts to migrate to Cloudera Enterprise, also audit access to the data and visualize it's lineage That sounds like it will cover most of what you're looking for.

Sean · ‎06-13-2016

Note that there are many variables in that tutorial you'll need to replace with your own values. A copy of the tutorial with all the blanks filled in and the required datasets are available in the QuickStart VM.

Sean · ‎06-06-2016

I'm not sure I've seen this particular problem before, however I'd suggest comparing the SHA-1 hashes to be sure it's not compromised. The hashes can be found when you download the file. For the 5.7.0-0 VirtualBox image it's 1309591109ebd9b1e44c89bd064b12d8b00feeb6. My copy of the file matches and is slightly smaller than yours, so unless there's a difference in how file sizes are reported on different operating systems, I would suspect your download is corrupted. As Cy said, we do recommend using a download manager. Browsers tend to have inferior support for recovering from problems during the download, and you see that more often on large files like this.

Online	Offline
Last Visited	‎03-17-2016 10:55 PM

Member Since	‎07-12-2013 07:35 AM
Last Visited	‎03-17-2016 10:55 PM
Posts	435
Kudos received	117

Cloudera Community

Re: Quickstart VM welcome page doesn't recognize t...

Re: Hadoop installation on Ubuntu 14.o4

Re: In Cloudera Quickstart VM how to upgrade lates...

Re: Unable to transfer files from Mac Desktop to C...

Re: Cloudera service and host monitoring fails fre...

Re: Quickstart VM welcome page doesn't recognize t...

Re: Hadoop installation on Ubuntu 14.o4

Re: In Cloudera Quickstart VM how to upgrade lates...

Re: Unable to transfer files from Mac Desktop to C...

Re: Cloudera service and host monitoring fails fre...

Re: What is Gateway and NFS Gateway

Re: Quckstart VM Cloudera - Hadoop Solution Backup

Re: Real Pratical Tutorial for Hadoop using HDFS, ...

Re: Unable to find lib path in CDH5 toadd jar usin...

Re: QuickStartCHD VM 5.7 boot failed on latest ver...