Member since
07-12-2013
435
Posts
117
Kudos Received
82
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1214 | 11-02-2016 11:02 AM | |
1833 | 10-05-2016 01:58 PM | |
6699 | 09-07-2016 08:32 AM | |
6233 | 09-07-2016 08:27 AM | |
1206 | 08-23-2016 08:35 AM |
11-02-2016
11:02 AM
1 Kudo
There's a file at /var/lib/cloudera-quickstart/tutorial/js/config.js you can edit to manually override the detection. Currently it likely contains the line: var managed = true; I'd recommend changing it to: var managed = 'express'; And that should unlock the other parts of the tutorial. Do not that the only parts 'express' unlocks include some sections on checking the health of services required for each step. The 'enterprise' option of CM will also add a section on using Navigator to audit access to the data and trace lineage of data sets.
... View more
10-05-2016
01:58 PM
CDH (and Cloudera Manager) are supported on Ubuntu 14.04. You can follow the standard documentation: it will include the necessary details when the procedure differs on different Linux distributions. See http://www.cloudera.com/documentation/enterprise/latest/topics/installation_installation.html .
... View more
09-08-2016
09:22 AM
I seem to recall getting a similar error when the root cause was SQL permissions. I would try specifying the MySQL username and password that you see in the Sqoop command in tutorial 1. Since you can connect as root, you should be able to tweak permissions for the 'cloudera' user if needed, but they should all work out of the box (and they did for me).
... View more
09-07-2016
08:32 AM
1 Kudo
The easiest way would be to download and install the JDK version you want from Oracle's website. They offer RPM packages which should work in the VM, or a tarball that you can extract yourself anywhere you like. Once it's installed, make a note of the directory it installed to: the RPMs will install under /usr/lib/jvm or /usr/java or something like that. The directory will include the version in the name, and should have a /bin/ directory underneath it. With that directory, you'll want to update the value of JAVA_HOME in /etc/profile and restart any shell sessions you have open. If you want CDH to use that JDK as well, export JAVA_HOME in /etc/default/bigtop-utils.
... View more
09-07-2016
08:27 AM
SSH in the VM will listen on port 22 by default. You're hitting port 2222 on your host machine. If you're using VirtualBox, you can set up port forwarding in VirtualBox so that port 2222 on your host machine is forwarded to 22 (this is probably the easiest solution, but that isn't done out of the box). The alternative is to configure the VM to use something other than NAT for the virtual network. If you configure it to bridged networking or a similar option, it will get it's own IP address that you can use to connect to port 22 from your host machine.
... View more
08-23-2016
08:35 AM
1 Kudo
Depending on what you're doing, the Cloudera Management Services are likely not needed for your project. They deal with monitoring the various services. They make it harder to tell from the Cloudera Manager home page if the service is healthy, but if they crash after 5 minutes it shouldn't affect any of the services themselves. In my experience with the VM, often 1 service will fail that impacts the others (often it's the Host Monitor). I'd look at the monitoring data for the services to see which one is going down first, and then dig deeper in it's logs to see what the problem is. 8 GB should not be seen as plenty, but as the absolute bare minimum required. If you're running all of the Cloudera Manager services and putting load on Flume, Kafka and Spark / YARN, I'd expect your VM to be straining to keep up. These are all services designed to run on fairly large clusters, not minimal VMs - it will struggle with certain projects. I'd recommend adding more memory if you're able to - that is likely the reason on of the Cloudera Management Services isn't keeping up.
... View more
08-11-2016
07:24 AM
5 Kudos
The term gateway may be used in lots of contexts - it usually refers to a machine or service that acts as an entry point to other services. For example, your entire cluster might be behind a firewall which blocks all inbound traffic, except that it allows you to log in to one of the machines. From that machine, you can submit jobs or interact with any of the services in the cluster. That machine would be called a "gateway". Often in a Cloudera context, a gateway is just that: a machine that you're supposed to log into to carry out some tasks that aren't possible from outside the cluster. Cloudera Manager might manage the machine (meaning it deploys configuration to it and does basic health checks) but not run any CDH services on it. The NFS gateway is a similar idea. It connects to your HDFS cluster and exposes the filesystem via the NFS protocol. So you might not expose all of the HDFS ports to your network, but you might expose just the NFS service, and it therefore acts as a gateway.
... View more
06-22-2016
08:32 AM
1 Kudo
I've seen this problem before and it should be fixed in the next release, but I may have a work-around for you. During boot, the VM will try to inteligently select the best IP address to bind the 'quickstart.cloudera' hostname to (as Hadoop configuration is very closely tied to the hostname). It'll try to use the eth0 if it's there, but if not it falls back to the loopback device. In your case and one other I've seen, the virtual NIC ends up being eth > 0, and the VM doesn't check for that. The easiest workaround for you would be to edit the file common/root/usr/bin/cloudera-quickstart-ip and replace line 24 that says "DEV='eth0'" with "DEV='eth1'" (or eth2, if you'd prefer things to treat that as the primary interface). After rebooting it should work.
... View more
06-21-2016
08:36 AM
VirtualBox has the ability to take snapshots of VMs that you can restore to at a later date.
... View more
06-20-2016
03:40 PM
The QuickStart VM includes a tutorial that will walk you through a use case where you: - ingest some data into HDFS from a relational database using Sqoop, and query it with Impala - ingest some data into HDFS from a batch of log files, ETL it with Hive, and query it with Impala - ingest some data into HDFS from a live stream of logs and index it for searching with Solr - perform link strength analysis on the data using Spark - build a dashboard in Hue - if Hue run the scripts to migrate to Cloudera Enterprise, also audit access to the data and visualize it's lineage That sounds like it will cover most of what you're looking for.
... View more
06-13-2016
07:14 AM
Note that there are many variables in that tutorial you'll need to replace with your own values. A copy of the tutorial with all the blanks filled in and the required datasets are available in the QuickStart VM.
... View more
06-06-2016
09:39 AM
I'm not sure I've seen this particular problem before, however I'd suggest comparing the SHA-1 hashes to be sure it's not compromised. The hashes can be found when you download the file. For the 5.7.0-0 VirtualBox image it's 1309591109ebd9b1e44c89bd064b12d8b00feeb6. My copy of the file matches and is slightly smaller than yours, so unless there's a difference in how file sizes are reported on different operating systems, I would suspect your download is corrupted. As Cy said, we do recommend using a download manager. Browsers tend to have inferior support for recovering from problems during the download, and you see that more often on large files like this.
... View more
06-02-2016
12:31 PM
Also, note that there's a script that tries to detect a public IP and set up the hosts file for you on boot. If you're going to edit it manually, you probably want to comment out the line in /etc/init.d/cloudera-quickstart-init that calls /usr/bin/cloudera-quickstart-ip. I don't remember which version that was added in. It might have been 5.5 - so if your VM doesn't have /usr/bin/cloudera-quickstart-ip you can ignore this post and safely edit the hosts file anyway.
... View more
06-01-2016
09:56 AM
intermediate_access_logs was created as part of the ETL process in the tutorial. That process is done via Hive because it uses Hive SerDe's and other Hive-only features. The final table created in that process (tokenized_access_logs, if I remember correctly) is the one you should be able to query in Impala. Also, don't forget to 'invalidate metadata' when the ETL process is finished, since Impala doesn't cache metadata.
... View more
06-01-2016
09:53 AM
I don't know much about Spark internals to give much intelligent advice here, but it's possible it's a matter of resources. You still have the problem in your hosts file that I described above. The hosts file you posted maps 127.0.0.1 AND your public IP to quickstart.cloudera. You should remove quickstart and quickstart.cloudera from the 127.0.0.1 line and have only your public IP map to that (as shown below). You'll need to restart all services after you make this change. 127.0.0.1 localhost localhost.localdomain quickstart.cloudera quickstart
... View more
05-20-2016
01:51 PM
The VirtualBox Guest additions are installed in the VM which should enable drag & drop of files, but perhaps it's having issues with the size of the files? SSH should also be running so scp is another option, as is a Shared Folder. You'll need to get the file to be visible from the VM's filesystem, perhaps unzip them at that point, and then you can use 'hadoop fs -copyFromLocal' to put them in HDFS.
... View more
05-02-2016
02:43 PM
When you try to stop a service, it will warn you which services depend on it if they are running. If you try to start a service, it will warn you which services it depends on if they are not running. I believe Zookeeper, HDFS, and YARN are the only other services you need to run for Spark, HBase, and Hive.
... View more
04-29-2016
07:04 AM
I don't have a ton of experience with Llama, but I think the misunderstanding here is that Impala manages the execution of its own queries, and the MapReduce framework manages the execution of Hive queries. YARN manages resources for individual MapReduce jobs, and it can manage the Impala daemons via Llama. The YARN application for Llama will run as long as Impala does - that's by design to keep the latency of Impala queries very low. In the case of Hive, YARN will manage the job's resources only until that job (a single query) is finished. Not sure why your Hive queries would not be running. If this is in the QuickStart VM, my first guess would be that if Llama is still running and there aren't enough executors / slots for your Hive queries. YARN in the QuickStart VM is not going to be configured with a lot of capacity and it's not tested with Llama. I know of no other way to manage Impala resources via YARN, though.
... View more
04-13-2016
11:08 AM
1 Kudo
The problem is the DataNode service is not running. You can start it with 'sudo service hadoop-hfds-datanode restart', but it's possible other services are now having issues because it's down, so the easiest thing to do is usually to just reboot. If you continue to have issues, check the logs in /var/log/hadoop-hdfs for more information.
... View more
04-13-2016
07:40 AM
1 Kudo
If you're in the QuickStart VM, it sounds like the browser you're talking about it is looking at the native Linux filesystem. You can find the file in this filesystem at /opt/examples/log_files/access.log.2 (or something like that). The Hive Warehouse directory is in HDFS, which is a separate filesystem.
... View more
04-13-2016
07:21 AM
1 Kudo
The 2 tables that are created are called 'intermediate_access_logs' and 'tokenized_access_logs' when shown in Hive or Impala. The intermediate_access_logs table is backed by the raw 'original_access_logs' file which is copied into HDFS. If you want to view it as a table, it should still be queryable in Hive at the end of the tutorial. The underlying data should still be in /user/hive/warehouse/original_access_logs in HDFS or /opt/examples/log_files/ on your local filesystem.
... View more
04-11-2016
07:51 AM
1 Kudo
Looks like the YARN Resource Manager process is not running. I would restart it with: 'sudo service hadoop-yarn-resourcemanager restart'. If you continue to have issues, other services may have failed to come up as a result of this or as a result of the same root cause. The easiest way to restart everything in order on the VM is to simply reboot. If you have sufficient memory for the VM, running on of the Cloudera Manager options on the desktop makes it a lot easier to see the health of all the services, etc. You might also want to look at the log files in /var/log/hadoop-yarn to see what kinds of exceptions are being thrown as the service dies.
... View more
04-11-2016
07:09 AM
I apologize for the confusion - the service got a bit backed up over the weekend because of too many people abandoning clusters mid-deployment improperly. I've cleared out everything that looks abandoned so it should work better now. Note that access codes can't be reused, however, so if you deleted your previous stack you'll need to register for a new access code to try again.
... View more
03-30-2016
06:47 AM
Once you're ssh'd in as ec2-user, you can run 'sudo su' to switch to root in your current shell (there are many other ways to use sudo and su -to do things as other users - it's worth reading up about them if you're not familiar).
... View more
03-24-2016
01:26 PM
Good to know. The difference is just the device name the virtual NIC gets added as in the guest OS. eth0 gets used when I try this out in VMware too - not sure why the difference in this case. But the script could be made a bit more flexible to handle this and similar scenarios.
... View more
03-23-2016
03:54 PM
1 Kudo
If your IP will reliably be 192.168.1.125, I would just comment-out the lines you show there from /etc/init.d/cloudera-quickstart-init, and I would edit /etc/hosts to be the following: 127.0.0.1 localhost localhost.domain 192.168.1.125 quickstart.cloudera quickstart Upon reboot, all the services should pick up and use the new IP. For the welcome page and the tutorial to use the new IP (I don't think this is necessary, it won't functionally change anything that I can think of), you can also edit /var/lib/cloudera-quickstart/tutorial/js/config.js. The 2 parts to edit are the value for manager_node_ip and worker_nodes_ip (although note that worker_nodes_ip is a list, with a single element.
... View more
03-23-2016
07:51 AM
1 Kudo
I think if you edit /usr/bin/cloudera-quickstart-ip you can work around this easily. On line 24 we set the device we're looking for, DEV, to eth0. Your networking device got added as eth1. When it fails to find eth0, it's falling back to the loopback device so things at least work internally to the VM. So if you edit that variable in your VM to eth1 and reboot, I would expect this to work better for you. I'll expand what devices the script looks for in the next release. More generally, you can edit /etc/hosts and the networking configuration however you want and remove the networking configuration from /etc/init.d/cloudera-quickstart-init (in version 5.5, this is lines 39-42). As long as quickstart.cloudera resolves to a valid IP and reverse lookup of that IP gives you quickstart.cloudera, everything else should work - you'll just need to restart all the services once it's the IP you want. The only tricky thing is if your hypervisor wouldn't give you the same IP every time you booted - then you need a script like cloudera-quickstart-ip to try and determine which IP you got before the services start, and edit the right files accordingly.
... View more
03-16-2016
12:27 PM
One possibility to have in mind is memory issues. The VM is a very compact environment, and it only gets tested with fairly small demo datasets. If you've loaded other data into HBase prior to trying to access it via Phoenix, you might need to do some tweaking of memory configuration in HBase or add more memory to the VM to get it to work as reliably as it ordinarily would.
... View more
03-16-2016
12:25 PM
Your best bet to figure out why it's failing is to check the log for the RegionServer role. Click on the HBase service and down the left hand side you'll see the RegionServer. You'll want to open that, go to the "Processes" tab, and the click "See Role Log Details". Most recent messages will be at the bottom, and my guess is the error should be in the last few entries. (I might be missing a link or tab or something in that navigation - hopefully this is clear enough for you to find it!)
... View more
03-16-2016
08:11 AM
Can you check that Zookeeper (and HDFS and HBase, for that matter) are running in Cloudera Manager? Port 2181 is ZooKeeper and it seems like it's not able to connect to that. Because running every service requires quite a lot of memory for a VM, when you migrate to Cloudera Manager or switch to parcels, it won't start every service for you. If you go to Cloudera Manager and log in, the home screen should show a table of all the services in the cluster. Make sure ZooKeeper, HDFS and HBase are marked with a green dot. Otherwise, they may need to started or restarted. If they're marked with a question mark, usually that means one of the "Management Services" (really, these are just parts of CM represented as separate services) need to be restarted.
... View more