Member since
07-12-2013
435
Posts
117
Kudos Received
82
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2008 | 11-02-2016 11:02 AM | |
3095 | 10-05-2016 01:58 PM | |
7724 | 09-07-2016 08:32 AM | |
8174 | 09-07-2016 08:27 AM | |
2101 | 08-23-2016 08:35 AM |
06-02-2016
12:31 PM
Also, note that there's a script that tries to detect a public IP and set up the hosts file for you on boot. If you're going to edit it manually, you probably want to comment out the line in /etc/init.d/cloudera-quickstart-init that calls /usr/bin/cloudera-quickstart-ip. I don't remember which version that was added in. It might have been 5.5 - so if your VM doesn't have /usr/bin/cloudera-quickstart-ip you can ignore this post and safely edit the hosts file anyway.
... View more
06-01-2016
09:56 AM
intermediate_access_logs was created as part of the ETL process in the tutorial. That process is done via Hive because it uses Hive SerDe's and other Hive-only features. The final table created in that process (tokenized_access_logs, if I remember correctly) is the one you should be able to query in Impala. Also, don't forget to 'invalidate metadata' when the ETL process is finished, since Impala doesn't cache metadata.
... View more
06-01-2016
09:53 AM
I don't know much about Spark internals to give much intelligent advice here, but it's possible it's a matter of resources. You still have the problem in your hosts file that I described above. The hosts file you posted maps 127.0.0.1 AND your public IP to quickstart.cloudera. You should remove quickstart and quickstart.cloudera from the 127.0.0.1 line and have only your public IP map to that (as shown below). You'll need to restart all services after you make this change. 127.0.0.1 localhost localhost.localdomain quickstart.cloudera quickstart
... View more
05-20-2016
01:51 PM
The VirtualBox Guest additions are installed in the VM which should enable drag & drop of files, but perhaps it's having issues with the size of the files? SSH should also be running so scp is another option, as is a Shared Folder. You'll need to get the file to be visible from the VM's filesystem, perhaps unzip them at that point, and then you can use 'hadoop fs -copyFromLocal' to put them in HDFS.
... View more
05-02-2016
02:43 PM
When you try to stop a service, it will warn you which services depend on it if they are running. If you try to start a service, it will warn you which services it depends on if they are not running. I believe Zookeeper, HDFS, and YARN are the only other services you need to run for Spark, HBase, and Hive.
... View more
04-29-2016
07:04 AM
I don't have a ton of experience with Llama, but I think the misunderstanding here is that Impala manages the execution of its own queries, and the MapReduce framework manages the execution of Hive queries. YARN manages resources for individual MapReduce jobs, and it can manage the Impala daemons via Llama. The YARN application for Llama will run as long as Impala does - that's by design to keep the latency of Impala queries very low. In the case of Hive, YARN will manage the job's resources only until that job (a single query) is finished. Not sure why your Hive queries would not be running. If this is in the QuickStart VM, my first guess would be that if Llama is still running and there aren't enough executors / slots for your Hive queries. YARN in the QuickStart VM is not going to be configured with a lot of capacity and it's not tested with Llama. I know of no other way to manage Impala resources via YARN, though.
... View more
04-13-2016
07:40 AM
1 Kudo
If you're in the QuickStart VM, it sounds like the browser you're talking about it is looking at the native Linux filesystem. You can find the file in this filesystem at /opt/examples/log_files/access.log.2 (or something like that). The Hive Warehouse directory is in HDFS, which is a separate filesystem.
... View more
04-13-2016
07:21 AM
1 Kudo
The 2 tables that are created are called 'intermediate_access_logs' and 'tokenized_access_logs' when shown in Hive or Impala. The intermediate_access_logs table is backed by the raw 'original_access_logs' file which is copied into HDFS. If you want to view it as a table, it should still be queryable in Hive at the end of the tutorial. The underlying data should still be in /user/hive/warehouse/original_access_logs in HDFS or /opt/examples/log_files/ on your local filesystem.
... View more
04-11-2016
07:51 AM
1 Kudo
Looks like the YARN Resource Manager process is not running. I would restart it with: 'sudo service hadoop-yarn-resourcemanager restart'. If you continue to have issues, other services may have failed to come up as a result of this or as a result of the same root cause. The easiest way to restart everything in order on the VM is to simply reboot. If you have sufficient memory for the VM, running on of the Cloudera Manager options on the desktop makes it a lot easier to see the health of all the services, etc. You might also want to look at the log files in /var/log/hadoop-yarn to see what kinds of exceptions are being thrown as the service dies.
... View more
04-11-2016
07:09 AM
I apologize for the confusion - the service got a bit backed up over the weekend because of too many people abandoning clusters mid-deployment improperly. I've cleared out everything that looks abandoned so it should work better now. Note that access codes can't be reused, however, so if you deleted your previous stack you'll need to register for a new access code to try again.
... View more