Member since
07-12-2013
435
Posts
117
Kudos Received
82
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2314 | 11-02-2016 11:02 AM | |
3601 | 10-05-2016 01:58 PM | |
8241 | 09-07-2016 08:32 AM | |
8844 | 09-07-2016 08:27 AM | |
2495 | 08-23-2016 08:35 AM |
06-21-2016
08:36 AM
VirtualBox has the ability to take snapshots of VMs that you can restore to at a later date.
... View more
06-20-2016
03:40 PM
The QuickStart VM includes a tutorial that will walk you through a use case where you: - ingest some data into HDFS from a relational database using Sqoop, and query it with Impala - ingest some data into HDFS from a batch of log files, ETL it with Hive, and query it with Impala - ingest some data into HDFS from a live stream of logs and index it for searching with Solr - perform link strength analysis on the data using Spark - build a dashboard in Hue - if Hue run the scripts to migrate to Cloudera Enterprise, also audit access to the data and visualize it's lineage That sounds like it will cover most of what you're looking for.
... View more
06-04-2016
07:49 AM
Thank you Sean, but the link that you have provided are returning this message: The core node you are trying to access was not found, it may have been deleted. Please refresh your original page and try the operation again. EDIT, now i have noticed that there is an extra ")".
... View more
06-01-2016
09:56 AM
intermediate_access_logs was created as part of the ETL process in the tutorial. That process is done via Hive because it uses Hive SerDe's and other Hive-only features. The final table created in that process (tokenized_access_logs, if I remember correctly) is the one you should be able to query in Impala. Also, don't forget to 'invalidate metadata' when the ETL process is finished, since Impala doesn't cache metadata.
... View more
05-21-2016
12:02 PM
sorry, mistake on my side. The error I got is different, was caused by having loaded examples, then run the exercise 1, triggered by existing directory in HDFS.
... View more
05-20-2016
01:51 PM
The VirtualBox Guest additions are installed in the VM which should enable drag & drop of files, but perhaps it's having issues with the size of the files? SSH should also be running so scp is another option, as is a Shared Folder. You'll need to get the file to be visible from the VM's filesystem, perhaps unzip them at that point, and then you can use 'hadoop fs -copyFromLocal' to put them in HDFS.
... View more
04-29-2016
07:04 AM
I don't have a ton of experience with Llama, but I think the misunderstanding here is that Impala manages the execution of its own queries, and the MapReduce framework manages the execution of Hive queries. YARN manages resources for individual MapReduce jobs, and it can manage the Impala daemons via Llama. The YARN application for Llama will run as long as Impala does - that's by design to keep the latency of Impala queries very low. In the case of Hive, YARN will manage the job's resources only until that job (a single query) is finished. Not sure why your Hive queries would not be running. If this is in the QuickStart VM, my first guess would be that if Llama is still running and there aren't enough executors / slots for your Hive queries. YARN in the QuickStart VM is not going to be configured with a lot of capacity and it's not tested with Llama. I know of no other way to manage Impala resources via YARN, though.
... View more
04-13-2016
07:40 AM
1 Kudo
If you're in the QuickStart VM, it sounds like the browser you're talking about it is looking at the native Linux filesystem. You can find the file in this filesystem at /opt/examples/log_files/access.log.2 (or something like that). The Hive Warehouse directory is in HDFS, which is a separate filesystem.
... View more
04-11-2016
09:15 AM
Sorry, Please ignore.. I got the email now
... View more