About Sean

Sean · ‎06-21-2016

VirtualBox has the ability to take snapshots of VMs that you can restore to at a later date.

Sean · ‎06-20-2016

The QuickStart VM includes a tutorial that will walk you through a use case where you: - ingest some data into HDFS from a relational database using Sqoop, and query it with Impala - ingest some data into HDFS from a batch of log files, ETL it with Hive, and query it with Impala - ingest some data into HDFS from a live stream of logs and index it for searching with Solr - perform link strength analysis on the data using Spark - build a dashboard in Hue - if Hue run the scripts to migrate to Cloudera Enterprise, also audit access to the data and visualize it's lineage That sounds like it will cover most of what you're looking for.

aironMan · ‎06-04-2016

Thank you Sean, but the link that you have provided are returning this message: The core node you are trying to access was not found, it may have been deleted. Please refresh your original page and try the operation again. EDIT, now i have noticed that there is an extra ")".

Sean · ‎06-01-2016

intermediate_access_logs was created as part of the ETL process in the tutorial. That process is done via Hive because it uses Hive SerDe's and other Hive-only features. The final table created in that process (tokenized_access_logs, if I remember correctly) is the one you should be able to query in Impala. Also, don't forget to 'invalidate metadata' when the ETL process is finished, since Impala doesn't cache metadata.

LeeHadoop · ‎05-21-2016

sorry, mistake on my side. The error I got is different, was caused by having loaded examples, then run the exercise 1, triggered by existing directory in HDFS.

Sean · ‎05-20-2016

The VirtualBox Guest additions are installed in the VM which should enable drag & drop of files, but perhaps it's having issues with the size of the files? SSH should also be running so scp is another option, as is a Shared Folder. You'll need to get the file to be visible from the VM's filesystem, perhaps unzip them at that point, and then you can use 'hadoop fs -copyFromLocal' to put them in HDFS.

SRoy · ‎05-10-2016

I ended up increasing the memory on my iMac to 32GB.

Sean · ‎04-29-2016

I don't have a ton of experience with Llama, but I think the misunderstanding here is that Impala manages the execution of its own queries, and the MapReduce framework manages the execution of Hive queries. YARN manages resources for individual MapReduce jobs, and it can manage the Impala daemons via Llama. The YARN application for Llama will run as long as Impala does - that's by design to keep the latency of Impala queries very low. In the case of Hive, YARN will manage the job's resources only until that job (a single query) is finished. Not sure why your Hive queries would not be running. If this is in the QuickStart VM, my first guess would be that if Llama is still running and there aren't enough executors / slots for your Hive queries. YARN in the QuickStart VM is not going to be configured with a lot of capacity and it's not tested with Llama. I know of no other way to manage Impala resources via YARN, though.

Sean · ‎04-13-2016

If you're in the QuickStart VM, it sounds like the browser you're talking about it is looking at the native Linux filesystem. You can find the file in this filesystem at /opt/examples/log_files/access.log.2 (or something like that). The Hive Warehouse directory is in HDFS, which is a separate filesystem.

Devadatttu · ‎04-11-2016

Sorry, Please ignore.. I got the email now

Online	Offline
Last Visited	‎03-17-2016 10:55 PM

Member Since	‎07-12-2013 07:35 AM
Last Visited	‎03-17-2016 10:55 PM
Posts	435
Kudos received	117

Cloudera Community

Re: Quickstart VM welcome page doesn't recognize t...

Re: Hadoop installation on Ubuntu 14.o4

Re: In Cloudera Quickstart VM how to upgrade lates...

Re: Unable to transfer files from Mac Desktop to C...

Re: Cloudera service and host monitoring fails fre...

Re: Quckstart VM Cloudera - Hadoop Solution Backup

Re: Real Pratical Tutorial for Hadoop using HDFS, ...

Re: I cannot access programmatically a file within...

Re: Tutorial Exercise 2: select * from intermediat...

Re: Importing sqoop data - cloudera live exercise ...

Re: Lad Source Files to HDFS from My local Machine

Re: Effectively using single node cluster

Re: Impala on yarn

Re: Tutorial exercise 2 Route of my tables

Re: Cloudera Live Status shows Create_completed, b...