Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Real Pratical Tutorial for Hadoop using HDFS, Hive, Pig and Spark

avatar
Rising Star

Hi experts,

There exists any complete tutorial for Hadoop in Cloudera Environment that demonstrates how to use HDFS , Pig , Hive and Spark ?

I have seen a lot of guides but do not correspond to practical cases and I have had some difficulties to develop a solution ... I am very new to Hadoop ecosystem .

I need to deliver a prototype of a Hadoop solution at the end of July and I'm getting frightened with the constant difficulties and doubts that I have felt .

I only want to use that components to do some data cleansing and transformation.

I already download this virtual machine to use Spark:

http://www.cloudera.com/downloads/quickstart_vms/5-7.html


Can anyone help me ?

Many thanks 🙂

1 ACCEPTED SOLUTION

avatar
Guru
The QuickStart VM includes a tutorial that will walk you through a use case
where you:

- ingest some data into HDFS from a relational database using Sqoop, and
query it with Impala
- ingest some data into HDFS from a batch of log files, ETL it with Hive,
and query it with Impala
- ingest some data into HDFS from a live stream of logs and index it for
searching with Solr
- perform link strength analysis on the data using Spark
- build a dashboard in Hue
- if Hue run the scripts to migrate to Cloudera Enterprise, also audit
access to the data and visualize it's lineage

That sounds like it will cover most of what you're looking for.

View solution in original post

1 REPLY 1

avatar
Guru
The QuickStart VM includes a tutorial that will walk you through a use case
where you:

- ingest some data into HDFS from a relational database using Sqoop, and
query it with Impala
- ingest some data into HDFS from a batch of log files, ETL it with Hive,
and query it with Impala
- ingest some data into HDFS from a live stream of logs and index it for
searching with Solr
- perform link strength analysis on the data using Spark
- build a dashboard in Hue
- if Hue run the scripts to migrate to Cloudera Enterprise, also audit
access to the data and visualize it's lineage

That sounds like it will cover most of what you're looking for.