Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Real Pratical Tutorial for Hadoop using HDFS, Hive, Pig and Spark

Solved Go to solution
Highlighted

Real Pratical Tutorial for Hadoop using HDFS, Hive, Pig and Spark

Hi experts,

There exists any complete tutorial for Hadoop in Cloudera Environment that demonstrates how to use HDFS , Pig , Hive and Spark ?

I have seen a lot of guides but do not correspond to practical cases and I have had some difficulties to develop a solution ... I am very new to Hadoop ecosystem .

I need to deliver a prototype of a Hadoop solution at the end of July and I'm getting frightened with the constant difficulties and doubts that I have felt .

I only want to use that components to do some data cleansing and transformation.

I already download this virtual machine to use Spark:

http://www.cloudera.com/downloads/quickstart_vms/5-7.html


Can anyone help me ?

Many thanks :)

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Real Pratical Tutorial for Hadoop using HDFS, Hive, Pig and Spark

Master Collaborator
The QuickStart VM includes a tutorial that will walk you through a use case
where you:

- ingest some data into HDFS from a relational database using Sqoop, and
query it with Impala
- ingest some data into HDFS from a batch of log files, ETL it with Hive,
and query it with Impala
- ingest some data into HDFS from a live stream of logs and index it for
searching with Solr
- perform link strength analysis on the data using Spark
- build a dashboard in Hue
- if Hue run the scripts to migrate to Cloudera Enterprise, also audit
access to the data and visualize it's lineage

That sounds like it will cover most of what you're looking for.

View solution in original post

1 REPLY 1

Re: Real Pratical Tutorial for Hadoop using HDFS, Hive, Pig and Spark

Master Collaborator
The QuickStart VM includes a tutorial that will walk you through a use case
where you:

- ingest some data into HDFS from a relational database using Sqoop, and
query it with Impala
- ingest some data into HDFS from a batch of log files, ETL it with Hive,
and query it with Impala
- ingest some data into HDFS from a live stream of logs and index it for
searching with Solr
- perform link strength analysis on the data using Spark
- build a dashboard in Hue
- if Hue run the scripts to migrate to Cloudera Enterprise, also audit
access to the data and visualize it's lineage

That sounds like it will cover most of what you're looking for.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here