question Re: Real Pratical Tutorial for Hadoop using HDFS, Hive, Pig and Spark in Archives of Support Questions (Read Only)

Real Pratical Tutorial for Hadoop using HDFS, Hive, Pig and Spark

Stewart12586 — Fri, 16 Sep 2022 10:26:02 GMT

Hi experts,

There exists any complete tutorial for Hadoop in Cloudera Environment that demonstrates how to use HDFS , Pig , Hive and Spark ?

I have seen a lot of guides but do not correspond to practical cases and I have had some difficulties to develop a solution ... I am very new to Hadoop ecosystem .

I need to deliver a prototype of a Hadoop solution at the end of July and I'm getting frightened with the constant difficulties and doubts that I have felt .

I only want to use that components to do some data cleansing and transformation.

I already download this virtual machine to use Spark:

http://www.cloudera.com/downloads/quickstart_vms/5-7.html

Can anyone help me ?

Many thanks 🙂

Re: Real Pratical Tutorial for Hadoop using HDFS, Hive, Pig and Spark

Sean — Mon, 20 Jun 2016 22:40:57 GMT

The QuickStart VM includes a tutorial that will walk you through a use case
where you:

- ingest some data into HDFS from a relational database using Sqoop, and
query it with Impala
- ingest some data into HDFS from a batch of log files, ETL it with Hive,
and query it with Impala
- ingest some data into HDFS from a live stream of logs and index it for
searching with Solr
- perform link strength analysis on the data using Spark
- build a dashboard in Hue
- if Hue run the scripts to migrate to Cloudera Enterprise, also audit
access to the data and visualize it's lineage

That sounds like it will cover most of what you're looking for.