Support Questions

Stewart12586 · ‎06-18-2016

Hi experts,

There exists any complete tutorial for Hadoop in Cloudera Environment that demonstrates how to use HDFS , Pig , Hive and Spark ?

I have seen a lot of guides but do not correspond to practical cases and I have had some difficulties to develop a solution ... I am very new to Hadoop ecosystem .

I need to deliver a prototype of a Hadoop solution at the end of July and I'm getting frightened with the constant difficulties and doubts that I have felt .

I only want to use that components to do some data cleansing and transformation.

I already download this virtual machine to use Spark:

http://www.cloudera.com/downloads/quickstart_vms/5-7.html

Can anyone help me ?

Many thanks 🙂

Sean · ‎06-20-2016

The QuickStart VM includes a tutorial that will walk you through a use case
where you:

- ingest some data into HDFS from a relational database using Sqoop, and
query it with Impala
- ingest some data into HDFS from a batch of log files, ETL it with Hive,
and query it with Impala
- ingest some data into HDFS from a live stream of logs and index it for
searching with Solr
- perform link strength analysis on the data using Spark
- build a dashboard in Hue
- if Hue run the scripts to migrate to Cloudera Enterprise, also audit
access to the data and visualize it's lineage

That sounds like it will cover most of what you're looking for.

View solution in original post

Sean · ‎06-20-2016

The QuickStart VM includes a tutorial that will walk you through a use case
where you:

- ingest some data into HDFS from a relational database using Sqoop, and
query it with Impala
- ingest some data into HDFS from a batch of log files, ETL it with Hive,
and query it with Impala
- ingest some data into HDFS from a live stream of logs and index it for
searching with Solr
- perform link strength analysis on the data using Spark
- build a dashboard in Hue
- if Hue run the scripts to migrate to Cloudera Enterprise, also audit
access to the data and visualize it's lineage

That sounds like it will cover most of what you're looking for.

Cloudera Community

Support Questions

Real Pratical Tutorial for Hadoop using HDFS, Hive, Pig and Spark

US Presidential Election: tweet analysis using HDF...

Tutorial using Hive CLI, enforce queues for TEZ

Real-time Twitter Dashboard using Cloudera Data Pl...

Tutorial: Using IPython Notebook with Apache Spark...

Writing parquet on HDFS using Spark Streaming

Using Spark to Virtually Integrate Hadoop with Ext...

Using Pig to convert uncompressed data to compress...

real time issues in hive and spark

Using Toad for Hadoop with HDP 2.4

Setting up a Hadoop/Spark cluster with Docker on a...