- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Real Pratical Tutorial for Hadoop using HDFS, Hive, Pig and Spark
Created on ‎06-18-2016 04:29 AM - edited ‎09-16-2022 03:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi experts,
There exists any complete tutorial for Hadoop in Cloudera Environment that demonstrates how to use HDFS , Pig , Hive and Spark ?
I have seen a lot of guides but do not correspond to practical cases and I have had some difficulties to develop a solution ... I am very new to Hadoop ecosystem .
I need to deliver a prototype of a Hadoop solution at the end of July and I'm getting frightened with the constant difficulties and doubts that I have felt .
I only want to use that components to do some data cleansing and transformation.
I already download this virtual machine to use Spark:
http://www.cloudera.com/downloads/quickstart_vms/5-7.html
Can anyone help me ?
Many thanks 🙂
Created ‎06-20-2016 03:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
where you:
- ingest some data into HDFS from a relational database using Sqoop, and
query it with Impala
- ingest some data into HDFS from a batch of log files, ETL it with Hive,
and query it with Impala
- ingest some data into HDFS from a live stream of logs and index it for
searching with Solr
- perform link strength analysis on the data using Spark
- build a dashboard in Hue
- if Hue run the scripts to migrate to Cloudera Enterprise, also audit
access to the data and visualize it's lineage
That sounds like it will cover most of what you're looking for.
Created ‎06-20-2016 03:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
where you:
- ingest some data into HDFS from a relational database using Sqoop, and
query it with Impala
- ingest some data into HDFS from a batch of log files, ETL it with Hive,
and query it with Impala
- ingest some data into HDFS from a live stream of logs and index it for
searching with Solr
- perform link strength analysis on the data using Spark
- build a dashboard in Hue
- if Hue run the scripts to migrate to Cloudera Enterprise, also audit
access to the data and visualize it's lineage
That sounds like it will cover most of what you're looking for.
