Created on 09-01-201601:53 PM - edited 08-17-201910:35 AM
Self Service Hadoop
– well some starting points
Say you want to get started with Big Data
and concurrently want to start to empower your relatively savvy end users that
have been in the frustrating desktop data management land for a long time. This very simple article will hopefully help
with a few options.
This diagram below gives a little perspective
around some of the sources and mechanisms around ingesting, manipulating and
using your data assets. You can see
following the numbers from 1 to 6 that you have many options with working with
your data. This article will concentrate
on just showing a simple end user example of ingesting data, understanding it
and using options for using the data from a self service approach. Really an approach to get started and help promote
some of the value of a modern data architecture as your team matures.
An end users want
to self service some data from their desktop/server into HDFS and be able to
query and understand that data from their existing tools as well as work with
it in conjunction with what tech staff is ingesting using other vehicles. This quick example will show how to use the
Ambari Hive view to upload data, provide some structure, and create a Hive table
that can be used by many available tools.
Will also give a very brief starting thought around how Atlas can be
used to help organize and track the what, where, how, etc. around your assets.
1.Go to Ambari Hive view – right side of Ambari dashboard on the top
lists the views when you click on the table looking icon. ( There are
also other views for HDFS file view, Zeppelin, etc.)
Here is the Where you select Ambari View
2.Once in the Ambari view, you can click on the upload table tab.
This is what the Ambari View looks like,
lot of options here, some more tech focused, but very functional.
3.Within that tab you can select a CSV, with or without headers, from
local storage or HDFS.
4.Then you can change the column names and/or types if necessary.
5.Then you create the hive table, in the Hive database you want.
This is the Table tab where I selected a
CSV (geolocation) from my hard drive, it had headers
6.Once the Hive table is created you can use any third party tool
(tableau), ambari hive view, excel, zeppelin, etc. to work with the table.
Here is the Hive table geolocation
(stored in ORC format) in default Hive Database queried in hive view
7.Ok, one more detail that may help you. Once the geolocation
table is created from the Hive View upload, there is no reason why you cannot
go out and tie it into a taxonomy in Atlas, tag columns, add details, see
lineage, etc. Few screen prints to give perspective. This is a
larger topic, but will help locate, organize, secure, and track data assets for
the team.
Bottom part of atlas screen.
A good understanding of the latest Atlas
release can be found in the Hadoop Summit presentations listed below.