This is more of a general structure question but here it is. How and where do analytics people fit into hadoop environment? These people usually work with analysis stuff using Excel. Hive looks like an option but again are these people expected to know SQL querying sort of thing?
Also,let's say hive part is what they manage. How would graphical reports be generated? Do we need to use R for that?
R is again a programming thing. Hive does generate charts for queries executed but these can not really be downloaded, right? What are other good alternatives for this?
As usual in Hadoop there is a whole basket of possibilities. Let's start with the simplest ones:
Hive is a SQL database. It has pretty standard JDBC and ODBC drivers a SQL syntax that is not more non-ANSI than Oracle and most BI tools support it. So you could use it like any other database as frontend. If you want to use other tools for the analytics you could export the data as Hive tables after the analysis.
Essentially there are three levels of sophistication.
- Direct access to cluster using, pig/hive/spark and any tools they want to use like zeppelin
Computer Savy users
- Access to Hive through JDBC/ODBC they can use excel or any other bi tool ( millions out there )
- Export aggregated datasets, prepared Excel sheets or Tableau reports or end user application.
Examples of tools I have seen:
- Excel ( ODBC )
- Tableau ( very nice to use and you can export a couple GB of aggregated hive results ( CTAS ) to have very fast interactive analytics )
- BIRT ( powerful and free )
- R Studio ( perhaps with jdbc dataframes or something like SparkR )
- and so on and so forth ( Microstrategy, Spotfire, Cognos ... )
Other tools that are nice to use for more interactive analytics are
- Zeppelin ( tech preview in HDP ) an interactive workbench mainly focused on Spark powerful but a bit more complex than the above.
and much much more.