Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Integrating from BI tools - Hive vs. Spark

Integrating from BI tools - Hive vs. Spark

Rising Star

When it comes to BI integration (eg. consuming from Cognos/Tableau/Pentaho/SpagoBI), it is quite straightforward to see the similarity between Hive and a RDMBS. As in the old SQL-over-relational-DB times, the reporting engine just issues a query through JDBC/ODBC, and voilá. No question here.

But... which would be an equivalent flow using Spark / SparkSQL? How does it match to BI engine?

For example, suppose you have a data store (any Hadoop flavour like HDFS flat file or Hive or HBase) and a Spark process that grabs the data, creates RDDs from it, creates a dataframe, and then you query the latter using SparkSQL, and producing analytics results. This is not just a single query to a datastore. How do you execute this from the BI engine?



Re: Integrating from BI tools - Hive vs. Spark

@Fernando Lopez Bello

If you are not talking only about something like SparkSQL over Hive context (data stored in hive), your BI needs to be capable to create RDDs, datasets or Dataframes before using SparkSQL. This is how notebooks like Zeppelin, iPhyton or Jupyter work.

Re: Integrating from BI tools - Hive vs. Spark

SparkSQL does provide a JDBC/ODBC interface via Spark Thrift Server. It is part of HDP.

You can connect to STS from any BI client and issue SQL queries.

Don't have an account?
Coming from Hortonworks? Activate your account here