Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Roadmap for Hive future

avatar

We are now using Hive in fairly standard ways, with one wrinkle... Our data is binary (protobuf) so we have writen a SerDe to handle this. I am wondering about the future roadmap for Hive within the Cloudera umbrella.

 

Impala is one route, but it does not support SerDe plugins, as far as I know. What is Cloudera's position on Shark and Stinger, which are explicitly designed as Hive improvements?

 

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Collaborator
Hey Chuck,

You are correct that some Hive users will prefer to take advantage of
Impala (or Shark); my point only is that those solutions were not designed
to displace Hive.

CDH 5 (currently in beta) will ship with Hive 0.12, which contains all
Stinger code that has gone upstream.

View solution in original post

4 REPLIES 4

avatar
Master Collaborator

Hi Chuck,

 

My observations:

 

1. First, keep in mind that Impala and Hive have different use cases. Impala offers the low latency and high concurrency that analysts doing BI-style queries are going to expect. In contrast, Hive/MR is still more appropriate for batch-oriented processing.

 

2. Based on #1, it stands to reason that any and all improvements to Hive are good news insofar as they help users with those workloads. To that end, Cloudera employs Hive committers, actively contributes code to Hive (e.g., HiveServer 2), and provides complementary infrastructure (e.g., the incubating Apache Sentry project for RBAC, which is built for both Hive and Impala and which we hope is embraced by the entire ecosystem).

 

3. Shark (which is a Hive port actually, not an "improvement" to Hive) is another example of having the right tool for the right job. I think most would agree with the premise that Shark is generally used for complex analytics/iterative machine learning, not "mainstream" BI.

avatar

Thanks for your reply. I disagree somewhat with your reasoning. Many Hive users put up with batch processing and slow response times because they have no other choice, when what they really want is faster results. So Impala and Shark *are* seen by many Hive users as hoped-for improvements.

 

What is Cloudera's plan for Stinger, which is from your competitor HortonWorks, but is explicitly a project to improve Hive? Are you accepting Stinger code changes into future releases of Hive within CDH?

 

Thank you,

Chuck Connell

Nuance Communications

avatar
Master Collaborator
Hey Chuck,

You are correct that some Hive users will prefer to take advantage of
Impala (or Shark); my point only is that those solutions were not designed
to displace Hive.

CDH 5 (currently in beta) will ship with Hive 0.12, which contains all
Stinger code that has gone upstream.

avatar

Thanks!