Reply
New Contributor
Posts: 1
Registered: ‎12-20-2017

Upgrading CDH to use Hive 1.2.0 or higher

Hello,

 

please advise is it possible to upgrade hive to 1.2.0 or higher on Cloudera cluster?

 

Thanks,

Maxim

New Contributor
Posts: 3
Registered: ‎12-29-2017

Re: Upgrading CDH to use Hive 1.2.0 or higher

Hi, everybody!

I'm new to cloudera, but i'm significantly surprised that cloudera's hadoop distribution doesn't support versions of hive later than 1.1.0.

Very many changes were done since this version, that affects performance, support of SQL commands (UNION inspite of UNION ALL) and etc.

Maybe there is something that can be used insted of Hive to store and manipulate data in SQL-way?

I'm asking because i can't believe that Cloudera can't include latest version of Hive, and i think that some other solution is used for this purposes.

 

Best regards, Daniil.

Cloudera Employee
Posts: 375
Registered: ‎03-23-2015

Re: Upgrading CDH to use Hive 1.2.0 or higher

Hi Maxim,

Short answer is NO. You can't just upgrade Hive or any component version independently, the reason being that there are dependencies between each components and they have been designed to work together in the same version of CDH. Upgrading one component will break such dependencies and cause issues in the cluster.

What specific JIRA are you looking for in Hive 1.2? Chances are that it might already included in CDH version of Hive. Please see the response to Daniil below for further details.
Cloudera Employee
Posts: 375
Registered: ‎03-23-2015

Re: Upgrading CDH to use Hive 1.2.0 or higher

Hi Daniil,

Even though that Hive only comes with version 1.1 in latest CDH version. There are lots of upstream JIRAs in higher version of Hive have already been included in CDH version of Hive 1.1. The version number between upstream Hive and CDH Hive is not compatible, meaning they contain different code base. This is the same for all other component, not just Hive.

The version number for Hive hive-1.1.0+cdh5.13.1+1283 means the CDH Hive is based on upstream Hive version 1.1 + another 1283 JIRAs committed to this version from upstream that is not available in upstream version of Hive 1.1.

For example, if you look at this page:
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed_in_513.html#cdh_...

It lists out the list of JIRAs fixed in CDH 5.13.1. Even though the Hive version is still 1.1 in this version of CDH, we have also included JIRAs available in Hive 2.0 or 3.0, like below Hive JIRAs:

https://issues.apache.org/jira/browse/HIVE-12742
https://issues.apache.org/jira/browse/HIVE-16758
https://issues.apache.org/jira/browse/HIVE-16784

etc.

Hope above makes sense.

Cheers
New Contributor
Posts: 3
Registered: ‎12-29-2017

Re: Upgrading CDH to use Hive 1.2.0 or higher

Hi, EricL.

I'v read about the things you'v written before. And i hope that all innovations and optimizations done by hive developers are applied in hive distributed with cloudera.

 

But still.

select * from clickstream_csv
union
select * FROM clickstream_bad
LIMIT 100;

returns

Error while compiling statement: FAILED: ParseException line 3:0 missing ALL at 'select' near '<EOF>'

So. The union statement can not be used. For sure.

 

And this makes me to doubt about inclusion of changes done by Hive developers since version 1.1

 

My current cloudera distribution is 5.12.1.

 

With best regards, Daniil.

Cloudera Employee
Posts: 31
Registered: ‎11-20-2015

Re: Upgrading CDH to use Hive 1.2.0 or higher

Daniil,

 

Hive 1.1 (CDH 5.4+) only offers UNION ALL (bag union), in which duplicate rows are not eliminated.  Starting with Hive 1.2, the UNION DISTINCT feature was introduced and if no UNION type was explictly specified, the default UNION operation is DISTINCT.  However, with the introduction of this new UNION DISTINCT capability came some other subtle changes to how the UNION ALL feature worked.  We are unable to introduce those changes into CDH 5 for risk of affecting existing workloads.  It will be available in CDH 6.

 

In CDH 5, there is only support UNION is UNION ALL.  If it fulfils your business requirements, please include the ALL statement.

 

 

select * from clickstream_csv
UNION ALL
select * FROM clickstream_bad
LIMIT 100;

 

You may then pass it through a DISTINCT clause to achieve the same affect.

 

select distinct(salary) from (
select salary from sample_07
union ALL
select salary from sample_08) z;

 

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union

 

Thanks.

New Contributor
Posts: 3
Registered: ‎12-29-2017

Re: Upgrading CDH to use Hive 1.2.0 or higher

Hello, David.

 

Thanks for your response.

I knew the difference between UNION and UNION ALL, and how to eliminate duplicates using UNION ALL statement combined with DISTINCT statement.

 

New thing for me is that you are going to use newer version in CDH 6. That's good. Looking forward to it.

 

Can i check somewhere CDH release roadmap?

 

Thanks.

Cloudera Employee
Posts: 31
Registered: ‎11-20-2015

Re: Upgrading CDH to use Hive 1.2.0 or higher

Daniil,

 

We do not have a publicly available roadmap for CDH 6 yet.  And while nothing is final until it final, I think it's safe to say that we will be upgrading to at least Hive 1.2, which includes this requested feature.

 

Thanks.

New Contributor
Posts: 2
Registered: ‎05-09-2018

Re: Upgrading CDH to use Hive 1.2.0 or higher

Need Hive 2 for ACID

New Contributor
Posts: 1
Registered: ‎06-21-2018

Re: Upgrading CDH to use Hive 1.2.0 or higher

[ Edited ]

I have a similar problem using my data modeling tool (erwin). CDH 5.15 is unable to process queries generated by erwin which supports Hive 2.1. When I attempt to reverse engineer my MySQL metastore, the following errors are returned:

 

--This file is published using to trace the exected SQL 
--ERwin RE SQL Trace for Hive 2.1.x started on 2018-06-21 10:23:42

--[2018-06-21 10:23:42] Model DBMSMetastoreMySQLVersion
--[2018-06-21 10:23:43] Database error: [Cloudera][SQLEngine] (31740) Table or view not found: HIVE..VERSION
--[2018-06-21 10:23:43] Database error: [Cloudera][SQLEngine] (31740) Table or view not found: HIVE..VERSION
--[2018-06-21 10:23:43] Entity ObjectsMetastoreMySQL
--[2018-06-21 10:23:43] Entity DatabaseFilterMetastoreMySQL
--[2018-06-21 10:23:43] Entity TableFilterMetastoreMySQL
--[2018-06-21 10:23:44] Database error: [Cloudera][SQLEngine] (31740) Table or view not found: HIVE..TBLS
--[2018-06-21 10:23:44] Database error: [Cloudera][SQLEngine] (31740) Table or view not found: HIVE..TBLS
--[2018-06-21 10:23:44] Model DBMSMetastoreMySQLVersion
--[2018-06-21 10:23:45] Database error: [Cloudera][SQLEngine] (31740) Table or view not found: HIVE..VERSION
--[2018-06-21 10:23:45] Database error: [Cloudera][SQLEngine] (31740) Table or view not found: HIVE..VERSION
--ERwin RE SQL Trace for Hive 2.1.x ended on 2018-06-21 10:23:45

 

When will Cloudera support the required queries so that I can successfully reverse and forward engineer to my MySQL metastore database using erwin?

 

***NEIL***

Announcements