- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Migrating SQL or PLSQL language to Pig and Hive
- Labels:
-
Apache Hive
-
Apache Pig
Created ‎11-05-2015 05:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can someone provide a rough order of magnitude/level-of-effort for converting or migrating PLSQL language (5,000-10,000 lines of code need to be migrated to Pig and Hive)?
Created ‎11-05-2015 05:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may want to look into this
It's not easy task to estimate this ...I would say 1 month (Just picked up random number)
Created ‎11-05-2015 05:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may want to look into this
It's not easy task to estimate this ...I would say 1 month (Just picked up random number)
Created ‎02-24-2016 11:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is also HPL/SQL.
It was incorporated into the Hive 2.0 release (Feb 16th) and thus will be automatically included in the next release of HDP that includes Hive 2.0.
It can be very easily added to an existing HDP install as well.
It is an implementation of stored procs, control flow logic, temporary variables, error checking, on-the-fly SQL syntax conversion to HiveQL, RDBMS SQL syntax compatibility, etc. It brings almost all the functionality of DB2, SQL Server, Oracle stored procs and SQL syntax into Hive. An added bonus is that you can mix & match the syntax of SQL Server T-SQL, Oracle PLSQL and DB2. It is slick.
For example - here is a sample script that runs in HPL/SQL (for real):
create procedure spLoadSalesFact (LoadMonth char(10)) BEGIN SET plhql.onerror = seterror; print 'Beginning execution...'; print '*' || LoadMonth || '*'; /* Fail if invalid Input Parameter set LoadMonth = Upper(LoadMonth); if (LoadMonth not in ('LAST', 'CURRENT')) BEGIN print 'Bad Input Paramter'; RETURN -1; END */ /* Load Prior Month */ IF (LoadMonth = 'LAST') BEGIN print 'Inserting Last Month''s Data...'; insert into table saleslineitem select * from saleslineitem where TransactionDate >= '2011-06-01' and transactionDate < '2011-07-01'; if SQLCODE <> 0 RETURN -2; END /* Load Current Month */ ELSE IF (LoadMonth = 'CURRENT') BEGIN print 'Inserting Current month''s data...'; insert into table saleslineitem select * from saleslineitem where TransactionDate >= '2011-07-01' and transactionDate < '2011-08-01'; if SQLCODE <> 0 RETURN -3; END PRINT 'Execution Completed...'; END /* Stored Proc invocation */ -- CALL spLoadSalesFact ('GARBAGE'); -- CALL spLoadSalesFact ('LAST'); CALL spLoadSalesFact ('CURRENT');
Created ‎10-23-2017 03:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are there any performance improvements of not creating custom UDF function ? I have currently some text-analytics functions, for which I can either create UDF jar files or use HPLSQL.@bpreachuk Are there any performance issues that you faced on using HPLSQL ?
Created ‎10-23-2017 03:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @tanmoy. I would recommend you go the custom route - create UDF jars for Hive. That way you can use them anywhere you use Hive code. HPL/SQL is still not a GA feature, and in my opinion HPL/SQL use a little bit more time to "bake" and become more robust.
