Support Questions

Find answers, ask questions, and share your expertise

Migrating SQL or PLSQL language to Pig and Hive

avatar
New Contributor

Can someone provide a rough order of magnitude/level-of-effort for converting or migrating PLSQL language (5,000-10,000 lines of code need to be migrated to Pig and Hive)?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Steve Shilling

You may want to look into this

It's not easy task to estimate this ...I would say 1 month (Just picked up random number)

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Steve Shilling

You may want to look into this

It's not easy task to estimate this ...I would say 1 month (Just picked up random number)

avatar

There is also HPL/SQL.

http://www.hplsql.org/doc

It was incorporated into the Hive 2.0 release (Feb 16th) and thus will be automatically included in the next release of HDP that includes Hive 2.0.

It can be very easily added to an existing HDP install as well.

It is an implementation of stored procs, control flow logic, temporary variables, error checking, on-the-fly SQL syntax conversion to HiveQL, RDBMS SQL syntax compatibility, etc. It brings almost all the functionality of DB2, SQL Server, Oracle stored procs and SQL syntax into Hive. An added bonus is that you can mix & match the syntax of SQL Server T-SQL, Oracle PLSQL and DB2. It is slick.

For example - here is a sample script that runs in HPL/SQL (for real):

create procedure spLoadSalesFact (LoadMonth char(10)) 
BEGIN
   SET plhql.onerror = seterror; 

   print 'Beginning execution...'; 
   print '*' || LoadMonth || '*'; 

   /* Fail if invalid Input Parameter 
   set LoadMonth = Upper(LoadMonth); 
   if (LoadMonth not in ('LAST', 'CURRENT'))
      BEGIN
      print 'Bad Input Paramter'; 
      RETURN -1;
      END
    */
   /* Load Prior Month */
   IF (LoadMonth = 'LAST') 
      BEGIN
      print 'Inserting Last Month''s Data...'; 
      insert into table saleslineitem 
      select * from saleslineitem 
      where TransactionDate >= '2011-06-01' 
        and transactionDate < '2011-07-01';
      if SQLCODE <> 0 
         RETURN -2;
      END
   /* Load Current Month */
   ELSE IF (LoadMonth = 'CURRENT') 
      BEGIN
      print 'Inserting Current month''s data...'; 
      insert into table saleslineitem 
      select * from saleslineitem 
      where TransactionDate >= '2011-07-01' 
        and transactionDate < '2011-08-01';
      if SQLCODE <> 0 
         RETURN -3;
      END
   PRINT 'Execution Completed...'; 
END
/* Stored Proc invocation */
-- CALL spLoadSalesFact ('GARBAGE'); 
-- CALL spLoadSalesFact ('LAST');
CALL spLoadSalesFact ('CURRENT');   

avatar
Contributor

Are there any performance improvements of not creating custom UDF function ? I have currently some text-analytics functions, for which I can either create UDF jar files or use HPLSQL.@bpreachuk Are there any performance issues that you faced on using HPLSQL ?

avatar

Hi @tanmoy. I would recommend you go the custom route - create UDF jars for Hive. That way you can use them anywhere you use Hive code. HPL/SQL is still not a GA feature, and in my opinion HPL/SQL use a little bit more time to "bake" and become more robust.