Created on 11-18-2015 05:18 PM - edited 09-16-2022 02:49 AM
Customer has an Access database application and they would like to (a) connect to Hive and extract data, and also (b) take resulting Access DB data and ODBC connect & load it into an existing Hive table from within Microsoft Access (not via Sqoop or via flat file). I am currently installing the driver and will soon test out option (a), but would like to know if option (b) is possible.
Thanks in advance!
Created 11-21-2015 02:47 PM
@bpreachuk Using ODBC driver is the cleanest way to do it.
Access --> RDBMS --> Hive using Sqoop.
Looking forward to see the final update
Kudos to this guy Link
Created 11-18-2015 05:24 PM
Yes, it can.
Source - http://hortonworks.com/wp-content/uploads/2015/10/Hortonworks-Hive-ODBC-Driver-User-Guide.pdf
Snippet - (Page 46)
The Hortonworks Hive ODBC Driver with SQL Connector supports translation for INSERT, UPDATE, and DELETE syntax when connecting to a Hive Server 2 instance that is running Hive 0.14 or later.
Created 11-25-2015 08:30 PM
Yes it can do Inserts, but it turns out that they're RBAR updates. We may try SIMBA drivers to see if they improve the performance (allow batching) but we are in the meantime moving ahead with Inserts via Access -> SQL Server -> Hive via Sqoop.
Created 11-18-2015 05:30 PM
I am not sure if there is direct path from Access to Hive. I have done Access to RDBMS then eventually Hive.
Created 11-18-2015 05:35 PM
@Neeraj Sabharwal Just wondering if you tried and found that the ODBC driver does not work with MsAccess? Usually ODBC driver that works for Excel works for Access.
Created 11-18-2015 05:36 PM
@bsaini It should work but migration from Access DB to Hive is in the question
Created 11-18-2015 05:43 PM
Whats the migration part? Hope I am not over simplifying but it appears to be a simple case of a) Select & b) Insert / update.
Created 11-18-2015 05:46 PM
Created 11-18-2015 05:49 PM
You mean DDL? Yeah, agreed. But OP is asking "load it into an existing Hive table" - so just insert.
Created 11-18-2015 05:55 PM
There is a desire to do it from within Access so we will definitely work that angle first. From a business standpoint it is cleaner that way - the data is processed in Access and once processed it would indeed be a simple Insert into Hive. I'll let you guys know the end result. The Access developer (part time at this client) will be doing the work so it may be a while before it's all done (FYI).
Created 11-18-2015 05:42 PM
Thanks Birender and Neeraj. I was looking into the Access -> SQL Server -> Hive approach since we already have a SQL Server on the box - but if I can get it to work via MS Access directly I will do so. I'll update the ticket here with how it all turns out. I will write up what worked if I am able to easily get the Access -> Hive Insert connection working.
Created 11-18-2015 05:44 PM
Great, looking forward to hearing the results.
Created 11-21-2015 02:47 PM
@bpreachuk Using ODBC driver is the cleanest way to do it.
Access --> RDBMS --> Hive using Sqoop.
Looking forward to see the final update
Kudos to this guy Link
Created 11-25-2015 07:47 PM
Here is an update on this task:
ODBC read from Hive -> Access works fine (not a surprise).
ODBC from Access to Hive runs as RBAR - one row at a time. This is sub-optimal in the RDBMS world, but REALLY crummy in the Hadoop world. This means 1 Hive session per row updated. We were getting about 150 rows updated every 10 minutes.
When sending data form Access to Hive, we will be implementing using Access -> SQL Server -> Sqoop into Hive.
We may try out Simba drivers in the future - more for curiosity - to see if they perform better with updates - or have the ability to batch updates into Hive.
Thanks Neeraj and Birender!
Created 11-25-2015 07:48 PM
Thanks @bpreachuk for updating the thread!!!