We are planning to improve and standardize our connectivity to Hive on ingest and export side. Performance, parallelism, scalability, sustainable maintainability and support boundaries are attributes we need to address.
Currently, we see the following approaches:
a) Get file path information via ThriftClient/Metastore, but accessing files direct via HDFS
b) use a JDBC driver
For both approaches, we see different pros and cons and we would like to learn what is your preferred and suggested way for Hive connectivity?
Is there an official suggestion or any best practices on that?
We are looking for a sustainable approach which also interact well with Hive 2.1 and later Hive 3, and which uses Hive onboard security mechanisms.
I suggest based Hive View 2 based on my experience with Views. Hive View 2 is much secured, sustainable and durable as it has more functionality. For better understand, you can refer below URL's:
Hope this helps you.