Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

What is the recommended way to import/export data with focus on performance, scalability & maintenance

New Contributor

We are planning to improve and standardize our connectivity to Hive on ingest and export side. Performance, parallelism, scalability, sustainable maintainability and support boundaries are attributes we need to address.

Currently, we see the following approaches:

a) Get file path information via ThriftClient/Metastore, but accessing files direct via HDFS

b) use a JDBC driver

For both approaches, we see different pros and cons and we would like to learn what is your preferred and suggested way for Hive connectivity?

Is there an official suggestion or any best practices on that?

We are looking for a sustainable approach which also interact well with Hive 2.1 and later Hive 3, and which uses Hive onboard security mechanisms.

1 REPLY 1

@Sebastian Fröhlich

I suggest based Hive View 2 based on my experience with Views. Hive View 2 is much secured, sustainable and durable as it has more functionality. For better understand, you can refer below URL's:

https://hortonworks.com/blog/3-great-reasons-to-try-hive-view-2-0/

https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-views/content/ch_using_hive_view....

Hope this helps you.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.