Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the recommended way to import/export data with focus on performance, scalability & maintenance

Highlighted

What is the recommended way to import/export data with focus on performance, scalability & maintenance

New Contributor

We are planning to improve and standardize our connectivity to Hive on ingest and export side. Performance, parallelism, scalability, sustainable maintainability and support boundaries are attributes we need to address.

Currently, we see the following approaches:

a) Get file path information via ThriftClient/Metastore, but accessing files direct via HDFS

b) use a JDBC driver

For both approaches, we see different pros and cons and we would like to learn what is your preferred and suggested way for Hive connectivity?

Is there an official suggestion or any best practices on that?

We are looking for a sustainable approach which also interact well with Hive 2.1 and later Hive 3, and which uses Hive onboard security mechanisms.

1 REPLY 1

Re: What is the recommended way to import/export data with focus on performance, scalability & maintenance

@Sebastian Fröhlich

I suggest based Hive View 2 based on my experience with Views. Hive View 2 is much secured, sustainable and durable as it has more functionality. For better understand, you can refer below URL's:

https://hortonworks.com/blog/3-great-reasons-to-try-hive-view-2-0/

https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-views/content/ch_using_hive_view....

Hope this helps you.