Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the fastest way to load data into Apache Hive ACID Tables?

Solved Go to solution

What is the fastest way to load data into Apache Hive ACID Tables?

Super Guru

1. A special utility?

2. NiFI: Load Table to ORC with PutHDFS, PutHiveQL Merge with ACID Table

3. SQOOP?

4. NiFI: PutHiveStreaming

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge

5. NiFi: To Druid, Insert into hive acid table from table ontop of Druid

6. NiFi to HBase, Hive table on hbase insert into Hive Acid Table

7. Some SnappyData in memory pattern?

8. IBM BigSQL?

9. Attunity?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: What is the fastest way to load data into Apache Hive ACID Tables?

Hi @Timothy Spann the recommended approach is Attunity ---> Kafka --->Nifi---->Hive--->Merge. If you want 100% open source than sqoop the data to a staging area and run merge to get the deltas.

7 REPLIES 7

Re: What is the fastest way to load data into Apache Hive ACID Tables?

@Timothy Spann

I would go with either attunity & or some utility/framework which can be modified depending on the use case. These kind of frameworks reduces time and effort. Multiple tables can be processed in parallel with less effort.

Re: What is the fastest way to load data into Apache Hive ACID Tables?

Super Guru

that would definitely work, but they are not open source and not free.

Any suggestions for open source?

Re: What is the fastest way to load data into Apache Hive ACID Tables?

@Timothy Spann

If open source is given importance then I would go with Hive using merge, though I haven't tried with merge with huge volume I believe that it would work decent.

Highlighted

Re: What is the fastest way to load data into Apache Hive ACID Tables?

Super Guru

Thanks! Merge seems to be recommended by a few sources.

Re: What is the fastest way to load data into Apache Hive ACID Tables?

Hi @Timothy Spann the recommended approach is Attunity ---> Kafka --->Nifi---->Hive--->Merge. If you want 100% open source than sqoop the data to a staging area and run merge to get the deltas.

Re: What is the fastest way to load data into Apache Hive ACID Tables?

Additionally the sqoop/merge process is easily automated using Workflow Manager.

Re: What is the fastest way to load data into Apache Hive ACID Tables?

Super Guru

does attunity work with CSV, JSON, XML and other files?

Don't have an account?
Coming from Hortonworks? Activate your account here