Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Who agreed with this solution

avatar
Guru

It looks like you are executing one insert per row. I see you have 900byte files in HDFS. Most likely reason for your error is two workflows going in parallel trying to insert into the table.

Even if you get the insert flow right, 900 byte files in HDFS will create performance hit for hive and NN overload for HDFS. You should try to change your oozie workflow. You should consider microbatching or streaming in your data into HDFS/Hive.

You could write your audits to nfs share with shell action. Every few minutes you can load all your audits from that folder into HDFS. This is an example of microbatching strategy.

You can also try putting JMS messages and use storm-jms to direct them to HDFS. This is a streaming approach.

View solution in original post

Who agreed with this solution