Support Questions

Find answers, ask questions, and share your expertise

Who agreed with this solution

avatar
Guru

It looks like you are executing one insert per row. I see you have 900byte files in HDFS. Most likely reason for your error is two workflows going in parallel trying to insert into the table.

Even if you get the insert flow right, 900 byte files in HDFS will create performance hit for hive and NN overload for HDFS. You should try to change your oozie workflow. You should consider microbatching or streaming in your data into HDFS/Hive.

You could write your audits to nfs share with shell action. Every few minutes you can load all your audits from that folder into HDFS. This is an example of microbatching strategy.

You can also try putting JMS messages and use storm-jms to direct them to HDFS. This is a streaming approach.

View solution in original post

Who agreed with this solution