Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

puthdfs is writing slow

avatar
Expert Contributor

I am getting data using gethdfs and did some processing and writing back to hdfs.

untill puthdfs data is processing fast, but puthdfs is writing data slow to hdfs.

could you please let me know how to improve the speed?

1 ACCEPTED SOLUTION

avatar
Super Mentor
@Hadoop User

Please share your PutHDFS processor configuration with us.

How large are the individual files that are being written to HDFS?

Thanks,

Matt

View solution in original post

5 REPLIES 5

avatar
Super Mentor
@Hadoop User

Please share your PutHDFS processor configuration with us.

How large are the individual files that are being written to HDFS?

Thanks,

Matt

avatar
Expert Contributor

@Matt Clarke

a file is only 1-2 kb file.

configuration is

28383-puthdfs-config.jpg

concurrent tasks; 1, rest are not changed.

Thank you

avatar
Super Mentor

@Hadoop User

It is unlikely you will see the same performance out of Hadoop between reads and writes. The Hadoop Architecture is designed in such a way to favor multiple many readers and few data writers.

Increasing the number of concurrent tasks may help but performance since you will then have multiple files being written concurrently.

1 - 2 KB files are very small and do not make optimal use of your Hadoop architecture. Commonly, NiFi is used to merge bundles of files together to a more optimal size for storage in Hadoop. I believe 64 KB is the default optimal size.

You can remove some of the overhead of each connection by mergeing files together in to larger files using the MergeContent processor before writing to Hadoop.

Thanks,

Matt

avatar
Expert Contributor

@Matt Clarke

Thank you.

I have merged files depending on the frequency of writes to 64 KB.

Sorry for turning late.

avatar
Super Mentor
@Hadoop User

If merging FlowFiles and adding more concurrent tasks to your putHDFS processor help with your performance issue here, please take a moment to click "accept" on the above answer to close out this thread.

Thank you,

Matt