Support Questions
Find answers, ask questions, and share your expertise

puthdfs is writing slow

Expert Contributor

I am getting data using gethdfs and did some processing and writing back to hdfs.

untill puthdfs data is processing fast, but puthdfs is writing data slow to hdfs.

could you please let me know how to improve the speed?

1 ACCEPTED SOLUTION

Accepted Solutions

Master Guru
@Hadoop User

Please share your PutHDFS processor configuration with us.

How large are the individual files that are being written to HDFS?

Thanks,

Matt

View solution in original post

5 REPLIES 5

Master Guru
@Hadoop User

Please share your PutHDFS processor configuration with us.

How large are the individual files that are being written to HDFS?

Thanks,

Matt

View solution in original post

Expert Contributor

@Matt Clarke

a file is only 1-2 kb file.

configuration is

28383-puthdfs-config.jpg

concurrent tasks; 1, rest are not changed.

Thank you

Master Guru

@Hadoop User

It is unlikely you will see the same performance out of Hadoop between reads and writes. The Hadoop Architecture is designed in such a way to favor multiple many readers and few data writers.

Increasing the number of concurrent tasks may help but performance since you will then have multiple files being written concurrently.

1 - 2 KB files are very small and do not make optimal use of your Hadoop architecture. Commonly, NiFi is used to merge bundles of files together to a more optimal size for storage in Hadoop. I believe 64 KB is the default optimal size.

You can remove some of the overhead of each connection by mergeing files together in to larger files using the MergeContent processor before writing to Hadoop.

Thanks,

Matt

Expert Contributor

@Matt Clarke

Thank you.

I have merged files depending on the frequency of writes to 64 KB.

Sorry for turning late.

Master Guru
@Hadoop User

If merging FlowFiles and adding more concurrent tasks to your putHDFS processor help with your performance issue here, please take a moment to click "accept" on the above answer to close out this thread.

Thank you,

Matt