Created 12-21-2015 07:22 PM
We are currently trying to use the phoenix csv bulk loader mapreduce tool. It is taking about a hour and a half for a 170 GB csv. The map is usally done quickly but the reduce seems to be taking much longer than it should. I am believe the fact we are utilizing a 1 Gb is a contributing factor to this. We have some old 10 Gb infiniband equipment laying around and I was considering trying to implement this as the backbone of HDFS and MapReduce. I have come across two articles mentioning multihoming, neither of which I believe gives me enough detail to solve this problem. Any documentation or direction is greatly appreciated.
Created 12-24-2015 05:08 PM
Have to explored and tried changing the memory setting for reducer and the number of reducers?
I am totally in agreement with moving to 10G but just wondering if there is an opportunity to improve the performance with current setup. In the recent past, working with a prospect on a POC, we were able to ingest 600GB file in about 30 mins on a small 4 node cluster. (64GB RAM, 10GiBE, other tuning done at the app/service level). Not sure how big this cluster is and what is the hardware spec though.
Created 12-21-2015 07:22 PM
Created 12-24-2015 02:15 PM
this would make an excellent wiki post
Created 12-24-2015 05:08 PM
Have to explored and tried changing the memory setting for reducer and the number of reducers?
I am totally in agreement with moving to 10G but just wondering if there is an opportunity to improve the performance with current setup. In the recent past, working with a prospect on a POC, we were able to ingest 600GB file in about 30 mins on a small 4 node cluster. (64GB RAM, 10GiBE, other tuning done at the app/service level). Not sure how big this cluster is and what is the hardware spec though.
Created 12-28-2015 07:29 PM
Sorry I haven't responded yet been out of the office with the holidays. From what I can tell the reduce memory is set to 5GB. I am unsure about the number of reduces. We have an 8 node cluster each node has 16 cores and 192 GB of RAM.