Support Questions

brian_ramsel · ‎12-21-2015

We are currently trying to use the phoenix csv bulk loader mapreduce tool. It is taking about a hour and a half for a 170 GB csv. The map is usally done quickly but the reduce seems to be taking much longer than it should. I am believe the fact we are utilizing a 1 Gb is a contributing factor to this. We have some old 10 Gb infiniband equipment laying around and I was considering trying to implement this as the backbone of HDFS and MapReduce. I have come across two articles mentioning multihoming, neither of which I believe gives me enough detail to solve this problem. Any documentation or direction is greatly appreciated.

bsaini · ‎12-24-2015

@Brian Ramsel

Have to explored and tried changing the memory setting for reducer and the number of reducers?

I am totally in agreement with moving to 10G but just wondering if there is an opportunity to improve the performance with current setup. In the recent past, working with a prospect on a POC, we were able to ingest 600GB file in about 30 mins on a small 4 node cluster. (64GB RAM, 10GiBE, other tuning done at the app/service level). Not sure how big this cluster is and what is the hardware spec though.

View solution in original post

nsabharwal · ‎12-21-2015

@jeff @Paul Codding

aervits · ‎12-24-2015

this would make an excellent wiki post

bsaini · ‎12-24-2015

@Brian Ramsel

Have to explored and tried changing the memory setting for reducer and the number of reducers?

I am totally in agreement with moving to 10G but just wondering if there is an opportunity to improve the performance with current setup. In the recent past, working with a prospect on a POC, we were able to ingest 600GB file in about 30 mins on a small 4 node cluster. (64GB RAM, 10GiBE, other tuning done at the app/service level). Not sure how big this cluster is and what is the hardware spec though.

brian_ramsel · ‎12-28-2015

Sorry I haven't responded yet been out of the office with the holidays. From what I can tell the reduce memory is set to 5GB. I am unsure about the number of reduces. We have an 8 node cluster each node has 16 cores and 192 GB of RAM.

Cloudera Community

Support Questions

How to utilize infiniband backbone during MapReduce.

Comparison : Kudu Copy Command vs Spark backup uti...

Utilization Report - Cloudera Platform

Custom Utilization Report

NiFi high jvm heap utilization on primary node

Resolution of Failed Knox Gateway Start During CDP...

Ambari Admin Utility - Part 1

Yarn Queue Utilization - Ambari Widget

Reading ORC files using Mapreduce

Apache Nifi Release 2.0 M1 & M2 High CPU Utilizati...

Yarn memory allocation & utilization