About Vitor

Vitor · ‎12-16-2015

I tried, but nothing changed. Where should I configure the collector interval. I don't need a "real-time" monitoring. Maybe 1 measurement per minute is enough.

Vitor · ‎12-16-2015

Here is my config files for ambari-metrics-collector ambari-metrics-collector.tar.gz

Vitor · ‎12-16-2015

Since we have 4 vCores, the %CPU sum up to 400.

Vitor · ‎12-16-2015

You did not answered my question completly. I'm the only user on cluster. There's no reason for 100% cpu.

Vitor · ‎12-16-2015

Indeed. I found this "internal HBase" version. Each node of cluster has 32 GB of RAM and 4 vCores. All of them are virtualized on top of very good hardware.

Vitor · ‎12-16-2015

I added this line to ambari.properties and restarted ambari-server, but the 100% cpu behavior stands. Se the screenshot in comment below.

Vitor · ‎12-16-2015

I have a HDP 2.3.0 cluster with 4 nodes. I noticed this process was consuming 100% cpu on my NameNode: ams 5386 223 5.0 3596120 1666616 ? Sl 13:50 46:56 /usr/jdk64/jdk1.8.0_40/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -Xmx1536m -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/opt/var/lib/ambari-metrics-collector/hbase-tmp -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-201512161350 -Xms1536m -Xmx1536m -Xmn256m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase-ams-master-NPAA1809.petrobras.biz.log -Dhbase.home.dir=/usr/lib/ams-hbase/bin/.. -Dhbase.id.str=ams -Dhbase.root.logger=INFO,RFA -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.master.HMaster start But my cluster does not have Hbase installed. Then I simply killed this process, but Ambari metrics went down together. After I restarted Ambari metrics, this process continues trying to executing and always consuming a lot of CPU. Here is a print of Ambari Metrics durint time of my actions. How can I configure AMS to stop trying to monitor HBase?

Vitor · ‎12-15-2015

From my own experience, Spark runs much faster in standalone mode. I tried a variety of configurations on Yarn, but I can't get same performance. I'll try to upgrade. Is there a guide to standalone mode?

Vitor · ‎12-14-2015

I'm trying to calculate an averange of randomForest predictions in Spark 1.3.1, since the predicted probability of all trees is available only in 1.5.0. The best I could do until now is using the function below: def calculaProbs(dados, modelRF): trees = modelRF._java_model.trees() nTrees = modelRF.numTrees() nPontos = dados.count() predictions = np.zeros(nPontos) for i in range(nTrees): dtm = DecisionTreeModel(trees[i]) predictions+= dtm.predict(dados.map(lambda x: x.features)).collect() predictions = predictions/nTrees return predictions This code is running very slow, as expected, since I'm collecting (collect()) predictions from each Tree and adding them up in Driver. I cannot put the dtm.predit() inside a Map operation in this version of Spark. Here is the Note from documentation: "Note: In Python, predict cannot currently be used within an RDD transformation or action. Call predict directly on the RDD instead." Any Idea to improve performance? How can I add values from 2 RDDs without collecting their values to a vector?

Vitor · ‎12-10-2015

before calling this routine, I introduced the code bellow and exec time reduced to 1m8s. 3x improvement. rawTrainData = rawTrainData.repartition(8) rawTrainData.cache() But introducing numPartitions=15 inside distinct method does not affect the result. I'm running Spark 1.3.1 into standalone mode (spark://host:7077) with 12 cores and 20 GB per node allocated to Spark. The hardware is virtual, but I know it`s a top hardware. The cluster has 4 nodes (3 spark workers)

Online	Offline
Last Visited	‎05-29-2015 10:36 AM

Member Since	‎11-18-2014 10:43 AM
Last Visited	‎05-29-2015 10:36 AM
Posts	20
Kudos received	10

Cloudera Community

Re: Ambari metrics using 100% cpu

Re: Ambari metrics using 100% cpu

Re: Ambari metrics using 100% cpu

Re: Ambari metrics using 100% cpu

Re: Ambari metrics using 100% cpu

Re: Ambari metrics using 100% cpu

Ambari metrics using 100% cpu

Re: Averaging RandomForest votes in Spark 1.3.1

Averaging RandomForest votes in Spark 1.3.1

Re: Best way to select distinct values from multip...