Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to print accumulator in yarn-cluster mode ?

How to print accumulator in yarn-cluster mode ?

New Contributor

Hi All,

I have below program that calculates the count of "ERROR" in log files. At the end, its value is printed in console. When program runs in yarn-client it displays the accumulator correct value 509 in console but when it run in yarn-cluster mode, no such value displays. Please help me out how to print it in yarn-cluster mode as well.

object ErrorLogsCount{
  def main(args:Array[String]){
    val sc = new SparkContext();    
    val logsRDD = sc.textFile(args(0),4)
    val errorsAcc = sc.accumulator(0,"Errors Accumulator")
    val errorsLogRDD = logsRDD.filter(x => x.contains("ERROR"))
    errorsLogRDD.persist()
    errorsLogRDD.foreach(x => errorsAcc += 1)
    errorsLogRDD.collect()

    //printing accumulator
    println(errorsAcc.name+" = "+errorsAcc)

    //Saving results in HDFS
    errorsLogRDD.coalesce(1).saveAsTextFile(args(1))
  }

Trying to run in HDP Sandbox 2.4 (Spark 1.6.0)

1 REPLY 1
Highlighted

Re: How to print accumulator in yarn-cluster mode ?

New Contributor

The reason why it was not printing in yarn-cluster mode is when a spark application is running in yarn cluster mode, driver is running in one of the nodes of cluster rather than in client shell. This is the reason, its console output can be seen in log file of respective node. If yarn.log-aggregation-enable is true in yarn-site.xml, logs can be seen using

yarn logs -applicationId [application_id]

If property is set as false, they can be viewed at logs location set in yarn-site.xml. Property is

yarn.nodemanager.log-dirs

In my case, logs aggregation was enabled so I could see the accumulator value printed in application logs file.