question Fetch distinct values of a column in Dataframe using Spark in Support Questions

question Fetch distinct values of a column in Dataframe using Spark in Support Questions https://community.cloudera.com/t5/Support-Questions/Fetch-distinct-values-of-a-column-in-Dataframe-using-Spark/m-p/174630#M136893 I am working on Spark 1.6.1 version and have a requirement to fetch distinct results of a column using Spark DataFrames. The column contains ~50 million records and doing a collect() operation slows down further operation on the result dataframe and there is No parallelism. Using the below piece of code on a local mode works fine. But on a yarn-cluster mode i get "java.lang.NoClassDefFoundError".<PRE>preProcessedDataFrame.registerTempTable("tTempTable") preProcessedDataFrame.distinct().foreach(record => { val applicationId = record.getAs[Int]("ApplicationId") val selectedApplicationDataFrame = sqlContext.sql("SELECT * FROM tTempTable WHERE ApplicationId = " + applicationId) selectedApplicationDataFrame.show(20) //FURTHER DO SOME MORE CALC BASED ON EACH APPLICATION-ID })</PRE>Can someone tell me the reason for the error or any other better approach to achieve the same result. Mon, 15 Aug 2016 09:35:53 GMT kaz_narasimhan 2016-08-15T09:35:53Z