question Re: Fetch distinct values of a column in Dataframe using Spark in Support Questions

question Re: Fetch distinct values of a column in Dataframe using Spark in Support Questions https://community.cloudera.com/t5/Support-Questions/Fetch-distinct-values-of-a-column-in-Dataframe-using-Spark/m-p/174636#M136899 Yes i have 1 fat jar that has all the dependencies. The thing is when i use collect() in my below code it works on yarn-cluster. But using collect() removes parallelism on further operations, so i don want to use collect.Without using collect() statement i get "java.lang.NoClassDefFoundError" exception as mentioned above. The below code on local mode works fine without collect statement.Please help me understand this behavior.<PRE> preProcessedDataFrame.registerTempTable("tApplication") preProcessedDataFrame.select(ApplicationId).distinct().collect().foreach(record => { val applicationId = record.getAs[String](ApplicationId) val selectedApplicationDataFrame = sqlContext.sql("SELECT * FROM tApplication WHERE ApplicationId = " + applicationId) logger.info("selectedApplicationId: " + applicationId) //DO FURTHER PROCESSSING on selectedApplicationDataFrame..... }) </PRE> Tue, 16 Aug 2016 01:51:37 GMT kaz_narasimhan 2016-08-16T01:51:37Z