About sridharbabu1138

qwikbaba · ‎01-07-2020

How did you resolve the issue ?

vvaks · ‎04-20-2016

@Sridhar Babu M Since cores per container are controlled by Yarn configuration, I believe you will need to set the number of executors and the number of cores per executor based on your Yarn configuration to control how many executors and cores get scheduled. So if you set Yarn to allocate 1 core per container and you want two cores for the job then ask for 2 executors with 1 core each from Spark submit. That should give you two containers with 1 executor each. I don't think Yarn will give you an executor with 2 cores if a container can only have 1 core. But if you can have 8 cores per container then you can have 8 executors with 1 core or 4 executors with 2 cores per container. Of course, you can continue to add executors as long as you your Yarn queue has capacity for more containers. # Run on a YARN cluster ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 2G --num-executors 2 --executor-cores 1 /path/to/examples.jar

ccasano · ‎04-13-2016

Hi Babu - It's more of a common approach to write out a new file. HDFS is essentially an append only system so creating a new file that's a derivative of the original is a very common practice. You can write a MR program to output a file or use a Hive query to output a query results to a new file. For example, INSERT OVERWRITE DIRECTORY '/user/me/output' SELECT UPPER(myColumn) FROM myTable. This would create a new file(s) with a modified change that's like an update. In this case, we're upper casing the 'myColumn' in the myTable table.

bkosaraju · ‎05-25-2017

Hi @Sridhar Babu, Apparently there is an issue with library in compatable with2.11:1.3.0 and 2.11:1.4.0 please use verison com.databricks:spark-csv_2.10:1.4.0

vvaks · ‎03-30-2016

@Sridhar Babu M You can see the details of what Spark is doing by clicking on the application master in Resource Manager UI. When you click on the application master link for the Spark job in Resource Manager UI it will take you to the Spark UI and show you the job in detail. You may just have to make sure that the Spark History Server is running in Ambari or the page may come up blank. If you actually need to change the value in the file then you will need to export the resulting Data Frame to file. The save function that is part of DF class creates a files for each partition. If you need a single file you convert back to an RDD and use coalesce(1) to get everything down to a single partition so you get one file. Make sure that you add the dependency in Zeppelin %dep z.load("com.databricks:spark-csv_2.10:1.4.0") or spark-shell --packages com.databricks:spark-csv_2.10:1.4.0 import org.apache.spark.sql.SQLContext import org.apache.spark.sql.SaveMode case class Person(name: String, age: Int) var personRDD = sc.textFile("/user/spark/people.txt") var personDF = personRDD.map(x=>x.split(",")).map(x=>Person(x(0),(x(1).trim.toInt))).toDF() personDF.registerTempTable("people") var personeDF = sqlContext.sql("SELECT * FROM people") var agedPerson = personDF.map(x=>if(x.getAs[String]("name")=="Justin"){Person(x.getAs[String]("name"), x.getAs[Int]("age")+2)}else{Person(x.getAs[String]("name"), x.getAs[Int]("age"))}).toDF() agedPerson.registerTempTable("people") var agedPeopleDF = sqlContext.sql("SELECT * FROM people") agedPeopleDF.show agedPeopleDF.select("name", "age").write.format("com.databricks.spark.csv").mode(SaveMode.Overwrite).save("agedPeople") var agedPeopleRDD = agedPeopleDF.rdd agedPeopleRDD.coalesce(1).saveAsTextFile("agedPeopleSingleFile")

vvaks · ‎03-31-2016

@Sridhar Babu M Glad it worked out. Would you mind accepting this answer and the one from the other thread? https://community.hortonworks.com/questions/24518/spark-sql-query-to-modify-values.html

sridharbabu1138 · ‎03-18-2016

Thanks Robert!

aervits · ‎03-10-2016

great you got it resolved.

Online	Offline
Last Visited	‎06-22-2016 06:52 PM

Member Since	‎03-06-2016 01:45 PM
Last Visited	‎06-22-2016 06:52 PM
Posts	49
Kudos received	38

Cloudera Community

Re: Spark - HDFS File access Error

Re: Ambari Host Registration Failed - Unable to in...

Re: Spark Submit Multiple Jobs in Cluster Environm...

Re: Hive Update - how to update a txt file in HDFS...

Re: SPARK SUBMIT - java.lang.NoSuchMethodError

Re: SPARK SQL query to modify values

Re: Spark SQL - Update Command

Re: SQLContext Error - CreateSchemaRDD

Re: Spark - HDFS File access Error