Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Use RDD.foreach to Create a Dataframe and execute actions on the Dataframe in Spark scala

Highlighted

Use RDD.foreach to Create a Dataframe and execute actions on the Dataframe in Spark scala

Contributor

Hi All,


I'm trying to read a config file in spark read.textfile which basically contains my tables list. my task is to iterate through the table list and convert Avro to ORC format. please find my below code snippet which will do the logic.

val tableList = spark.read.textFile('tables.txt')

tableList.collect().foreach(tblName => {
	


val df = spark.read.format("avro").load(inputPath+ "/" + tblName)

df.write.format("orc").mode("overwrite").save(outputPath+"/"+tblName)}

)


Please find my configurations below

DriverMemory: 4GB

ExecutorMemory: 10GB

NoOfExecutors: 5


Input DataSize: 45GB


My question here is this will execute in Executor or Driver ? this will throw Out of Memory Error ? please comment your suggestions.


Regards,

MJ

1 REPLY 1

Re: Use RDD.foreach to Create a Dataframe and execute actions on the Dataframe in Spark scala

Community Manager

The above was originally posted in the Community Help track. On Mon May 20 16:14 UTC 2019, the HCC moderation staff moved it to the Data Processing Track. The Community Help Track is intended for questions about using the HCC site itself.

Bill Brooks, Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.