New Contributor
Posts: 5
Registered: ‎11-01-2016

spark dataframe write to file using scala

[ Edited ]

I am trying to read a file and add two extra columns. 1. Seq no and 2. filename. When I run spark job in scala IDE output is generated correctly but when I run in putty with local or cluster mode job is stucks at stage-2 (save at File_Process). There is no progress even i wait for an hour. I am testing on 1GB data.

Below is the code i am using

object File_Process
 val spark = SparkSession
 def main(arg:Array[String])
  val FileDF =
  val rdd = FileDF.rdd.zipWithIndex().map(indexedRow => Row.fromSeq((indexedRow._2.toLong+SEED+1)+:indexedRow._1.toSeq))
  val FileDFWithSeqNo = StructType(Array(StructField("UniqueRowIdentifier",LongType)).++(FileDF.schema.fields))
  val datasetnew = spark.createDataFrame(rdd,FileDFWithSeqNo)
  val dataframefinal = datasetnew.withColumn("Filetag", lit(filename))
  val query = dataframefinal.write
              .option("delimiter", "|")


 output path is cluster location /data/text_file/. This folder is created by spark job when stage 2 starts and I can see temporary files created ex: /data/text_file/_temporary/0/_temporary/attempt_201704260541‌​02_0002_m_000000_0 and attempt_20170426054102_0002_m_000000_0 file is 0 kb


I am using Spark 2.1.0 in CDH 5.10.1


I am using spark-submit --deploy-mode cluster --class "File_Process" ~/File_Process.jar command to run spark job


Thanks in advance.

L Raghunath.