Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark dataframe write to file using scala


spark dataframe write to file using scala

New Contributor

I am trying to read a file and add two extra columns. 1. Seq no and 2. filename. When I run spark job in scala IDE output is generated correctly but when I run in putty with local or cluster mode job is stucks at stage-2 (save at File_Process). There is no progress even i wait for an hour. I am testing on 1GB data.

Below is the code i am using

object File_Process
 val spark = SparkSession
 def main(arg:Array[String])
  val FileDF =
  val rdd = FileDF.rdd.zipWithIndex().map(indexedRow => Row.fromSeq((indexedRow._2.toLong+SEED+1)+:indexedRow._1.toSeq))
  val FileDFWithSeqNo = StructType(Array(StructField("UniqueRowIdentifier",LongType)).++(FileDF.schema.fields))
  val datasetnew = spark.createDataFrame(rdd,FileDFWithSeqNo)
  val dataframefinal = datasetnew.withColumn("Filetag", lit(filename))
  val query = dataframefinal.write
              .option("delimiter", "|")


 output path is cluster location /data/text_file/. This folder is created by spark job when stage 2 starts and I can see temporary files created ex: /data/text_file/_temporary/0/_temporary/attempt_201704260541‌​02_0002_m_000000_0 and attempt_20170426054102_0002_m_000000_0 file is 0 kb


I am using Spark 2.1.0 in CDH 5.10.1


I am using spark-submit --deploy-mode cluster --class "File_Process" ~/File_Process.jar command to run spark job


Thanks in advance.

L Raghunath.
Don't have an account?
Coming from Hortonworks? Activate your account here