Reply
Explorer
Posts: 36
Registered: ‎02-01-2017

Spark streaming issue file not found when trying to load data in path from temporary location

[ Edited ]

Here is the code inside the streamset

 

val dfOlDDatta = hivecontext.sql(s"select * from $db where part_id=$partId")
            dfOlDDatta.persist()
            /*.withColumn("testDate2", unix_timestamp(col("timeStamp"), "yyyy MMM dd HH:mm:ss").cast("String")).
            withColumn("testDate3", from_unixtime(col("testDate2"), "yyyy-MM-dd HH:mm:ss").cast("String"))
            .withColumn("dt_skey1", from_unixtime(col("testDate2"), "yyyyMMdd").cast("Int"))
            .withColumn("date2", from_unixtime(col("testDate2"), "yyyy-MM-dd HH:mm:ss").cast("String"))*/
            val unionDf = dfTransformed.unionAll((dfOlDDatta))
            //maybe change append to an overwrite
            println("writing to hdfs  new partition")
            unionDf.coalesce(1).write.mode("overwrite").partitionBy("part_id").parquet(tempHdfsLocation)
            val fullPartitionString = "part_id=" + partId
            //hivecontext.sql(s"load data inpath '$tempHdfsLocation/$fullPartitionString' overwrite into table $db partition ($fullPartitionString)")
            unionDf.coalesce(1).write.mode("overwrite").partitionBy("part_id").parquet(explLocation)
            dfOlDDatta.unpersist()

This is all within a spark streaming context.

 

This fails after one run through the spark streaming context .

 

Always after one run.

 

sparkFileNotFoundErrorEdited.png

 

sparkFileNotFoundErrorEdited.png

 

 

 

 

 

 

 

 

error in short

 

17/11/30 15:24:01 ERROR datasources.InsertIntoHadoopFsRelation: Aborting job.
java.io.FileNotFoundException: File does not exist:

 

 I am using spark 1.6 and cdh 5.7.2

 

Any suggestions greatly appreciated.

 

Announcements