Support Questions
Find answers, ask questions, and share your expertise

How to read input file name in spark data frame and part of it as one of the column ?

How to read input file name in spark data frame and part of it as one of the column ?

Rising Star

This is how i load my csv file in spark data frame

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._
    import org.apache.spark.{ SparkConf, SparkContext }
    import java.sql.{Date, Timestamp}
    import org.apache.spark.sql.Row
    import org.apache.spark.sql.types._
    import org.apache.spark.sql.functions.udf
    val df = sqlContext.read.format("csv").option("header", "true").option("delimiter", "|").option("inferSchema","true").load("s3://MAIN")
    val df1With_ = df.toDF(df.columns.map(_.replace(".", "_")): _*)
    val column_to_keep = df1With_.columns.filter(v => (!v.contains("^") && !v.contains("!") && !v.contains("_c"))).toSeq
    val df1result = df1With_.select(column_to_keep.head, column_to_keep.tail: _*)
    val df1Final=df1result.withColumn("DataPartition", lit(null: String)) 

This is example of one of my input file name .

    Fundamental.FinancialLineItem.FinancialLineItem.SelfSourcedPrivate.CUS.1.2017-09-07-1056.Full 

Now i want to read this file and split it with "." operator and then add CUS as new column in place of DataPartition .