Support Questions
Find answers, ask questions, and share your expertise

While inferring Schema from CSV file using Spark . It is taking date column as String??

Highlighted

While inferring Schema from CSV file using Spark . It is taking date column as String??

New Contributor

Hi , i am trying to infer schema from a CSV file, date column is taking as String, How can i infer date datatype with the date format in column. As we have so many date formats. task to infer dateformat as per in file column instead of giving manually everytime.

Could anyone help me on this

i have sample code, which may be helpful.

package scala_programs

import java.text.ParseException

import java.text.SimpleDateFormat

object DateFormatParser {

private val formats: Array[String] = Array( "yyyy-MM-dd'T'HH:mm:ss'Z'", "yyyy-MM-dd'T'HH:mm:ssZ", "yyyy-MM-dd'T'HH:mm:ss", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", "yyyy-MM-dd'T'HH:mm:ss.SSSZ", "yyyy-MM-dd HH:mm:ss", "MM/dd/yyyy HH:mm:ss", "MM/dd/yyyy'T'HH:mm:ss.SSS'Z'", "MM/dd/yyyy'T'HH:mm:ss.SSSZ", "MM/dd/yyyy'T'HH:mm:ss.SSS", "MM/dd/yyyy'T'HH:mm:ssZ", "MM/dd/yyyy'T'HH:mm:ss", "yyyy:MM:dd HH:mm:ss", "yyyyddMM", "dd-MM-yyyy", "yyyy-MM-dd", "dd-MM-yyyy", "dd.MM.yyyy", "dd/MM/yyyy", "dd:MM:yyyy", "MM-dd-yyyy", "MM.dd.yyyy", "MM/dd/yyyy", "MM:dd:yyyy", "yyyy-MM-dd", "yyyy.MM.dd", "yyyy:MM:dd", "yyyy/MM/dd", "yyyyddMM", "ddMMyyyy", "yyyyMMdd" )

def main(args: Array[String]): Unit = { val yyyyMMdd: String = "12-09-2017" parse(yyyyMMdd) }

def parse(d: String): Unit = { if (d != null) { for (parse <- formats) { val sdf: SimpleDateFormat = new SimpleDateFormat(parse) try {

      sdf.parse(d)
      println("Date formats" +parse)
    } catch {
      case e: ParseException => {
      }

    }
  }
}

}

}

1 REPLY 1

Re: While inferring Schema from CSV file using Spark . It is taking date column as String??

Contributor

@Shashidhar Janne

You can use Java's util.Date then

val format = new java.text.SimpleDateFormat("yyyy-MM-dd") format.parse("2013-07-06")

and good to see https://github.com/nscala-time/nscala-time/blob/master/README.md