Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Date column in hive table showing as 31/12/1969 19:00:00 instead of NULL

Date column in hive table showing as 31/12/1969 19:00:00 instead of NULL

New Contributor

Hi Team,

 

I have been facing an issue in hive table creation of date column.

 

First step we sqoop the data from the oracle DB and storing it in the HDFS as parquet file. (Some of the date column value will be "null")

Second step after sqoop completion I ran our spark transformation code and store the data as hive tables. In that the date column value stored as like this "31/12/1969 19:00:00" instead of null

 

After Spark transformation completed we created hive tables for that entity and the date column value showing as 31/12/1969 19:00:00 instead of null

private static String format 				= "dd/MM/yyyy HH:mm:ss";
	
#initialization
DataFrame initParquetDf = sqlContext.read().parquet(SQOOP_PATH.concat("initiative/XGL/initiative_v"));

JavaRDD<Initiative> initiative = initParquetDf.javaRDD().map(y ->  Initiative.builder()
				.initiativeCode(Long.parseLong(y.getString(0)))
				.iniPct1RecorededDate(Optional.ofNullable(y.getLong(19)).map(s -> {
                    try {
                        return new Timestamp(s);
                    }
                    catch (Exception e){
                        return null;
                    }
                }).orElse(null))

#filter
JavaRDD<Initiative> initiativeRDD = initiative.filter(x -> x.getDelInd().equals("N"));

#Create dataframe	   
DataFrame initiativeDF  = sqlContext.createDataFrame(initiativeRDD, Initiative.class);

#joining the entities	   
DataFrame initiativeJoin = initiativeDF
                .join(dictEntityDF.as("df1"), col("df1.ogrdsEntityCode").equalTo(initiativeDF.col("categoryCode")))
                .join(dictEntityDF.as("df2"), initiativeDF.col("brandExtrnCode").equalTo(col("df2.ogrdsEntityCode")));

#write data to hdfs as parquet				
DataFrame nPubinitiative = initiativeJoin.select(initiativeDF.col("initiativeCode").as("ini_code"),
	date_format(initiativeDF.col("iniPct1RecorededDate"), format).as("ini_pct_1_recorded_date"));
	nPubinitiative.write().parquet(OUTPUT_PATH.concat("initiative/pub_initiatives"));

Please correct me if am doing any thing wrong in the transformation.

 

Let me know if need any further informations.

 

 

Regards,

Ganeshbabu R