Support Questions

Find answers, ask questions, and share your expertise

Issue with handle kafka data into hive table in hdfs

avatar
Contributor

I just used streaming to handle kafka data, and write it into hive table in hdfs.The hive table was partitioned by month/day which the time in the kafka data and I compute the month/day with it as below:

val sdf=new SimpleDateFormat("yyyy_MM")
val sdf1=new SimpleDateFormat("dd")
val adjustTime = data.getLong(12)
val month = sdf.format(new Date(adjustTime))
val day = sdf1.format(new Date(adjustTime))

 and I used repartition when I parse the data

repartition($"month",$"day").write.mode(SaveMode.Append).partitionBy("month","day")

when I checked the data in hdfs I found the problems:

1.The partitioned day appeared like month="2018_09" day ="31",the problem is that Sep could not have 31th.

2.in the someday partition, the data is not belog to this day, just like the adjusttime is 2018-09-30,but the data in partition 2018_09_30 have more data with other time like 2018_03_08 and ... The data in it is not correct.

 

So I will be appreciated for it if any suggestions or ideas to solve these problems. Thanks~

 

1 ACCEPTED SOLUTION

avatar
Contributor

The Problem was solved because of the thread unsafe SimpleDateFormat method,replaced the method with

val fdf = FastDateFormat.getInstance("yyyy_MM", tz

 checked the data is fine now .

View solution in original post

1 REPLY 1

avatar
Contributor

The Problem was solved because of the thread unsafe SimpleDateFormat method,replaced the method with

val fdf = FastDateFormat.getInstance("yyyy_MM", tz

 checked the data is fine now .