Accepted Solution

Issue with handle kafka data into hive table in hdfs

I just used streaming to handle kafka data, and write it into hive table in hdfs.The hive table was partitioned by month/day which the time in the kafka data and I compute the month/day with it as below:

val sdf=new SimpleDateFormat("yyyy_MM")
val sdf1=new SimpleDateFormat("dd")
val adjustTime = data.getLong(12)
val month = sdf.format(new Date(adjustTime))
val day = sdf1.format(new Date(adjustTime))

 and I used repartition when I parse the data


when I checked the data in hdfs I found the problems:

1.The partitioned day appeared like month="2018_09" day ="31",the problem is that Sep could not have 31th. the someday partition, the data is not belog to this day, just like the adjusttime is 2018-09-30,but the data in partition 2018_09_30 have more data with other time like 2018_03_08 and ... The data in it is not correct.


So I will be appreciated for it if any suggestions or ideas to solve these problems. Thanks~


Re: Issue with handle kafka data into hive table in hdfs

The Problem was solved because of the thread unsafe SimpleDateFormat method,replaced the method with

val fdf = FastDateFormat.getInstance("yyyy_MM", tz

 checked the data is fine now .