Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Issue with handle kafka data into hive table in hdfs

avatar
Contributor

I just used streaming to handle kafka data, and write it into hive table in hdfs.The hive table was partitioned by month/day which the time in the kafka data and I compute the month/day with it as below:

val sdf=new SimpleDateFormat("yyyy_MM")
val sdf1=new SimpleDateFormat("dd")
val adjustTime = data.getLong(12)
val month = sdf.format(new Date(adjustTime))
val day = sdf1.format(new Date(adjustTime))

 and I used repartition when I parse the data

repartition($"month",$"day").write.mode(SaveMode.Append).partitionBy("month","day")

when I checked the data in hdfs I found the problems:

1.The partitioned day appeared like month="2018_09" day ="31",the problem is that Sep could not have 31th.

2.in the someday partition, the data is not belog to this day, just like the adjusttime is 2018-09-30,but the data in partition 2018_09_30 have more data with other time like 2018_03_08 and ... The data in it is not correct.

 

So I will be appreciated for it if any suggestions or ideas to solve these problems. Thanks~

 

1 ACCEPTED SOLUTION

avatar
Contributor

The Problem was solved because of the thread unsafe SimpleDateFormat method,replaced the method with

val fdf = FastDateFormat.getInstance("yyyy_MM", tz

 checked the data is fine now .

View solution in original post

1 REPLY 1

avatar
Contributor

The Problem was solved because of the thread unsafe SimpleDateFormat method,replaced the method with

val fdf = FastDateFormat.getInstance("yyyy_MM", tz

 checked the data is fine now .