Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Issue with handle kafka data into hive table in hdfs

Solved Go to solution

Issue with handle kafka data into hive table in hdfs

Explorer

I just used streaming to handle kafka data, and write it into hive table in hdfs.The hive table was partitioned by month/day which the time in the kafka data and I compute the month/day with it as below:

val sdf=new SimpleDateFormat("yyyy_MM")
val sdf1=new SimpleDateFormat("dd")
val adjustTime = data.getLong(12)
val month = sdf.format(new Date(adjustTime))
val day = sdf1.format(new Date(adjustTime))

 and I used repartition when I parse the data

repartition($"month",$"day").write.mode(SaveMode.Append).partitionBy("month","day")

when I checked the data in hdfs I found the problems:

1.The partitioned day appeared like month="2018_09" day ="31",the problem is that Sep could not have 31th.

2.in the someday partition, the data is not belog to this day, just like the adjusttime is 2018-09-30,but the data in partition 2018_09_30 have more data with other time like 2018_03_08 and ... The data in it is not correct.

 

So I will be appreciated for it if any suggestions or ideas to solve these problems. Thanks~

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Issue with handle kafka data into hive table in hdfs

Explorer

The Problem was solved because of the thread unsafe SimpleDateFormat method,replaced the method with

val fdf = FastDateFormat.getInstance("yyyy_MM", tz

 checked the data is fine now .

1 REPLY 1

Re: Issue with handle kafka data into hive table in hdfs

Explorer

The Problem was solved because of the thread unsafe SimpleDateFormat method,replaced the method with

val fdf = FastDateFormat.getInstance("yyyy_MM", tz

 checked the data is fine now .

Don't have an account?
Coming from Hortonworks? Activate your account here