- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Issue with handle kafka data into hive table in hdfs
- Labels:
-
Apache Kafka
-
HDFS
Created on ‎11-11-2018 10:41 PM - edited ‎09-16-2022 06:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just used streaming to handle kafka data, and write it into hive table in hdfs.The hive table was partitioned by month/day which the time in the kafka data and I compute the month/day with it as below:
val sdf=new SimpleDateFormat("yyyy_MM")
val sdf1=new SimpleDateFormat("dd")
val adjustTime = data.getLong(12)
val month = sdf.format(new Date(adjustTime))
val day = sdf1.format(new Date(adjustTime))
and I used repartition when I parse the data
repartition($"month",$"day").write.mode(SaveMode.Append).partitionBy("month","day")
when I checked the data in hdfs I found the problems:
1.The partitioned day appeared like month="2018_09" day ="31",the problem is that Sep could not have 31th.
2.in the someday partition, the data is not belog to this day, just like the adjusttime is 2018-09-30,but the data in partition 2018_09_30 have more data with other time like 2018_03_08 and ... The data in it is not correct.
So I will be appreciated for it if any suggestions or ideas to solve these problems. Thanks~
Created ‎12-06-2018 01:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Problem was solved because of the thread unsafe SimpleDateFormat method,replaced the method with
val fdf = FastDateFormat.getInstance("yyyy_MM", tz)
checked the data is fine now .
Created ‎12-06-2018 01:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Problem was solved because of the thread unsafe SimpleDateFormat method,replaced the method with
val fdf = FastDateFormat.getInstance("yyyy_MM", tz)
checked the data is fine now .
