Support Questions

Striver · ‎11-11-2018

I just used streaming to handle kafka data, and write it into hive table in hdfs.The hive table was partitioned by month/day which the time in the kafka data and I compute the month/day with it as below:

val sdf=new SimpleDateFormat("yyyy_MM")
val sdf1=new SimpleDateFormat("dd")
val adjustTime = data.getLong(12)
val month = sdf.format(new Date(adjustTime))
val day = sdf1.format(new Date(adjustTime))

and I used repartition when I parse the data

repartition($"month",$"day").write.mode(SaveMode.Append).partitionBy("month","day")

when I checked the data in hdfs I found the problems:

1.The partitioned day appeared like month="2018_09" day ="31",the problem is that Sep could not have 31th.

2.in the someday partition, the data is not belog to this day, just like the adjusttime is 2018-09-30,but the data in partition 2018_09_30 have more data with other time like 2018_03_08 and ... The data in it is not correct.

So I will be appreciated for it if any suggestions or ideas to solve these problems. Thanks~

Striver · ‎12-06-2018

The Problem was solved because of the thread unsafe SimpleDateFormat method，replaced the method with

val fdf = FastDateFormat.getInstance("yyyy_MM", tz)

checked the data is fine now .

View solution in original post

Striver · ‎12-06-2018

The Problem was solved because of the thread unsafe SimpleDateFormat method，replaced the method with

val fdf = FastDateFormat.getInstance("yyyy_MM", tz)

checked the data is fine now .

Cloudera Community

Support Questions

Issue with handle kafka data into hive table in hdfs

Troubleshooting kerberos issues in kafka for prod...

Hive create external table fails to load data

NiFi Error Handling - Design Pattern

Hive Kafka storage handler example

Druid Kafka Integration Service + Hive

Connecting to Kafka and Schema Registry in Data Hu...

Drop external hive table with data

How to Extract All Hive Tables DDL

HDF 2.x/3.0: Enable Ranger authorization for HDF c...

HDFS finalize upgrade issue