12-24-2018 02:01 AM - last edited on 12-24-2018 05:46 AM by cjervis
The data that I collect contains complex types and should guarantee a response time of less than 5 seconds.
I might use Hbase, but I want to use Impala.
I know that Impala does not support complex types.
What I want is for Impala to skipping complex types.
As a result of my checking, Impala skipping a complex type in a parqut format file.
How do I write a parquert format file to hdfs, hive, impala etc.?
Can I write a parquert file using Flume, Morphline, etc.?
My system's data collection flow is as follows.
Kafka -> Flume -> hdfs(avro file) -> hive