- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
flume kafkasource, hdfs sink remove avro field
- Labels:
-
Apache Flume
-
Apache Kafka
-
HDFS
Created on ‎01-16-2019 11:16 PM - edited ‎09-16-2022 07:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to create a table with the complex type removed from the avro data in the same schema. This is because Impala does not skipping complex types. Platform is CDH 6.0.1
For Example :
Employee(raw data) - name : string - age : int - additional-info : map<string, string> Employee(Hive table 1) - name : string - age : int - additional-info : map<string, string> Employee_For_Implala(Hive table 2) - name : string - age : int
Pipeline :
KafkaProducer(Avro Bytes) - Kafka - Flume - HDFS - Hive(Impala)
Flume : KafkaSource - Channel - Sink(AvroEventSerializer$Builder)
I tried changing the sink(serializer.schemaURL, remove Complex type field) but it failed.
I am trying to use morphine now. But this is also failing.
Is there a better way?
Created ‎01-23-2019 10:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Morphlines would be the preferred way to selectively choose the data that will be passing through the source to the sink. you can use the morphline removeFields command [1] to selectively drop the fields you don't want. If you need to review what is happening with the data you can turn on morphline TRACE by adding the following to the flume logging safety valve:
log4j.logger.org.kitesdk.morphline=TRACE
-pd
[1] http://kitesdk.org/docs/1.1.0/morphlines/morphlines-reference-guide.html#removeFields
Created ‎01-23-2019 10:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Morphlines would be the preferred way to selectively choose the data that will be passing through the source to the sink. you can use the morphline removeFields command [1] to selectively drop the fields you don't want. If you need to review what is happening with the data you can turn on morphline TRACE by adding the following to the flume logging safety valve:
log4j.logger.org.kitesdk.morphline=TRACE
-pd
[1] http://kitesdk.org/docs/1.1.0/morphlines/morphlines-reference-guide.html#removeFields
Created ‎01-28-2019 05:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
