Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

flume kafkasource, hdfs sink remove avro field

Solved Go to solution
Highlighted

flume kafkasource, hdfs sink remove avro field

Explorer

I want to create a table with the complex type removed from the avro data in the same schema. This is because Impala does not skipping complex types. Platform is CDH 6.0.1

For Example :

Employee(raw data)
  - name : string
  - age : int
  - additional-info : map<string, string>

Employee(Hive table 1)
  - name : string
  - age : int
  - additional-info : map<string, string>

Employee_For_Implala(Hive table 2)
  - name : string
  - age : int

Pipeline :

KafkaProducer(Avro Bytes) - Kafka - Flume - HDFS - Hive(Impala)

Flume : KafkaSource - Channel - Sink(AvroEventSerializer$Builder)

I tried changing the sink(serializer.schemaURL, remove Complex type field) but it failed.

I am trying to use morphine now. But this is also failing.

Is there a better way?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: flume kafkasource, hdfs sink remove avro field

Super Collaborator

Morphlines would be the preferred way to selectively choose the data that will be passing through the source to the sink. you can use the morphline removeFields command [1] to selectively drop the fields you don't want. If you need to review what is happening with the data you can turn on morphline TRACE by adding the following to the flume logging safety valve:
log4j.logger.org.kitesdk.morphline=TRACE

 

-pd

 

[1] http://kitesdk.org/docs/1.1.0/morphlines/morphlines-reference-guide.html#removeFields

2 REPLIES 2

Re: flume kafkasource, hdfs sink remove avro field

Super Collaborator

Morphlines would be the preferred way to selectively choose the data that will be passing through the source to the sink. you can use the morphline removeFields command [1] to selectively drop the fields you don't want. If you need to review what is happening with the data you can turn on morphline TRACE by adding the following to the flume logging safety valve:
log4j.logger.org.kitesdk.morphline=TRACE

 

-pd

 

[1] http://kitesdk.org/docs/1.1.0/morphlines/morphlines-reference-guide.html#removeFields

Re: flume kafkasource, hdfs sink remove avro field

Explorer
Thanks. I'll try it the way you told me.
Don't have an account?
Coming from Hortonworks? Activate your account here