Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Falcon process producing to multiple feed instances

Falcon process producing to multiple feed instances

New Contributor

Hi,

I would like to configure a Falcon process that consumes a Hive feed and produces another Hive feed. Both feeds are partitionned by a date field, but each record to be computed contains 2 dates : the first one is the one used to partition input feed, the other one must be used as output feed partition field. Of course, they may differ.

As a consequence, consuming day D input feed records may produce records for day D output feed instance and maybe for day D-1 instance and day D-n instance.

As for now, I did not find a way to specify a "dynamic" value for ouput feed "instance" attribute, so that I may only use one instance directly related to input feed instance. And that leads to an erroneous lineage in Falcon console.

Is there a way out-of-the-box to deal with that use case ?

Thanks

3 REPLIES 3
Highlighted

Re: Falcon process producing to multiple feed instances

Expert Contributor

@Benjamin Bonnet Can you please provide some example for your question to understand it much better. Are you looking for a way to specify the dynamic partitions (provided by Hive) for output feed instance in Falcon.

Highlighted

Re: Falcon process producing to multiple feed instances

New Contributor

Hi,

@peeyush , here is an example.

This is today's instance of my input feed content. That feed is partitioned by ingestion date (year/month/day)

ingestion_yearingestio_monthingestion_daykeyvaluefunctional_date
2016031624hello2016-03-15
2016031614world2016-03-01
2016031654hadoop2016-03-15

Now I want my process to produce into my output feed, another Hive table that uses functional_date (not ingestion date) as a partitioning criterion.

Basically, my process will execute a Hive query:

INSERT INTO target PARTITION(year, month, day)

SELECT ingestion_year, ingestion_month, ingestion_day, key, value, year(functional_date), month(functional_date), day(functional_date)

FROM source

where ${falcon_input_partition_filter_hive};

Therefore, today my process will put data to 2016/03/15 partition (2 rows) and 2016/03/01 partition (1 row).

And I would like falcon lineage to show that my process produced into those two partitions of my output feed.

Thanks

Highlighted

Re: Falcon process producing to multiple feed instances

Expert Contributor

@Benjamin Bonnet Thanks for providing the example. Presently Falcon lineage is for partitioned hive table specified in feed entity. I don't think so Falcon lineage supports for dynamic partitions. I will create a jira issue to check for the feasibility of adding dynamic partitions in lineage.

Don't have an account?
Coming from Hortonworks? Activate your account here