Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is it mandatory to have partition on hive table while using in Apache Falcon?

avatar
Super Collaborator

I have ran two or more examples of hive data pipeline line using Apache Falcon by creating Hive table URI(input feed/output feed) Feed.Here are the problem statements,

statements:

1) Inserting data from one hive table to another table.

2)loading data from HDFS to hive table.

Above both data pipeline are running perfectly but now my requirement is somewhat like,

Requirement:-

I have 3 external hive tables(without partition),written 1 SELECT query on top of that and wants to load data into another hive table using Falcon.I know, I will have to use INSERT OVERWRITE INTO TABLE table2...... SELECT col1,col2 from table1 query but a question is

Does Falcon will allow me to create TABLE URI FEED on without partition table?

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Tables:

1) patient

2)Observation

3)DiagnosticReport

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

SELECT query:

select p.id,p.gender, p.Age, p.birthdate, o.component[1].valuequantity.value, o.component[1].valuequantity.unit from (select *, floor(datediff(to_date(from_unixtime(unix_timestamp())), to_date(birthdate)) / 365.25) as Age FROM patient) p inner join DiagnosticReport d on p.id = substr(d.subject.reference,9) inner join Observation o on p.id = substr(o.subject.reference,9) where p.Age>17 and p.Age<86 and o.component[1].valuequantity.value <140;

Thanks in advance,

Please help me.

1 ACCEPTED SOLUTION

avatar

@Manoj Dhake , Falcon does schedule the data loading which is suppose to be continuous in nature. An example would be loading data on daily basis. It makes sense to have partition based on day so that new data on a daily basis would be loaded into the appropriate partition. Your query looks like an one time query not required to be scheduled (My understanding). Let me know if you have a specific requirement in which you want to do this way.

View solution in original post

3 REPLIES 3

avatar

@Manoj Dhake , Falcon does schedule the data loading which is suppose to be continuous in nature. An example would be loading data on daily basis. It makes sense to have partition based on day so that new data on a daily basis would be loaded into the appropriate partition. Your query looks like an one time query not required to be scheduled (My understanding). Let me know if you have a specific requirement in which you want to do this way.

avatar
Super Collaborator

Thank you Saktheesh,

<table uri="catalog:tmp_rishav:rec_count_tbl#feed_date=${YEAR}-${MONTH}-${DAY}" />
but can I remove "feed_date=${YEAR}-${MONTH}-${DAY}" from above statement and use it in falcon.
Is mandatory to use in falcon?

avatar
Contributor

@Manoj Dhake If you use table feed, you have to define partition. For your case where you don't want to specify partition, you can use process with hive engine: https://falcon.apache.org/0.4-incubating/docs/EntitySpecification.html#Hive