Created 05-31-2016 06:17 AM
I have ran two or more examples of hive data pipeline line using Apache Falcon by creating Hive table URI(input feed/output feed) Feed.Here are the problem statements,
statements:
1) Inserting data from one hive table to another table.
2)loading data from HDFS to hive table.
Above both data pipeline are running perfectly but now my requirement is somewhat like,
Requirement:-
I have 3 external hive tables(without partition),written 1 SELECT query on top of that and wants to load data into another hive table using Falcon.I know, I will have to use INSERT OVERWRITE INTO TABLE table2...... SELECT col1,col2 from table1 query but a question is
Does Falcon will allow me to create TABLE URI FEED on without partition table?
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Tables:
1) patient
2)Observation
3)DiagnosticReport
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT query:
select p.id,p.gender, p.Age, p.birthdate, o.component[1].valuequantity.value, o.component[1].valuequantity.unit from (select *, floor(datediff(to_date(from_unixtime(unix_timestamp())), to_date(birthdate)) / 365.25) as Age FROM patient) p inner join DiagnosticReport d on p.id = substr(d.subject.reference,9) inner join Observation o on p.id = substr(o.subject.reference,9) where p.Age>17 and p.Age<86 and o.component[1].valuequantity.value <140;
Thanks in advance,
Please help me.
Created 05-31-2016 09:53 AM
@Manoj Dhake , Falcon does schedule the data loading which is suppose to be continuous in nature. An example would be loading data on daily basis. It makes sense to have partition based on day so that new data on a daily basis would be loaded into the appropriate partition. Your query looks like an one time query not required to be scheduled (My understanding). Let me know if you have a specific requirement in which you want to do this way.
Created 05-31-2016 09:53 AM
@Manoj Dhake , Falcon does schedule the data loading which is suppose to be continuous in nature. An example would be loading data on daily basis. It makes sense to have partition based on day so that new data on a daily basis would be loaded into the appropriate partition. Your query looks like an one time query not required to be scheduled (My understanding). Let me know if you have a specific requirement in which you want to do this way.
Created 05-31-2016 12:16 PM
Thank you Saktheesh,
<table uri="catalog:tmp_rishav:rec_count_tbl#feed_date=${YEAR}-${MONTH}-${DAY}" /> but can I remove "feed_date=${YEAR}-${MONTH}-${DAY}" from above statement and use it in falcon. Is mandatory to use in falcon?
Created 06-01-2016 01:51 AM
@Manoj Dhake If you use table feed, you have to define partition. For your case where you don't want to specify partition, you can use process with hive engine: https://falcon.apache.org/0.4-incubating/docs/EntitySpecification.html#Hive