Support Questions

Manus · ‎05-31-2016

I have ran two or more examples of hive data pipeline line using Apache Falcon by creating Hive table URI(input feed/output feed) Feed.Here are the problem statements,

statements:

1) Inserting data from one hive table to another table.

2)loading data from HDFS to hive table.

Above both data pipeline are running perfectly but now my requirement is somewhat like,

Requirement:-

I have 3 external hive tables(without partition),written 1 SELECT query on top of that and wants to load data into another hive table using Falcon.I know, I will have to use INSERT OVERWRITE INTO TABLE table2...... SELECT col1,col2 from table1 query but a question is

Does Falcon will allow me to create TABLE URI FEED on without partition table?

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Tables:

1) patient

2)Observation

3)DiagnosticReport

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

SELECT query:

select p.id,p.gender, p.Age, p.birthdate, o.component[1].valuequantity.value, o.component[1].valuequantity.unit from (select *, floor(datediff(to_date(from_unixtime(unix_timestamp())), to_date(birthdate)) / 365.25) as Age FROM patient) p inner join DiagnosticReport d on p.id = substr(d.subject.reference,9) inner join Observation o on p.id = substr(o.subject.reference,9) where p.Age>17 and p.Age<86 and o.component[1].valuequantity.value <140;

Thanks in advance,

Please help me.

saktheesh_kumar · ‎05-31-2016

@Manoj Dhake , Falcon does schedule the data loading which is suppose to be continuous in nature. An example would be loading data on daily basis. It makes sense to have partition based on day so that new data on a daily basis would be loaded into the appropriate partition. Your query looks like an one time query not required to be scheduled (My understanding). Let me know if you have a specific requirement in which you want to do this way.

View solution in original post

saktheesh_kumar · ‎05-31-2016

@Manoj Dhake , Falcon does schedule the data loading which is suppose to be continuous in nature. An example would be loading data on daily basis. It makes sense to have partition based on day so that new data on a daily basis would be loaded into the appropriate partition. Your query looks like an one time query not required to be scheduled (My understanding). Let me know if you have a specific requirement in which you want to do this way.

Manus · ‎05-31-2016

Thank you Saktheesh,

<table uri="catalog:tmp_rishav:rec_count_tbl#feed_date=${YEAR}-${MONTH}-${DAY}" />
but can I remove "feed_date=${YEAR}-${MONTH}-${DAY}" from above statement and use it in falcon.
Is mandatory to use in falcon?

yzheng · ‎06-01-2016

@Manoj Dhake If you use table feed, you have to define partition. For your case where you don't want to specify partition, you can use process with hive engine: https://falcon.apache.org/0.4-incubating/docs/EntitySpecification.html#Hive

Cloudera Community

Support Questions

Is it mandatory to have partition on hive table while using in Apache Falcon?

Define and Process Data Pipelines in Hadoop With A...

HIVE - Duplicate table and merge partitions from ...

Mirroring Datasets Between Hadoop Clusters with Ap...

Falcon Hive Integration

Machine Learning with SQL using Apache Hive and Hi...

Apache Deep Learning 101: Using Apache MXNet with ...

Support Video: How to use reassign partitions tool...

Time series oriented architecture using Apache Pho...

Ambari Falcon WEB UI alert always checks falcon se...

How to Extract All Hive Tables DDL