I have requirement where I have to copy data between cluster from one table to another table. So is it possible through falcon ?
I was reading following articles but I am lil bit confused about them.
http://hortonworks.com/blog/introduction-apache-falcon-hadoop/ here example 3.
https://falcon.apache.org/HiveDR.html explains how you can setup Hive replication from DB to DB, Table(s) to Table(s). Here are the limitations.
1. The DB and Table(s) names should be same on source and target.
2. Make sure hive has property hive.metastore.event.listeners set to org.apache.hive.hcatalog.listener.DbNotificationListener
3. User should bootstrap the DB and Table(s) on target from the source, after enabling event listeners.
As of today, Falcon will only replicate events captured by hive.metastore.event.listeners. This means that for Hive 1.2.*, the following will NOT be replicated. Everything else should be replicated by Falcon because Hive saves the events.
- will not replicate virtual objects like views, and will not handle other metadata objects such as roles, etc, it will be up to the warehouse administrator to manage those aspects currently.
- will not replicate direct HDFS writes without Metadata registering (in Alter, Update scenarios)
Here are the steps you need to follow to make the falcon Hive mirroring to work.
1).Create 2 Cluster entity(Source and Target Cluster) on a Source Cluster(You could do it target cluster as well).
2).Bootstrap the source table and databases . You need to follow the link here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport. Once exported you could do distcp to the target cluster.
3).Set the hive property(hive-site.xml) "hive.metastore.event.listeners" set to org.apache.hive.hcatalog.listener.DbNotificationListener and also hive.metastore.dml.events to true.
4).Create a mirror job pointing source and target cluster entity.
Hope this helps.
@jramakrishnan: Thanks a lot for your help.
Its quite explained in your article but I just have one doubt on step 2nd, actually If I am not wrong it is manual step and every time whenever source table gets update we need to export table or database to source location.
If the table use to be updated with partitions then how exctaly it will copy only updated partitions.
Also can you please help me to understand how exactly it will copy only updated data from 2nd step to target every time not complete table or data.
Bootstrapping is needed for the initial table and database. All the updates should be handled by Falcon. This is done via NOTIFICATION_SEQUENCE & NOTIFICATION_LOG tables in the hive metastore. FALCON also supports table partition. So you don't need to bootstrap individual partition.
Hope this helps.
tried it but I am getting below error: So can you please help me. ava.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://lxhdpmasttst002.lowes.com:10001/landing_db;auth=delegationToken: Invalid status 72 at org.apache.falcon.hive.mapreduce.CopyMapper.setup(CopyMapper.java:50) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://lxhdpmasttst002.lowes.com:10001/landing_db;auth=delegationToken: Invalid status 72 at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:210) at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:156) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:215) at org.apache.falcon.hive.util.EventUtils.setupConnection(EventUtils.java:122) at org.apache.falcon.hive.mapreduce.CopyMapper.setup(CopyMapper.java:48) ... 8 more Caused by: org.apache.thrift.transport.TTransportException: Invalid status 72 at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:307) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:185) ... 14 more Exception encountered HiveDR failure: Job job_1460461362019_0825 has failed: Task failed task_1460461362019_0825_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0