Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can we copy entire table or database with data and metadata through falcon ?

Can we copy entire table or database with data and metadata through falcon ?

Guru

Team:

I have requirement where I have to copy data between cluster from one table to another table. So is it possible through falcon ?

I was reading following articles but I am lil bit confused about them.

https://falcon.apache.org/HiveIntegration.html

http://hortonworks.com/blog/introduction-apache-falcon-hadoop/ here example 3.

18 REPLIES 18

Re: Can we copy entire table or database with data and metadata through falcon ?

Super Guru

Re: Can we copy entire table or database with data and metadata through falcon ?

Guru

@Kuldeep Kulkarni: I tried already this but not helping. It is working for normal hdfs data mirror but not hive mirroring.

Re: Can we copy entire table or database with data and metadata through falcon ?

Rising Star

https://falcon.apache.org/HiveDR.html explains how you can setup Hive replication from DB to DB, Table(s) to Table(s). Here are the limitations.

1. The DB and Table(s) names should be same on source and target.

2. Make sure hive has property hive.metastore.event.listeners set to org.apache.hive.hcatalog.listener.DbNotificationListener

3. User should bootstrap the DB and Table(s) on target from the source, after enabling event listeners.

As of today, Falcon will only replicate events captured by hive.metastore.event.listeners. This means that for Hive 1.2.*, the following will NOT be replicated. Everything else should be replicated by Falcon because Hive saves the events.

- will not replicate virtual objects like views, and will not handle other metadata objects such as roles, etc, it will be up to the warehouse administrator to manage those aspects currently.

- will not replicate direct HDFS writes without Metadata registering (in Alter, Update scenarios)

Re: Can we copy entire table or database with data and metadata through falcon ?

Guru

@Balu: Sorry but I am little bit confused about feed and process entity definition. SO can you please put some more light on feed entity and process entity definitions.

Re: Can we copy entire table or database with data and metadata through falcon ?

Rising Star
@Saurabh Kumar

I did not refer to any feed/process entities in my comment. So I do not understand your comment. Can you please clarify?

Re: Can we copy entire table or database with data and metadata through falcon ?

Saurabh Kumar

Here are the steps you need to follow to make the falcon Hive mirroring to work.

1).Create 2 Cluster entity(Source and Target Cluster) on a Source Cluster(You could do it target cluster as well).

For Example,

3426-sourceentity.jpg

3427-tclusterentity.jpg

2).Bootstrap the source table and databases . You need to follow the link here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport. Once exported you could do distcp to the target cluster.

3).Set the hive property(hive-site.xml) "hive.metastore.event.listeners" set to org.apache.hive.hcatalog.listener.DbNotificationListener and also hive.metastore.dml.events to true.

4).Create a mirror job pointing source and target cluster entity.

for example,

3424-mirror.jpg

3425-mirror2.jpg

Hope this helps.

Re: Can we copy entire table or database with data and metadata through falcon ?

Guru

@jramakrishnan: Thanks a lot for your help.

Its quite explained in your article but I just have one doubt on step 2nd, actually If I am not wrong it is manual step and every time whenever source table gets update we need to export table or database to source location.

If the table use to be updated with partitions then how exctaly it will copy only updated partitions.

Also can you please help me to understand how exactly it will copy only updated data from 2nd step to target every time not complete table or data.

Highlighted

Re: Can we copy entire table or database with data and metadata through falcon ?

@Saurabh Kumar,

Bootstrapping is needed for the initial table and database. All the updates should be handled by Falcon. This is done via NOTIFICATION_SEQUENCE & NOTIFICATION_LOG tables in the hive metastore. FALCON also supports table partition. So you don't need to bootstrap individual partition.

Hope this helps.

Re: Can we copy entire table or database with data and metadata through falcon ?

Guru

@jramakrishnan:

tried it but I am getting below error: So can you please help me. ava.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://lxhdpmasttst002.lowes.com:10001/landing_db;auth=delegationToken: Invalid status 72 at org.apache.falcon.hive.mapreduce.CopyMapper.setup(CopyMapper.java:50) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://lxhdpmasttst002.lowes.com:10001/landing_db;auth=delegationToken: Invalid status 72 at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:210) at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:156) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:215) at org.apache.falcon.hive.util.EventUtils.setupConnection(EventUtils.java:122) at org.apache.falcon.hive.mapreduce.CopyMapper.setup(CopyMapper.java:48) ... 8 more Caused by: org.apache.thrift.transport.TTransportException: Invalid status 72 at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:307) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:185) ... 14 more Exception encountered HiveDR failure: Job job_1460461362019_0825 has failed: Task failed task_1460461362019_0825_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0