Member since
09-29-2015
57
Posts
49
Kudos Received
19
Solutions
03-13-2017
11:35 PM
1 Kudo
Recipes framework capability to support HDFS and Hive mirroring was added in Apache Falcon 0.6.0 release and it was a client side logic. With 0.10 release its moved to server side and renamed as server side extensions as part of jira https://issues.apache.org/jira/browse/FALCON-1107. For any new mirror job to be submitted and managed, Falcon extensions should be used. Please refer https://falcon.apache.org/restapi/ExtensionEnumeration.html for more details. Supported DistCp options for HDFS mirroring in HDP 2.5: distcpMaxMaps distcpMapBandwidth overwrite ignoreErrors skipChecksum removeDeletedFiles preserveBlockSize preserveReplicationNumber preservePermission preserveUser preserveGroup preserveChecksumType preserveAcl preserveXattr preserveTimes Hdfs mirroring job can be scheduled using extension as below: falcon extension -submitAndSchedule -extensionName hdfs-mirroring -file sales-monthly.properties
Content of sales-monthly.properties file:
jobName=sales-monthly
jobValidityStart=2016-06-30T00:00Z
jobValidityEnd=2099-12-31T11:59Z
jobFrequency=minutes(45)
jobTimezone=UTC
sourceCluster=primaryCluster
targetCluster=backupCluster
jobClusterName=primaryCluster
sourceDir=/user/ambari-qa/sales-monthly/input
targetDir=/user/ambari-qa/sales-monthly/output
removeDeletedFiles=true
skipChecksum=false
preservePermission=true
preserveUser=true
Refer hdfs-mirroring-properties.json for properties supported in HDFS mirroring. Supported DistCp options for Hive mirroring in HDP 2.5:
distcpMaxMaps distcpMapBandwidth Hive mirroring job can be scheduled using extension as below: falcon extension -submitAndSchedule -extensionName hive-mirroring -file hive-sales-monthly.properties
Content of hive-sales-monthly.properties file:
jobName=hive-sales-monthly
sourceCluster=primaryCluster
targetCluster=backupCluster
jobClusterName=primaryCluster
jobValidityStart=2016-07-19T00:02Z
jobValidityEnd=2018-05-25T11:02Z
jobFrequency=minutes(30)
jobRetryPolicy=periodic
jobRetryDelay=minutes(30)
jobRetryAttempts=3
distcpMaxMaps=1
distcpMapBandwidth=100
maxEvents=-1
replicationMaxMaps=5
sourceDatabases=default
sourceTables=*
sourceHiveServer2Uri=hive2://primary:10000
targetHiveServer2Uri=hive2://backup:10000
Refer hive-mirroring-properties.json for properties supported in Hive mirroring.
... View more
Labels: