About mramasami

mramasami · ‎03-15-2017

@Gnanasekaran G Use the following ( <=>) operator instead of (=) operator SELECT CASE WHEN (NULL<=>NULL) THEN "equals" ELSE "not equals" end AS value; Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-RelationalOperators

mramasami · ‎03-15-2017

@Gnanasekaran G Use the following ( <=>) operator instead of (=) operator SELECT CASE WHEN (NULL<=>NULL) THEN "equals" ELSE "not equals" end AS value; Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-RelationalOperators

mramasami · ‎03-15-2017

@zaenal rifai No. It is the limitation from oozie UI.

mramasami · ‎03-15-2017

@zaenal rifai The maximum number of action oozie DAG will show is 25. Reference https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/util/GraphGenerator.java#L62 https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/util/GraphGenerator.java#L341

mramasami · ‎11-03-2016

@zhixun he Yes. Whenever there is a change, snapshot will get created in source and falcon process instance will trigger based on the frequency

mramasami · ‎10-25-2016

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system.Snapshots are very efficient because they only copy data that are changed. We can restore the data to any previous snapshot. Some common use cases of snapshots are Data backup and Disaster recovery. HDFS Snapshot Extension: Falcon will support HDFS snapshot-based replication through HDFS Snapshot extension. Using this feature, create and manage snapshots on source/target directories. Mirror data from source to target for disaster recovery using these snapshots. Perform retention on the snapshots created on source and target. Snapshot replication will only work from single source directory to single target directory. For snapshot to work, we expect users to do the following Both source and target clusters must have a version of Hadoop 2.7.0 or higher. The user submitting and scheduling the falcon extension should have permissions on both source and target directories. Both directories should be snap shotable. To perform the HDFS Snapshot replication in Falcon, We need to create the source, target cluster entities and also need to create/give permissions to the staging and working directories. Please use the following steps to accomplish it. Source Cluster: hdfs dfs -rm -r /tmp/fs /tmp/fw hdfs dfs -mkdir -p /tmp/fs hdfs dfs -chmod 777 /tmp/fs hdfs dfs -mkdir -p /tmp/fw hdfs dfs -chmod 755 /tmp/fw hdfs dfs -chown falcon /tmp/fs hdfs dfs -chown falcon /tmp/fw Target Cluster : hdfs dfs -rm -r /tmp/fs /tmp/fw hdfs dfs -mkdir -p /tmp/fs hdfs dfs -chmod 777 /tmp/fs hdfs dfs -mkdir -p /tmp/fw hdfs dfs -chmod 755 /tmp/fw hdfs dfs -chown falcon /tmp/fs hdfs dfs -chown falcon /tmp/fw Cluster Entities: primaryCluster.xml <?xml version="1.0" encoding="UTF-8"?> <cluster xmlns="uri:falcon:cluster:0.1" colo="USWestOregon" description="oregonHadoopCluster" name="primaryCluster"> <interfaces> <interface type="readonly" endpoint="webhdfs://mycluster1:20070" version="0.20.2" /> <interface type="write" endpoint="hdfs://mycluster1:8020" version="0.20.2" /> <interface type="execute" endpoint="primaryCluster-12.openstacklocal:8050" version="0.20.2" /> <interface type="workflow" endpoint="http://primaryCluster-14.openstacklocal:11000/oozie" version="3.1" /> <interface type="messaging" endpoint="tcp://primaryCluster-9.openstacklocal:61616?daemon=true" version="5.1.6" /> <interface type="registry" endpoint="thrift://primaryCluster-14.openstacklocal:9083" version="0.11.0" /> </interfaces> <locations> <location name="staging" path="/tmp/fs" /> <location name="temp" path="/tmp" /> <location name="working" path="/tmp/fw" /> </locations> <ACL owner="ambari-qa" group="users" permission="0755" /> <properties> <property name="dfs.namenode.kerberos.principal" value="nn/_HOST@EXAMPLE.COM" /> <property name="hive.metastore.kerberos.principal" value="hive/_HOST@EXAMPLE.COM" /> <property name="hive.metastore.sasl.enabled" value="true" /> <property name="hadoop.rpc.protection" value="authentication" /> <property name="hive.metastore.uris" value="thrift://primaryCluster-14.openstacklocal:9083" /> <property name="hive.server2.uri" value="hive2://primaryCluster-14.openstacklocal:10000" /> </properties> </cluster> falcon entity -submit -type cluster -file primaryCluster.xml --> primaryCluster backupCluster : <?xml version="1.0" encoding="UTF-8"?> <cluster xmlns="uri:falcon:cluster:0.1" colo="USWestOregon" description="oregonHadoopCluster" name="backupCluster"> <interfaces> <interface type="readonly" endpoint="webhdfs://mycluster2:20070" version="0.20.2" /> <interface type="write" endpoint="hdfs://mycluster2:8020" version="0.20.2" /> <interface type="execute" endpoint="backupCluster-5.openstacklocal:8050" version="0.20.2" /> <interface type="workflow" endpoint="http://backupCluster-6.openstacklocal:11000/oozie" version="3.1" /> <interface type="messaging" endpoint="tcp://backupCluster-1.openstacklocal:61616" version="5.1.6" /> <interface type="registry" endpoint="thrift://backupCluster-6.openstacklocal:9083" version="0.11.0" /> </interfaces> <locations> <location name="staging" path="/tmp/fs" /> <location name="temp" path="/tmp" /> <location name="working" path="/tmp/fw" /> </locations> <ACL owner="ambari-qa" group="users" permission="0755" /> <properties> <property name="dfs.namenode.kerberos.principal" value="nn/_HOST@EXAMPLE.COM" /> <property name="hive.metastore.kerberos.principal" value="hive/_HOST@EXAMPLE.COM" /> <property name="hive.metastore.sasl.enabled" value="true" /> <property name="hadoop.rpc.protection" value="authentication" /> <property name="hive.metastore.uris" value="thrift://backupCluster-6.openstacklocal:9083" /> <property name="hive.server2.uri" value="hive2://backupCluster-6.openstacklocal:10000" /> </properties> </cluster> falcon entity -submit -type cluster -file backupCluster.xml --> backupCluster HDFS Snapshot Replication: Source: [ Create directory and copy the data] hdfs dfs -mkdir -p /tmp/falcon/HDFSSnapshot/source hdfs dfs -put NYSE-2000-2001.tsv /tmp/falcon/HDFSSnapshot/source Note: you can download the NYSE-2000-2001.tsv file from https://s3.amazonaws.com/hw-sandbox/tutorial1/NYSE-2000-2001.tsv.gz Allow Snapshot to the directory: ddfs dfsadmin -allowSnapshot /tmp/falcon/HDFSSnapshot/source [ hdfs] hdfs lsSnapshottableDir [ ambari-qa] Target Cluster hdfs dfs -mkdir -p /tmp/falcon/HDFSSnapshot/target hdfs dfsadmin -allowSnapshot /tmp/falcon/HDFSSnapshot/target hdfs-snapshot.properties jobName=HDFSSnapshot jobClusterName=primaryCluster jobValidityStart=2016-05-09T06:25Z jobValidityEnd=2016-05-09T08:00Z jobFrequency=days(1) sourceCluster=primaryCluster sourceSnapshotDir=/tmp/falcon/HDFSSnapshot/source sourceSnapshotRetentionAgeLimit=days(1) sourceSnapshotRetentionNumber=3 targetCluster=backupCluster targetSnapshotDir=/tmp/falcon/HDFSSnapshot/target targetSnapshotRetentionAgeLimit=days(1) targetSnapshotRetentionNumber=3 jobAclOwner=ambari-qa jobAclGroup=users jobAclPermission="0x755" Submit And schedule the job using the property file: falcon extension -extensionName hdfs-snapshot-mirroring -submitAndSchedule -file hdfs-snapshot.properties By using the jobName we can find the oozie job it has launched falcon extension -instances -jobName HDFSSnapshot Once the job is completed, we can see in source the snapshot will be automatically created and snapshot along with source content are replicated in the target cluster : Source Cluster HDFS Content: hdfs dfs -ls -R hdfs://mycluster1:8020//tmp/falcon/HDFSSnapshot/source/ drwxr-xr-x - ambari-qa hdfs 0 2016-10-25 02:27 hdfs://mycluster1:8020/tmp/falcon/HDFSSnapshot/source/source -rw-r--r-- 3 ambari-qa hdfs 44005963 2016-10-25 02:27 hdfs://mycluster1:8020/tmp/falcon/HDFSSnapshot/source/source/NYSE-2000-2001.tsv Target Cluster HDFS Content: hdfs dfs -ls -R hdfs://mycluster2:8020//tmp/falcon-HDFSSnapshot/target/ drwxr-xr-x - ambari-qa hdfs 0 2016-10-25 02:28 hdfs://mycluster2:8020/tmp/falcon/HDFSSnapshot/target/source -rw-r--r-- 3 ambari-qa hdfs 44005963 2016-10-25 02:28 hdfs://mycluster2:8020/tmp/falcon/HDFSSnapshot/target/source/NYSE-2000-2001.tsv We can see the data has been replicated from source to target cluster. Source Snapshot Directory: hdfs dfs -ls hdfs://mycluster1:8020//tmp/falcon/HDFSSnapshot/source/.snapshot Found 1 items drwxr-xr-x - ambari-qa hdfs 0 2016-10-25 02:27 hdfs://mycluster1:8020/tmp/falcon/HDFSSnapshot/source/.snapshot/falcon-snapshot-HDFSSnapshot-2016-05-09-06-25-1477362461509 Target Snapshot Directory: hdfs dfs -ls hdfs://mycluster2:8020//tmp/falcon/HDFSSnapshot/target/.snapshot Found 1 itemsdrwxr-xr-x - ambari-qa hdfs 0 2016-10-25 02:28 hdfs://mycluster2:8020/tmp/falcon/HDFSSnapshot/target/.snapshot/falcon-snapshot-HDFSSnapshot-2016-05-09-06-25-1477362461509 We can see the snapshot directory has been automatically created in source and also replicated from source to target cluster.

mramasami · ‎09-26-2016

@Saurabh Can you run the following command in your ambari server and let us know the output for further debugging. /usr/bin/yum install hdp-select

mramasami · ‎09-26-2016

Good descriptive article on how to install Atlas HA via Ambari

mramasami · ‎09-26-2016

@Santhosh B Gowda To Increase the size of the workflow job definition size. Please add the following property to oozie-site.xml: oozie.service.WorkflowAppService.WorkflowDefinitionMaxLength=<The maximum length of the workflow definition in bytes> For Example: oozie.service.WorkflowAppService.WorkflowDefinitionMaxLength=1000000

mramasami · ‎08-18-2016

@ Gaurab D can you share your shell script? Looks like the script has been executed but the desired output which we are comparing are not correct? Thanks

Online	Offline
Last Visited	‎06-06-2022 12:36 PM

Member Since	‎01-06-2016 11:13 AM
Last Visited	‎06-06-2022 12:36 PM
Posts	36
Kudos received	104

Cloudera Community

Re: Ambari API to find HDFS HA enabled

Re: Hive Null comparison not works properly

Re: Oozie can't display the graph cause number of ...

Re: max size for oozie workflow definition exceede...

Re: Oozie job is getting killed

Re: Hive Null Timestamp comparison not works prope...

Re: Hive Null comparison not works properly

Re: Oozie can't display the graph cause number of ...

Re: Oozie can't display the graph cause number of ...

Re: HDFS Snapshots Based Replication Using Apache ...

HDFS Snapshots Based Replication Using Apache Falc...

Re: Not able to upgrade from 2.3.4 to 2.5.0.0

Re: HowTo install and configure high availability ...

Re: max size for oozie workflow definition exceede...

Re: Oozie job is getting killed