Member since
09-25-2015
82
Posts
93
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3849 | 06-06-2017 09:57 AM | |
1066 | 03-01-2017 10:26 AM | |
1083 | 11-22-2016 10:32 AM | |
902 | 08-09-2016 12:05 PM | |
1586 | 08-08-2016 03:57 PM |
02-24-2017
10:32 PM
I'm trying to implement the Falcon Email Tutorial tutorial in HDP 2.5.3 (self installed on a single node rather than on the sandbox). Everything is submitted and running, the RawEmailIngestProcess is creating data, but no instances of the rawEmailFeed are scheduled, and therefore the cleansedEmailProcess is stuck waiting for input. How do I troubleshoot this, anyone have any ideas? I have a feeling it's something similar to this HCC post which @Sowmya Ramesh helped with, but I can't get my head round the logic of the validities! Here's my code: rawEmailFeed <feed xmlns='uri:falcon:feed:0.1' name='rawEmailFeed' description='Raw customer email feed'>
<tags>externalSystem=USWestEmailServers</tags>
<groups>churnAnalysisDataPipeline</groups>
<availabilityFlag>_success</availabilityFlag>
<frequency>hours(1)</frequency>
<timezone>UTC</timezone>
<late-arrival cut-off='hours(1)'/>
<clusters>
<cluster name='primaryCluster' type='source'>
<validity start='2017-02-24T17:57Z' end='2099-06-05T11:59Z'/>
<retention limit='days(90)' action='delete'/>
<locations>
<location type='data' path='/user/ambari-qa/falcon/demo/primary/input/enron/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
<location type='stats' path='/'>
</location>
</locations>
</cluster>
</clusters>
<locations>
<location type='data' path='/user/ambari-qa/falcon/demo/primary/input/enron/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
<location type='stats' path='/'>
</location>
</locations>
<ACL owner='ambari-qa' group='users' permission='0755'/>
<schema location='/none' provider='/none'/>
<properties>
<property name='queueName' value='default'>
</property>
<property name='jobPriority' value='NORMAL'>
</property>
</properties>
</feed>
rawEmailIngestProcess <process xmlns='uri:falcon:process:0.1' name='rawEmailIngestProcess'>
<tags>email=testemail</tags>
<clusters>
<cluster name='primaryCluster'>
<validity start='2017-02-24T17:59Z' end='2099-06-05T18:00Z'/>
</cluster>
</clusters>
<parallel>1</parallel>
<order>FIFO</order>
<frequency>hours(1)</frequency>
<timezone>UTC</timezone>
<outputs>
<output name='output' feed='rawEmailFeed' instance='now(0,0)'>
</output>
</outputs>
<workflow name='emailIngestWorkflow' version='4.0.1' engine='oozie' path='/user/ambari-qa/falcon/demo/apps/ingest/fs'/>
<retry policy='exp-backoff' delay='minutes(3)' attempts='3'/>
<ACL owner='ambari-qa' group='users' permission='0755'/>
</process>
cleansedEmailFeed <feed xmlns='uri:falcon:feed:0.1' name='cleansedEmailFeed' description='Cleansed customer emails'>
<tags>cleanse=cleaned</tags>
<groups>churnAnalysisDataPipeline</groups>
<availabilityFlag>_success</availabilityFlag>
<frequency>hours(1)</frequency>
<timezone>UTC</timezone>
<late-arrival cut-off='hours(4)'/>
<clusters>
<cluster name='primaryCluster' type='source'>
<validity start='2017-02-24T17:58Z' end='2099-06-05T18:00Z'/>
<retention limit='hours(90)' action='delete'/>
<locations>
<location type='data' path='/user/ambari-qa/falcon/demo/primary/processed/enron/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
<location type='stats' path='/tmp/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
<location type='meta' path='/tmp/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
</locations>
</cluster>
<cluster name='backupCluster' type='target'>
<validity start='2017-02-24T17:58Z' end='2099-06-05T18:00Z'/>
<retention limit='hours(90)' action='delete'/>
<locations>
<location type='data' path='/falcon/demo/bcp/processed/enron/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
<location type='stats' path='/tmp/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
<location type='meta' path='/tmp/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
</locations>
</cluster>
</clusters>
<locations>
<location type='data' path='/user/ambari-qa/falcon/demo/primary/processed/enron/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
<location type='stats' path='/tmp/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
<location type='meta' path='/tmp/${YEAR}-${MONTH}-${DAY}-${HOUR}'>
</location>
</locations>
<ACL owner='ambari-qa' group='users' permission='0755'/>
<schema location='/none' provider='/none'/>
<properties>
<property name='queueName' value='default'>
</property>
<property name='jobPriority' value='NORMAL'>
</property>
</properties>
</feed>
cleanseEmailProcess <process xmlns='uri:falcon:process:0.1' name='cleanseEmailProcess'>
<tags>cleanse=yes</tags>
<clusters>
<cluster name='primaryCluster'>
<validity start='2017-02-24T17:59Z' end='2099-06-05T18:00Z'/>
</cluster>
</clusters>
<parallel>1</parallel>
<order>FIFO</order>
<frequency>hours(1)</frequency>
<timezone>UTC</timezone>
<inputs>
<input name='input' feed='rawEmailFeed' start='now(0,0)' end='now(0,0)'>
</input>
</inputs>
<outputs>
<output name='output' feed='cleansedEmailFeed' instance='now(0,0)'>
</output>
</outputs>
<workflow name='emailCleanseWorkflow' version='pig-0.13.0' engine='pig' path='/user/ambari-qa/falcon/demo/apps/pig/id.pig'/>
<retry policy='exp-backoff' delay='minutes(3)' attempts='3'/>
<ACL owner='ambari-qa' group='users' permission='0755'/>
</process>
Logs output [root@anafalcon0 ~]# falcon instance -type feed -name rawEmailFeed -logs
Consolidated Status: SUCCEEDED
Instances:
Instance Cluster SourceCluster Status RunID Log
-----------------------------------------------------------------------------------------------
Additional Information:
Response: default/STATUS
Request Id: default/1477825851@qtp-1435229983-77 - f7fb75d8-4221-4481-b333-d4b25d105c02
[root@anafalcon0 ~]# falcon instance -type feed -name cleansedEmailFeed -logs
Consolidated Status: SUCCEEDED
Instances:
Instance Cluster SourceCluster Status RunID Log
-----------------------------------------------------------------------------------------------
2017-02-24T21:59Z backupCluster primaryCluster WAITING latest -
Additional Information:
Response: default/STATUS
Request Id: default/1477825851@qtp-1435229983-77 - 2c22df87-3e68-4af6-b483-3dba6ae85bb3
[root@anafalcon0 ~]# falcon instance -type process -name cleanseEmailProcess -logs
Consolidated Status: SUCCEEDED
Instances:
Instance Cluster SourceCluster Status RunID Log
-----------------------------------------------------------------------------------------------
2017-02-24T21:59Z primaryCluster - WAITING latest -
2017-02-24T21:59Z primaryCluster - WAITING latest -
2017-02-24T20:59Z primaryCluster - WAITING latest -
2017-02-24T19:59Z primaryCluster - WAITING latest -
2017-02-24T18:59Z primaryCluster - WAITING latest -
2017-02-24T17:59Z primaryCluster - WAITING latest -
Additional Information:
Response: default/STATUS
Request Id: default/1477825851@qtp-1435229983-77 - ecdd07db-ca6c-42a6-b2cd-c0ac02faa116
[root@anafalcon0 ~]# falcon instance -type process -name rawEmailIngestProcess -logs
Consolidated Status: SUCCEEDED
Instances:
Instance Cluster SourceCluster Status RunID Log
-----------------------------------------------------------------------------------------------
2017-02-24T21:59Z primaryCluster - SUCCEEDED latest http://anafalcon0.test.com:50070/data/apps/falcon/primaryCluster/staging/falcon/workflows/process/rawEmailIngestProcess/logs/job-2017-02-24-21-59/000/oozie.log
2017-02-24T20:59Z primaryCluster - SUCCEEDED latest http://anafalcon0.test.com:50070/data/apps/falcon/primaryCluster/staging/falcon/workflows/process/rawEmailIngestProcess/logs/job-2017-02-24-20-59/000/oozie.log
2017-02-24T19:59Z primaryCluster - SUCCEEDED latest http://anafalcon0.test.com:50070/data/apps/falcon/primaryCluster/staging/falcon/workflows/process/rawEmailIngestProcess/logs/job-2017-02-24-19-59/000/oozie.log
2017-02-24T18:59Z primaryCluster - SUCCEEDED latest http://anafalcon0.test.com:50070/data/apps/falcon/primaryCluster/staging/falcon/workflows/process/rawEmailIngestProcess/logs/job-2017-02-24-18-59/000/oozie.log
2017-02-24T17:59Z primaryCluster - SUCCEEDED latest http://anafalcon0.test.com:50070/data/apps/falcon/primaryCluster/staging/falcon/workflows/process/rawEmailIngestProcess/logs/job-2017-02-24-17-59/001/oozie.log
Additional Information:
Response: default/STATUS
Request Id: default/1477825851@qtp-1435229983-77 - 29845518-e2ac-4363-a9ab-9a0fea5f60e2
Appreciate any help! Thanks in advance!
... View more
Labels:
- Labels:
-
Apache Falcon
02-22-2017
05:45 PM
hm, it's hard to tell. I doubt it would be any hadoop-specific configuration because it's fast on the NameNode machine and you would have the same *-site.xml files on the others. Are those other cluster members VMs or still physical? Do they all consistently behave the same way? What else is running on them? Also have you had a look at whether those other cluster machines are busy - e.g. if they have enough free memory to run the JVM for example.
... View more
02-22-2017
04:56 PM
Hi @Silvio del Val. Is there good communication performance between the other nodes and the NameNode in general? Try to test how quickly ping works, for example. Those commands will contact the namenode for the information you request, so I'm thinking there might be some problem with your network performance.
... View more
11-22-2016
10:32 AM
1 Kudo
Hi @J. D. Bacolod - please see this article I wrote a while ago which explains how Ranger works: https://community.hortonworks.com/content/kbentry/49177/how-do-ranger-policies-work-in-relation-to-hdfs-po.html From HDP 2.5, there is also the potential to Deny access explicitly via a Deny policy. See this article on how to enable them: https://community.hortonworks.com/content/kbentry/61208/how-to-enable-deny-conditions-and-excludes-in-rang.html Hope this helps!
... View more
11-09-2016
10:02 AM
@vamsi valiveti the jar file is from the XML SerDe created by the community and is available on github: https://github.com/dvasilen/Hive-XML-SerDe
... View more
10-20-2016
04:50 PM
Hi @Jasper sorry just spotted this comment. That's interesting - I used the same technique for all of them. Did you get it working since then? Since you will have downloaded it in the first curl I'm guessing the URL is right. Silly question, but is the hive plugin enabled?
... View more
10-12-2016
04:59 PM
13 Kudos
In HDP 2.5, the addition of RANGER-606 has introduced the ability to explicity deny access to a Hadoop resource via a Ranger Policy.
RANGER-876 makes these types of policies optional by default for all except tag-based policies. To enable them, you must set enableDenyAndExceptionsInPolicies to true in the Service Definition for each of the Ranger Repository types as below, via the REST API: {
"name": "hdfs",
"description": "HDFS Repository",
"options": {
"enableDenyAndExceptionsInPolicies": "true"
}
} How To If deny policies are not enabled, the Ranger “Create Policy” UI will look like this: Get the current service definition of the desired repository via a curl command and output to a file: curl -u admin:admin ranger-admin-host.hortonworks.com:6080/service/public/v2/api/servicedef/1 > hdfs.json It should look something like this: {"id":1,"guid":"0d047247-bafe-4cf8-8e9b-d5d377284b2d","isEnabled":true,"createTime":1476173228000,"updateTime":1476173228000,"version":1,"name":"hdfs","implClass":"org.apache.ranger.services.hdfs.RangerServiceHdfs","label":"HDFS Repository","description":"HDFS Repository","options":{},"configs":[{"itemId":1,"name":"username","type":"string","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Username"},{"itemId":2,"name":"password","type":"password","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Password"},{"itemId":3,"name":"fs.default.name","type":"string","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Namenode URL"},{"itemId":4,"name":"hadoop.security.authorization","type":"bool","subType":"YesTrue:NoFalse","mandatory":true,"defaultValue":"false","validationRegEx":"","validationMessage":"","uiHint":"","label":"Authorization Enabled"},{"itemId":5,"name":"hadoop.security.authentication","type":"enum","subType":"authnType","mandatory":true,"defaultValue":"simple","validationRegEx":"","validationMessage":"","uiHint":"","label":"Authentication Type"},{"itemId":6,"name":"hadoop.security.auth_to_local","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":7,"name":"dfs.datanode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":8,"name":"dfs.namenode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":9,"name":"dfs.secondary.namenode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":10,"name":"hadoop.rpc.protection","type":"enum","subType":"rpcProtection","mandatory":false,"defaultValue":"authentication","validationRegEx":"","validationMessage":"","uiHint":"","label":"RPC Protection Type"},{"itemId":11,"name":"commonNameForCertificate","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Common Name for Certificate"}],"resources":[{"itemId":1,"name":"path","type":"path","level":10,"mandatory":true,"lookupSupported":true,"recursiveSupported":true,"excludesSupported":false,"matcher":"org.apache.ranger.plugin.resourcematcher.RangerPathResourceMatcher","matcherOptions":{"wildCard":"true","ignoreCase":"false"},"validationRegEx":"","validationMessage":"","uiHint":"","label":"Resource Path","description":"HDFS file or directory path"}],"accessTypes":[{"itemId":1,"name":"read","label":"Read","impliedGrants":[]},{"itemId":2,"name":"write","label":"Write","impliedGrants":[]},{"itemId":3,"name":"execute","label":"Execute","impliedGrants":[]}],"policyConditions":[],"contextEnrichers":[],"enums":[{"itemId":1,"name":"authnType","elements":[{"itemId":1,"name":"simple","label":"Simple"},{"itemId":2,"name":"kerberos","label":"Kerberos"}],"defaultIndex":0},{"itemId":2,"name":"rpcProtection","elements":[{"itemId":1,"name":"authentication","label":"Authentication"},{"itemId":2,"name":"integrity","label":"Integrity"},{"itemId":3,"name":"privacy","label":"Privacy"}],"defaultIndex":0}],"dataMaskDef":{"maskTypes":[],"accessTypes":[],"resources":[]},"rowFilterDef":{"accessTypes":[],"resources":[]}} 2. Update the file to add "options":{"enableDenyAndExceptionsInPolicies":"true"} {"id":1,"guid":"0d047247-bafe-4cf8-8e9b-d5d377284b2d","isEnabled":true,"createdBy":"Admin","updatedBy":"Admin","createTime":1476173228000,"updateTime":1476287031622,"version":2,"name":"hdfs","implClass":"org.apache.ranger.services.hdfs.RangerServiceHdfs","label":"HDFS Repository","description":"HDFS Repository","options":{"enableDenyAndExceptionsInPolicies":"true"},"configs":[{"itemId":1,"name":"username","type":"string","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Username"},{"itemId":2,"name":"password","type":"password","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Password"},{"itemId":3,"name":"fs.default.name","type":"string","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Namenode URL"},{"itemId":4,"name":"hadoop.security.authorization","type":"bool","subType":"YesTrue:NoFalse","mandatory":true,"defaultValue":"false","validationRegEx":"","validationMessage":"","uiHint":"","label":"Authorization Enabled"},{"itemId":5,"name":"hadoop.security.authentication","type":"enum","subType":"authnType","mandatory":true,"defaultValue":"simple","validationRegEx":"","validationMessage":"","uiHint":"","label":"Authentication Type"},{"itemId":6,"name":"hadoop.security.auth_to_local","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":7,"name":"dfs.datanode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":8,"name":"dfs.namenode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":9,"name":"dfs.secondary.namenode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":10,"name":"hadoop.rpc.protection","type":"enum","subType":"rpcProtection","mandatory":false,"defaultValue":"authentication","validationRegEx":"","validationMessage":"","uiHint":"","label":"RPC Protection Type"},{"itemId":11,"name":"commonNameForCertificate","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Common Name for Certificate"}],"resources":[{"itemId":1,"name":"path","type":"path","level":10,"mandatory":true,"lookupSupported":true,"recursiveSupported":true,"excludesSupported":false,"matcher":"org.apache.ranger.plugin.resourcematcher.RangerPathResourceMatcher","matcherOptions":{"wildCard":"true","ignoreCase":"false"},"validationRegEx":"","validationMessage":"","uiHint":"","label":"Resource Path","description":"HDFS file or directory path"}],"accessTypes":[{"itemId":1,"name":"read","label":"Read","impliedGrants":[]},{"itemId":2,"name":"write","label":"Write","impliedGrants":[]},{"itemId":3,"name":"execute","label":"Execute","impliedGrants":[]}],"policyConditions":[],"contextEnrichers":[],"enums":[{"itemId":1,"name":"authnType","elements":[{"itemId":1,"name* Connection #0 to host ana-sme-security2.field.hortonworks.com left intact
":"simple","label":"Simple"},{"itemId":2,"name":"kerberos","label":"Kerberos"}],"defaultIndex":0},{"itemId":2,"name":"rpcProtection","elements":[{"itemId":1,"name":"authentication","label":"Authentication"},{"itemId":2,"name":"integrity","label":"Integrity"},{"itemId":3,"name":"privacy","label":"Privacy"}],"defaultIndex":0}],"dataMaskDef":{"maskTypes":[],"accessTypes":[],"resources":[]},"rowFilterDef":{"accessTypes":[],"resources":[]}} 3. Put the updated file back into the Service Definition: curl -iv -u admin:admin -X PUT -H "Accept: application/json" -H "Content-Type: application/json" -d @hdfs.json ranger-admin-host.hortonworks.com:6080/service/public/v2/api/servicedef/1
If successful, the Ranger “Create Policy” UI will look like this: 4. Repeat for any other desired repository.
References Apache Ranger Wiki: Deny Conditions and Excludes in Ranger Policies Apache Ranger Wiki: REST APIs for Service Definition, Service and Policy Management
... View more
Labels:
08-09-2016
12:05 PM
Do you use Ambari? If so, then there should be no problems as the service user information is recorded in Ambari and propagates through with upgrades. If you don't, then you probably just have to make sure you copy across any relevant configuration parameters, remember to start your services as that user and check HDFS permissions are correct as part of the post-upgrade validation process. As you say, custom users are totally supported, so you shouldn't experience any problems.
... View more
08-09-2016
11:58 AM
Hi @Avijeet Dash - audit to Solr is now enabled by default. You can go into Ranger Configs in Ambari and disable it using the slider button as in the screenshot below:
... View more
08-08-2016
03:57 PM
1 Kudo
Hi @Raghu Udiyar that BUG ID is a Hortonworks internal ID which corresponds to Apache JIRA HIVE-10500 That particular bug has been "backported" to previous versions of Hive in the HDP distribution, so it is fixed in HDP versions 2.2.6 and higher.
... View more