About sunile_manjee

rkovacs · ‎05-06-2016

Yes, Cloudbreak can install a cluster without internet connection if you set the local repo, skip SSSD, and live with limitation of recipes. But somehow you have to install Cloudbreak itself, which is impossible withoout internet connection. Easiest way to install and manage Cloudbreak installation is to use Cloudbreak deployer. But some of the commands are dowloading content, like init, upgrade, version, doctor. So first you have to install deployer with internet connection, for example you create an AMI which contains an already installed deployer and cloudbreak, and than in the secured network you start an instance of the created AMI.

dlyle · ‎05-03-2016

Could you add: ansible.verbose = "vvv" to /Users/smanjee/Documents/Hortonworks/Products/incubator-metron-Metron_0.1BETA_rc7/deployment/vagrant/singlenode-vagrant/Vagrantfile between lines 61-62, re-run and re-attach ansible.log? Also, please run the following from /Users/smanjee/Documents/Hortonworks/Products/incubator-metron-Metron_0.1BETA_rc7/deployment/vagrant/singlenode-vagrant and attach the output. # is this a git repo? IS_GIT_REPO=`git rev-parse --is-inside-work-tree` if [ "$IS_GIT_REPO" == "true" ]; then # current branch echo "--" git branch | grep "*" # last commit echo "--" git log -n 1 # local changes since last commit echo "--" git diff --stat fi # ansible echo "--" ansible --version # vagrant echo "--" vagrant --version # python echo "--" python --version 2>&1 # maven echo "--" mvn --version # operating system echo "--" uname -a Thanks! -David...

bbukacek · ‎05-01-2016

There is a JIRA in the community to use a NFS location as a input/output for a process (FALCON-1785).

ravi1 · ‎05-01-2016

I got distcp working between HDP sandbox and isilon based HDP cluster both ways. If you want to use Falcon, to backup from HDP non-isilon cluster to HDP isilon cluster, this should work as well since its distcp based.

bbende · ‎04-28-2016

It does not need to be on the source server, it is listening on the port in the processor for incoming connections. A syslog server can be configured to forward messages to the host & port that the processor is listening on. Once it accepts connections it will be reading as fast as possible. The main properties to consider customizing are the Max Batch Size, Max Size of Socket Buffer Size, and Max Size of Message Queue Size, but this depends a lot on your environment and the amount of data.

sunile_manjee · ‎04-27-2016

Falcon by default coes with authorization turned off. To turn on set the following through ambari falcon config: *.falcon.security.authorization.enabled = true *.falcon.security.authorization.superusergroup = <linux group> In my example I am using linux group users In my example oozie,ambari-qa,tez,falcon,hue,guest are in the group "users". The purpose of this group is to only allow users within this group to view, edit, and delete each others material. Any user outside this group should not have access. Now logging in a falcon who is part of users group: Falcon user has created a cluster "authTest" and feed "feed1" Lets view it: Great so falcon is see the feed and cluster. Now lets go in with user hdfs who is NOT part of group users Logged in as user hdfs who is NOT part of group users. This user will do a simple search for everything cluster/feed entity which exist in falcon. So hdfs user search does not return anything since the use is not allowed. Now lets log in with user tez who IS part of user group users User tez will do a simple search for everything cluster/feed entity which exist in falcon. As you can see tez is able to view what user falcon created since they are part of the same group. user hdfs was not since it is not part of the same group.

rbiswas1 · ‎05-06-2016

@Sunile Manjee The supported policies for late data handling are: backoff: Take the maximum late cut-off and check every specified time. exp-backoff (default): Recommended. Take the maximum cut-off date and check on an exponentially determined time. final:Take the maximum late cut-off and check once. For example, a late cut-off of hours (8) means data can be delayed by up to 8 hours: <late-arrival cut-off="hours(6)”/> The, late input in the following process specification is handled by the /apps/myapp/latehandle workflow: <late-process policy="exp-backoff" delay="hours(2)”> <late-input input="input" workflow-path="/apps/myapp/latehandle" /> </late-process> So this means that for 8 hours till feed arrives the workflow will be retried. Once the feed arrives within that window, the window will be reset. Now inside /apps/myapp/latehandle you can put your own logic (It may be a sqoop/hive/shell etc etc). The processing here will determine what will happen to that late feed. For simplified scenarios we can run the actual workflow or might modify for a special workflow which handles the dependencies and boundary cases. Thanks

bsaini · ‎04-26-2016

@Sunile Manjee The dependencies are derived based on the entity description, once you create those entities using Falcon (UI or CLI). So for e.g., you define your cluster in the cluster entity xml, you specify the name.. <cluster colo="location1" description="primaryDemoCluster" name="primaryCluster" xmlns="uri:falcon:cluster:0.1"> When you define this cluster in a feed entity, the dependency gets created when you create the feed entity.. <feed description="Demo Input Data" name="demoEventData" xmlns="uri:falcon:feed:0.1"> <tags>externalSystem=eventData,classification=clinicalResearch</tags> <groups>events</groups> <frequency>minutes(3)</frequency> <timezone>GMT+00:00</timezone> <late-arrival cut-off="hours(4)"/> <clusters> <cluster name="primaryCluster" type="source"> <validity start="2015-08-10T08:00Z" end="2016-02-08T22:00Z"/> <retention limit="days(5)" action="delete"/> </cluster> </clusters> The same concept applies to processes to feed dependencies.. Take a look at this example for working set of falcon entities - https://github.com/sainib/hadoop-data-pipeline/tree/master/falcon

karl_funk · ‎05-24-2016

Is this working in practice?

james_jones · ‎04-26-2016

Really weird, but I'm glad you got it worked out. Cheers

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: Cloudbreak communication with public IPs

Re: Metron singlenode pcap_replay step failing

Re: Does Falcon support backup to NFS-gateway?

Re: Does Falcon support isilon backed cluster?

Re: Does NiFi listenSysLog processor require nifi ...

Enabling Falcon Authorization

Re: How does falcon handle late arriving data on t...

Re: How to create falcon entity dependencies?

Re: Ranger policy replication across clusters

Re: Ranger solr script create_ranger_audits_collec...