Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4023 | 08-20-2018 08:26 PM | |
| 1929 | 08-15-2018 01:59 PM | |
| 2359 | 08-13-2018 02:20 PM | |
| 4072 | 07-23-2018 04:37 PM | |
| 4990 | 07-19-2018 12:52 PM |
05-06-2016
05:31 AM
Yes, Cloudbreak can install a cluster without internet connection if you set the local repo, skip SSSD, and live with limitation of recipes. But somehow you have to install Cloudbreak itself, which is impossible withoout internet connection. Easiest way to install and manage Cloudbreak installation is to use Cloudbreak deployer. But some of the commands are dowloading content, like init, upgrade, version, doctor. So first you have to install deployer with internet connection, for example you create an AMI which contains an already installed deployer and cloudbreak, and than in the secured network you start an instance of the created AMI.
... View more
05-03-2016
10:18 AM
Could you add: ansible.verbose = "vvv" to /Users/smanjee/Documents/Hortonworks/Products/incubator-metron-Metron_0.1BETA_rc7/deployment/vagrant/singlenode-vagrant/Vagrantfile between lines 61-62, re-run and re-attach ansible.log? Also, please run the following from /Users/smanjee/Documents/Hortonworks/Products/incubator-metron-Metron_0.1BETA_rc7/deployment/vagrant/singlenode-vagrant and attach the output. # is this a git repo?
IS_GIT_REPO=`git rev-parse --is-inside-work-tree`
if [ "$IS_GIT_REPO" == "true" ]; then
# current branch
echo "--"
git branch | grep "*"
# last commit
echo "--"
git log -n 1
# local changes since last commit
echo "--"
git diff --stat
fi
# ansible
echo "--"
ansible --version
# vagrant
echo "--"
vagrant --version
# python
echo "--"
python --version 2>&1
# maven
echo "--"
mvn --version
# operating system
echo "--"
uname -a Thanks! -David...
... View more
05-01-2016
04:54 PM
There is a JIRA in the community to use a NFS location as a input/output for a process (FALCON-1785).
... View more
05-01-2016
12:19 AM
I got distcp working between HDP sandbox and isilon based HDP cluster both ways. If you want to use Falcon, to backup from HDP non-isilon cluster to HDP isilon cluster, this should work as well since its distcp based.
... View more
04-28-2016
03:04 PM
3 Kudos
It does not need to be on the source server, it is listening on the port in the processor for incoming connections. A syslog server can be configured to forward messages to the host & port that the processor is listening on. Once it accepts connections it will be reading as fast as possible. The main properties to consider customizing are the Max Batch Size, Max Size of Socket Buffer Size, and Max Size of Message Queue Size, but this depends a lot on your environment and the amount of data.
... View more
04-27-2016
03:05 AM
Falcon by default coes with authorization turned off. To turn on set the following through ambari falcon config: *.falcon.security.authorization.enabled = true *.falcon.security.authorization.superusergroup = <linux group> In my example I am using linux group users In my example oozie,ambari-qa,tez,falcon,hue,guest are in the group "users". The purpose of this group is to only allow users within this group to view, edit, and delete each others material. Any user outside this group should not have access. Now logging in a falcon who is part of users group: Falcon user has created a cluster "authTest" and feed "feed1" Lets view it: Great so falcon is see the feed and cluster. Now lets go in with user hdfs who is NOT part of group users Logged in as user hdfs who is NOT part of group users. This user will do a simple search for everything cluster/feed entity which exist in falcon. So hdfs user search does not return anything since the use is not allowed. Now lets log in with user tez who IS part of user group users User tez will do a simple search for everything cluster/feed entity which exist in falcon. As you can see tez is able to view what user falcon created since they are part of the same group. user hdfs was not since it is not part of the same group.
... View more
Labels:
05-06-2016
03:29 AM
3 Kudos
@Sunile Manjee The supported policies for late data handling are:
backoff: Take the maximum late cut-off and check every specified time. exp-backoff (default): Recommended. Take the maximum cut-off date and check on an exponentially determined time. final:Take the maximum late cut-off and check once. For example, a late cut-off of hours (8) means data can be delayed by up to 8 hours: <late-arrival cut-off="hours(6)”/> The, late input in the following process specification is handled by the /apps/myapp/latehandle workflow: <late-process policy="exp-backoff" delay="hours(2)”> <late-input input="input" workflow-path="/apps/myapp/latehandle" /> </late-process> So this means that for 8 hours till feed arrives the workflow will be retried. Once the feed arrives within that window, the window will be reset. Now inside /apps/myapp/latehandle you can put your own logic (It may be a sqoop/hive/shell etc etc). The processing here will determine what will happen to that late feed. For simplified scenarios we can run the actual workflow or might modify for a special workflow which handles the dependencies and boundary cases. Thanks
... View more
04-26-2016
07:59 PM
@Sunile Manjee The dependencies are derived based on the entity description, once you create those entities using Falcon (UI or CLI). So for e.g., you define your cluster in the cluster entity xml, you specify the name.. <cluster colo="location1" description="primaryDemoCluster" name="primaryCluster" xmlns="uri:falcon:cluster:0.1"> When you define this cluster in a feed entity, the dependency gets created when you create the feed entity.. <feed description="Demo Input Data" name="demoEventData" xmlns="uri:falcon:feed:0.1">
<tags>externalSystem=eventData,classification=clinicalResearch</tags>
<groups>events</groups>
<frequency>minutes(3)</frequency>
<timezone>GMT+00:00</timezone>
<late-arrival cut-off="hours(4)"/>
<clusters>
<cluster name="primaryCluster" type="source">
<validity start="2015-08-10T08:00Z" end="2016-02-08T22:00Z"/>
<retention limit="days(5)" action="delete"/>
</cluster>
</clusters>
The same concept applies to processes to feed dependencies.. Take a look at this example for working set of falcon entities - https://github.com/sainib/hadoop-data-pipeline/tree/master/falcon
... View more
05-24-2016
05:03 PM
Is this working in practice?
... View more
04-26-2016
07:59 PM
Really weird, but I'm glad you got it worked out. Cheers
... View more