Member since
04-27-2016
218
Posts
133
Kudos Received
25
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1938 | 08-31-2017 03:34 PM | |
4224 | 02-08-2017 03:17 AM | |
1528 | 01-24-2017 03:37 AM | |
6523 | 01-19-2017 03:57 AM | |
3699 | 01-17-2017 09:51 PM |
01-22-2019
04:45 PM
Ananya, script was updated long back to take care of this. You should able to use existing vpc and subnet. Only issue you might face if internet gateway is already attached to vpc as script prefers to add new internet gateway.
... View more
12-14-2018
10:59 PM
4 Kudos
Overview
There are many useful articles as well official Cloudbreak
documentation covers everything in great depth. This short article walks you
through how to deploy the Cloudbreak instance within the existing VPC and
Subnet using the AWS Quickstart deployment.
Cloudbreak deployment options
The cloudbreak deployment options are explained in detail here.
If you notice the AWS specific Networking options, the Quickstart by default creates the new VPC and for the custom VPC the production
install is recommended.
In case you are doing poc and quickly want to try the Quickstart option but wanted to use the existing VPC, you can do that by enhancing
the CloudFormation template which is described in next session.
CloudFormation template changes
When you launch the CloudFormation template for AWS Quickstart by default it selects the existing CloudFormation template https://s3.amazonaws.com/cbd-quickstart/cbd-quickstart-2.7.0.template.
Instead of using the default template use the following template.
https://github.com/mpandithw/cloudbreak/blob/master/CloudFormation_aws_QuickStart-Template
Mainly following changes have been made to the original
template.
Added the following two parameters VpcId and SubnetId.
"VpcId": {
"Type": "AWS::EC2::VPC::Id",
"Description": "VpcId of your existing Virtual Private
Cloud (VPC)"
},"SubnetId": {
"Type": "AWS::EC2::Subnet::Id",
"Description": "SubnetId of your existing Virtual Private
Cloud (VPC)"
}
I am not walking through all the detail steps which are already covered in cloudbreak documentation.
Only modification to original process is to select your own CloudFormation template which is described above.
You will get the options with drop down list of existing VPC
and subnet.
Complete the rest of process as explained in the cloudbreak documentation.
Benefits
You can use the AWS
Quickstart deployment of cloudbreak within your existing VPC/Subnet.
Document References aws CloudFormation user guide
... View more
- Find more articles tagged with:
- aws
- Cloudbreak
- cloudformation
- How-ToTutorial
- QuickStart
- Sandbox & Learning
- vpc
Labels:
04-09-2018
08:52 PM
5 Kudos
This is the continuation of article Part-1 provisioning HDP/HDF cluster on google cloud. Now that we have Google credentials created, we can provision the HDP/HDF cluster. Lets first start with HDP cluster. Login to Cloudbreak UI and click on create cluster which will open the create cluster wizard with both basic and advanced options. On the general configuration page Select the previously created Google credentials, Enter name of the cluster , Select region as shown below, Select either HDP or HDF version. For cluster type select the appropriate cluster blueprint based on your requirements. The available blueprint option in cloudbreak 2.5 tech preview are shown below. Next is configuring the Hardware and Storage piece. Select the Google VM instance type from the dropdown. Enter number of instances for each group. You must select one node for ambari server for one of the host group for which the Group Size should be set to "1". Next is setup the Network group. You can select the existing network or you have option to create new network. On the Security config page provide the cluster admin username and password. Select the new ssh key public key option or the existing ssh public key option. You will use the matching private key to access your nodes via SSH. Finally you will hit create cluster which will redirect you to cloudbreak dashboard. The following left image shows the cluster creation in progress and right image shows the successfully creation of HDP cluster on Google cloud. Once successful deploying the HDP cluster you can login to HDP nodes using your ssh private key with choice of your tool. Following image shows the node login using google cloud browser option. Similarly you can provision the HDF (NiFi: Flow management ) cluster using cloudbreak which is included as part of 2.5 tech preview. Following are some key screenshots for the reference. The Network, Storage and security configuration is similar as we have seen in HDP section earlier. With limitation with my google cloud account subscription I ran into the exception while creating HDF cluster which was rightly shown on cloudbreak. I had to select different region to resolve it. The nifi cluster got created successfully as shown below. Conclusion: Cloudbreak can provide you the easy button to provision and monitor the connected data platform (HDP and HDF) in the cloud vendor of your choice to build the modern data applications.
... View more
- Find more articles tagged with:
- Cloud & Operations
- Cloudbreak
- FAQ
- gcp
- how-to-tutorial
- How-ToTutorial
Labels:
04-09-2018
05:42 PM
5 Kudos
Cloudbreak Overview Overview Cloudbreak enables enterprises to provision Hortonworks
platforms in Public (AWS + GCP + Azure) and Private (OpenStack) cloud
environments. It simplifies the provisioning, management, and monitoring of
on-demand HDP and HDF clusters in virtual and cloud environments. Following are primary use cases for Cloudbreak:
Dynamically configure and manage
clusters on public or private clouds. Seamlessly manage elasticity
requirements as cluster workloads change Supports configuration defining
network boundaries and configuring security groups. This article focuses on deploying HDP and HDF cluster on Google
Cloud. Cloudbreak Benefits You can spin up connected data platform (HDP and HDF clusters)
on choice of your cloud vendor using open source Cloudbreak 2.0 which address
the following scenarios.
Defining the comprehensive
Data Strategy irrespective of deployment architecture (cloud or on premise). Addressing the Hybrid (on-premise
& cloud) requirements. Supporting the key Multi-cloud
approach requirements. Consistent and familiar
security and governance across on-premise and cloud environments. Cloudbreak 2 Enhancements Recently Hortonworks announced the general Availability of the
Cloudbreak 2.4 release. Following are some of the major enhancements in the
Cloudbreak 2.4:
New UX / UI: a greatly simplified and streamlined user
experience. New CLI: a new CLI that eases automation, an important
capability for cloud DevOps.
Custom Images: advanced support for “bring your own image”, a
critical feature to meet enterprise infrastructure requirements.
Kerberos: ability to enable Kerberos security on your
clusters, must for any enterprise deployment. You can check the following HCC article for detail overview
of Cloudbreak 2.4 https://community.hortonworks.com/articles/174532/overview-of-cloudbreak-240.html Also check the following article for the Cloudbreak 2.5 tech
preview details. https://community.hortonworks.com/content/kbentry/182293/whats-new-in-cloudbreak-250-tp.html Prerequisites for
Google Cloud Platform. Article assumes that you have already installed and launch
the Cloudbreak instance either on your own custom VM image or on Google Cloud
Platform. You can follow the Cloudbreak documentation which describes
both the options. https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.5.0/content/index.html https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.5.0/content/gcp-launch/index.html
In order to launch the Cloudbreak
and provision the clusters make sure you have the Google cloud account. You can
create one at https://console.cloud.google.com Create new project in GCP
(e.g. GCPIntegration project as shown below).
In order to launch the
clusters on GCP you must have service account that Cloudbreak can use. Assign
the admin roles for the Compute Engine and Storage. You can check the required service account admin roles at Admin Roles Make sure you create the P12 key and store it safely.
This article assumes that you have successfully meet the prereqs and able to launch the cloudbreak UI as shown left below by visiting https://<IP_Addr or HostName> and Upon successful login you are redirected to the dashboard which looks like the image on right. Create Cloudbreak Credential for GCP. First step before provisioning cluster is to create the Cloudbreak credential for GCP. Cloudbreak uses this GCP credentials to create the required resources on GCP. Following are steps to create GCP credential:
In Cloudbreak UI select credentials from Navigation pane and click create credentials. Under cloud provider select Google Cloud Platform.
As shown below provide the Google project id, Service Account email id from google project and upload the P12 key that you created the above section.
Once you provide all the right details , cloudbreak will create the GCP credential and that should be displayed in the Credential pane. Next article Part 2 covers in detail how to provision the HDP and HDF cluster using the GCP credential.
... View more
- Find more articles tagged with:
- Cloud & Operations
- Cloudbreak
- FAQ
- gcp
- how-to-tutorial
Labels:
04-04-2018
01:34 PM
1 Kudo
Please confirm if you tried deleting the flow repository at. $nifi_home/flowfile_repository. Also take backup of flow.xml.gz file. delete it and try again. The file is in your conf dir.
... View more
04-04-2018
12:55 PM
1 Kudo
Please tell us what's the cloudbreak image location you use to import the image and whats the value you set up for CB_LATEST_IMAGE variable.
... View more
04-04-2018
12:42 PM
Can you please confirm your classpath setting and make sure its pointing to correct version of NiFI.
... View more
04-04-2018
12:21 PM
1 Kudo
To define the right set of policies its important to understand how ranger policy engine evaluates the policies. Once the list of tags for the requested resource are found, Apache Ranger policy engine will evaluate the tag-based-policies applicable for the tags. 1. If a policy for one of these tag results in deny, the access will be denied. 2. If none of the tags are denied and if a policy allows for one of the tags, the access will be allowed. 3. If there is no result for any tag or if there are no tags for the resource, the policy engine will evaluate the resource-based policies to make the authorization decision. For masking, To exclude specific users/groups from column-masking, create a policy-item for specific users/groups with ‘Unmasked’ as the masking option and ensure that the policy-item is the first one to appear in the list for the users/groups. I hope this helps.
... View more
12-21-2017
03:10 PM
You can you achieve this with Single NiFi flow : Get CSV FIle using Get FTP Processor ---> PUT HDFS Processor,
... View more
12-21-2017
02:29 PM
NiFi does not replicate data. If you lose a node, then flow can be directed to a available node , flowfile queued for the failed node will either wait until the node comes up or the flowfile is manually is sent to another working node. There is feature proposal for this https://cwiki.apache.org/confluence/display/NIFI/Data+Replication
... View more
12-21-2017
02:17 PM
What NiFI version are u using? You might be running into https://issues.apache.org/jira/browse/NIFI-516, which is already fixed. If you want to merge group of 1400 flowfiles into single file every time, then you should set the Minimum Number of entries as 1400. You can significantly lower your maximum number of Bins based on system resources. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.MergeContent/index.html
... View more
12-21-2017
01:56 PM
Verify your HDP/Ambari version to see if you are running into this issue. https://issues.apache.org/jira/browse/AMBARI-22005
... View more
09-05-2017
04:13 AM
Not sure if any additional configuration required at beeline as Hive view jobs are showing in Tez view.
... View more
08-31-2017
09:17 PM
1 Kudo
Whats the parameter I should be setting up in beeline so that I can view all the Jobs in Tez view. If I use Hive view I am able to see all the jobs.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
-
Apache Tez
08-31-2017
03:34 PM
It started working after I disabled the vectorized execution mode. set hive.vectorized.execution.enabled = false;
... View more
08-31-2017
03:04 PM
Thanks @Sindhu I tried that but got the same exception. The same query is working fine if I turn on the llap mode.
... View more
08-31-2017
12:47 AM
I am getting following exception while running one the TPCDS benchmarking query in non llap mode. Please advise. ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1504136552198_0002_33_05, diagnostics=[Task failed, taskId=task_1504136552198_0002_33_05_000001, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Async Initialization failed. abortRequested=false
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:464)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:398)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:564)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:516)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:384)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)
... 15 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastLongHashTable.allocateBucketArray(VectorMapJoinFastLongHashTable.java:265)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastLongHashTable.<init>(VectorMapJoinFastLongHashTable.java:279)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastLongHashMap.<init>(VectorMapJoinFastLongHashMap.java:113)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:115)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.<init>(VectorMapJoinFastTableContainer.java:86)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:108)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:315)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:187)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:183)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:91)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:108)
... 4 more
, errorMessage=Cannot recover from this error:java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Async Initialization failed. abortRequested=false
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:464)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:398)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:564)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:516)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:384)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)
... 15 more
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
05-08-2017
10:00 PM
My storm service is down here is the screenshot from Ambari. I am getting connection failed exception for all the services. Connection failed: [Errno 111] Connection refused to <host>:3772.
... View more
- Tags:
- Hadoop Core
- hdp-2.5.0
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
04-21-2017
02:27 AM
@Adda Fuentes Looks like similar issue , can you please check what nar file you have in <nifi_home>/lib folder? e.g. nifi-hive-nar-1.1.0.2.1.2.0-10.nar
... View more
04-04-2017
08:35 PM
@jwitt PutSQL : Insert into PCSOR (policy_num, prem_amount, LOB, AdminSystem, Name, DOB, Phone, Address) values ( '123456','2000.00','PC','PC SOR','John Smith','1/1/1970','212-010-2345','316 Lincoln Rd. Brooklyn, NY 11225') ON DUPLICATE KEY UPDATE Address='316 Lincoln Rd. Brooklyn, NY 11225'; Sample Flow: Connection Pool.
... View more
04-04-2017
08:21 PM
1 Kudo
I am getting following exception with PutSQL processor "2017-04-04 20:18:00,746 ERROR [Timer-Driven Process Thread-10] o.apache.nifi.processors.standard.PutSQL PutSQL[id=aff94ebf-739d-1936-ffff-ffff9e38bbdf] Failed to update database due to a failed batch update. There were a total of 1 FlowFiles that failed, 0 that succeeded, and 0 that were not execute and will be routed to retry;" The same sql if I run through command line works fine. There are no additional details for the exception.
... View more
Labels:
- Labels:
-
Apache NiFi
03-29-2017
08:30 PM
2 Kudos
The idea is to add search criteria based on custom field (part of the event) based on some unique identifier ( e.g. Order Number) . This will help in identifying the unique transaction details. Please suggest if there is any other alternatives to achieve this.
... View more
Labels:
- Labels:
-
Apache NiFi
03-10-2017
04:12 AM
1 Kudo
Currently cloudbreak supports AWS, Google cloud, Azure , Openstack etc. If I would like to make it work with some different cloud vendor, what can be done?
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
02-21-2017
04:14 AM
You might be missing the port forwarding step. Verify this https://hortonworks.com/hadoop-tutorial/port-forwarding-azure-sandbox/
... View more
02-15-2017
06:10 PM
@Tomas Safarik I havent tried it but you can look into procrun. http://commons.apache.org/proper/commons-daemon/procrun.html
... View more
02-15-2017
05:00 PM
@Dawid Glowacki Looking at the exception it seems you have nested record with the same name. Avro does not allow two records with the same name within schema. Try using the namespace to make sure the full record name unique to avoid this issue.
... View more