Member since
04-27-2016
218
Posts
133
Kudos Received
25
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3186 | 08-31-2017 03:34 PM | |
6534 | 02-08-2017 03:17 AM | |
2748 | 01-24-2017 03:37 AM | |
9663 | 01-19-2017 03:57 AM | |
5265 | 01-17-2017 09:51 PM |
08-25-2016
07:14 PM
@Constantin Stanca I am exploring the NiFi option as ibm xml serde having issues processing multi byte chars.
... View more
08-23-2016
01:29 AM
2 Kudos
What are the different options to transfer data from old cluster to new one. (HDFS/Hive/HBase) ?
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
08-18-2016
08:33 PM
1 Kudo
Please make sure you use the unique stack name. "e.g default value is HortonworksCloudController" Change it to something unique and try again.
... View more
08-12-2016
03:09 PM
1 Kudo
Make sure you define the Auto terminate relationship properly and handled the exception in the PutFile processor. Make sure to route the failed message to failure queue. You should able to see the following exception in the log. 2016-08-12 11:02:41,628 INFO [Timer-Driven Process Thread-10] o.a.nifi.processors.standard.PutFile PutFile[id=4540b2c4-9425-4817-a7e1-d9d2930a2d5b] Penalizing StandardFlowFileRecord[uuid=51e1f6cb-3a31-42e3-bd65-1f6a1f20eff0,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1471013498033-1, container=default, section=1], offset=23989, length=806],offset=0,name=903851051716952,size=806] and routing to 'failure' because the output directory /Users/mpandit/test has 2 files, which exceeds the configured maximum number of files
... View more
08-12-2016
02:00 PM
Can you please try putting your jars under <nifi_home>/lib dir.
... View more
08-11-2016
08:39 PM
can you please provide the additional logs by setting <logger name="org.apache.nifi.remote" level="DEBUG"/> in your logback.xml.
... View more
08-11-2016
07:43 PM
Can you please make sure to disable the firewall on the nifi node.( service iptables stop) and try it again.
... View more
08-04-2016
11:03 AM
11 Kudos
Introduction Hortonworks
Dataflow (HDF) powered by Apache NiFi, kafka and Storm, collects, curates,
analyzes and delivers real-time data from the IoAT to data stores both
on-premises and in the cloud. Apache
NiFi automates and manages the flow of information between systems. NiFi data
flows are made of series of processors each with specific task. NiFi provides
hundreds of general purpose processors. NiFi can pull data from various sources.
This document mainly discusses in detail the integration with AWS S3 data
source. Amazon
S3 is cloud storage for the Internet. To upload your data (photos, videos,
documents etc.), you first create a bucket in one of the AWS regions. You can
then upload any number of objects to the bucket. S3 buckets and objects are
resources, and Amazon S3 provides APIs for you to manage them. With
many existing AWS customers there is need to integrate with S3 to process data
across multiple applications. NiFi provides many processors to manage and
process S3 objects integrating with S3 buckets. This document outlines the
detail setup and configuration to integrate S3 with Apache NiFi. Business Use cases Many customers are utilizing the Amazon S3 storage service to build applications.The application types can be backup & archiving, content storage,
big data analytics, cloud native application data or DR. S3 can be utilized as persistent or temporary
storage. Many applications need to process the data as lands into a S3 bucket,
retrieve the content and log the metadata regarding the S3 object. AWS supports
following destinations where it can publish S3 related events.
Amazon SNS topic Amazon SQS queue AWS lambda This document describes putting
and extracting data object from amazon S3 using Apache NiFi leveraging the
Amazon SQS notifications. Solution
Architecture with Apache NiFi The main purpose of the document is to showcase
the ease of integration with S3 using Apache NiFi. In the sample demo scenario:
The cloud
NiFi instance creates the data object to S3 bucket using PutS3Object
processor. As the new
object is created at S3, it sends out notification in JSON object form to
amazon SQS queue. GetSQS
processor subscribes to the SQS event and retrieves the newly created S3 object
metadata. The
FetchS3Object extracts the content of newly created object and sends it downstream systems for further processing. AWS Configurations This section
describes the setup and configuration on AWS side (SQS & S3 bucket) based
on the scenario described in the previous section. Make sure to login to the AWS
dashboard and select appropriate product section to configure the S3 buckets
and SQS queues. SQS One of the
ways to monitor S3 bucket is to use SQS notifications. First create
the SQS Queue as shown below e.g. NiFiEvent. Configure
the security and appropriate permissions to the SQS queue so that SQS queue can
be utilized for the S3 bucket events. S3 Bucket Once
the SQS configuration is done, create the S3 bucket (e.g. mphdf). Adding a folder named "orderEvent" to the S3
bucket. Go to the properties section and make sure to configure Permissions,
Event notification and policy to the S3 bucket. For
permissions, add the appropriate account to include list, upload, delete, view
and Edit permissions to the bucket. Creating an AWS
bucket policy, so that specific accounts have permissions to manage the S3
bucket objects. The policy
output will be JSON object which you can update in the bucket configuration. Next
configure the event notification to publish all the relevant events to SQS
queues which we created earlier. Make sure to select the Type as SQS. The AWS side configuration is now complete. Next, build the NiFi dataflow using the NiFi processors
for S3 buckets and SQS. NiFi DataFlow Configuration To
demonstrate the S3 integration I modified the existing NiFi data flow. There
are two NiFi dataflows, one to publish the object to S3 and second one is to
extract the object content through SQS notification. Flow
1 : Accept the ERP Events --> transform to JSON --> PutS3Object Flow
2 : GetSQS notification -->Get Object metadata --> FetchS3Object The complete flow is shown below. PutS3Object Processor This
processor creates the object in S3 bucket. The configuration is shown
below. Make sure to use the correct
bucket name, owner (aws account) , access key and secret key. To get the Access
and secret keys go the AWS IAM console -> users ->security
credentials->create access keys. You can download or select Show User
Security Credentials options to get Access key and Secret access key. Make sure
to select the correct Region. GetSQS Processor IT
fetches messages from an AWS SQS queue. The Queue URL you can get from AWS SQS
details section. FetchS3Object Processor Retrieves the contents of an S3
Object and writes it to the content of a FlowFile. The "object key" is nothing
but the key attribute value from incoming SQS notification JSON message. The "objectname" property is populated by JsonPath expression $.Records[*].s3.object.key Testing NiFi DataFlow For testing
purpose, the existing dataflow is used which:
pulls the ERP events from JMS queue. The
event xml is mapped to JSON object and the object is pushed to AWS S3 bucket. The successful creation of S3 object triggers
a notification to SQS queue. The second NiFi dataflow reads the SQS event
through GetSQS processor. It parses the JSON metadata information and extracts
the object name from SQS message. Next it uses the fetchS3Object NiFi processor
to extract the S3 object content and then writes it to the local file system. Following
are some data provenance screenshots from NiFi which proves the successful
creation of S3 object and then processing the SQS notification to extract and
process the S3 object through NiFi in near real time fashion. PutS3Object data provenance S3 Bucket
browser view (shows the newly created object). GetSQS data provenance Showed the SQS event getting triggered for newly created S3
object. Sample SQS Notification message The highlighted value shows
the object name which will be extracted and used to retrieve the object from S3
bucket. {"Records":[{"eventVersion":"2.0","eventSource":"aws:s3","awsRegion":"us-east-1","eventTime":"2016-08-03T14:25:13.147Z","eventName":"ObjectCreated:Put","userIdentity":{"principalId":"AWS:AIDAJL3JQI6HZAG3MB6JM"},"requestParameters":{"sourceIPAddress":"71.127.248.137"},"responseElements":{"x-amz-request-id":"9A673206F1BFDE85","x-amz-id-2":"UmcqEKQJyXfH+UlgDTWIMfvQDOhuOWREe/lwUSJdMx9CbgCu7wzPWJL+wCeRzL6dgqsnYTopWrM="},"s3":{"s3SchemaVersion":"1.0","configurationId":"NiFiEvents","bucket":{"name":"mphdf","ownerIdentity":{"principalId":"A1O76DMOXCPR44"},"arn":"arn:aws:s3:::mphdf"},"object":{"key":"orderEvent/651155932749796","size":804,"eTag":"53ca61e19b3223763f1b36ed0e9383fa","sequencer":"0057A1FEC9137FBF3A"}}}]} FetchS3Bucket data provenance The content from the newly created S3 object extracted using
the FetchS3Bucket processor. Apache NiFi Benefits NiFi is data source and
destination-agnostic. it can move data from any data platform to any data
platform. The S3 example demonstrates it can integrate well with non Hadoop
echo system as well. The NiFi project provides
connection processors for many standard data sources like S3. With standard
interfaces new processors can be developed with minimal effort. You can create NiFi dataflow
templates to accelerate development. Apache NiFi is ideal for data
sources sitting out on the edge in the cloud or on-prem. Document References NiFi System Admin Guide: http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDF1/HDF-1-trunk/bk_AdminGuide/content/ch_administration_guide.html https://aws.amazon.com/s3/?nc2=h_l3_sc https://aws.amazon.com/sqs/?nc2=h_m1 https://nifi.apache.org/docs/nifi-docs/
... View more
Labels: