Member since
07-30-2019
333
Posts
355
Kudos Received
76
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3360 | 02-17-2017 10:58 PM | |
572 | 02-16-2017 07:55 PM | |
3070 | 12-21-2016 06:24 PM | |
413 | 12-20-2016 01:29 PM | |
327 | 12-16-2016 01:21 PM |
10-16-2015
07:19 PM
Is there a way to tell this SEND event from 'regular flow' SEND events? We have people asking about fishing out this event specifically. Replay would probably be the same implementation.
... View more
10-16-2015
07:02 PM
NiFi has a a special role requirement to allow/disallow downloading the content from the provenance view. However, is it possible to record the fact that someone downloaded the contents of an event?
... View more
Labels:
10-16-2015
06:50 PM
Thinking of a use case when one would pro-actively notify an admin of a potential problem (or just let them now something is happening, but the system is handling it). Is there such an event in NiFi today for backpressure?
... View more
Labels:
10-15-2015
02:42 PM
Yes, it's a known CSV with a header line. I wonder if there's a trick to use the column names and expression language to avoid manual re-typing.
... View more
10-15-2015
02:17 PM
Wes, The next version of NiFi will have new components like FetchSFTP and ListSFTP [1]: FetchSFTP will take an info from an incoming message and download the requested file. ListSFTP will list the contents of a remote dir and send flow files down the pipe to process/download next. I think it will be able to use a distributed cache as well to maintain state on the NiFi state and avoid re-processing (e.g. we don't always have an option of deleting on a remote server). I have already used the FetchSFTP to tell it which files to get, triggered by an external notification. [1] https://issues.apache.org/jira/browse/NIFI-673
... View more
10-15-2015
02:07 PM
2 Kudos
Hi, what's the recommended processor sequence to parse single-line csv entries into JSON? I'm all set on ingest and egress, but a little fuzzy on the conversion part still.
... View more
Labels:
10-14-2015
12:54 PM
1 Kudo
There are 2 areas of focus in your question:
Getting data into HDP. Here, as @David Streever mentioned, NiFi can be a great fit (and maintain data lineage of all streams coming into your data lake). Think of any place you might have considered Flume - it will be a good candidate. Orchestrating processing and feeds in HDP. What you described in the original question were all the right concerns, but you should really be looking at Falcon [1] to have a higher-level visibility and controls into the workflow than Oozie. Falcon will use and generate Oozie workflows under the hood, but will expose a nice DSL and UI for higher-level constructs. [1] http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_data_governance/content/ch_hdp_data_governance_overview.html
... View more
10-14-2015
12:48 PM
FWIW, PermGen space has been removed from Java 8, this last param will generate a warning.
... View more
10-13-2015
11:21 PM
1 Kudo
Had a chance to hash out details with Joe further, posting here for a record. The input/output port terminology can get confusing quickly, based on what side of the communications you're looking at A client adds a Remote Process Group (RPG) to a local canvas/instance. The dialog prompts for a remote NiFi instance (UI port). A remote NiFi instance specifies the nifi.remote.input.socket.port property to designate which port to use for incoming site-to-site communications. A RPG's input ports are then visible in our local NiFi instance as possible connections to send data to (granted we have permissions). A RPG's output ports are optional, and can be used to pull data from remotely, or implement a request/response like pattern with a remote instance. The inbound site-to-site port multiplexes all RPG inbound port communications. In case of a client talking to a remote NiFi cluster the following applies: The port to be specified in the RPG UI is the NiFi Cluster Manager (NCM) address. Technically, in a cluster, talking to a node UI directly is illegal and won't work. A local client must be able to reach every node in a cluster for site-to-site communications. The actual port is specified by each node via nifi.remote.input.socket.port. Some more details here https://nifi.apache.org/docs.html
... View more
10-13-2015
07:02 PM
1 Kudo
This is a bad idea and will be flagged during an architecture review. Master servers must have RAID for production deployments, and we recommend RAID 5 or, more often, 10. No amount of software HA and failover will address a situation when an OS or primary disk is lost if care wasn't taken of the data in advance. Imagine if HA was a solid solution, but client configuration for the HA mode was saved on a disk which failed, without any replica or a redundant drive. Master nodes must have RAID arrays for production.
... View more
10-13-2015
05:53 PM
Thanks Joe, I'll follow up with you offline
... View more
10-13-2015
05:47 PM
Hi, I couldn't find direct links to published dev builds on the Apache site, are they being produced regularly? E.g. we are seeing some new features streaming in and don't always have access to a full local build environment to iterate with the NiFi engineering team. Having automated build infrastructure helps here.
... View more
Labels:
10-13-2015
05:23 PM
Hortonworks supports only the products listed and linked to from this page http://hortonworks.com/hdp/ However, our PS and DS teams have delivered custom solutions using those projects as part of the overall design. It doesn't provide an official endorsement of those in the support offering, however, depends on a customer's risk appetite and any pre-existing vendor support arrangements for those tools.
... View more
10-13-2015
05:06 PM
Hi, can you provide some details? Is there a stack trace? Is a cluster operational? Was it a rolling upgrade? Maybe it's a case for support to dive into?
... View more
10-13-2015
12:58 PM
35 Kudos
Update: added a GitBook link The unofficial little black book of Kerberos, created and maintained by a HWX engineer, Steve Loughran. Lots of questions that you were afraid to ask. Many advanced customers found it a very useful guide, especially if one needs to develop solutions and code for a Kerberized cluster. I felt this guide needed much more exposure than it had so far. All credit goes to @stevel@hortonworks.com https://github.com/steveloughran/kerberos_and_hadoop Click on the Contents links ..or.. Enjoy as a GitBook, readable online, on mobile and exportable as e.g. a PDF: https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/
... View more
- Find more articles tagged with:
- FAQ
- java
- Kerberos
- operations
- Security
10-12-2015
10:27 PM
1 Kudo
Hi, I'd like clarification on the ports required to run NiFi. I was reading through the admin guide herehttps://nifi.apache.org/docs/nifi-docs/html/administration-guide.html Questions: In a non-clustered local mode, what are the minimum ports required for a runtime (on top of data protocol ones, which will naturally differ based on the flow)? Anything beyond the web port for UI and API? In a clustered mode, things are a little more interesting. Appears that ports will differ based on the casting protocol (if any). Comments on that? The guide mentions many are left blank by default. Site-to-site protocol. Any additional ports for a protocol itself? Does a sender node have to have access to every individual node as well? Any special ports or is it just data ports?
... View more
Labels:
10-12-2015
06:50 PM
1 Kudo
Are there any gotchas, known issues or tips for deploying NiFi in Azure? I expect that to be a very straightforward one (especially if everything is further wrapped in a docker container), but please share if there are any bits of wisdom around.
... View more
Labels:
10-10-2015
02:03 PM
Kent, there are numerous metrics collected in AMS beyond what's exposed at the top level. There is also an option to add derived metric charts (e.g. sum, avg, math, expression). E.g. try HBase -> Add widget (big plus sign in the UI) and play with available metrics in drop-down/search.
... View more
10-10-2015
01:41 PM
3 Kudos
Short answer - yes. Longer answer - this hasn't been tested nor certified. There is an extensive matrix of possible connection modes, transactions, listen vs poll, XA vs non-XA, so it's non-trivial. Yes, IBM WMQ adheres to the JMS spec much closer compared to old days, but always has its 'specialties'. However, please share more details on how you'd like to use NiFi in this context (or PM me to chat), this will help drive the focus and roadmap.
... View more
10-08-2015
07:19 PM
It was a field in the UI which wasn't flagged as sensitive (NiFi automatically encrypts such fields).
... View more
10-08-2015
07:00 PM
I think it's an omission. It was mentioned already before here in a context of digest authentication support https://issues.apache.org/jira/browse/NIFI-980?focusedCommentId=14940725&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14940725 Perhaps it makes sense to create a top-level jira for marking the property sensitive or convert to a subtask?
... View more
10-08-2015
03:41 PM
If you installed NiFi as an Ambari service, there is a field controlling the runtime account for NiFi. It's 'nifi' by default.
... View more
10-08-2015
03:10 PM
Check out the /var/log/ambari-server/ambari-alerts.log file, it writes an alert stream to the file. There is also another way, potentially, I'll flag it with right team to comment on this thread next.
... View more
10-08-2015
02:58 PM
1 Kudo
If you have an SSD in the node (or, oftentimes, a RAID 1 mirror) and it's large enough, YTS database would be a good candidate for putting on it.
... View more
10-08-2015
02:53 PM
1 Kudo
Sure! Hosts -> click into the node -> Host Actions -> Delete Host. It may take some time if data blocks for HDFS need to be moved away from that host.
... View more
10-08-2015
02:50 PM
1 Kudo
Linking your cross-post in another space, there was a discussion going. http://community.hortonworks.com/questions/953/can-nifi-be-used-to-pipe-the-data-from-oracle-data.html
... View more
10-08-2015
02:46 PM
1 Kudo
Alex, I think API is a no-go for bulk ingest. You should be looking at native admin tools in SP and Documentum to do the bulk. Otherwise their APIs (REST, SOAP, Java) may not perform at the desired level.
... View more
10-08-2015
01:04 PM
5 Kudos
We have created this write-up some time ago, might be useful: https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works
... View more
10-08-2015
01:00 PM
Thanks, Bosco. Definitely, worth mentioning it in the docs. Even simple stuff, e.g. if port and path are the same or different, etc. Ping me offline so we can track this update, please.
... View more
10-06-2015
12:47 PM
13 Kudos
A series of examples and flow files: https://github.com/xmlking/nifi-examples NiFi Examples Apache NiFi example flows. collect-stream-logs This flow shows workflow for log collection, aggregation, store and display. Ingest logs from folders. Listen for syslogs on UDP port. Merge syslogs and drop-in logs and persist merged logs to Solr for historical search. Dashboard: stream real-time log events to dashboard and enable cross-filter search on historical logs data. csv-to-json This flow shows how to convert a CSV entry to a JSON document using ExtractText and ReplaceText. decompression This flow demonstrates taking an archive that is created with several levels of compression and then continuously decompressing it using a loop until the archived file is extracted out. http-get-route his flow pulls from a web service (example is nifi itself), extracts text from a specific section, makes a routing decision on that extracted value, prepares to write to disk using PutFile. invoke-http-route This flow demonstrates how to call an HTTP service based on an incoming FlowFile, and route the original FlowFile based on the status code returned from the invocation. In this example, every 30 seconds a FlowFile is produced, an attribute is added to the FlowFile that sets q=nifi, the google.com is invoked for that FlowFile, and any response with a 200 is routed to a relationship called 200. retry-count-loop This process group can be used to maintain a count of how many times a flowfile goes through it. If it reaches some configured threshold it will route to a 'Limit Exceeded' relationship otherwise it will route to 'retry'. Great for processes which you only want to run X number of times before you give up. split-route This flow demonstrates splitting a file on line boundaries, routing the splits based on a regex in the content, merging the less important files together for storage somewhere, and sending the higher priority files down another path to take immediate action. twitter-garden-hose This flow pulls from Twitter using the garden hose setting; it pulls out some basic attributes from the Json and then routes only those items that are actually tweets. twitter-solr This flow shows how to index tweets with Solr using NiFi. Pre-requisites for this flow are NiFi 0.3.0 or later, the creation of a Twitter application, and a running instance of Solr 5.1 or later with a tweets collection. Here are sample steps to set this up (along with Banana dashboard) on HDP Sandbox. Other examples https://github.com/hortonworks-gallery/nifi-templates
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- examples
- FAQ
- hdf
- NiFi
Labels:
- « Previous
- Next »