Member since
07-30-2019
333
Posts
356
Kudos Received
76
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9920 | 02-17-2017 10:58 PM | |
2316 | 02-16-2017 07:55 PM | |
8014 | 12-21-2016 06:24 PM | |
1765 | 12-20-2016 01:29 PM | |
1243 | 12-16-2016 01:21 PM |
10-16-2015
06:50 PM
Thinking of a use case when one would pro-actively notify an admin of a potential problem (or just let them now something is happening, but the system is handling it). Is there such an event in NiFi today for backpressure?
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
10-15-2015
02:42 PM
Yes, it's a known CSV with a header line. I wonder if there's a trick to use the column names and expression language to avoid manual re-typing.
... View more
10-15-2015
02:17 PM
Wes, The next version of NiFi will have new components like FetchSFTP and ListSFTP [1]: FetchSFTP will take an info from an incoming message and download the requested file. ListSFTP will list the contents of a remote dir and send flow files down the pipe to process/download next. I think it will be able to use a distributed cache as well to maintain state on the NiFi state and avoid re-processing (e.g. we don't always have an option of deleting on a remote server). I have already used the FetchSFTP to tell it which files to get, triggered by an external notification. [1] https://issues.apache.org/jira/browse/NIFI-673
... View more
10-15-2015
02:07 PM
2 Kudos
Hi, what's the recommended processor sequence to parse single-line csv entries into JSON? I'm all set on ingest and egress, but a little fuzzy on the conversion part still.
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
10-14-2015
12:54 PM
1 Kudo
There are 2 areas of focus in your question:
Getting data into HDP. Here, as @David Streever mentioned, NiFi can be a great fit (and maintain data lineage of all streams coming into your data lake). Think of any place you might have considered Flume - it will be a good candidate. Orchestrating processing and feeds in HDP. What you described in the original question were all the right concerns, but you should really be looking at Falcon [1] to have a higher-level visibility and controls into the workflow than Oozie. Falcon will use and generate Oozie workflows under the hood, but will expose a nice DSL and UI for higher-level constructs. [1] http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_data_governance/content/ch_hdp_data_governance_overview.html
... View more
10-14-2015
12:48 PM
FWIW, PermGen space has been removed from Java 8, this last param will generate a warning.
... View more
10-13-2015
11:21 PM
1 Kudo
Had a chance to hash out details with Joe further, posting here for a record. The input/output port terminology can get confusing quickly, based on what side of the communications you're looking at A client adds a Remote Process Group (RPG) to a local canvas/instance. The dialog prompts for a remote NiFi instance (UI port). A remote NiFi instance specifies the nifi.remote.input.socket.port property to designate which port to use for incoming site-to-site communications. A RPG's input ports are then visible in our local NiFi instance as possible connections to send data to (granted we have permissions). A RPG's output ports are optional, and can be used to pull data from remotely, or implement a request/response like pattern with a remote instance. The inbound site-to-site port multiplexes all RPG inbound port communications. In case of a client talking to a remote NiFi cluster the following applies: The port to be specified in the RPG UI is the NiFi Cluster Manager (NCM) address. Technically, in a cluster, talking to a node UI directly is illegal and won't work. A local client must be able to reach every node in a cluster for site-to-site communications. The actual port is specified by each node via nifi.remote.input.socket.port. Some more details here https://nifi.apache.org/docs.html
... View more
10-13-2015
07:02 PM
1 Kudo
This is a bad idea and will be flagged during an architecture review. Master servers must have RAID for production deployments, and we recommend RAID 5 or, more often, 10. No amount of software HA and failover will address a situation when an OS or primary disk is lost if care wasn't taken of the data in advance. Imagine if HA was a solid solution, but client configuration for the HA mode was saved on a disk which failed, without any replica or a redundant drive. Master nodes must have RAID arrays for production.
... View more
10-13-2015
05:53 PM
Thanks Joe, I'll follow up with you offline
... View more
10-13-2015
05:47 PM
Hi, I couldn't find direct links to published dev builds on the Apache site, are they being produced regularly? E.g. we are seeing some new features streaming in and don't always have access to a full local build environment to iterate with the NiFi engineering team. Having automated build infrastructure helps here.
... View more
Labels:
- Labels:
-
Apache NiFi