Member since
07-30-2019
333
Posts
356
Kudos Received
76
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9632 | 02-17-2017 10:58 PM | |
2193 | 02-16-2017 07:55 PM | |
7779 | 12-21-2016 06:24 PM | |
1695 | 12-20-2016 01:29 PM | |
1202 | 12-16-2016 01:21 PM |
11-06-2015
02:56 PM
Ali, in step #2, it should read 'on the right', currently guides a reader to the left and is misleading.
... View more
11-05-2015
03:49 PM
1 Kudo
Just about any operation on https://nifi.apache.org/docs/nifi-docs/rest-api/index.html has a clientId parameter. What is its purpose and benefits? Is it more like maintaining a session? Is there physically a server-side session created for a REST client? If I don't specify a clientId, do I just keep creating more sessions (as it's auto-generated if not provided)?
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
11-05-2015
03:11 PM
2 Kudos
Shane, only the 'hdfs' user can change ownership of the files, there's no way around it. In a real production environment one would have security in place with Kerberos, at which point you can specify the Kerberos principal which will be used to write to HDFS. Without security in place the discussion of data ownership is, IMO, pointless. Hope this helps.
... View more
11-05-2015
01:05 PM
1 Kudo
There's a difference between modifying your flow in-flight, realtime (possible, but requires some effort from the developer), and automating deployment for operations. Templates provide a middle ground, whereas one doesn't need to wire everything programmatically, but rather piece together large chunks of the flow only. Today, you can simply set up the flow the way you need and bundle the complete conf/flow.xml.gz file together with a newly deployed instance. This way you deploy NiFi with a complete flow already deployed. The operations side will see many improvements going forward, but today these are the options.
... View more
11-04-2015
06:31 PM
Thanks Mark, so this is a quick way to search for 'expected' components and get an actual ID handle to work with them next. My use case is programmatic flow control through an API.
... View more
11-04-2015
06:28 PM
Mike, are you asking about a commit batch size from the client side? This is controllable in the API, but it may have adverse effects based on how powerful your SolrCloud cluster is. Trade throughput for latency, old as the world. However, if you can tolerate some larger latency, maybe consider dumping stream to HDFS or streaming into Hive, and then use e.g. MR2, Pig or Hive connectors that Solr provides: https://doc.lucidworks.com/hdpsearch23/Guide-Jobs.html This will allow for ultimate parallelism and throughput.
... View more
11-04-2015
04:41 PM
https://nifi.apache.org/docs/nifi-docs/rest-api/index.html I can pull all of the data about a NiFi data flow using the /controller/search-results api call, but the request section also mentions a 'q' parameter (for query). There was no mention of the query syntax in the doc, however, any pointers? Would like to avoid pulling in a full config every time and parse on a constrained device, if possible.
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)
10-30-2015
05:32 PM
To close the loop with some offline discussions here are a few scenarios to help with the understanding: Archiving disabled. No more FlowFiles referencing the content (e.g. those processors removed already). Content is deleted (when reaching the overall threshold, to free up disk space). Provenance may still have event metadata (separate retention policies). Archiving enabled. No more FlowFiles referencing the content. The content will still be available in the archive for the lifetime of the archive.
... View more
10-30-2015
05:16 PM
Hi, I'm trying to understand how content repository, sizing/capping and archiving are related, reading through the Admin Guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html By default the archive feature is disabled.
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
10-30-2015
02:46 PM
4 Kudos
Wes, current Hive versions with RDBMS metastore backend should be able to handle 10 000+ partitions. For numerous reasons, the community is moving away from this design to leverage HBase for the metastore. Follow https://issues.apache.org/jira/browse/HIVE-9452 . Overall design document is available here: https://issues.apache.org/jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf
... View more