About andrewg

andrewg · ‎11-06-2015

Ali, in step #2, it should read 'on the right', currently guides a reader to the left and is misleading.

andrewg · ‎11-05-2015

Just about any operation on https://nifi.apache.org/docs/nifi-docs/rest-api/index.html has a clientId parameter. What is its purpose and benefits? Is it more like maintaining a session? Is there physically a server-side session created for a REST client? If I don't specify a clientId, do I just keep creating more sessions (as it's auto-generated if not provided)?

andrewg · ‎11-05-2015

Shane, only the 'hdfs' user can change ownership of the files, there's no way around it. In a real production environment one would have security in place with Kerberos, at which point you can specify the Kerberos principal which will be used to write to HDFS. Without security in place the discussion of data ownership is, IMO, pointless. Hope this helps.

andrewg · ‎11-05-2015

There's a difference between modifying your flow in-flight, realtime (possible, but requires some effort from the developer), and automating deployment for operations. Templates provide a middle ground, whereas one doesn't need to wire everything programmatically, but rather piece together large chunks of the flow only. Today, you can simply set up the flow the way you need and bundle the complete conf/flow.xml.gz file together with a newly deployed instance. This way you deploy NiFi with a complete flow already deployed. The operations side will see many improvements going forward, but today these are the options.

andrewg · ‎11-04-2015

Thanks Mark, so this is a quick way to search for 'expected' components and get an actual ID handle to work with them next. My use case is programmatic flow control through an API.

andrewg · ‎11-04-2015

Mike, are you asking about a commit batch size from the client side? This is controllable in the API, but it may have adverse effects based on how powerful your SolrCloud cluster is. Trade throughput for latency, old as the world. However, if you can tolerate some larger latency, maybe consider dumping stream to HDFS or streaming into Hive, and then use e.g. MR2, Pig or Hive connectors that Solr provides: https://doc.lucidworks.com/hdpsearch23/Guide-Jobs.html This will allow for ultimate parallelism and throughput.

andrewg · ‎11-04-2015

https://nifi.apache.org/docs/nifi-docs/rest-api/index.html I can pull all of the data about a NiFi data flow using the /controller/search-results api call, but the request section also mentions a 'q' parameter (for query). There was no mention of the query syntax in the doc, however, any pointers? Would like to avoid pulling in a full config every time and parse on a constrained device, if possible.

andrewg · ‎10-30-2015

To close the loop with some offline discussions here are a few scenarios to help with the understanding: Archiving disabled. No more FlowFiles referencing the content (e.g. those processors removed already). Content is deleted (when reaching the overall threshold, to free up disk space). Provenance may still have event metadata (separate retention policies). Archiving enabled. No more FlowFiles referencing the content. The content will still be available in the archive for the lifetime of the archive.

andrewg · ‎10-30-2015

Hi, I'm trying to understand how content repository, sizing/capping and archiving are related, reading through the Admin Guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html By default the archive feature is disabled.

andrewg · ‎10-30-2015

Wes, current Hive versions with RDBMS metastore backend should be able to handle 10 000+ partitions. For numerous reasons, the community is moving away from this design to leverage HBase for the metastore. Follow https://issues.apache.org/jira/browse/HIVE-9452 . Overall design document is available here: https://issues.apache.org/jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf

Online	Offline
Last Visited	‎11-29-2021 04:12 PM

Member Since	‎07-30-2019 11:14 AM
Last Visited	‎11-29-2021 04:12 PM
Posts	333
Kudos received	330

Cloudera Community

Re: getfile : nifi does not have sufficient permi...

Re: Back pressure settings not Honored when a Funn...

Re: Urgent need for ListSFTP & FetchSFTP working e...

Re: Raise alert from NiFi if file not available fr...

Re: NiFi: PutHiveQL reflect UDF not working

Re: New Visualization Feature in Hive View

Use of clientId parameter in REST API calls

Re: NiFi PutHDFS Processor - Remote Owner and Remo...

Re: Can we create a NiFi workflow and operate usin...

Re: REST API /controller/search-results query synt...

Re: Bulk Inserts into SOLR

REST API /controller/search-results query syntax?

Re: Content Repository and Archival

Content Repository and Archival

Re: Maximum Hive Table Partitions allowed & recomm...