About andrewg

andrewg · ‎10-08-2015

If you have an SSD in the node (or, oftentimes, a RAID 1 mirror) and it's large enough, YTS database would be a good candidate for putting on it.

andrewg · ‎10-08-2015

Sure! Hosts -> click into the node -> Host Actions -> Delete Host. It may take some time if data blocks for HDFS need to be moved away from that host.

andrewg · ‎10-08-2015

Linking your cross-post in another space, there was a discussion going. http://community.hortonworks.com/questions/953/can-nifi-be-used-to-pipe-the-data-from-oracle-data.html

andrewg · ‎10-08-2015

Alex, I think API is a no-go for bulk ingest. You should be looking at native admin tools in SP and Documentum to do the bulk. Otherwise their APIs (REST, SOAP, Java) may not perform at the desired level.

andrewg · ‎10-08-2015

We have created this write-up some time ago, might be useful: https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works

andrewg · ‎10-08-2015

Thanks, Bosco. Definitely, worth mentioning it in the docs. Even simple stuff, e.g. if port and path are the same or different, etc. Ping me offline so we can track this update, please.

andrewg · ‎10-06-2015

A series of examples and flow files: https://github.com/xmlking/nifi-examples NiFi Examples Apache NiFi example flows. collect-stream-logs This flow shows workflow for log collection, aggregation, store and display. Ingest logs from folders. Listen for syslogs on UDP port. Merge syslogs and drop-in logs and persist merged logs to Solr for historical search. Dashboard: stream real-time log events to dashboard and enable cross-filter search on historical logs data. csv-to-json This flow shows how to convert a CSV entry to a JSON document using ExtractText and ReplaceText. decompression This flow demonstrates taking an archive that is created with several levels of compression and then continuously decompressing it using a loop until the archived file is extracted out. http-get-route his flow pulls from a web service (example is nifi itself), extracts text from a specific section, makes a routing decision on that extracted value, prepares to write to disk using PutFile. invoke-http-route This flow demonstrates how to call an HTTP service based on an incoming FlowFile, and route the original FlowFile based on the status code returned from the invocation. In this example, every 30 seconds a FlowFile is produced, an attribute is added to the FlowFile that sets q=nifi, the google.com is invoked for that FlowFile, and any response with a 200 is routed to a relationship called 200. retry-count-loop This process group can be used to maintain a count of how many times a flowfile goes through it. If it reaches some configured threshold it will route to a 'Limit Exceeded' relationship otherwise it will route to 'retry'. Great for processes which you only want to run X number of times before you give up. split-route This flow demonstrates splitting a file on line boundaries, routing the splits based on a regex in the content, merging the less important files together for storage somewhere, and sending the higher priority files down another path to take immediate action. twitter-garden-hose This flow pulls from Twitter using the garden hose setting; it pulls out some basic attributes from the Json and then routes only those items that are actually tweets. twitter-solr This flow shows how to index tweets with Solr using NiFi. Pre-requisites for this flow are NiFi 0.3.0 or later, the creation of a Twitter application, and a running instance of Solr 5.1 or later with a tweets collection. Here are sample steps to set this up (along with Banana dashboard) on HDP Sandbox. Other examples https://github.com/hortonworks-gallery/nifi-templates

andrewg · ‎10-06-2015

Hi, let me make sure I understand the environment. According to https://www.elastic.co/guide/en/logstash/2.0/plugins-inputs-jdbc.html there's nothing in the jdbc plugin to track incremental runs (e.g. only pick up data added/changed since the last run). This has to be built into the query itself (a generic mechanism). For this matter NiFi can absolutely serve the purpose, not sure there would be much that LogStash would bring on top. There is also another use case, when Oracle transaction log is being followed for real-time replication, which is a very different architecture. E.g. Oracle GoldenGate is one of the best products in its class (I think it works with other DBs as well).

andrewg · ‎10-06-2015

Actually memory is #3 in this list. You will probably be concerned with available storage space for content repository before running out of memory. E.g. NiFi, even in big deployments feels comfortable with 96GB available memory, which is below an average server RAM today.

andrewg · ‎10-06-2015

Hi, the reference Hadoop KMS implementation has a REST API https://hadoop.apache.org/docs/current/hadoop-kms/index.html Is there anything like that for Ranger KMS? Given that Ranger itself has a complete REST API, I would expect the same for KMS, but I don't see any mention in here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Ranger_KMS_Admin_Guide/content/ch_ranger_kms_overview.html

Online	Offline
Last Visited	‎11-29-2021 04:12 PM

Member Since	‎07-30-2019 11:14 AM
Last Visited	‎11-29-2021 04:12 PM
Posts	333
Kudos received	330

Cloudera Community

Re: getfile : nifi does not have sufficient permi...

Re: Back pressure settings not Honored when a Funn...

Re: Urgent need for ListSFTP & FetchSFTP working e...

Re: Raise alert from NiFi if file not available fr...

Re: NiFi: PutHiveQL reflect UDF not working

Re: What type of disk (RAID 1, RAID 0, etc) should...

Re: Ambari 2.1.1 decommission a node

Re: Ingestion : How to ingest data from Oracle Dat...

Re: Bulk Ingesting from Documentum and Sharepoint

Re: How are number of mappers determined for a que...

Re: Is there a Ranger KMS REST API?

A Collection of NiFi Examples

Re: Can NiFi be used to pipe the data from Oracle ...

Re: Recommendations of Java version and Garbage Co...

Is there a Ranger KMS REST API?