Member since 
    
	
		
		
		01-11-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                355
            
            
                Posts
            
        
                232
            
            
                Kudos Received
            
        
                74
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 9257 | 06-19-2018 08:52 AM | |
| 3906 | 06-13-2018 07:54 AM | |
| 4560 | 06-02-2018 06:27 PM | |
| 5255 | 05-01-2018 12:28 PM | |
| 6808 | 04-24-2018 11:38 AM | 
			
    
	
		
		
		10-04-2019
	
		
		02:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello, I'm looking your answer 3 years later because I'm in a similar situation :). In my company (telco) we're planning using 2 hot clusters with dual ingest because our RTO is demanding and we're looking for mechanisms to monitor and keep in sync both clusters. We ingest data in real-time with kafka + spark streaming, loading to HDFS and consuming with Hive/Impala. I'm thinking about a first approach making simple counts with Hive/Impala tables on both clusters each hour/half hour and comparing. If something is missing in one of the clusters, we will have to "manually" re-ingest the missing data (or copy it with cloudera BDR from one cluster to the other) and re-process enriched data. I'm wondering if have you dealt with similar scenarios or suggestions you may have. Thanks in advance! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-18-2018
	
		
		12:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @rajat puchnanda   Here is screenshot showing the menus and icons that go along with the above explanation:      Thanks,  Matt 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-13-2018
	
		
		03:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 https://community.hortonworks.com/content/kbentry/109629/how-to-achieve-better-load-balancing-using-nifis-s.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-05-2018
	
		
		04:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Abdelkrim Hadjidj, yes you can do it if you know what you want to extract. This code will help if user want to load the attribute from json file , so that the attribute value is not hardcoded in the flow.xml. Often some values can be kept in variable for specific environment , eg : Dev , Test , Prod.  and these can be separated out in a json file which will not change with the updates in the flow.xml  With the latest version of nifi (variable registry )  this is not required. My intention is just  to show the need for the same.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-01-2018
	
		
		08:35 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 DataWorks Summit (DWS) is the industry’s Premier Big Data Community Event in Europe and the US. The last DWS was in Berlin, Germany, on April 18th and 19th. This was the 6th year occurence in Europe and this year there was over 1200 attendees from 51 different countries, 77 breakouts in 8 tracks, 8 Birds-of-a-Feather sessions and 7 Meetups. I had the opportunity to attend as a speaker this year, where I gave a talk on “Best practices and lessons learnt from Running Apache NiFi”. It was a joint talk with the Big Data squad team from Renault, a French car manufacturer. The presentation recording will be available on the DWS website. In the meantime, I’ll share with you the 3 key takeaways from our talk.  NiFi is an accelerator for your Big Data projects  If you worked on any data project, you already know how hard it is to get data into your platform to start “the real work”. This is particularly important in Big Data projects where companies aim to ingest a variety of data sources ranging from Databases, to files, to IoT data. Having NiFi as a single ingestion platform that gives you out-of-the-box tools to ingest several data sources in a secure and governed manner is a real differentiator. NiFi accelerates data availability in the data lake, and hence accelerates your Big Data projects and business value extraction. The following numbers from Renault projects are worth a thousands words.      NiFi enables new use cases  NiFi is not only an ingestion tool. It’s a data logistics platform. This means that NiFi enables easy collection, curation, analysis and action on any data anywhere (edge, cloud, data center) with built-in end-to-end security and provenance. This unique set of features makes NiFi the best choice for implementing new data centric use cases that require geographically distributed architectures and high levels of SLA (availability, security and performance). In our talk, two exciting use cases were shared: connected plants and packaging traceability.      NiFi flow design is like software development  When I pitch NiFi to my customers I can see them get excited quickly. They start brainstorming instantly and ask if NiFi can do this or that. In this situation, I usually fire a NiFi instance on my MAC and start dragging and dropping a few processors in NiFi to simulate their use case. This is a powerful feature that fosters interactions between team members in the room and gets us to very interesting business and technical discussions.  When people see the power of NiFi and all what we can easily achieve in short a timeframe, a new set of questions arise (especially from the very few skeptics in the room :)). Can I automate this task? Can I monitor my data flows? Can I integrate NiFi flow design with my development process? Can I “industrialize” my use case?. All these questions are legitimate when we see how powerful and easy to use NiFi is. The good news is that “Yes” is the answer to all previous questions. However, it’s important to put in place the right process to avoid having a POC that becomes a production (who has never lived this situation?) 
 The way I like to answer these questions is to show how much NiFi flow design is like software development. When a developer wants to tackle a problem, he starts designing a solution by asking : ‘what’s the best way to implement this?’. The word best here integrates aspects like complexity, scalability, maintainability, etc. The same logic applies to NiFi flow design. You have several ways to implement your use case and they are not equivalent. Once a solution is found, you will use NiFi UI as your IDE to implement the solution.      Your flow is a set of processors just like your code or your algorithm is a set of instructions. You have “if then else” statements with routing processor, you have “for” or “while” loops with update attributes and self-relations, you have mathematical and logical operators with processors and Expression Langage, etc. When you build your flow you divide it into process groups similar to functions you use when you organize your code. This makes your applications easier to understand, to maintain, and to debug. You use templates for repetitive things like you build and use libraries across your projects.  From this main consideration, you can derive several best practices. Some of them are generic software development practices, and some of them are specific to NiFi as “a programming language”. I share some good principals to use in this following slide:      Final thoughts  NiFi is a powerful tool that gives you business and technical agility. To master its power, it is important to define and to enforce best practices. Lots of these best practices can be borrowed directly from software engineering. Others are specific to NiFi. We have shared some of these ideas in deck available on the DWS webpage.  Some of the ideas explained in the presentation have been discussed by other NiFi enthusiasts such as the excellent “Monitoring NiFi Series” by Pierre[1]. Various Flow Development Lifecycle (FDLC) [2] topics have been also covered by folks like Dan and Tim for NiPyAPI[3][4], Bryan for flow registry [5] and Pierre for NiFi CLI [6]. Other topics like NiFi design patterns requires a dedicated post that I’ll address in the future.  Article initially shared on https://medium.com/@abdelkrim.hadjidj/best-practices-for-using-apache-nifi-in-real-world-projects-3-takeaways-1fe6912101db 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		03-29-2018
	
		
		08:50 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 When NiFi refactored it's security model between 0.x and 1.x lines, templates were moved to be associated with the process group where you uploaded the template. This was done so that template was protected by the same security policies as the process group where it was uploaded. Unfortunately the "View Templates" capability is still from the global menu, but should really be from the context palette on the left based on the process group you are in.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-29-2018
	
		
		09:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks setting max bin age property works. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-23-2018
	
		
		06:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi Matt,  The MonitorActivity processor is exceptionally useful. I will use it to monitor the overall health of my nifi processor groups.  Thanks,  Mark 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-27-2018
	
		
		12:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Jayendra Patil   You currently have 120 set as your "maximum Timer Driven Thread Count".  Multiply that by the number of nodes in your NiFi cluster to see maximum number of useable threads cumulative across you cluster.  Then look at the info bar across the top of your canvas.  Does it look like your dataflows is using all these threads you have allocated?  You may need to make adjustments to your processor configurations to maximize the thread usage.  Look for where you have bottlenecks in your dataflow (queues built up in front of processors).  What kind of processors reading from these built up queues?  How have they been configured?  Just because you allocated more available threads does not mean NiFi processors are going to automatically start using them or even be allowed to use them. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













