Member since 
    
	
		
		
		06-09-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                48
            
            
                Posts
            
        
                10
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		11-03-2016
	
		
		01:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Background:We are using Apache NIFI data flow to move data local from to Hadoop based file systems.We are executing the NIFI processors by calling the NIFI rest API using groovy script wherein we  use json builders in groovy to generate the Json and then passing the json to  put methods to execute the processors. NIFI Version:0.6.0  While planning to migrate to NIFI 1.0.0  and using the same groovy script we are facing a few errors in the latest version of NIFI(1.0.0):   Having controller/revision in nifi.get method does not
return the response json, instead its throwing 404, verified in browser too.
This works fine in 0.6.1 Nifi version. Reference : resp = nifi.get(path: 'controller/revision')  This  below does not work too, since having controller
in path as a pretext to process-groups is no longer valid. It also returns a
404/bad request error.This works fine in 0.6.1 Nifi version                                                                                                     Reference : resp = nifi.put(
          path:
"controller/process-groups/$processGroup/processors/$processorId",
          body: builder.toPrettyString(),
          requestContentType: JSON
       )  PS: While trying to verify the below in browser, GET only
responds when we have structures like /process-groups/{id} or
/process-groups/{id}/processors , i.e. without controller string.This works
fine in 0.6.1 Nifi version                                             Reference: host://port/nifi-api/process-groups/root                                                                                                                               Below syntax does not work in script either. This works fine
in 0.6.1 Nifi version
resp = nifi.put(
  path: "process-groups/$processGroup/processors/$processorId",
  body: builder.toPrettyString(),
  requestContentType: JSON
)   Since the syntax provided above works perfectly fine in 0.6.0 I would like to know if any changes are made in NIFI 1.0.0 in the rest API or in the way the various HTTP requests are passed to methods like 'get' and 'put'?  I could not find any changes in the release notes or the NIFI API documentation provided in the link below:           https://nifi.apache.org/docs/nifi-docs/rest-api/  Please let me know if you need any other information.  Regards,  Indranil Roy  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		09-09-2016
	
		
		11:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Bryan Bende  Thanks for the input it really helped a lot in our case.Say I have 2 rows in my table  1|Indranil|ETL  2|Reporting|Joy  I want to convert it to JSON so that I am able to insert each row into multiple cells in a single Hbase row.  This is my converted JSON   {
"Personal":  
[   {
"id":"1",
"name":"Indranil",
"Skill":"ETL"
}
,  
{
"id":"2",
"name":"Joy",
"Skill":"Reporting"
}  
]  
}  Is this JSON in the correct format to be consumed by the PutHBaseJSON. My end goal is to insert all the values in a row to different cell."Personal" refers to the "column family" and "id" refers to the "Row Identifier Field Name". 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-08-2016
	
		
		05:37 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Bryan Bende  In such a scenario since we want to store the rows with different row ids is there a workaround possible?If I assume correctly using PutHBaseJSON might help.So is there any processors available to convert the pipe delimited source file into a JSON file to be consumed by the PutHbaseJSON processor to insert multiple values?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-08-2016
	
		
		04:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Bryan Bende   As you mentioned PutHBaseCell is used to write a single cell to HBase and it uses the content of the FlowFile as the value of the cell.Now if  my input flowfile has say 50 lines of pipe separated values,will it insert all those rows into 50 cells with 50 different row id's or it will enter all the rows into same row? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-08-2016
	
		
		01:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 I want to insert data into Hbase from a flowfile using NIFI. Does putHbaseCell supports Hbase tables with multiple column families.Say I have create an Hbase table with 2 column families  cf1(column1,column2,column3) and cf2(column4,column5).  How do I specify "Column Family" and "Column Qualifier"  properties in the putHbaseCell configuration.  Where do I specify the mapping between the flowfile(Text file with pipe comma separated values) and the Hbase table? The flowfile will have pipe separated columns.And I want to store a subset of columns into each column families.  Regards,  Indranil Roy 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache HBase
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		09-07-2016
	
		
		04:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @mclark   Thanks for your inputs.The above solution worked perfectly fine in my case both in terms of the error and performance.But as you already mentioned above in this situation we have a large number of files in the HDFS. Even if I use a MergeContent processor in the flow I am getting more than I files.For what I can understand by looking at the provenence the MergeContent processor is merging files in block.Say we have 100 flow files coming to the MergeContent processor batches of 30,30,20,20.If will not wait for 100 files and generate 4 output files by merging in groups.Is there a way by which we can control this behavior and enforce it to produce only 1 output files for each output path.  mergecontent.png  This is the configuration of MergeContent processor.Any inputs will be very helpful.  Regards,  Indranil Roy 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-06-2016
	
		
		04:14 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @jwitt   According to you the flow should look like  GetFile->SplitFile->RouteText->PutHDFS  Since we are using only a standalone cluster  if we Split the file into 5000 splits do we need to do a UpdateAttribute/MergeContent after the RouteText  processor or the flow shown above should be fine?  Also do we need to set the "No of Concurrent Task" in all the processor(GetFile,putHDFS,splitText) or only the RouteText processor?   Regards,  Indranil Roy 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-06-2016
	
		
		04:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @mclark   Sure I will try that option.I can understand that I need to increase my split count in order to achieve better load balancing.But if I go back to the main issue I pointed in the thread above was that apart from the performance aspect we were getting only a subset of records in HDFS. It seems the process was trying to create the same file and overwrite it multiple times hence giving an error as shown below. When I use splitText processor and  send it to the RPG and then merge it I am getting the error as shown(attached).  7300-afiuy.png  Just to be on the same page here  my flow looks like below:  In the NCM of the cluster  flow.png  In the standalone cluster  flow1.png  Does increasing the splitCount solve this problem?  Also is it necessary to use a MergeContent/UpdateAttribute if we use a splitText?Can't we achieve this flow without using the MergeContent/UpdateAttribute processor in the RPG?  Regards,  Indranil Roy 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-06-2016
	
		
		01:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							          We have a data flow as shown above wherein we have a single pipe delimited source file around 15 GB having 50 million records. We are routing the rows into two different paths in HDFS based on routing condition as shown above in the RouteText configuration window. The following process in taking around 20 minutes to complete on a standalone server. The number of concurrent processors are set to 10 for all the processors.  Is this performance exhaustive or there is any way to improve the performance further in this standalone server considering  the server has 4 cores and 16 GB RAM.  Also as I can observe most the processing time is consumed by the RouteText processor. Is this design suitable for this kind of use case to send the records of a pipe delimited file to different outputs based on some conditions since RouteText processes records line by line. We are using NIFI 0.6.1 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		09-02-2016
	
		
		05:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @mclark   Our RouteText processor is configured as shown below  7231-condition.png  as it shows, records with (first field is record number) number <= 5000000 goes to one direction and records number >= 5100000 goes to another.  The split Text processor is configured with the below properties:  7234-split-config-final.png  Just to give an overview of our requirement:  1)We have a single file as source coming  to a standalone server.  2)We fetch the file and then split it into multiple files and then send to the cluster in order to distribute the processing to all the nodes of the cluster.  3)In the cluster we route the files based on condition so that records with (first field is record number) number <= 5000000 goes to one output directory in HDFS and records number >= 5100000 goes to another output directory in HDFS as mentioned in the two putHDFS processors.  4)But after executing the process we have around 1000000 records in each output directory whereas ideally we should have 5000000 records approximately in either of the HDFS directory.   Also we are getting below error  in  the PutHDFS processors  7300-afiuy.png  Please let me know if you need any further information.  Just to add to the above set up works perfectly fine when we are using a standalone node and we are using putFile instead of putHDFS to output the files to a local path instead of hadoop.  Regards,  Indranil Roy 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













