Member since 
    
	
		
		
		11-16-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                905
            
            
                Posts
            
        
                665
            
            
                Kudos Received
            
        
                249
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 403 | 09-30-2025 05:23 AM | |
| 724 | 06-26-2025 01:21 PM | |
| 615 | 06-19-2025 02:48 PM | |
| 831 | 05-30-2025 01:53 PM | |
| 11303 | 02-22-2024 12:38 PM | 
			
    
	
		
		
		07-20-2016
	
		
		03:26 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Also if you don't care about that column you can set the Unmatched Column Behavior to warn/ignore 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-20-2016
	
		
		03:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Is Translate Field Names set to true? That should enable the matching of the column (which appears capitalized) against the field (which is lowercase) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-19-2016
	
		
		10:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 What error are you getting? Also what version of NiFi/HDF are you using?  In SplitJson, the JSON Path expression you may want is $.*  As an alternative, you can try QueryDatabaseTable -> SplitAvro -> ConvertAvroToJson, this will split the Avro records first instead of converting the whole set to JSON then splitting the JSON.  In Apache NiFi 1.0.0 (and HDF 2.0), there will be a ConvertAvroToORC processor which will allow you to convert directly to ORC, then you can use PutHDFS and PutHiveQL (also in NiFi 0.7.0 and 1.0.0 and HDF 2.0) to transfer the files to HDFS and create a Hive table atop the target directory to make the data ready for querying. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-19-2016
	
		
		04:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hans,  There is an email thread that talks about how to (currently) get EvaluateXPath to work with namespaces. The email talks about default namespaces but it works for explicit namespaces too. If multiple namespaces have the same "local name", that can cause problems however.  There is a Jira case to add support for namespaces in XPath/XQuery processors: https://issues.apache.org/jira/browse/NIFI-1023 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-18-2016
	
		
		02:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The error in the bulletin is unfortunately not very descriptive, can you check the logs for the cause of that exception? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-17-2016
	
		
		10:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Here's my answer from StackOverflow:  There is a subproject of Apache NiFi called MiNiFi, which (among other things) aims to have agents on devices and such in order to collect data at its point of creation. This will include native agents, thus a JVM will not be required. The proposed roadmap is here, it mentions the development of native agent(s). 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-15-2016
	
		
		02:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 You could use GetFile -> SplitText -> ExtractText -> InvokeHttp:   GetFile gets the configuration file, set "Keep source file" to true and schedule it to run once a day  SplitText splits the file into multiple flow files, each containing a single line/URL  ExtractText can put the contents of the flow file into an attribute (called "my.url" for example)  InvokeHttp can be configured to use an Expression Language construct for the URL property (such as "${my.url}")  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-14-2016
	
		
		04:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 QueryDatabaseTable would require a "last modified" column in the table(s) in order to detect updates, and probably a "logical delete" flag (i.e. boolean column) in that (or a helper) table in order to detect deletes. This is similar to what Apache Sqoop does.  If you have the Enterprise Edition of SQL Server, you may be able to enable their Change Data Capture feature. Then, for incremental changes, you can use QueryDatabaseTable against the "CDC Table" rather than your source tables.  For strict migration (no incremental fetching of updates) of multiple tables in SQL Server, if you can generate individual flow files, each containing an attribute such as "table.name", then you could parallelize across a NiFi cluster by sending them to ExecuteSQL with the query set to "SELECT * FROM ${table.name}". In this case each instance of ExecuteSQL will get all the rows from one table into an Avro record and send it along the flow.  Regarding MongoDB, I don't believe the MongoDB processors support incremental fetching. QueryDatabaseTable might work on flat documents, but there is a bug that prevents nested fields from being returned, and aliasing the columns won't work for the incremental fetch part. However ExecuteSQL will work if you explicitly list (and alias) the document fields in the SQL statement, but that won't do incremental fetch.  You might be able to use Sqoop for such things, but there are additional requirements if using sqoop-import-all-tables, and if doing incremental fetch you'd need 250 calls to sqoop import.  Do your tables all have a "last modified" column or some sort of similar structure? Supporting distributed incremental fetch for arbitrary tables is a difficult problem as you'd need to know the appropriate "last modified" column for each table (if they're not named the same and/or present in every table). When tables all behave the same way from an update perspective, it makes this problem much easier. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-07-2016
	
		
		03:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 In EvaluateJsonPath, you can choose "flowfile-attribute" as the Destination, then the original JSON will still be in the flow file content, and any extracted JSON elements will be in the flowfile's attributes. That can go into RouteOnAttribute for "eventname". Then you can use ReplaceText (or ExecuteScript if you prefer) to create a CQL statement using Expression Language to insert the values from your attributes, or to wrap the entire JSON object in a CQL statement.  I have a template that uses ReplaceText to put an entire JSON object into a "INSERT INTO myTable JSON" CQL statement, it is available as a Gist (here). It doesn't have a PutCassandraQL processor at the end, instead its a LogAttribute processor so you can see if the CQL looks right for what you're trying to do. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-30-2016
	
		
		07:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 In curl the -d means to put the data into the request body. The InvokeHttp processor does not send the contents of the flow file for GET requests, only PUT or POST. However the Elasticsearch Search/Query API accepts GET, so this approach probably won't work.  What you may be looking for is the URL Search API, I commented on that in another thread, but will post here too. Using this method, you can put your query in the URL itself. Note that the query parameters look a bit different because it's not JSON, they are HTTP query parameters. In your example you are matching all documents (which is the default I believe) so http://localhost:9200/tweet_library/_search?size=10000 should be all you need for that case. To explicitly match all documents, you can use the q parameter:  http://localhost:9200/tweet_library/_search?size=10000&q=*:*  There are quite a few query options available with the URL Search API, please see the Elasticsearch documentation for more information. 
						
					
					... View more