Member since 
    
	
		
		
		07-14-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                99
            
            
                Posts
            
        
                5
            
            
                Kudos Received
            
        
                4
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1887 | 09-05-2018 09:58 AM | |
| 2512 | 07-31-2018 12:59 PM | |
| 1978 | 01-15-2018 12:07 PM | |
| 1722 | 11-23-2017 04:19 PM | 
			
    
	
		
		
		08-02-2017
	
		
		01:50 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi,  I have a stream of data coming in to hdfs. I want to store the data in to hive.  ---------------------------------------------------------------------------------------  Sample data:(data is in single line but with multiple attributes)  sample=data1 _source="/s/o/u" destination="/d/e/s" _ip="0.0.0.0" timestamp=20170802 10:00:00 text="sometext_with$spec_char"  sample=data2 destination="/d/e/s" _ip="0.0.0.0" timestamp=20170802 10:00:00 text="sometext_with$spec_char" _source="/s/o/u" technology="r"o"b"ust"  sample=data3 _ip="0.0.0.0" timestamp=20170802 10:00:00destination="/d/e/s" text="sometext_with$spec_char" _source="/s/o/u"   ---------------------------------------------------------------------------------------  Problems with data:  1.data do not follow same order  if you can see (sample_data 1 has source, destination, timestamp, text.   sample_data2 has destination,timestamp,text, source e.t.c)  2. the attributes dont follow same convention (_source, destination, _ip, timestamp,text etc; but basically with "_" and with out "_".  3. the attributes are not fixed (sample_data1 has source, destination,timestamp,text; sample_data2 has destination, _ip, timestamp,text,source and technology)   sample | source| destination | ip         | text                   | technology |   data1  |a/b/c  | /d/e/s      |  0.0.0.0   |sometext_with$spec_char | NULL|  data2  |a/b/c  | /d/e/s      |  0.0.0.0   |sometext_with$spec_char | r"o"b"ust   data3  |a/b/c  | /d/e/s      |  0.0.0.0   |sometext_with$spec_char | NULL|  Thanks for your support 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		08-02-2017
	
		
		11:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Matt Clarke   Also, I need some help, thankful if you could guide me.  I have a file in hdfs, which have a lot of fields, which I want to put in to hive.  e.g:  ---------------------------------------------------------------------------------  text in hdfs  "These are the attributes to save in hive _source="/a/b/c" _destination="/a/b/d" - - _ip="a.b.c.d" text="hive should save these attributes in different columns"".  I made an external table in hive with columnns  |source           |
destination      |
ip               |
text             |  I want to get the key value pairs from above text in hdfs and place in hive in respective columns.  ---------------------------------------------------------------------------------  In hdfs file, a series of such lines are present, they are unordered and not exactly in the same order of source, destination etc.  Any suggestion  Thankyou 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-02-2017
	
		
		09:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Matt Clarke Hi Matt,  I have followd your suggestion, I got the expected text.  As I am new to Nifi, need more learning. And your suggestions helped me.Thank you. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-02-2017
	
		
		09:21 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Wynner  I have replaced RouteOnContent processor, but kept parameters same.  Surprisingly, it works pretty fast(seconds). not sure why the old one was not working.  Thanks for your extended support. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-01-2017
	
		
		03:28 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Matt Clarke I have used your suggestion, but result is same, it fetches the complete line instead of [hdfs....... .log"]  for clarification I will let you know the steps which I am following  1. GetHDFS  2. Splittext: count-1.  3. Extract text:   
 (\[hdfs.*log"\])   4. Update Attribute  5. PutHDFS  not sure why it is pulling complete line?  Thanks 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-01-2017
	
		
		09:56 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi ,  I have stream data (GetHDFS will be running continuosly ) which contains number of lines.  e.g:   <start>this is 123_@":text coming from [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"] linedelimited.  A stream of above lines of data will be in file  I have to extract text from above message  [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"]  I tried using a extract text processor and used custom property  extract: ([hdfs.*log"]).  I tried the above in java regex evaluator, it shows correct text extracted. but when I run the flow, output gets the complete text.  expected: [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"]  actual     : <start>this is 123_@":text coming from [hdfs file="/a/b/c" and' the; '''', "", file is streamed. The location=["/location"] and log is some.log"] linedelimited.  Please help me to correct the regex to extract correct text. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		07-31-2017
	
		
		04:56 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have changed it to 4 concurrent tasks, and run duration of 2s.  for 50k messages it took almost 3 hours (never expected case).  eg: a message will be like below  this_is_an_example_message <1> [some_"text_and_digits_here"_number="121212"] [some_text_here] --similarly 50k messages  routeoncontent configuration:  Scheduling: concurrent tasks: 4  Run Schedule: 2s  Properties: matchrequirement: content must contain match  character set: UTF-8  Content Buffer Size :1MB  txt: number="121212"  update attribute: filename updated here  puthdfs: configurations and path updated here  Thanks in advance 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-27-2017
	
		
		02:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I tried with changing the concurrent processes with 100(for testing), tested with 1k messages, it took 11 minutes to complete.  Any suggestions, please!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-27-2017
	
		
		02:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 typically each message from split content processor is <=3KB  concurrent processor are 1.  Also, every second >50000 messages will be received and splitted and sent to route on content processor. I tested it with 50k messages, till route on content it just takes 2-3 second, but after that it is taking almost 3hours!!  I will increase the number of concurrent processors and see, it this helps me to improve the performance 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
- Next »
 
        













