Member since 
    
	
		
		
		01-23-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                51
            
            
                Posts
            
        
                41
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2699 | 02-18-2016 04:34 PM | 
			
    
	
		
		
		03-21-2018
	
		
		02:01 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I ended up not using NiFi for this. Looking back I tried forcing a solution out of NiFi thst wasn’t a good fit. I spent several weeks and entirely too long trying to solve the most simple case of this project (formatting some text and dumping it to a db).   I could certainly see NiFi being useful for moving source data files around from the folders I’m working with (copying, moving etc.) but doing any amount of logic or manipulation of anything but a happy path is extremely tedious and seemingly difficult to do.   Knowing that I was going to have to do a lot more work on the data to make it even close to usable, I just scrapped NiFi and implement it in Python.  After dealing with this data and running into edge cases over and over again that I wasn’t even aware about when I wrote this topic, the data IMO was just too dirty and had too many exceptions to deal with, with NiFi. On top of that this wasn’t just the import of the data, not even using it so I would have had to have another tool to actually process the data to put it into a usable form anyways.   Appreciate the response. You took the time to respond so I figured it was reasonable to respond even though I didn’t end up using the solution. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-05-2018
	
		
		08:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I've googled everywhere for this and everything I run across its super complicated. It should be relatively simple to do. The recommendations show to look at the "Example_With_CSV.xml" from here.   So given a flowfile thats a CSV.  2017-09-20 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,"Failed to fit max attempts (1=>3), fit failing entirely (Fit Failure=True)"  I need  $date = 2017-09-20 23:49:49:38.637  $id = 162929511757  ...  $instanceid = 36095  $comment = "Failed, to fit max attempts (1=>3), fit failing, entirely (Fit Failure=True)"  OR   $csv.date = ...  $csv.id = ...  ...  $csv.instanceid = ...  $csv.comment = ..  Is there another easier option to do this besides RegEx? I can't stand to do anything with RegEx as how unreadable, and overly complicated they are. To me there should be a significantly easier way of doing this than with RegEx.   https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates but it doesn't have anything in there related to actually getting the columns of each value out.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache NiFi
 
			
    
	
		
		
		01-05-2018
	
		
		08:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 There is no example in the "Working_With_CSV" template of how to extract each individual field into attributes.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-04-2018
	
		
		05:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Thanks! That seems to work correctly. I'll mark this as the answer as it produces the answer I'm looking for. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-03-2018
	
		
		10:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Shu  Thank you for the great detailed response.   The first part does work but I don't think the regex will work for my
 case.   (Side bit, no fault of yours, I just absolutely despise regex as its 
unreadable to me and extremely difficult to debug (if at all).)  I should have mentioned this, but the only thing I know about the CSV
 file is that there are X number of columns before the string.   So I could see something like.. 
 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,Failed to fit max, attempts,(1=>3), fit failing entrely,(FitFailure=True),  
  The only thing I know is that there are 13 columns (commas) before 
the string and the string will always have a trailing "," (It has always
 been the last column in the row from what I have seen).  The other issue is I tried doing  
 (.*),  
  for all of the columns so I could then put it into a database query 
to insert the data but the regex seems to blowup and not function with 
so many columns (the original data has about 150 columns in it and I 
just truncated it down here). 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-03-2018
	
		
		04:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I have a CSV file that is messy.  I need to:  1. Get the date from the filename and use that as my date and append that to one of the columns.  2. Parse the CSV file to get the columns as the very last column is a string which has separators in the string ",".   The data looks like this.  Filename: ExampleFile_2017-09-20.LOG  Content:  23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,Failed to fit max attempts (1=>3), fit failing entirely (Fit Failure=True),  23:49:38.638,162929512814,$008EE9F6,-16777208,,,,,,,,,,Command Measure, Targets complete - Elapsed: 76064 ms,  The following is what will need to be inserted into the database:  2017-09-20 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,"Failed to fit max 
attempts (1=>3), fit failing entirely (Fit Failure=True)"  2017-09-20 23:49:38.638,162929512814,$008EE9F6,-16777208,,,,,,,,,,"Command Measure, Targets complete - Elapsed: 76064 ms"  Would I need to do this inside of NiFi or some external script by calling some type of ExecuteScript? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache NiFi
 
			
    
	
		
		
		05-25-2016
	
		
		05:34 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 what do I need to set in hive-env.sh? It seems that anything I touch it gets overwritten. This has to be a bug in ambari where it won't save the hive.heapsize value. How can I get it to persist? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-25-2016
	
		
		04:49 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The configuration of hive.heapsize does not exist in my hive-site.xml for some reason and whenever I add it to the file it keeps getting overwritten. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-25-2016
	
		
		04:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Divakar AnnapureddyCorrect but if you look at my comments i posted a picture and it shows, it is changed to 12GB in the UI. The services have been restarted (complete server has been restarted). 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-25-2016
	
		
		03:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 hive  24964  0.2  1.7 2094636 566148 ?  Sl  17:03  0:56 /usr/lib/jvm/ja  va-1.7.0-openjdk-1.7.0.91.x86_64/bin/java -Xmx1024m -Dhdp.version=2.3.2.0-2950 -  Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/  log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-  2950/hadoop -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Djava.librar  y.path=:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.  0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.prefe  rIPv4Stack=true -Xmx1024m -XX:MaxPermSize=512m -Dhadoop.security.logger=INFO,Nul  lAppender org.apache.hadoop.util.RunJar /usr/hdp/2.3.2.0-2950/hive/lib/hive-serv  ice-1.2.1.2.3.2.0-2950.jar org.apache.hive.service.server.HiveServer2 --hiveconf  hive.aux.jars.path=file:///usr/hdp/current/hive-webhcat/share/hcatalog/hive-hca  talog-core.jar -hiveconf hive.metastore.uris=  -hiveconf hive.log.file=hiveserve  r2.log -hiveconf hive.log.dir=/var/log/hive  So I can see that it is set at 1024m, however it is set to some really large value.   http://imgur.com/3oXfpPj 
						
					
					... View more