Member since 
    
	
		
		
		02-22-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                60
            
            
                Posts
            
        
                71
            
            
                Kudos Received
            
        
                27
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 5070 | 07-14-2017 07:41 PM | |
| 1777 | 07-07-2017 05:04 PM | |
| 6024 | 07-07-2017 03:59 PM | |
| 1259 | 07-06-2017 02:59 PM | |
| 3703 | 07-06-2017 02:55 PM | 
			
    
	
		
		
		10-21-2016
	
		
		04:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Not sure what happened to the comment I made on doing this, so re-posting it. Part of this is about code-style. You generally don't want to define implicits at the top-level because it can make the code more difficult to reason about. For this reason it's common to tuck the implicits into a companion object (e.g., the relevant class or an Implicits object) and then import them just where you need them. This is probably the best use case for being able to do imports in the scope of a class, object or function -- you can apply an implicit without polluting the whole space. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-21-2016
	
		
		04:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 intToRational is only in the scope of Rational in your code, so the conversion isn't available to Ints outside of the Rational class -- the reverse order (x + 2) works because 2 ends up being bound by the + inside Rational, where the conversion is available. What you want to do is create a companion Rational object, define intToRational there, and then you can import it in Rational, and outside of it (global scope, e.g.) too.  As a minor note, I'd check out the Spire project for a complete set of rational classes, plus a whole lot more. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-19-2016
	
		
		08:35 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 @Raj B The SplitText processor has a "Header Line Count" property. If you set this to 1, you should be able to achieve what you want in generating multiple flow files, each with the same header. That said, if you're intending to insert these into Hive, you could actually use ConvertCSVToAvro too, setting the delimiter to '|' and then you'd have the data in batches which should give you better throughput. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-05-2016
	
		
		08:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 @Randy Gelhausen There are a few ways to do this.  
 Use the distributed map cache to get runtime attribute lookups and re-populate it as needed with new configs.  Use a scripted processor to lookup your config values and merge the attributes on to the FlowFile.  I have some work in progress extending a lookup table service by @Andrew Grande that can do lookups against a properties file that is reloaded periodically. It includes a LookupAttribute processor that can merge in either specific properties or all the properties from a properties file. See: https://github.com/jfrazee/nifi-lookup-service/tree/file-based-lookup-service  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-05-2016
	
		
		08:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Timothy Spann ProcessorLog was removed between HDF 1.2/NiFi 0.6.x and HDF 2.0/NiFi 1.0 (see https://github.com/apache/nifi/pull/403) and that processor builds against the NiFi 0.6.x libraries, so it's going to need its dependencies updated to NiFi 1.0.0 to run under HDF 2.0. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-29-2016
	
		
		05:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Yes. That's sort of what I had in mind, but it'll still depend on how balanced/imbalanced your data is. There are algorithms for doing this more intelligently too but I've never looked at how to do them in Spark. It looks like the FPGrowth() classes expose a support proportion, but I can't quite tell what it does if you have, e.g., 10k 1's and 100 items with count > 1. I probably can't take you much further without doing some reading. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-29-2016
	
		
		03:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 That percentage will certainly vary by domain so I don't know what normal will be. I will note that to do that on a large data set you'll need a step in your job to approximate where the cutoff is, but that's easy enough using the sampling methods exposed on the RDD. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-28-2016
	
		
		11:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Pedro Rodgers Sorry, I meant to write to put the .filter() after the .reduceByKey(). Edited it in the original answer. It should run now. Yes, it's filtering out/eliminating occurrences with counts less than or equal to 2. If your data/training time isn't too big, you can probably tune that and your confidence level empirically using a grid search. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-28-2016
	
		
		04:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Pedro Rodgers Three things jump out at me:  
 There are a ton of examples in your sample data with just one or two occurrences of the pattern. For this particular algorithm and its usual application they're not going to be very useful.  Your confidence is quite high, considering the size of the sample data and the evidence for the different patterns.  The learner came up with 1165 rules for 1185 data points.   I re-ran your code including a .filter(_._2 > 2) after the .reduceByKey(_ + _) and lowered the confidence to 0.6 and I get 20 or so rules now with confidence varying between 0.6 and 1.0.  I suspect if you carefully go through the results you were getting before you might see that it was just learning a 1-to-1 mapping between input and output, so the confidence of 1.0 is justified, but the generalization of the model is bad. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-01-2016
	
		
		06:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Not really sure why it's not loading the examples for you but the JSON input that page should have loaded is:  {
  "Rating": 1,
  "SecondaryRatings": {
    "Design": 4,
    "Price": 2,
    "RatingDimension3": 1
  }
}
  And the Jolt spec is:  [
  {
    "operation": "shift",
    "spec": {
      "Rating": "rating-primary",
      //
      // Turn all the SecondaryRatings into prefixed data
      // like "rating-Design" : 4
      "SecondaryRatings": {
        // the "&" in "rating-&" means go up the tree 0 levels,
        // grab what is ther and subtitute it in
        "*": "rating-&"
      }
    }
  }
]
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













