About jfrazee

jfrazee · ‎10-21-2016

Not sure what happened to the comment I made on doing this, so re-posting it. Part of this is about code-style. You generally don't want to define implicits at the top-level because it can make the code more difficult to reason about. For this reason it's common to tuck the implicits into a companion object (e.g., the relevant class or an Implicits object) and then import them just where you need them. This is probably the best use case for being able to do imports in the scope of a class, object or function -- you can apply an implicit without polluting the whole space.

jfrazee · ‎10-21-2016

intToRational is only in the scope of Rational in your code, so the conversion isn't available to Ints outside of the Rational class -- the reverse order (x + 2) works because 2 ends up being bound by the + inside Rational, where the conversion is available. What you want to do is create a companion Rational object, define intToRational there, and then you can import it in Rational, and outside of it (global scope, e.g.) too. As a minor note, I'd check out the Spire project for a complete set of rational classes, plus a whole lot more.

jfrazee · ‎10-19-2016

@Raj B The SplitText processor has a "Header Line Count" property. If you set this to 1, you should be able to achieve what you want in generating multiple flow files, each with the same header. That said, if you're intending to insert these into Hive, you could actually use ConvertCSVToAvro too, setting the delimiter to '|' and then you'd have the data in batches which should give you better throughput.

jfrazee · ‎10-05-2016

@Randy Gelhausen There are a few ways to do this. Use the distributed map cache to get runtime attribute lookups and re-populate it as needed with new configs. Use a scripted processor to lookup your config values and merge the attributes on to the FlowFile. I have some work in progress extending a lookup table service by @Andrew Grande that can do lookups against a properties file that is reloaded periodically. It includes a LookupAttribute processor that can merge in either specific properties or all the properties from a properties file. See: https://github.com/jfrazee/nifi-lookup-service/tree/file-based-lookup-service

jfrazee · ‎10-05-2016

@Timothy Spann ProcessorLog was removed between HDF 1.2/NiFi 0.6.x and HDF 2.0/NiFi 1.0 (see https://github.com/apache/nifi/pull/403) and that processor builds against the NiFi 0.6.x libraries, so it's going to need its dependencies updated to NiFi 1.0.0 to run under HDF 2.0.

jfrazee · ‎09-29-2016

Yes. That's sort of what I had in mind, but it'll still depend on how balanced/imbalanced your data is. There are algorithms for doing this more intelligently too but I've never looked at how to do them in Spark. It looks like the FPGrowth() classes expose a support proportion, but I can't quite tell what it does if you have, e.g., 10k 1's and 100 items with count > 1. I probably can't take you much further without doing some reading.

jfrazee · ‎09-29-2016

That percentage will certainly vary by domain so I don't know what normal will be. I will note that to do that on a large data set you'll need a step in your job to approximate where the cutoff is, but that's easy enough using the sampling methods exposed on the RDD.

jfrazee · ‎09-28-2016

@Pedro Rodgers Sorry, I meant to write to put the .filter() after the .reduceByKey(). Edited it in the original answer. It should run now. Yes, it's filtering out/eliminating occurrences with counts less than or equal to 2. If your data/training time isn't too big, you can probably tune that and your confidence level empirically using a grid search.

jfrazee · ‎09-28-2016

@Pedro Rodgers Three things jump out at me: There are a ton of examples in your sample data with just one or two occurrences of the pattern. For this particular algorithm and its usual application they're not going to be very useful. Your confidence is quite high, considering the size of the sample data and the evidence for the different patterns. The learner came up with 1165 rules for 1185 data points. I re-ran your code including a .filter(_._2 > 2) after the .reduceByKey(_ + _) and lowered the confidence to 0.6 and I get 20 or so rules now with confidence varying between 0.6 and 1.0. I suspect if you carefully go through the results you were getting before you might see that it was just learning a 1-to-1 mapping between input and output, so the confidence of 1.0 is justified, but the generalization of the model is bad.

jfrazee · ‎09-01-2016

Not really sure why it's not loading the examples for you but the JSON input that page should have loaded is: { "Rating": 1, "SecondaryRatings": { "Design": 4, "Price": 2, "RatingDimension3": 1 } } And the Jolt spec is: [ { "operation": "shift", "spec": { "Rating": "rating-primary", // // Turn all the SecondaryRatings into prefixed data // like "rating-Design" : 4 "SecondaryRatings": { // the "&" in "rating-&" means go up the tree 0 levels, // grab what is ther and subtitute it in "*": "rating-&" } } } ]

Online	Offline
Last Visited	‎09-18-2017 09:10 PM

Member Since	‎02-22-2016 03:57 PM
Last Visited	‎09-18-2017 09:10 PM
Posts	60
Kudos received	62

Cloudera Community

Re: How can I send FlowFile content to String in J...

Re: non-Kerberized HDF communication with Kerberiz...

Re: Is there a way to do a count Approx for a data...

Re: update attributes with other existing attribut...

Re: NiFi LookupAttribute and UpdateAttributes

Re: Scala Implicit Conversion

Re: Scala Implicit Conversion

Re: Splitting a Nifi flowfile into multiple flowfi...

Re: Does NiFi have a means of updateable variables...

Re: Using NiFi-Soap Processor

Re: Spark Mllib - Frequent Pattern Mining - stran...

Re: Spark Mllib - Frequent Pattern Mining - stran...

Re: Spark Mllib - Frequent Pattern Mining - stran...

Re: Spark Mllib - Frequent Pattern Mining - stran...

Re: Can we flatten complex JSON file using NIFI.?