About mburgess

mburgess · ‎08-03-2016

With the first ReplaceText you could leave yourself a marker like '@@DATE_HERE@@' as the date_time value, then as your intuition suggests, you can have another ReplaceText after ConvertJSONtoSQL to match the marker and replace it with the Expression Language statement, thereby removing the quotes by matching them with the marker and replacing with the correct literal.

mburgess · ‎08-01-2016

It could likely be done with a combination of processors (one part of the flow to read the file and load the conditions into a DistributedMapCache, another to read the input file (GetFile or ListFile -> FetchFile) and possibly split into individual records (with SplitText), extract the desired values with ExtractText, then get the conditions from the DistributedMapCache and route (with RouteOnAttribute) to the various paths. If you are comfortable with a programming language like Groovy, Jython, JRuby, Lua, or Javascript, you could use InvokeScriptedProcessor to accomplish any/all of the above. I'd recommend you keep the script to handling just the reading of the config file and the filtering of the data, as the other processors above handle the remaining tasks very well. If you will only have two routes, you can also use ExecuteScript for scripting, but that processor only gives you "success" and "failure" routes. InvokeScriptedProcessor lets you implement a full Processor so you can define your own relationships/routes. I have some examples (here and here) of InvokeScriptedProcessor, along with many other examples of scripting in NiFi, on my blog.

mburgess · ‎07-27-2016

It is certainly worth considering the addition of various clauses to be added to the QueryDatabaseTable (and soon, the GenerateTableFetch) processors. We will also have to consider whether the various drivers (Oracle, MySQL, Postgres, etc.) support such notation (or some variant), and how to handle the clauses if the database type does not support them. Do you mind filing a Jira case for this improvement?

mburgess · ‎07-27-2016

Perhaps you are being rate-limited by the Twitter API? How often is GetTwitter executing? You can check this on the Scheduling tab of the processor configuration dialog. If "Run Schedule" is set to zero seconds, then the processor will run as fast as possible, which could certainly cause throttling / rate-limiting and thus an eventual lack of data.

mburgess · ‎07-27-2016

Certainly! The Module Directory property in the ExecuteScript processor is for this purpose, you can give it a comma-separated list of directories and/or JAR files, and it will add them to the script's classpath. I have a blog post with an example (bringing in Hazelcast to get data into flowfiles): http://funnifi.blogspot.com/2016/02/executescript-using-modules.html Also if you add the Apache Ivy JAR to your NiFi lib/ folder (normally a no-no but ok in this case), you can even leverage the Grab annotation to bring in dependencies, I have a post with an example here: http://funnifi.blogspot.com/2016/05/using-groovy-grab-with-executescript.html

mburgess · ‎07-21-2016

If you stop the processor that ConvertJSONtoSQL is connected to, you will see your flow files in the connection queue (between the processors). You can right-click on that and choose ListQueue, then pick any of the files and click on the Info button (looks like a question mark) and choose the Attributes tab. That should show all the flow file attributes including the sql.args pairs. Alternatively you can connect ConvertJSONtoSQL to a LogAttribute processor and check logs/nifi-app.log to see the attributes being printed out.

mburgess · ‎07-21-2016

The SQL generated is a prepared statement (the question marks are placeholders for the values). In the flow file coming out of ConvertJSONtoSQL, you should see attributes on the flow file such as "sql.args.1.type" and "sql.args.1.value", there should be a pair of attributes like that for each of the columns (looks like 6). Are those attributes present and valid?

mburgess · ‎07-20-2016

Also if you don't care about that column you can set the Unmatched Column Behavior to warn/ignore

mburgess · ‎07-20-2016

Is Translate Field Names set to true? That should enable the matching of the column (which appears capitalized) against the field (which is lowercase)

mburgess · ‎07-19-2016

What error are you getting? Also what version of NiFi/HDF are you using? In SplitJson, the JSON Path expression you may want is $.* As an alternative, you can try QueryDatabaseTable -> SplitAvro -> ConvertAvroToJson, this will split the Avro records first instead of converting the whole set to JSON then splitting the JSON. In Apache NiFi 1.0.0 (and HDF 2.0), there will be a ConvertAvroToORC processor which will allow you to convert directly to ORC, then you can use PutHDFS and PutHiveQL (also in NiFi 0.7.0 and 1.0.0 and HDF 2.0) to transfer the files to HDFS and create a Hive table atop the target directory to make the data ready for querying.

Online	Offline
Last Visited	‎11-07-2024 11:28 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎11-07-2024 11:28 PM
Posts	892
Kudos received	642

Cloudera Community

Re: Nifi Building error when creating a brand new ...

Re: Tuning PutHive3Streaming NiFi processor

Re: NiFi ExecuteScript - Able to add attributes to...

Re: NiFi - JOLT assign value to attribute from Jso...

Re: NiFi - ExecuteScript for getting max value of ...

Re: Issues with Date handling in NiFi

Re: Split data into multiple files using NIFI base...

Re: Add option clause OPTION(hash join) to QueryDa...

Re: nifi gettwitter works intermittently

Re: Nifi ExecuteScript: Using external libraries w...

Re: Newbie NiFi Oracle PutSQL

Re: Newbie NiFi Oracle PutSQL

Re: ConvertJSONtoSQL in Apache NiFi for Sending to...

Re: ConvertJSONtoSQL in Apache NiFi for Sending to...

Re: importing data from mysql to hive/hdfs using a...