About awatson

awatson · ‎11-08-2016

Hi, I have a streaming use case where I'm ingesting JSON data via an MQ. I am trying to pull out some key-value pairs from a JSON to be sent to a CEP for windowing functions. The issue is that the JSON file is storing the key-value pairs in a nested JSON map with special characters embedded in it. Below are the details regarding the steps I'm trying to take. Any suggestions on how to achieve my goal would be greatly appreciated. Current JSON: "Message": "[{\"Key\":\"Key1\",\"ResponseTime\":\"54\"},{\"Key\":\"Key2\",\"ResponseTime\":\"2186\"},{\"Key\":\"Key3\",\"ResponseTime\":\"2242\"}]", { "IncludeExclude": true, "Description": "ResponseTimes", "TimeStamp": "2016-07-02T18:59:59.6162528-05:00", "Sequence": 0, "Loglevel": 0, "$type": "Information", "OperationName": "BeforeSendReply", "StateInfos": null, "FileName": "CSS.cs", "ClassName": null, "RequestUri": "https://ILoveHadoop.com", "AssemblyInfo": null, "LineNumber": "170", "TimeZone": null, "Message": "[{\"Key\":\"Key1\",\"ResponseTime\":\"54\"},{\"Key\":\"Key2\",\"ResponseTime\":\"2186\"},{\"Key\":\"Key3\",\"ResponseTime\":\"2242\"}]", "EventInfo": { "EventLevel": null, "$type": "Event123", "EventSuccess": null, "EventType": "Information" } } Trying to remove special characters so the JSON looks like this: "Message": [{"Key":"Key1","ResponseTime":"54"},{"Key":"Key2","ResponseTime":"2186"},{"Key":"Key3","ResponseTime":"2242"}], { "IncludeExclude": true, "Description": "ResponseTimes", "TimeStamp": "2016-07-02T18:59:59.6162528-05:00", "Sequence": 0, "Loglevel": 0, "$type": "Information", "OperationName": "BeforeSendReply", "StateInfos": null, "FileName": "CSS.cs", "ClassName": null, "RequestUri": "https://ILoveHadoop.com", "AssemblyInfo": null, "LineNumber": "170", "TimeZone": null, "Message": [{"Key":"Key1","ResponseTime":"54"},{"Key":"Key2","ResponseTime":"2186"},{"Key":"Key3","ResponseTime":"2242"}], "EventInfo": { "EventLevel": null, "$type": "Event123", "EventSuccess": null, "EventType": "Information" } } Then I plan to run the below JOLT shift via the JOLT Processor in NiFi to transpose the map to a list: \\Output for transposing message data [ { "operation": "shift", "spec": { "Message": { "*": { "@ResponseTime": "ApplicationResponseTimes.@Key" } } } } ] With an ultimate end output of: { "ApplicationResponseTimes" : { "Key1" : "54", "Key3" : "2242", "Key2" : "2186" } } Thanks, Andrew

awatson · ‎10-26-2016

Hi Yolada, How do you pass flowfile attributes into your JOLT Transformation? Thanks,

awatson · ‎10-26-2016

But that overwrites the entire FlowFile, right? I just want to replace that one value in the JSON (while leaving the rest of the JSON AS-IS).

awatson · ‎10-26-2016

I have used EvaluateJsonPath to pull a value out of a FlowFile and put it into an attribute. Then based off the value of that attribute, I have updated the attribute to a new value. Now, how do I replace the old value in the JSON with the new value that is stored as an attribute? I'd assume I could do this with RegEx and ReplaceText but I wasn't sure if there was a more elegant way to do it? Thanks, Andrew

awatson · ‎10-25-2016

Hi, Is there a limit (hard coded or performance) on the number of attributes that can be assigned to a single flowfile? If not, what are the considerations (e.g. performance, space on disk, etc) for limiting the number of attributes? Thanks,

awatson · ‎10-21-2016

Hi All, What are the limits on the: The number of fields in a solr collection? 100K? 1 Million? What is the max size for a particular field? 1MB 100MB? 1GB? Thanks,

awatson · ‎09-20-2016

Hi, I currently have a 20 node cluster set up with mount points (/grid01, /grid02,..../grid10) on each of my data nodes. Currently all mounts are available to HDFS. However I would like to reconfigure 3 of the data nodes so that grid points - /grid01, /grid02, /grid03 - are no longer used for HDFS (they will be used for kafka and other non HDFS processes). How best do I go about reconfiguring the datanodes in Ambari ?

awatson · ‎09-19-2016

What are the options for Row Level filtering in HBase? I am aware that Ranger 0.6 has this capability for Hive but I wasn't sure what is the best option for doing it in HBase.

awatson · ‎09-06-2016

Hi, How do I go about running multiple Kafka brokers on the HDP 2.4 (or 2.5) Sandbox? Thanks,

awatson · ‎07-14-2016

Hi, I am looking to pull all configuration changes (The who, what and when for every change) made in Ambari (e.g. John Smith changed the Yarn min container size to 4gb on June 9th at 6:09am). The reason is that our compliance team want a report of all changes made to production systems. My assumption would be using Ambari's Rest API - I just wasn't sure if someone had some examples on how best to do this. Thanks, Andrew

Online	Offline
Last Visited	‎02-21-2017 08:38 PM

Member Since	‎09-24-2015 09:53 PM
Last Visited	‎02-21-2017 08:38 PM
Posts	105
Kudos received	82

Cloudera Community

Re: Using Python <2.7.9 with HDP 2.4

Re: save ranger audit to HDFS Vs Ranger audit to D...

Re: Please suggest what is best way to proceed wit...

Re: how many spark execturos runs for the below co...

Re: Spark HiveContext - Querying External Hive Tab...

Removing Special Characters from JSON

Re: JSON-to-JSON Simplified with Apache NiFi and J...

Re: Update JSON Element w/FlowFile Attribute - Nif...

Update JSON Element w/FlowFile Attribute - Nifi

Nifi Attribute Limit

Solr Field Limits

HDFS - Different Storage Configurations on Servers

HBase Row Level Filtering

Running Multiple Kafka Brokers on one HDP Sandbox

How to Extract Configuration Changes from Ambari