Member since
09-24-2015
105
Posts
82
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1086 | 04-11-2016 08:30 PM | |
917 | 03-11-2016 04:08 PM | |
809 | 12-21-2015 09:51 PM | |
493 | 12-18-2015 10:43 PM | |
5848 | 12-08-2015 03:01 PM |
11-08-2016
11:58 PM
Hi, I have a streaming use case where I'm ingesting JSON data via an MQ. I am trying to pull out some key-value pairs from a JSON to be sent to a CEP for windowing functions. The issue is that the JSON file is storing the key-value pairs in a nested JSON map with special characters embedded in it.
Below are the details regarding the steps I'm trying to take. Any suggestions on how to achieve my goal would be greatly appreciated. Current JSON: "Message": "[{\"Key\":\"Key1\",\"ResponseTime\":\"54\"},{\"Key\":\"Key2\",\"ResponseTime\":\"2186\"},{\"Key\":\"Key3\",\"ResponseTime\":\"2242\"}]", {
"IncludeExclude": true,
"Description": "ResponseTimes",
"TimeStamp": "2016-07-02T18:59:59.6162528-05:00",
"Sequence": 0,
"Loglevel": 0,
"$type": "Information",
"OperationName": "BeforeSendReply",
"StateInfos": null,
"FileName": "CSS.cs",
"ClassName": null,
"RequestUri": "https://ILoveHadoop.com",
"AssemblyInfo": null,
"LineNumber": "170",
"TimeZone": null,
"Message": "[{\"Key\":\"Key1\",\"ResponseTime\":\"54\"},{\"Key\":\"Key2\",\"ResponseTime\":\"2186\"},{\"Key\":\"Key3\",\"ResponseTime\":\"2242\"}]",
"EventInfo": {
"EventLevel": null,
"$type": "Event123",
"EventSuccess": null,
"EventType": "Information"
}
} Trying to remove special characters so the JSON looks like this: "Message": [{"Key":"Key1","ResponseTime":"54"},{"Key":"Key2","ResponseTime":"2186"},{"Key":"Key3","ResponseTime":"2242"}], {
"IncludeExclude": true,
"Description": "ResponseTimes",
"TimeStamp": "2016-07-02T18:59:59.6162528-05:00",
"Sequence": 0,
"Loglevel": 0,
"$type": "Information",
"OperationName": "BeforeSendReply",
"StateInfos": null,
"FileName": "CSS.cs",
"ClassName": null,
"RequestUri": "https://ILoveHadoop.com",
"AssemblyInfo": null,
"LineNumber": "170",
"TimeZone": null,
"Message": [{"Key":"Key1","ResponseTime":"54"},{"Key":"Key2","ResponseTime":"2186"},{"Key":"Key3","ResponseTime":"2242"}],
"EventInfo": {
"EventLevel": null,
"$type": "Event123",
"EventSuccess": null,
"EventType": "Information"
}
} Then I plan to run the below JOLT shift via the JOLT Processor in NiFi to transpose the map to a list: \\Output for transposing message data
[
{
"operation": "shift",
"spec": {
"Message": {
"*": {
"@ResponseTime": "ApplicationResponseTimes.@Key"
}
}
}
}
]
With an ultimate end output of: {
"ApplicationResponseTimes" : {
"Key1" : "54",
"Key3" : "2242",
"Key2" : "2186"
}
}
Thanks, Andrew
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
10-26-2016
02:58 AM
Hi Yolada, How do you pass flowfile attributes into your JOLT Transformation? Thanks,
... View more
10-26-2016
02:19 AM
But that overwrites the entire FlowFile, right? I just want to replace that one value in the JSON (while leaving the rest of the JSON AS-IS).
... View more
10-26-2016
01:19 AM
1 Kudo
I have used EvaluateJsonPath to pull a value out of a FlowFile and put it into an attribute. Then based off the value of that attribute, I have updated the attribute to a new value. Now, how do I replace the old value in the JSON with the new value that is stored as an attribute? I'd assume I could do this with RegEx and ReplaceText but I wasn't sure if there was a more elegant way to do it? Thanks, Andrew
... View more
Labels:
- Labels:
-
Apache NiFi
10-25-2016
05:44 PM
Hi, Is there a limit (hard coded or performance) on the number of attributes that can be assigned to a single flowfile? If not, what are the considerations (e.g. performance, space on disk, etc) for limiting the number of attributes? Thanks,
... View more
Labels:
- Labels:
-
Apache NiFi
10-21-2016
10:30 PM
Hi All, What are the limits on the: The number of fields in a solr collection? 100K? 1 Million? What is the max size for a particular field? 1MB 100MB? 1GB? Thanks,
... View more
- Tags:
- Data Processing
- solr
Labels:
- Labels:
-
Apache Solr
09-20-2016
04:30 PM
Hi, I currently have a 20 node cluster set up with mount points (/grid01, /grid02,..../grid10) on each of my data nodes. Currently all mounts are available to HDFS. However I would like to reconfigure 3 of the data nodes so that grid points - /grid01, /grid02, /grid03 - are no longer used for HDFS (they will be used for kafka and other non HDFS processes). How best do I go about reconfiguring the datanodes in Ambari ?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
09-19-2016
06:37 PM
2 Kudos
What are the options for Row Level filtering in HBase? I am aware that Ranger 0.6 has this capability for Hive but I wasn't sure what is the best option for doing it in HBase.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Ranger
09-06-2016
05:39 PM
1 Kudo
Hi, How do I go about running multiple Kafka brokers on the HDP 2.4 (or 2.5) Sandbox? Thanks,
... View more
Labels:
- Labels:
-
Apache Kafka
07-14-2016
02:54 PM
3 Kudos
Hi, I am looking to pull all configuration changes (The who, what and when for every change) made in Ambari (e.g. John Smith changed the Yarn min container size to 4gb on June 9th at 6:09am). The reason is that our compliance team want a report of all changes made to production systems. My assumption would be using Ambari's Rest API - I just wasn't sure if someone had some examples on how best to do this. Thanks, Andrew
... View more
Labels:
- Labels:
-
Apache Ambari
07-08-2016
01:36 PM
1 Kudo
Hi, Is it possible to use AWS S3 as a storage tier within HDFS Heterogeneous Storage? If so, any insight would be greatly appreciated.
... View more
Labels:
- Labels:
-
Apache Hadoop
06-09-2016
08:43 PM
1 Kudo
Hi, I currently have an HDP 2.3 cluster with Kafka running. Kafka topics look to be corrupt so I want to wipe all data in the topics. Delete Kafka from the cluster and re-add it. What is the best way to do this? The cluster is currently being used for other tasks (Hive, Spark, etc) that I don't want to impact. Thanks,
... View more
Labels:
- Labels:
-
Apache Kafka
05-25-2016
03:33 PM
Hi All, I am aware that HDP and Ambari need Python 2.X to run it's services. However are there any concerns with installing Python 3.5.1 to be used for data processing? Thanks, Andrew
... View more
Labels:
- Labels:
-
Apache Ambari
05-12-2016
07:03 PM
1 Kudo
Hi All, I am looking to move all my Hive and Pig scripts written in HUE over to Ambari Views. Is there a script that I can use to extract the data from the HUE RDBMS and import it into the appropriate Ambari Views? Thanks,
... View more
Labels:
- Labels:
-
Apache Ambari
-
Cloudera Hue
05-03-2016
08:42 PM
1 Kudo
Hi All, I am looking for what the overall and by component performance impact for implementing all security components in HDP including SSL, TDE, Ranger Kerberos and Knox. I have found a few links regarding SSL and Knox but can't seem to find anything comprehensive enough. Thanks,
... View more
Labels:
- Labels:
-
Apache Knox
-
Apache Ranger
04-27-2016
04:34 PM
@Raghu Gurrala Does the python script successfully finish when manually run outside of Nifi?
... View more
04-15-2016
12:14 AM
1 Kudo
@Laurent Edel That is if unsecure. If secure it would be Datanode: 1004 & WebHDFS 1006
... View more
04-14-2016
10:22 PM
1 Kudo
Hi, What ports need to be opened between clusters for DistCP?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
04-13-2016
06:53 PM
1 Kudo
Hi All, After applying OS patches to Data nodes, the servers must be rebooted. Once the servers are rebooted, the hadoop services running on them are not automatically coming up. You have to manually go into Ambari and tell the services to start on that host. What are some best practices and recommendations as to how to automatically bring up the hadoop services after an OS reboot?
... View more
Labels:
- Labels:
-
Apache Ambari
04-11-2016
08:30 PM
2 Kudos
Okay got the answer: For CentOS 6 and SLES 11: Hortonworks supporst Python 2.6.* (which is what is installed by default). Hortnworks do not support switching to Python 2.7.* on these OSs. For all other supported OSs (CentOS 7, Ubuntu 12 + 14, Debian 7): Hortonworks support Python 2.7.* (which is what is installed by default). As of Ambari 2.2, there is no longer an issue with 2.7.9 (therefore, you can use 2.7.*).
... View more
03-29-2016
08:48 PM
1 Kudo
@Wes Floyd @Scott Shaw I just had a talk with HDP and Ambari PM's and they recommended that you don't mix OS's between major releases (e.g. RHEL 6.X and RHEL 7.X). They did state some people do mix Family OS's in the same major release (e.g. RHEL 7.X and CentOS 7.X) but while less likely, even that could lead to issues as it isn't tested.
... View more
03-17-2016
03:07 AM
1 Kudo
@vpoornalingam Okay and to confirm, python 2.7.8 is the highest version allowed?
... View more
03-16-2016
02:59 PM
3 Kudos
Hi All, The documentation says Python 2.6 is required but then right below it says: "Python v2.7.9 or later is not supported due to changes in how Python performs certificate validation." Does that mean you can use Python 2.7.X so long as it's less than 2.7.9? Thanks,
... View more
Labels:
03-11-2016
06:53 PM
1 Kudo
@DIALLO Sory what database are you configuring Ambari to use for it's repository?
... View more
03-11-2016
04:24 PM
@Michael Rife Can you please try going to localhost:8080 does it bring up Ambari?
... View more
03-11-2016
04:11 PM
Hi @DIALLO Sory From what you have posted, I don't see any errors. It says Ambari Server has successfully started. You should be able to access Ambari on Hostname:8080. If Ambari Server is indeed down, please send your ambari-server.log so we can better identify what the potential issue is. To see if Ambari Server isn't running try: ps -ef | grep ambari Cheers, Andrew
... View more
03-11-2016
04:08 PM
1 Kudo
Storing ranger audit logs on HDFS is beneficial for a couple of years: A) It provides a more scalable distributed data store, so you can store logs for a lot longer B) If you are currently leveraging Hadoop to store all security/audit logs. You can store your Ranger Audit logs along those in HDFS and be able to do better correlation between access request from different systems to help detect anomalies Storing in the RBDMS was the original default. It provides better response times on smaller data sets but it's not as scalable and you will then need to maintain (e.g. purge/roll logs) on a set frequency. Cheers, Andrew
... View more
03-10-2016
03:08 PM
2 Kudos
@Abdus Sagir Mollah Primary keys can also be useful for bucketing (i.e. paritioning of data) especially if you are trying to leverage the ACID capabilities of Hive. Quote from the below blog:
Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. Entire blog: http://hortonworks.com/blog/adding-acid-to-apache-hive/
... View more
02-23-2016
06:30 PM
@jsequeiros see the updated processor configure screenshot.
... View more
02-23-2016
05:18 PM
1 Kudo
Hi All, I leveraged the CSV to JSON XML workflow example to create a workflow where I wait for a CSV from an HTTP call, I then parse and label the CSV values and lastly send the fields and values to myself via email. The flow is working except for the email message doesn't seem to send the CSV replaceText values of Field1, Field2, Field3, Field4. Instead it is sending the Extract text values of csv.1, csv.2, csv.3, csv.4. The weird thing is when we look at the data provenance at the Email Processor we see the input claim has the fields correctly labeled as field1, field2, etc. Any idea what the issue is? EMAIL Message:
Standard FlowFile Metadata:
id = '4af1cc19-c702-42d0-907e-adcc92b04dab'
entryDate = 'Tue Feb 23 16:58:03 UTC 2016'
fileSize = '130'
FlowFile Attributes:
csv.1 = 'one'
path = './'
flowfile.replay.timestamp = 'Tue Feb 23 16:58:03 UTC 2016'
csv.3 = 'three'
filename = '5773072822254662'
restlistener.remote.user.dn = 'none'
csv.2 = 'two'
csv.4 = 'four'
csv = 'one'
restlistener.remote.source.host = 'XXXX'
flowfile.replay = 'true'
uuid = '4af1cc19-c702-42d0-907e-adcc92b04dab' Template: csvtojson.xml PutEmail Processor
... View more
Labels:
- Labels:
-
Apache NiFi