About awatson

awatson · ‎03-29-2016

@Wes Floyd @Scott Shaw I just had a talk with HDP and Ambari PM's and they recommended that you don't mix OS's between major releases (e.g. RHEL 6.X and RHEL 7.X). They did state some people do mix Family OS's in the same major release (e.g. RHEL 7.X and CentOS 7.X) but while less likely, even that could lead to issues as it isn't tested.

awatson · ‎03-17-2016

@vpoornalingam Okay and to confirm, python 2.7.8 is the highest version allowed?

awatson · ‎03-16-2016

Hi All, The documentation says Python 2.6 is required but then right below it says: "Python v2.7.9 or later is not supported due to changes in how Python performs certificate validation." Does that mean you can use Python 2.7.X so long as it's less than 2.7.9? Thanks,

awatson · ‎03-11-2016

@DIALLO Sory what database are you configuring Ambari to use for it's repository?

awatson · ‎03-11-2016

@Michael Rife Can you please try going to localhost:8080 does it bring up Ambari?

awatson · ‎03-11-2016

Hi @DIALLO Sory From what you have posted, I don't see any errors. It says Ambari Server has successfully started. You should be able to access Ambari on Hostname:8080. If Ambari Server is indeed down, please send your ambari-server.log so we can better identify what the potential issue is. To see if Ambari Server isn't running try: ps -ef | grep ambari Cheers, Andrew

awatson · ‎03-11-2016

Storing ranger audit logs on HDFS is beneficial for a couple of years: A) It provides a more scalable distributed data store, so you can store logs for a lot longer B) If you are currently leveraging Hadoop to store all security/audit logs. You can store your Ranger Audit logs along those in HDFS and be able to do better correlation between access request from different systems to help detect anomalies Storing in the RBDMS was the original default. It provides better response times on smaller data sets but it's not as scalable and you will then need to maintain (e.g. purge/roll logs) on a set frequency. Cheers, Andrew

awatson · ‎03-10-2016

@Abdus Sagir Mollah Primary keys can also be useful for bucketing (i.e. paritioning of data) especially if you are trying to leverage the ACID capabilities of Hive. Quote from the below blog: Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. Entire blog: http://hortonworks.com/blog/adding-acid-to-apache-hive/

awatson · ‎02-23-2016

@jsequeiros see the updated processor configure screenshot.

awatson · ‎02-23-2016

Hi All, I leveraged the CSV to JSON XML workflow example to create a workflow where I wait for a CSV from an HTTP call, I then parse and label the CSV values and lastly send the fields and values to myself via email. The flow is working except for the email message doesn't seem to send the CSV replaceText values of Field1, Field2, Field3, Field4. Instead it is sending the Extract text values of csv.1, csv.2, csv.3, csv.4. The weird thing is when we look at the data provenance at the Email Processor we see the input claim has the fields correctly labeled as field1, field2, etc. Any idea what the issue is? EMAIL Message: Standard FlowFile Metadata: id = '4af1cc19-c702-42d0-907e-adcc92b04dab' entryDate = 'Tue Feb 23 16:58:03 UTC 2016' fileSize = '130' FlowFile Attributes: csv.1 = 'one' path = './' flowfile.replay.timestamp = 'Tue Feb 23 16:58:03 UTC 2016' csv.3 = 'three' filename = '5773072822254662' restlistener.remote.user.dn = 'none' csv.2 = 'two' csv.4 = 'four' csv = 'one' restlistener.remote.source.host = 'XXXX' flowfile.replay = 'true' uuid = '4af1cc19-c702-42d0-907e-adcc92b04dab' Template: csvtojson.xml PutEmail Processor

Online	Offline
Last Visited	‎02-21-2017 08:38 PM

Member Since	‎09-24-2015 09:53 PM
Last Visited	‎02-21-2017 08:38 PM
Posts	105
Kudos received	82

Cloudera Community

Re: Using Python <2.7.9 with HDP 2.4

Re: save ranger audit to HDFS Vs Ranger audit to D...

Re: Please suggest what is best way to proceed wit...

Re: how many spark execturos runs for the below co...

Re: Spark HiveContext - Querying External Hive Tab...

Re: HDP Support for mix of OS Releases within a cl...

Re: Using Python <2.7.9 with HDP 2.4

Using Python <2.7.9 with HDP 2.4

Re: Hi I need help I works on Amazon EC2 I try to ...

Re: Unable to find/start the Sandbox Tools

Re: Hi I need help I works on Amazon EC2 I try to ...

Re: save ranger audit to HDFS Vs Ranger audit to D...

Re: Can I create Primary Key in Hive table? I saw ...

Re: Nifi Email Processor not using ReplaceText Val...

Nifi Email Processor not using ReplaceText Values