About ldaluz

ldaluz · ‎09-06-2016

@justin kuspa In HDP 2.5, r is provided in Zeppelin via the Livy interpreter. Try using the following: %livy.sparkr Note, you will need to make sure you have R installed on your machine first. If you haven't already, install it with the following (on all nodes): yum install R R-devel libcurl-devel openssl-devel Validate it was installed correctly: R -e "print(1+1)" Once it is installed, test out sparkr in Zeppelin with Livy to confirm it is working: %livy.sparkr foo <- TRUE print(foo)

ldaluz · ‎09-05-2016

@Piyush Jhawar The Ranger Hive plugin protects Hive data when it is accessed via HiveServer2. When you access these tables using HCatalog in Pig you are not going through HiveServer2, but instead Pig is using the files directly from HDFS (HCatalog is just used to map the table metadata to the HDFS files in this case). In order to protect this data, you should also define a Ranger HDFS policy to protect the underlying HDFS directory that is used to store the marketingDb.saletable data. To clarify: Ranger Hive Plugin - Used to protect Hive data when accessed via HiveServer2 (e.g, a user connecting to Hive via JDBC) Ranger HDFS Plugin - Used to protect HDFS files and directories (suitable if users need to access the data outside of HiveServer2 - Pig, Spark etc)

ldaluz · ‎07-05-2016

@Manikandan Durairaj Within the PutHDFS processor, you can set the HDFS owner/group using the 'Remote Owner' and 'Remote Group' properties: Note, this will only work if NiFi is running as a user that has HDFS super-user privilege to change owner/group.

ldaluz · ‎06-22-2016

@Predrag Minovic - Slight update on storm, we can run multiple Nimbus servers: https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_Ambari_Users_Guide/content/ch05s05.html This is to deal with cases where Nimbus can't be automatically restarted (e.g disk failure on the node). Details of Nimbus HA is outlined here: http://hortonworks.com/blog/fault-tolerant-nimbus-in-apache-storm/

ldaluz · ‎06-17-2016

@sankar rao To elaborate on the answer provided by @Artem Ervits The edge node is typically used to install client tools, so it will make sense to install the AWS S3 CLI on the edge. For adding new users to the cluster, you need to ensure that the new users exist in ALL nodes. The reason is that Hadoop by default takes the user/group mappings from the UNIX users by default. So for Hadoop to 'know' about the new user you've created on the edge node, that same userid should exist on all nodes.

ldaluz · ‎06-09-2016

@KC Your 'InferAvroSchema' is likely capturing the schema as an attribute called 'inferred.avro.schema' (assuming you followed the tutorial here: https://community.hortonworks.com/articles/28341/converting-csv-to-avro-with-apache-nifi.html ) If that's the case, you can view its output by looking at one of the flowfiles in queue after 'InferAvroSchema' (List queue > select a flowfile > view attributes > view inferred.avro.schema property). If you want to manually define the schema without changing too much of your flow, you can directly replace your 'InferAvroSchema' processor with an 'UpdateAttribute' processor - within the 'UpdateAttribute' define a new property called inferred.avro.schema and paste in your avro schema as the value (json format).

ldaluz · ‎06-09-2016

@KC How are you defining your Avro Schema? Typically the 'failed to convert' errors occur when the csv records don't fit the data types defined in your avro schema. If you're using the 'InferAvroSchema' processor or Kite SDK to define the schema, it is possible that the inferred schema isn't a true representation of your data (keep in mind that these methods infer the schema based on a subset of the data, so if your data isn't very consistent then it is likely that they will misinterpret what the field types are and hit errors during converting). If you know the data, you could get around this by manually defining the Avro schema based on the actual data types.

ldaluz · ‎04-15-2016

@Indrajit swain The reason you can't see the Action option is because you are currently logged in as the 'maria_dev' user which isn't given full Ambari access by default. You can log in as the 'Admin' user account to change this. Note that the default password for the 'Admin' user in HDP 2.4 sandbox has been changed. Refer to the following thread with details on resetting the admin account password: https://community.hortonworks.com/questions/20960/no-admin-permission-for-the-latest-sandbox-of-24.html

ldaluz · ‎03-13-2016

@zhang dianbo Which web browser are you using? I'm able to download the data using the link you've provided without any issues (using Chrome). If you still can't get the box download to work, here is the file you are looking for (uploaded to this post directly): geolocation.zip

ldaluz · ‎12-08-2015

How do we manage authorization control over tables within SparkSQL? Will ranger enforce existing Hive policies when these Hive tables are accessed via SparkSQL? If not, what is the recommended approach.

Online	Offline
Last Visited	‎08-04-2022 04:55 PM

Member Since	‎09-14-2015 01:07 PM
Last Visited	‎08-04-2022 04:55 PM
Posts	47
Kudos received	89

Cloudera Community

Re: prefix or suffix in NIFI tailFile Processor

Re: How can I WAIT on two processors in NIFI? Fina...

Re: streaming source

Re: Can we integrate NiFi with other Apache Softwa...

Re: Import Data from MongoDb to HBase using Spark

Re: No R interpreter in Zeppelin, HDP 2.5

Re: PIG is not restricting authorization of HCatal...

Re: How to change owner in Hdfs using Nifi tool?

Re: HDP services High Availability

Re: Hi ,i just got request to add new users in had...

Re: ConvertCSVToAvro NiFi Failed to Convert Record...

Re: ConvertCSVToAvro NiFi Failed to Convert Record...

Re: hortonworks Sandbox dont have the Action optio...

Re: Geolocation zip can not be downloaded

Ranger security when accessing Hive tables via Spa...