About cstanca

cstanca · ‎08-18-2016

@RAMESH K There is no demonstrated correlation to support that statement. It does not matter the number of nodes. It matters more how resources are used. You can say that for a complex environment where multiple applications and users access resources and SLA is important (jobs need to complete by a given time, users expect a response time under x seconds, etc), a resource manager is a must. As such, running Spark over Yarn just makes sense. It is more solid to deliver in a competitive use of resources environment.

cstanca · ‎08-18-2016

@ripunjay godhani You need to stop Ambari services to allow changes to configurations like log location. During that time, your cluster could still run, but you will have a gap in Ambari logs and metrics.

cstanca · ‎08-18-2016

@kishore sanchina Subash is correct. It is not that different. Pre-reqs: 1. access to your EC2 machine and using the pem key or credentials with root permissions. 2. already setup RSA keys on your local machine. Private key and public key are available at "~/.ssh/id_rsa" and "~/.ssh/id_rsa.pub", respectively. Steps: Login to you EC2 machine as a root user. Create a new user useradd -m <yourname> sudo su <yourname> cd mkdir -p ~/.ssh touch ~/.ssh/authorized_keys Append contents of file ~/.ssh/id_rsa.pub on you local machine to ~/.ssh/authorized_keys on EC2 machine. chmod -R 700 ~/.ssh chmod 600 ~/.ssh/* Check whether ssh-ing is permitted by the machine. It should. In /etc/ssh/sshd_config, line containing "PasswordAuthentication yes" is uncommented. Restart sshd service if you make any change in this file: service sshd restart # On Centos service ssh restart # On Ubuntu Your passwordless login should work now. Try following on your local machine: ssh -A <yourname>@ec2-xx-xx-xxx-xxx.ap-southeast-1.compute.amazonaws.com Making yourself a super user. Open /etc/sudoers . Make sure following two lines are uncommented: ## Allows people in group wheel to run all commands %wheel ALL=(ALL) ALL ## Same thing without a password %wheel ALL=(ALL) NOPASSWD: ALL Add yourself to wheel group. usermod -aG wheel <yourname> Try it and let me know.

cstanca · ‎08-18-2016

@Pat McCarthy If you want to keep only numbers and alphas, you can use something like this: [^0-9a-zA-Z]+ If you want also to keep some special characters like {}, you can use something like this: [^0-9a-zA-Z{}]+ Obviously, you can add other special characters to the above regex The following matches any non-alphanumeric characters: [^\w\d] Try this interactive tutorial: http://regexone.com/

cstanca · ‎08-18-2016

@vpemawat You already tagged the question as HDFS, Hive and SolR. I guess that you know the answer 🙂

cstanca · ‎08-18-2016

@Adi Jabkowsky That output format refers to the data format in intermediary stage (FlowFile) and not an output to a file put somewhere in the OS. If you want to output to a file then you need to continue your flow with a different processor, e.g. PutFile. You should use ReplaceText to build a HiveQL statement (INSERT, e.g.) either with parameters (if you want prepared statements) or hard-coded values. PutHiveQL is the processor to use and the SQL needs to be prepared and passed to this processor from another processor. References: https://community.hortonworks.com/articles/45706/using-the-new-hiveql-processors-in-apache-nifi-070.html https://community.hortonworks.com/questions/46150/convertjsontosql-in-apache-nifi-for-sending-to-put.html

cstanca · ‎08-18-2016

@Prakash M All good advices above. Could you confirm that ACID is turned on globally? Also Tez.

cstanca · ‎08-18-2016

HDP 2.5 will be released end of August, beginning of September. Just a few weeks...

cstanca · ‎08-18-2016

@sankar rao Killing jobs is the not the right way to go about your production environment. Doing global tuning can help some jobs and impact other jobs. You need to understand your jobs and their resource requirements. You could set the size of containers as such that you maximize the use of resources, you could do a lot of things with Tez parameters, etc., but again boils to understanding your jobs and their requirements. You need to identify those jobs that use a lot of resources and optimize them with design techniques. You also need to manage your queues and their allocated resources and assign applications or users to a specific queues based on their workload and SLA needs. Not last, plan for adding more resources to your cluster in a proactive manner setting thresholds for alerts to preempt YARN 100%.

cstanca · ‎08-18-2016

@RAMESH K Use Spark Standalone if you are Spark only shop and you don't care about resource contention with other services from the Hadoop ecosystem. Your Spark uses all resources of your cluster. If your Spark is part of Hortonworks Data Platform and Spark shares resources like HDFS, use Spark over YARN. That will allow you to allocate proper resources to Spark and avoid resource contention with other services. You can achieve SLA. I hope this answer helps.

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: Spark Standalone Cluster.

Re: can i change log location in HDP installation

Re: how to create password less ssh between two AW...

Re: Remove ^@ from a flow file in Nifi

Re: I am facing issue of huge data in mysql table ...

Re: How to insert data into Hive using NiFi ?

Re: How to update hive table

Re: Hive in HDP 2.4 and ALTERs

Re: YARN MEMORY 100% full , some of jobs gets fail...

Re: Spark Standalone Cluster.