Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5127 | 09-21-2018 09:54 PM | |
6494 | 03-31-2018 03:59 AM | |
1968 | 03-31-2018 03:55 AM | |
2179 | 03-31-2018 03:31 AM | |
4828 | 03-27-2018 03:46 PM |
08-18-2016
07:49 PM
1 Kudo
@RAMESH K There is no demonstrated correlation to support that statement. It does not matter the number of nodes. It matters more how resources are used. You can say that for a complex environment where multiple applications and users access resources and SLA is important (jobs need to complete by a given time, users expect a response time under x seconds, etc), a resource manager is a must. As such, running Spark over Yarn just makes sense. It is more solid to deliver in a competitive use of resources environment.
... View more
08-18-2016
07:30 PM
1 Kudo
@ripunjay godhani You need to stop Ambari services to allow changes to configurations like log location. During that time, your cluster could still run, but you will have a gap in Ambari logs and metrics.
... View more
08-18-2016
07:05 PM
5 Kudos
@kishore sanchina Subash is correct. It is not that different. Pre-reqs: 1. access to your EC2 machine and using the pem key or credentials with root permissions. 2. already setup RSA keys on your local machine. Private key and public key are available at "~/.ssh/id_rsa" and "~/.ssh/id_rsa.pub", respectively. Steps:
Login to you EC2 machine as a root user.
Create a new user useradd -m <yourname>
sudo su <yourname>
cd
mkdir -p ~/.ssh
touch ~/.ssh/authorized_keys
Append contents of file ~/.ssh/id_rsa.pub on you local machine to ~/.ssh/authorized_keys on EC2 machine. chmod -R 700 ~/.ssh
chmod 600 ~/.ssh/*
Check whether ssh-ing is permitted by the machine. It should. In /etc/ssh/sshd_config, line containing "PasswordAuthentication yes" is uncommented. Restart sshd service if you make any change in this file: service sshd restart # On Centos
service ssh restart # On Ubuntu
Your passwordless login should work now. Try following on your local machine: ssh -A <yourname>@ec2-xx-xx-xxx-xxx.ap-southeast-1.compute.amazonaws.com
Making yourself a super user. Open /etc/sudoers . Make sure following two lines are uncommented: ## Allows people in group wheel to run all commands
%wheel ALL=(ALL) ALL
## Same thing without a password
%wheel ALL=(ALL) NOPASSWD: ALL
Add yourself to wheel group. usermod -aG wheel <yourname> Try it and let me know.
... View more
08-18-2016
06:31 PM
3 Kudos
@Pat McCarthy If you want to keep only numbers and alphas, you can use something like this: [^0-9a-zA-Z]+ If you want also to keep some special characters like {}, you can use something like this: [^0-9a-zA-Z{}]+ Obviously, you can add other special characters to the above regex The following matches any non-alphanumeric characters: [^\w\d] Try this interactive tutorial: http://regexone.com/
... View more
08-18-2016
06:15 PM
1 Kudo
@vpemawat You already tagged the question as HDFS, Hive and SolR. I guess that you know the answer 🙂
... View more
08-18-2016
04:04 PM
4 Kudos
@Adi Jabkowsky That output format refers to the data format in intermediary stage (FlowFile) and not an output to a file put somewhere in the OS. If you want to output to a file then you need to continue your flow with a different processor, e.g. PutFile. You should use ReplaceText to build a HiveQL statement (INSERT, e.g.) either with parameters (if you want prepared statements) or hard-coded values. PutHiveQL is the processor to use and the SQL needs to be prepared and passed to this processor from another processor. References: https://community.hortonworks.com/articles/45706/using-the-new-hiveql-processors-in-apache-nifi-070.html https://community.hortonworks.com/questions/46150/convertjsontosql-in-apache-nifi-for-sending-to-put.html
... View more
08-18-2016
03:35 AM
3 Kudos
@Prakash M All good advices above. Could you confirm that ACID is turned on globally? Also Tez.
... View more
08-18-2016
03:07 AM
HDP 2.5 will be released end of August, beginning of September. Just a few weeks...
... View more
08-18-2016
03:03 AM
4 Kudos
@sankar rao Killing jobs is the not the right way to go about your production environment. Doing global tuning can help some jobs and impact other jobs. You need to understand your jobs and their resource requirements. You could set the size of containers as such that you maximize the use of resources, you could do a lot of things with Tez parameters, etc., but again boils to understanding your jobs and their requirements. You need to identify those jobs that use a lot of resources and optimize them with design techniques. You also need to manage your queues and their allocated resources and assign applications or users to a specific queues based on their workload and SLA needs. Not last, plan for adding more resources to your cluster in a proactive manner setting thresholds for alerts to preempt YARN 100%.
... View more
08-18-2016
02:49 AM
4 Kudos
@RAMESH K Use Spark Standalone if you are Spark only shop and you don't care about resource contention with other services from the Hadoop ecosystem. Your Spark uses all resources of your cluster. If your Spark is part of Hortonworks Data Platform and Spark shares resources like HDFS, use Spark over YARN. That will allow you to allocate proper resources to Spark and avoid resource contention with other services. You can achieve SLA. I hope this answer helps.
... View more