About sandeepksaini

sandeepksaini · ‎11-23-2017

@Sai Dileep The only way to have a look at these types of error is directly checking them in the log file (As shown by @Jay Kumar SenSharma already) or you can also take a different route by using Ambari log search service. Incase you don't have it in your environment it's a good idea to install it. But it's recommended only if you have a cluster size comprising of 150 or lesser nodes. Also, this type of behaviour is normal when a port is already binded by another process Zookeeper tries to start a process on the same port. Moreover, in ambari console like the image you have provided above it is not reflected reason being it captures the log till the pid is started and not after that. Here is the gify showing same type of symptoms with history server.output-sy8pf5.gif

sandeepksaini · ‎11-21-2017

@Michael Bronson m -rf -> This is a Linux/Unix based command which will only delete your Unix/Lrinux based directory created in Unix/Linux file system. Whereas hdfs dfs -rmr /DirectoryPath -> Is for deletion of files/dirs in HDFS filesystem. Incase I miss interpreted your question then and you mean to ask me what is difference between "hdfs dfs -rmr" and "hdfs dfs -rm -rf" then the later one doesn't exist as there is no "-f" parameter to rm command in HDFS filesystem. We only have "-r" as an option for rm command in HDFS to delete the dir and files.

sandeepksaini · ‎11-21-2017

@Michael Bronson That's really a very good question. I can see there is still work going on it and one JIRA ( HDFS-107 ) in open state. So the answer to your first question is obvious NO. Formatting the namenode will surely be going to impact your whole cluster as it's the master node and contains metadata of all of the data nodes. So, formatting namenode is not a good idea in my view. I have to replicate some steps before answering your second question. But, it's the tricky one. I will try to answer it ASAP.

sandeepksaini · ‎11-20-2017

@chaitanya Kandula Just to compliment @Jay Kumar SenSharma 's answer in case you still find your file opened by multiple processes and gives you such type of error you may use the below-mentioned command to check the process using that file and then killing it to resume back normally. lsof <file-name> lsof /var/log/ambari-server/ambari-server.log Once, your sandbox is up and running check if your ambari server has come up properly or not. ambari-server status If it is up and running you will find the log file in the same location. I would suggest stop the ambari services, move the older files to some temporary location and then start the services to have a fresh look at the problem. mbari-server stop mkdir -p /tmp/AmbariLogBackup mv /var/log/ambari-server/* /tmp/AmbariLogBackup ambari-server status ambari-server start Once your ambari log creation problem is sorted try to perform the steps so that error messages will be logged into it. Also, do attach it here so that we can have a look at the problem Thanks. SKS

sandeepksaini · ‎11-20-2017

@Michael Bronson To delete the HDFS directories in cluster use the command mentioned below: hdfs dfs -rmr /DirectoryPath This will delete all directory and files under this path /DirectoryPath

sandeepksaini · ‎11-20-2017

@Carol Elliott Did you try running the script with the modification I have mentioned?

sandeepksaini · ‎11-19-2017

The answer to your problem is simple, before coming to solution let me break my answers into steps: I think you want the output of your code in one file and change the permission of that log file, which you also want to be read in some other script or might be some automation you want to do. So, I assume that you are trying to use exec command for same. Let's try to understand why your script is misbehaving in this manner. Well, it's not it is just following the protocol and command mentioned in your script. Exec is the way of running the command in Linux which doesn't spawn a new pid and utilize the same pid of the current shell. So, ideally when you run a command mentioned below in a Linux shell, you will notice that it stops listening and goes in hanging state, but it's not!! exec >1.log 2>&1 We have closed the doors with our own hands, i.e., pointing standard error and output to a log file. So, inside your script when this command is executed your child-process goes in the same state. Now, let's understand how nohup and background process works. When you try executing your script using nohup ./x.sh & nohup takes terminal as standard out and error and point your .sh output to "nohup.out" which is quite expected (To understand this I would recommend having a look at man page for nohup). And this is the reason you are getting below the line on your terminal: nohup ./run_beeline_hql.sh & [1] 58486 [xxxx/home/xxxx/xxx/xxx]$ nohup: ignoring input and appending output to `nohup.out' [1] + Stopped (SIGTTOU) nohup ./run_beeline_hql.sh & Now, we know when your script start running it goes into that state and doesn't do much unless you send a kill command which is when the process tries to kill itself gracefully and complete the task and exit out. The solution to ignore your nohup errors use the below-mentioned piece of code which will help you achieve the same with very little code changes : #!/bin/sh #Declaring Variables THISFILE='run_beeline_hql' EXT1=$(date +%y%m%d) EXT2=$(date +%H%M%S) . $(dirname $0)/srv.env # Close STDOUT file descriptor exec 1<&- # Close STDERR FD exec 2<&- # Open STDOUT as $THISFILE file for read and write. exec 1<>$THISFILE # Redirect STDERR to STDOUT exec 2>&1 chmod 666 $output_dir/${THISFILE}_$EXT1.$EXT2.log beeline -f simple_query.hql exit In case, you also don't want nohup error than running the script like below would be a better idea: nohup ./run_beeline_hql.sh 1>Temp.log 2>&1 & Do let me know if this worked out or not? Thanks, SKS

sandeepksaini · ‎11-19-2017

Great tool. Thanks for sharing. In case you want to automate the script nohup is the only way to go ahead I think.

sandeepksaini · ‎11-18-2017

This is probably the best link which I have found on how to setup the passwordless login using ssh-copy-id command on ec-2 instances. https://superuser.com/questions/331167/why-cant-i-ssh-copy-id-to-an-ec2-instance Probably, this might help.

sandeepksaini · ‎11-17-2017

Let, me break down my answer in parts and provide you some initial details to help you understand it in a better manner. Hadoop: Yes, it's a framework which is currently been used to handle the Big Data problem faced by most of the customer these days. Hadoop works on distributed computing machines which are cheaper in cost also referred as Commodity Hardware and you don't have to send/move your data over network in turn you send your code to data for faster computation and quicker results. When bigdata come into picture we can think of two major problem i.e. how are we going to store it and how we can implement processing on it. To overcome these two major challenges which is hard to implement in your DBMS system Hadoop came into picture. Obviously, it varies with use case but right now we are just keeping ourselves constrained to Hadoop and bigdata. The two major components of Hadoop are Storage: To store the Big Data and manage it efficiently, redundantly this component came into picture. To achieve this we have HDFS which is Hadoop Distributed File System. HDFS is a type of file system for management of your data in a Hadoop Cluster and major services related to this are mentioned below: NameNode ( Master Deamon) Data Node ( Slave Daemon) Computation : In earlier part, we have seen Storage is handled by HDFS , then next part comes is computation which is taken care by YARN framework in your Hadoop also known as Yet Another Resource negotiator. YARN components are mentioned below: Resource Manager ( Master Deamon) Application master ( Slave Daemon) Node Manager ( Slave Daemon) HBASE : Hbase is a open source, nosql, a distributed, scalable, big data store. It's gives you faster read and write access to your big data stored in HDFS. You can think it like a layer on top of HDFS. You will have a API using that you can write your no-sql queries and get the results. You can use in your system when you need random, realtime read/write access to your Big Data. It was inspired and modeled after white papers released by Internet Giant Google. Paper link: Bigtable: A Distributed Storage System for Structured Data In my suggestion the best way to go ahead is to start from book named Hadoop Definitive Guide by Tom White.

Online	Offline
Last Visited	‎07-28-2021 03:58 AM

Member Since	‎03-29-2018 07:15 AM
Last Visited	‎07-28-2021 03:58 AM
Posts	41
Kudos received	4

Cloudera Community

Re: How to create a file in specified directory ?

Re: Difference between HBase and Hadoop/HDFS

Re: Zookeeper Server keeps stopping after I restar...

Re: how to clear HDFS directories on specific host

Re: hadoop namenode -format

Re: cant find VNC service under actions in ambari ...

Re: how to clear HDFS directories on specific host

Re: Unable to successfully launch beeline script f...

Re: Unable to successfully launch beeline script f...

Re: Unable to successfully launch beeline script f...

Re: how to create password less ssh between two AW...

Re: Difference between HBase and Hadoop/HDFS