Member since
12-09-2015
97
Posts
51
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1513 | 05-04-2016 06:00 AM | |
3257 | 04-11-2016 09:57 AM | |
1009 | 04-08-2016 11:30 AM |
06-27-2016
11:27 AM
I have a 22 GB file, that is processed by a MapReduce job. The output file is a JSON file that I am storing on HDFS. The size of the file is 1 GB. Currently, I do not want to reduce the information in the output file, because it contains valuable information needed for my visualization (drill down etc). The problem is that this file is huge in terms of reading from HDFS and used by charting tools on a web page. What should be the strategy here?. My first thought is to go for a NoSQL such as MongoDB or HBase. But, I have other choices like a RDBS like Oracle. I understand that the choice actually depends upon the nature of the data, but I would like to hear from the experienced hadoop users who might have faced similar situation.
... View more
Labels:
- Labels:
-
Apache Hadoop
05-10-2016
06:02 AM
@Ajay I have understood it now, but can you tell me how do I view the Current mappers running for an application? what links should I follow?
... View more
05-10-2016
03:39 AM
I have a job running in the cluster, but I am unable to see that job through the JobHistory UI. I can only see the job if I execute the command "hadoop job -list" in the linux command prompt. I have observed that if I go to "ResourceManager UI" I see a running application, but I do not see any jobs of that running application through "JobHistory UI". In the ResourceManager UI, I have also observed that the latest application that I executed is associated with "ApplicationMaster' under "Tracking UI" field. Rest of the other Applications are associated with "History" under "Tracking UI" field. Is this the reason why I cannot see all the jobs under JobHistory UI for this application because it is associated with ApplicationMaster?.
... View more
Labels:
- Labels:
-
Apache Ambari
05-10-2016
03:36 AM
@Predrag Minovic Thanks. I have understood it now very well. The problem was indeed Nodemanager not available in the other two nodes. Also, I made a mistake in my calculation of MB, due to which I misunderstood the process.
... View more
05-06-2016
11:26 AM
I have a 4 node cluster. I am running a MapReduce job on this cluster. The input file is a JSON file of the size 1.53 GB. The Mapper task is reading a JSON record and manipulating the text. I observed the following, after I executed the Job. 1) There are 15 Mapper tasks, which is correct. (no issues here) 2) Only 1% of the job is processed in 50 minutes, which is very slow. 3) Only 4 mapper task is shown running. 4) Two mappers are running on Machine1 and other two mapers are running on Machine2. 5) Mapper task 1 in Machine 1 is showing total 21627027 as read and keeps increasing after a few seconds. Following is what I need to understand: 1) Why only two Nodes have all the Mapper tasks running. Why are the other nodes not running any mapper? 2) If one mapper is per 128 MB file block, why the mapper task on machine 1 is showing 21627027 byes (21 MB) of data ? (Edited: I had mentioned 21120 MB, which was a calculation mistake. The correct figure is 21 MB.)
... View more
Labels:
- Labels:
-
Apache Hadoop
05-05-2016
10:47 AM
I found out what was occupying space in Non dfs space. It was the log files under the folder /var/log/hive. It had around 67 GB of log file!!!. I removed the file and now the space has been reclaimed. Thanks for your help. (I used the command du -kscx * to know the size of each folder. I executed this command in the log folder.)
... View more
05-05-2016
10:21 AM
@Sagar Shimpi I have checked the NameNode UI. I observe that the "Non-DFS Used" is showing 77.15 GB and "used" showing just 1.25 GB. 77.15 GB is very high as compared to other three nodes. My question is what to do next? how do I free up more space on this node?. As for the versions, HDP is version 2.4 and Ambari is version 2.2.1.1.
... View more
05-05-2016
09:50 AM
@Sagar ShimpiThanks for pointing me to "Rebalance HDFS" utility. After I clicked on Rebalance HDFS, the progress bar quickly ended saying success. Shouldn't this be a long procedure, with lots of data being sent from one node to another to balance?. How do I know when the process will finish, if it has not ended, because after click that link, I do not see any change immediately.
... View more
05-05-2016
09:23 AM
I have a four node cluster (HDP 2.4). On the Ambari hosts page, I can see that the space consumption for one of the nodes is very high. First of all I do not understand the cause of this. I would like to understand how easy is it to evenly distribute the data across all the nodes so that all nodes consumes equal amount of dfs data.
... View more
Labels:
- Labels:
-
Apache Hadoop
05-04-2016
06:00 AM
I found the issue and fixed. The command "hostname -f" on Machine1 was giving "hostname: Unknown host" error. I fixed that error and then when I added the host again, it was successful.
... View more