Member since
09-30-2014
31
Posts
13
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2150 | 10-25-2016 07:02 AM | |
596 | 10-17-2016 11:34 AM | |
1322 | 01-07-2016 12:46 PM |
07-31-2017
08:43 AM
I'm also seeing this with Ambari 2.5.1.0 and HDP-2.4.3.0.
... View more
06-09-2017
11:01 AM
@Vani This solution works but the side-effect now is that users are allowed to override to which queue their jobs are assigned. Do you agree? Do you in that case know any way around this?
... View more
10-25-2016
07:03 AM
Thanks for your reply Anu. We didn't get around to try your suggestion so I can't accept your answer unfortunately, even though it might be valid.
... View more
10-25-2016
07:02 AM
1 Kudo
We got it to work by lowering the "dfs.datanode.balance.max.concurrent.moves" from 500 to 20, which is more in line with the guide at https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html. It's possible that we could also have gotten it to work by upping the dispatcher threads setting suggested by aengineer below but we didn't try that once we got this to work.
... View more
10-20-2016
07:36 AM
Hello, I'm trying to rebalance hdfs in our HDP 2.4.3 cluster (which is running namenode HA) and I am having a problem that the balancer only does actual work for a short time and then just sits and idles. If I kill the process and restart it, it will do some balancing immediately and then go into idle again. I have repeated this many times now. I enabled debug logging for the balancer but I can't see anything in there that explains why it just stops balancing. Here is the beginning of the log (since it shows some parameters that might be relevant): 16/10/19 16:34:10 INFO balancer.Balancer: namenodes = [hdfs://PROD1]
16/10/19 16:34:10 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
16/10/19 16:34:10 INFO balancer.Balancer: included nodes = []
16/10/19 16:34:10 INFO balancer.Balancer: excluded nodes = []
16/10/19 16:34:10 INFO balancer.Balancer: source nodes = []
16/10/19 16:34:11 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
16/10/19 16:34:11 INFO block.BlockTokenSecretManager: Setting block keys
16/10/19 16:34:11 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 500 (default=5)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
16/10/19 16:34:11 INFO block.BlockTokenSecretManager: Setting block keys
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
16/10/19 16:34:11 INFO net.NetworkTopology: Adding a new node: /default-rack/X.X.X.X:1019
....
16/10/19 16:34:11 INFO balancer.Balancer: Need to move 11.83 TB to make the cluster balanced.
...
16/10/19 16:34:11 INFO balancer.Balancer: Will move 120 GB in this iteration
16/10/19 16:34:11 INFO balancer.Dispatcher: Start moving blk_1661084121_587506756 with size=72776669 from X.X.X.X:1019:DISK to X.X.X.X:1019:DISK through X.X.X.X:1019
...
16/10/19 16:34:12 WARN balancer.Dispatcher: No mover threads available: skip moving blk_1457593679_384005217 with size=104909643 from X.X.X.X:1019:DISK to X.X.X.X:1019:DISK through X.X.X.X:1019
...
Here is the part of the log just after the last block has successfully been moved: ...
16/10/19 16:36:00 INFO balancer.Dispatcher: Successfully moved blk_1693419961_619844350 with size=134217728 from X.X.X.X:1019:DISK to X.X.X.X:1019:DISK through X.X.X.X:101916/10/19 16:36:00 INFO balancer.Dispatcher: Successfully moved blk_1693366190_619790579 with size=134217728 from X.X.X.X:1019:DISK to X.X.X.X:1019:DISK through X.X.X.X:1019
16/10/19 19:04:11 INFO block.BlockTokenSecretManager: Setting block keys
16/10/19 21:34:11 INFO block.BlockTokenSecretManager: Setting block keys
16/10/20 00:04:11 INFO block.BlockTokenSecretManager: Setting block keys
... In the above log sections I'm not showing the debug output since that is pretty verbose and from what I can see the only things mentioned is a periodic reauthentication of the ipc.Client. I'm launching the balancer from command line using the following command: $ hdfs --loglevel DEBUG balancer -D dfs.datanode.balance.bandwidthPerSec=200000000 I have tried other values of the bandwidth setting but it doesn't change the behaviour. Can anyone see if I'm doing something wrong and point me towards a solution? Best Regards /Thomas
... View more
Labels:
- Labels:
-
Apache Hadoop
10-17-2016
11:34 AM
I just found that something like this was added somewhat recently: https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/AvailableSpaceBlockPlacementPolicy.java This seems to be what I was looking for.
... View more
10-17-2016
11:25 AM
Hello, I am wondering if there is an BlockPlacementPolicy that in addition to storing replicas safely on different racks as the default one does, also can consider how much disk space that is available on different nodes? In case where you have a cluster that consists of two sets of machines with a big difference in the amount of available disk space, the default policy will lead to the disks of the set with a smaller amount of disk space running out of disk space long before you actually reach your total HDFS capacity. Is there any such policy ready to be used? Best Regards Thomas
... View more
Labels:
- Labels:
-
Apache Hadoop
09-02-2016
01:23 PM
2 Kudos
Hello. I would like to monitor the actual memory usage of the yarn containers in our cluster. We are using defaults such as mapreduce.map.memory.mb=X; mapreduce.reduce.memory.mb=Y; But if I have understood this correctly, these values are only used to determine the maximum limit for processes running inside the containers. Is it possible to get metrics out from yarn about the actual memory usage of the process that ran in a container? It looks like something like this was implemented in https://issues.apache.org/jira/browse/YARN-2984 but I'm not sure how I can access that data. Can you give me any tips regarding this? Best Regards /Thomas Added: I can see what I'm looking for in the nodemanager logs so I guess those logs could be harvested and analyzed. Any other tips Example of nodemanager log: 2016-09-02 13:31:58,563 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 50811 for container-id container_e21_1472110676349_75100_01_006278: 668.7 MB of 2.5 GB physical memory used; 2.9 GB of 5.3 GB virtual memory used
... View more
Labels:
- Labels:
-
Apache YARN
08-11-2016
12:52 PM
Hi @Arpit Agarwal,
That is my understanding as well. Thanks for a short and to the point answer.
... View more
07-04-2016
06:43 AM
Hi Artem. I agree that /tmp is just plain wrong for this. I think Ambari chose these directories for us during cluster installation and we haven't noticed. We will remove /tmp from this configuration.
... View more
06-28-2016
08:59 AM
Hello, I am seeing an issue with fsimage files not being cleaned away from one of the "dfs.namenode.name.dir" directories. The setting of "dfs.namenode.name.dir" in our cluster is "/tmp/hadoop/hdfs/namenode,/var/hadoop/hdfs/namenode,/mnt/data/hadoop/hdfs/namenode". This fills up the /tmp partition on the host hosting the namenode. Listing the contents of these folders show that the /tmp folder contains a lot more fsimage files than the other two folders: [me@node ~]$ ls -la /tmp/hadoop/hdfs/namenode/current | grep fsimage | wc -l
94
[me@node ~]$ ls -la /var/hadoop/hdfs/namenode/current | grep fsimage | wc -l
9
[me@node ~]$ ls -la /mnt/data/hadoop/hdfs/namenode/current | grep fsimage | wc -l
9 Looking at the namenode logs confirms that the purging seems to only happen for /var and /mnt: [me@node ~]$ grep NNStorageRetentionManager /var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log* | grep fsimage/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.7:2016-06-27 19:50:25,462 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/var/hadoop/hdfs/namenode/current/fsimage_0000000002281385227, cpktTxId=0000000002281385227)/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.7:2016-06-27 19:50:25,640 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/mnt/data/hadoop/hdfs/namenode/current/fsimage_0000000002281385227, cpktTxId=0000000002281385227)/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.8:2016-06-27 18:38:58,921 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/var/hadoop/hdfs/namenode/current/fsimage_0000000002280372072, cpktTxId=0000000002280372072)/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.8:2016-06-27 18:38:59,102 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/mnt/data/hadoop/hdfs/namenode/current/fsimage_0000000002280372072, cpktTxId=0000000002280372072)/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.9:2016-06-27 17:34:31,800 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/var/hadoop/hdfs/namenode/current/fsimage_0000000002279353884, cpktTxId=0000000002279353884)/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.9:2016-06-27 17:34:31,992 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/mnt/data/hadoop/hdfs/namenode/current/fsimage_0000000002279353884, cpktTxId=0000000002279353884) Can anyone explain why only two directories are purged? I should mention that we are running namenode HA. Best Regards /Thomas
... View more
Labels:
- Labels:
-
Apache Hadoop
06-16-2016
07:03 AM
Hi Ravi, I'm not sure I understand what you mean. Is there a tool that could detect our type of disk error and automatically remount the drive in read-only mode? Or are you talking about something like the fstab mount options "errors=remount -ro"? The fstab options only means that if errors are encountered when the os tries to mount the drive for read-write mode, it should try to mount it as read-only. But this does not apply to our situation since our machine is not just starting up, its been up and running for a long while and then disk errors start to occur. If you mean some other tool or configuration that can detect and remount while a system is running, please share a link. Best Regards
... View more
05-23-2016
06:51 AM
Hi Predrag, See my comment to Sagar above, our value of that setting is the default, i.e. zero.
... View more
05-23-2016
06:26 AM
Hi Ashnee. See my comment to Sagar above.
... View more
05-19-2016
12:37 PM
Yes, I agree that is exactly how it seems. There is no problem running ls directly on /mnt/data21. [thomas.larsson@datavault-prod-data8 ~]$ ls -la /mnt/data21
total 28
drwxr-xr-x. 4 root root 4096 9 nov 2015 .
drwxr-xr-x. 26 root root 4096 9 nov 2015 ..
drwxr-xr-x. 4 root root 4096 28 jan 12.32 hadoop
drwx------. 2 root root 16384 6 nov 2015 lost+found
... View more
05-16-2016
12:12 PM
Hi Sagar, I think you misunderstand my question. My question was NOT "In what scenarios does a namenode consider a datanode dead?". It's more a question of why our datanode does not shut itself down when one of its disk is failing. I assumed that this what should happen since our setting of dfs.datanode.failed.volumes.tolerated is the default, i.e. zero.
... View more
05-16-2016
12:06 PM
A follow-up. I forgot to mention our hadoop version: HDP 2.2.6.0, i.e. hadoop 2.6. I looked into the hadoop code and found the org.apache.hadoop.util.DiskChecker class which seems to be used by a monitoring thread to monitor the health of a datanodes disks. In order to try to verify that the datanode actually does not detect this error, I created a very simple Main class that just calls the DiskChecker.checkDirs method. Main.java: import java.io.File;
public class Main {
public static void main(String[] args) throws Exception {
org.apache.hadoop.util.DiskChecker.checkDirs(new File(args[0]));
}
} If I run this class on one of our problematic directories, nothing is detected: [thomas.larsson@datavault-prod-data8 ~]$ /usr/jdk64/jdk1.7.0_67/bin/javac Main.java -cp /usr/hdp/2.2.6.0-2800/hadoop/hadoop-common.jar[thomas.larsson@datavault-prod-data8 ~]$ sudo java -cp .:/usr/hdp/2.2.6.0-2800/hadoop/hadoop-common.jar:/usr/hdp/2.2.6.0-2800/hadoop/lib/* Main /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
However, trying to list the files in this subdir looks like this: [thomas.larsson@datavault-prod-data8 ~]$ sudo ls -la /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir162: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir163: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir155: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir165: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir166: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir164: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir159: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir154: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir153: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir167: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir161: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir157: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir152: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir160: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir156: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir158: Input/output error
total 984
drwxr-xr-x. 258 hdfs hadoop 12288 13 dec 12.52 .
drwxr-xr-x. 258 hdfs hadoop 12288 22 nov 14.50 ..
drwxr-xr-x. 2 hdfs hadoop 4096 12 maj 18.12 subdir0
drwxr-xr-x. 2 hdfs hadoop 4096 12 maj 18.02 subdir1
...
drwxr-xr-x. 2 hdfs hadoop 4096 30 apr 19.21 subdir151
d?????????? ? ? ? ? ? subdir152
d?????????? ? ? ? ? ? subdir153
d?????????? ? ? ? ? ? subdir154
d?????????? ? ? ? ? ? subdir155
d?????????? ? ? ? ? ? subdir156
d?????????? ? ? ? ? ? subdir157
d?????????? ? ? ? ? ? subdir158
d?????????? ? ? ? ? ? subdir159
drwxr-xr-x. 2 hdfs hadoop 4096 12 maj 18.12 subdir16
d?????????? ? ? ? ? ? subdir160
d?????????? ? ? ? ? ? subdir161
d?????????? ? ? ? ? ? subdir162
d?????????? ? ? ? ? ? subdir163
d?????????? ? ? ? ? ? subdir164
d?????????? ? ? ? ? ? subdir165
d?????????? ? ? ? ? ? subdir166
d?????????? ? ? ? ? ? subdir167
drwxr-xr-x. 2 hdfs hadoop 4096 12 maj 18.30 subdir168
drwxr-xr-x. 2 hdfs hadoop 4096 12 maj 18.28 subdir169
...
So, it seems like this problem is undetectable by a datanode.
... View more
05-16-2016
09:14 AM
2 Kudos
Hi. We have encountered issues on our cluster that seems to be caused by bad disks. When we run "dmesg" on the datanode host we see warnings such as: This should not happen!! Data will be lost
sd 1:0:20:0: [sdv] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 1:0:20:0: [sdv] Sense Key : Medium Error [current]
Info fld=0x2f800808
sd 1:0:20:0: [sdv] Add. Sense: Unrecovered read error
sd 1:0:20:0: [sdv] CDB: Read(10): 28 00 2f 80 08 08 00 00 08 00
end_request: critical medium error, dev sdv, sector 796919816
EXT4-fs (sdv1): delayed block allocation failed for inode 70660422 at logical offset 2049 with max blocks 2048 with error -5
In the datanode logs we see warnings such as: 2016-05-16 09:41:42,694 WARN util.Shell (DU.java:run(126)) - Could not get disk usage information
ExitCodeException exitCode=1: du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir162': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir163': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir155': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir165': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir166': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir164': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir159': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir154': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir153': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir167': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir161': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir157': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir152': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir160': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir156': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir158': Input/output error
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.fs.DU.run(DU.java:190)
at org.apache.hadoop.fs.DU$DURefreshThread.run(DU.java:119)
at java.lang.Thread.run(Thread.java:745) and : 2016-05-16 09:31:14,494 ERROR datanode.DataNode (DataXceiver.java:run(253)) - datavault-prod-data8.internal.machines:1019:DataXceiver error processing READ_BLOCK operation src: /x.x.x.x:55220 dst: /x.x.x7.x:1019
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1356445971-x.x.x.x-1430142563027:blk_1367398616_293808003
at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:431)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:229)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
at java.lang.Thread.run(Thread.java:745)
These errors/warnings do not however, seem to be enough for the datanode to consider a volume as "failed" and shut itself down. Some consequences that we have seen when this happens is that it's impossible to scan a hbase region that is served by a regionserver on the same host as the datanode, and also that mapreduce jobs get stuck accessing the host. This brings me to my question: What is the requirement for a datanode to consider a volume as failed? Best Regards /Thomas
... View more
Labels:
- Labels:
-
Apache Hadoop
02-22-2016
07:37 AM
@Wendy Foslien Perhaps you are having the same problem I had, see here: How to connect Kerberized Hive via ODBC and avoid the “No credentials cache found” error
... View more
01-07-2016
12:46 PM
1 Kudo
I found the source code, here: https://github.com/hortonworks/hive-release/releases
... View more
01-07-2016
10:10 AM
1 Kudo
Hello, I'm running HDP-2.2.6.0-2800 and am trying to remote debug the HiveServer2 process. My problem is that I don't get the line numbers to match when setting breakpoints in the code and stepping through the code. This usually happens when the source code on my local machine from which I am running the debugger does not match the compiled code running on the server I'm debugging. Has anyone got this to work? In that case how? Here are my instructions to reproduce this problem using the HDP-2.2.4 sandbox (there is no 2.2.6 sandbox afaik): ====================================== 1. Create a new HDP-2.2.4 sandbox instance.<br/> 2. Add the following snippet to Advanced hive-env -> hive-env template <br/> # Enable remote debugging of hiveserver2.
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_OPTS="$HADOOP_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005"
fi 3. Modify the /usr/hdp/2.2.4.2-2/hive/bin/hive.distro file by replacing the following section <br/> # Make sure we're using a compatible version of Hadoop
if [ "x$HADOOP_VERSION" == "x" ]; then
HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
fi With this: (When you start the hiveserver with the agent flags, it
print an additional line to stdout which confuses this awk script) # Make sure we're using a compatible version of Hadoop
if [ "$SERVICE" == 'hiveserver2' ]; then
if [ "x$HADOOP_VERSION" == "x" ]; then
HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 2) {print $2;}}');
fi
else
if [ "x$HADOOP_VERSION" == "x" ]; then
HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
fi
fi
4. Clone the hive git repo and change to the branch-0.14 branch.<br/> 5. Create a remote debug connection to the hiveserver2 java process<br/> (i'm using IntelliJ IDEA to setup the project) 6. Set a breakpoint in org.apache.hive.service.cli.session.SessionManager, line 268, i.e. if (withImpersonation) {
HiveSessionImplwithUGI sessionWithUGI = new HiveSessionImplwithUGI(protocol, username, password,hiveConf, ipAddress, delegationToken);
session = HiveSessionProxy.getProxy(sessionWithUGI, sessionWithUGI.getSessionUgi());
sessionWithUGI.setProxySession(session);
} else {
session = new HiveSessionImpl(protocol, username, password, hiveConf, ipAddress);
}
session.setSessionManager(this);
session.setOperationManager(operationManager); // <--- Set breakpoint here for example
try {
session.initialize(sessionConf);
if (isOperationLogEnabled) {
session.setOperationLogSessionDir(operationLogRootDir);
}
session.open();
} catch (Exception e) {
throw new HiveSQLException("Failed to open new session", e);
}
7. Start a new hive session, I'm using beeline <br/> 8. See that the hiveserver2 execution is halted at the breakpoint. <br/> 9. Try to "Step into" the session.setOperationManager method, and you actually end up in org.apache.hive.service.cli.session.HiveSessionImpl.getSessionHandle() An obvious line mismatch here as you can see. Perhaps I am missing something here. Grateful for any tips. /Thomas
... View more
Labels:
- Labels:
-
Apache Hive