Member since
04-20-2016
86
Posts
27
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2891 | 03-13-2017 04:06 AM | |
4054 | 03-09-2017 01:55 PM | |
1640 | 01-05-2017 02:13 PM | |
5926 | 12-29-2016 05:43 PM | |
4977 | 12-28-2016 11:03 PM |
10-17-2018
12:11 PM
1 Kudo
Great article !!!
... View more
10-17-2018
12:11 PM
@PJ Since you have ranger enabled, its possible that your permission is denied at Ranger end. I would definitely check the Ranger Audit logs for any events for the users and see if we are hitting the permission denied in there. Also I would add a ranger hdfs policy to allow user user1 write access to /user/user1/sparkeventlogs once I validate it was Ranger who was blocking the permissions.
... View more
10-17-2018
12:04 PM
@Dukool SHarma
Safe mode is a NameNode state in which the node doesn’t accept any changes to the HDFS namespace, meaning HDFS will be in a read-only state. Safe mode is entered automatically at NameNode startup, and the NameNode leaves safe mode automatically when the configured minimum percentage of blocks satisfies the minimum replication condition.
When you start up the NameNode, it doesn’t start replicating data to the DataNodes right away. The NameNode first automatically enters a special read-only state of operation called safe mode. In this mode, the NameNode doesn’t honor any requests to make changes to its namespace. Thus, it refrains from replicating, or even deleting, any data blocks until it leaves the safe mode.
The DataNodes continuously send two things to the NameNode—a heartbeat indicating they’re alive and well and a block report listing
all data blocks being stored on a DataNode. Hadoop considers a data block “safely” replicated once the NameNode receives enough block reports from the DataNodes indicating they have a minimum number of replicas of that block. Hadoop makes the NameNode wait for the DataNodes to report blocks so it doesn’t start replicating data prematurely by attempting to replicate data even when the correct
number of replicas exists on DataNodes that haven’t yet reported their block information.
When a preconfigured percentage of blocks are reported as safely replicated, the NameNode leaves the safe mode and starts serving block information to clients. It’ll also start replicating all blocks that the DataNodes have reported as being under replicated.
Use the dfsadmin –safemode command to manage safe mode operations for the NameNode. You can check the current safe mode status with the -safemode get command: $ hdfs dfsadmin -safemode get
Safe mode is OFF in hadoop01.localhost/10.192.2.21:8020
Safe mode is OFF in hadoop02.localhost/10.192.2.22:8020
$ You can place the NameNode in safe mode with the -safemode enter command: $ hdfs dfsadmin -safemode enter
Safe mode is ON in hadoop01.localhost/10.192.2.21:8020
Safe mode is ON in hadoop02.localhost/10.192.2.22:8020
$ Finally, you can take the NameNode out of safemode with the –safemode leave command: $ hdfs dfsadmin -safemode leave
Safe mode is OFF in hadoop01.localhost/10.192.2.21:8020
Safe mode is OFF in hadoop02.localhost/10.192.2.22:8020
$
... View more
08-10-2017
12:49 PM
Are we closing the spark context here ? Usually a ".close()" call is done, the JVM should be able to clean up those directories .
... View more
04-04-2017
02:47 PM
@Nikhil Pawar One thing you could do here is to increase the "spark.executor.heartbeatInterval" which is default set to 10secs to something higher and test it out . Also something to look at would be to review the executor logs to see if you have any OOM / GC issues when the executors are running on the jobs that you kick off from spark.
... View more
03-14-2017
01:40 PM
@Jeff Watson Can you give us the command for the spark-submit and also attach the console o/p in here for us to check ?
... View more
03-13-2017
06:34 PM
Can you check what is the "io.file.buffer.size" is set to here? You may need to tweak it to set this to below what the "MAX_PACKET_SIZE" is set to . Referencing a great blog post here (http://johnjianfang.blogspot.com/2014/10/hadoop-two-file-buffer-size.html) For example, take a look at the BlockSender in HDFS.class BlockSender implements java.io.Closeable {
/**
* Minimum buffer used while sending data to clients. Used only if
* transferTo() is enabled. 64KB is not that large. It could be larger, but
* not sure if there will be much more improvement.
*/
private static final int MIN_BUFFER_WITH_TRANSFERTO = 64*1024;
private static final int TRANSFERTO_BUFFER_SIZE = Math.max(
HdfsConstants.IO_FILE_BUFFER_SIZE, MIN_BUFFER_WITH_TRANSFERTO);
}
The BlockSender uses "io.file.buffer.size" as the transfer buffer size. If this parameter is not defined, the default buffer size 64KB is used. The above explains why most hadoop IOs were either 4K or 64K chunks in my friend's cluster since he did not tune the cluster. To achieve a better performance, we should tune "io.file.buffer.size" to a much bigger value, for example, up to 16MB. The upper limit is set by the MAX_PACKET_SIZEin org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.
... View more
03-13-2017
06:19 PM
Try running it in the debug mode and then provide the o/p here. For hivecli you could do as below: hive --hiveconf hive.root.logger=DEBUG,console Once done , re-run the query and see where it fails. That should give you better insight on the failure here
... View more
03-13-2017
06:16 PM
@Saikiran Parepally Please accept the answer if that has helped to resolve the issue
... View more
03-13-2017
04:12 AM
@Saikiran Parepally Did that fix the issue here?
... View more