About phoncy_joseph

phoncy_joseph · ‎12-22-2017

Thank you

phoncy_joseph · ‎12-19-2017

Thanks for the response. I wanted to know if the memory assignment could be done, without providing these while submitting jobs.

phoncy_joseph · ‎12-18-2017

phoncy_joseph · ‎06-16-2017

I have a multi-tenanted HDP2.3 cluster. It has been configured with an S3 end-point in custom hdfs-site.xml. Is it possible to add another S3 end-point for another tenant? If so, what should be the property name? Thanks in Advance.

phoncy_joseph · ‎06-15-2016

I have a HDP 2.0 cluster where I'm executing a mapreduce program which takes Hive(0.14) table as input. There are a large number of small files for the Hive table and hence large number of mapper containers are being requested. Please let me know if there is a way to combine small files before being input to mapreduce job?

phoncy_joseph · ‎04-14-2016

Thanks for the suggestions.Two of the data nodes in the cluster had to be replaced, as it didn't have enough disk space. I have also set the below in hdfs configuration and the jobs started executing fine even though I have noticed "Premature end of fail" error in data node logs. dfs.client.block.write.replace-datanode-on-failure.policy=ALWAYS

phoncy_joseph · ‎04-12-2016

I'm trying to execute a MapReduce streaming job in a 10 node Hadoop cluster(HDP2.2). There are 5 datanodes in the cluster. When the reduce phase reaches almost 100% completion, I'm getting the below error in client logs: Error: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[x.x.x.x:50010], original=[x.x.x.x:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration The data node on which the jobs were executing contained below logs: INFO datanode.DataNode (BlockReceiver.java:run(1222)) - PacketResponder: BP-203711345-10.254.65.246-1444744156994:blk_1077645089_3914844, type=HAS_DOWNSTREAM_IN_PIPELINE java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2203) java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) 2016-04-10 08:12:14,477 WARN datanode.DataNode (BlockReceiver.java:run(1256)) - IOException in BlockReceiver.run(): java.io.IOException: Connection reset by peer 016-04-10 08:13:22,431 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(816)) - Exception for BP-203711345-x.x.x.x -1444744156994:blk_1077645082_3914836 java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/XX.XXX.XX.XX:50010 remote=/XX.XXX.XX.XXX:57649] The NameNode logs contained the below warning: WARN blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(383)) - Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy I had tried setting the below parameters in hdfs-site.xml dfs.datanode.handler.count =10 dfs.client.file-block-storage-locations.num-threads = 10 dfs.datanode.socket.write.timeout=20000 But still the error persists. Kindly suggest a solution. Thanks

phoncy_joseph · ‎03-22-2016

I have upgraded to Hadoop 2.7 now. I have done configurations changes for s3a and the queries are executing successfully. Thank you.

phoncy_joseph · ‎02-26-2016

Though have not yet upgraded to Hadoop 2.7, I made the configuration changes for s3a as per the documentation. On executing Hive create query, I got the below exception: FAILED: AmazonClientException Unable to execute HTTP request: Connect to hive-bucket.s3.amazonaws.com:443 timed out

phoncy_joseph · ‎02-22-2016

@Artem Ervits Copied jets3t.properties to all data nodes. Currently I'm getting below exception: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><Resource>/hive-bucket</Resource><RequestId></RequestId></Error> at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:470)

Online	Offline
Last Visited	‎02-02-2018 12:42 PM

Member Since	‎02-10-2016 01:29 PM
Last Visited	‎02-02-2018 12:42 PM
Posts	36
Kudos received	14

Cloudera Community

Re: Is it possible to dynamically set map and redu...

Re: Is it possible to dynamically set map and redu...

Is it possible to dynamically set map and reduce m...

How to configure multiple S3 end-points in a multi...

How to combine Hive table files for input to mapre...

Re: Getting " IOException: Failed to replace a bad...

Getting " IOException: Failed to replace a bad dat...

Re: Hive aggregate query failing for External tabl...

Re: Hive aggregate query failing for External tabl...

Re: Hive aggregate query failing for External tabl...