Member since
02-10-2016
36
Posts
14
Kudos Received
0
Solutions
12-22-2017
11:15 AM
Thank you
... View more
12-19-2017
01:31 PM
Thanks for the response. I wanted to know if the memory assignment could be done, without providing these while submitting jobs.
... View more
06-29-2017
11:34 AM
The DN process is running if do a check in the machine using ps -ef. But Ambari incorrectly shows the DataNode process as stopped.
... View more
06-27-2017
01:13 PM
In Ambari UI, the data node is in stopped state few seconds after starting it. As mentioned in the earlier reply, with hdfs fsck command the newly added nodes are also listed, though Ambari doesnt recognize the addition.
... View more
06-27-2017
10:49 AM
I'm trying to add 2 new datanodes to an existing HDP2.3 cluster through Ambari. The existing 36 data nodes have configuration 10CPU's, 56GB RAM and 8.5 TB disk size. The data node heap size is set as 1 GB.The 2 new ones to be added have configuration of 6 CPU's, 25GB RAM and 1 TB disk size. The HDFS disk usage is 7%. I'm able to start the NodeManager and AmbariMetrics service in the new nodes, but the datanode service goes down immediately after starting. Below are the logs from hadoop-hdfs-datanode-worker1.log 2017-06-27 12:07:30,047 INFO datanode.DataNode (BPServiceActor.java:blockReport(488)) - Successfully sent block report 0x2235b2b47bf3a, containing 1 storage report(s), of which we sent 1. The reports had 19549 total blocks and used 1 RPC(s). This took 10 msec to generate and 695 msecs for RPC and NN processing. Got back no commands.
2017-06-27 12:07:36,003 ERROR datanode.DataNode (DataXceiver.java:run(278)) - worker1.bigdata.net.net:50010:DataXceiver error processing unknown operation src: /10.255.yy.yy:49656 dst: /10.255.xx.xx:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
at java.lang.Thread.run(Thread.java:745)
2017-06-27 12:08:00,180 INFO datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-1320493910-10.255.zz.zz-1479412973603:blk_1100238956_26515824 src: /10.254.yy.yy:45293 dest: /10.255.xx.xx:50010
2017-06-27 12:08:00,326 INFO DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1432)) - src: /10.254.yy.yy:45293, dest: /10.255.xx.xx:50010, bytes: 26872748, op: HDFS_WRITE, cliID: DFSClient_attempt_1498498030455_0521_r_000001_0_-908535141_1, offset: 0, srvID: f148bbe2-8f2a-489b-b03d-c8322aecd43e, blockid: BP-1320493910-10.255.zz.zz-1479412973603:blk_1100238956_26515824, duration: 122445075
2017-06-27 12:08:00,326 INFO datanode.DataNode (BlockReceiver.java:run(1405)) - PacketResponder: BP-1320493910-10.255.12.202-1479412973603:blk_1100238956_26515824, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
06-16-2017
10:58 AM
I have a multi-tenanted HDP2.3 cluster. It has been configured with an S3 end-point in custom hdfs-site.xml. Is it possible to add another S3 end-point for another tenant? If so, what should be the property name?
Thanks in Advance.
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
05-18-2017
05:48 AM
@Mike Riggs Thanks for the response. I'm looking for the availability of a backup and recovery option for Hive tables without much scripting work to be done. There is an option to mirror HDFS data to S3 from Falcon Web UI, is something similar available for Hive table?
... View more
05-16-2017
09:58 AM
I have an HDP2.3 cluster. I need to setup backup and restore of hive tables in S3. Could you please suggest the best way to do the same? Can Falcon Web UI be used? Can I schedule the replication activity from Falcon?
... View more
Labels:
- Labels:
-
Apache Falcon
-
Apache Hive
06-15-2016
07:08 AM
1 Kudo
I have a HDP 2.0 cluster where I'm executing a mapreduce program which takes Hive(0.14) table as input. There are a large number of small files for the Hive table and hence large number of mapper containers are being requested. Please let me know if there is a way to combine small files before being input to mapreduce job?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache YARN
04-14-2016
10:17 AM
Thanks for the suggestions.Two of the data nodes in the cluster had to be replaced, as it didn't have enough disk space. I have also set the below in hdfs configuration and the jobs started executing fine even though I have noticed "Premature end of fail" error in data node logs. dfs.client.block.write.replace-datanode-on-failure.policy=ALWAYS
... View more
04-12-2016
12:48 PM
I'm trying to execute a MapReduce streaming job in a 10 node Hadoop cluster(HDP2.2). There are 5 datanodes in the cluster. When the reduce phase reaches almost 100% completion, I'm getting the below error in client logs: Error: java.io.IOException: Failed to replace a bad
datanode on the existing pipeline due to no more good datanodes being available
to try. (Nodes: current=[x.x.x.x:50010], original=[x.x.x.x:50010]).
The current failed datanode replacement policy is DEFAULT, and a client may
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy'
in its configuration The data node on which the jobs were executing contained below logs: INFO datanode.DataNode (BlockReceiver.java:run(1222)) - PacketResponder:
BP-203711345-10.254.65.246-1444744156994:blk_1077645089_3914844,
type=HAS_DOWNSTREAM_IN_PIPELINE
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2203)
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
2016-04-10 08:12:14,477 WARN datanode.DataNode
(BlockReceiver.java:run(1256)) - IOException in BlockReceiver.run():
java.io.IOException: Connection reset by peer
016-04-10 08:13:22,431 INFO datanode.DataNode
(BlockReceiver.java:receiveBlock(816)) - Exception for
BP-203711345-x.x.x.x -1444744156994:blk_1077645082_3914836
java.net.SocketTimeoutException: 60000 millis timeout while
waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/XX.XXX.XX.XX:50010 remote=/XX.XXX.XX.XXX:57649]
The NameNode logs contained the below warning: WARN blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(383)) - Failed to place enough
replicas, still in need of 1 to reach 2 (unavailableStorages=[DISK],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more
information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy I had tried setting the below parameters in hdfs-site.xml dfs.datanode.handler.count =10
dfs.client.file-block-storage-locations.num-threads = 10
dfs.datanode.socket.write.timeout=20000
But still the error persists. Kindly suggest a solution. Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop
03-22-2016
06:00 AM
I have upgraded to Hadoop 2.7 now. I have done configurations changes for s3a and the queries are executing successfully. Thank you.
... View more
02-26-2016
06:49 AM
1 Kudo
Though have not yet upgraded to Hadoop 2.7, I made the configuration changes for s3a as per the documentation. On executing Hive create query, I got the below exception: FAILED: AmazonClientException Unable to execute HTTP request: Connect to hive-bucket.s3.amazonaws.com:443 timed out
... View more
02-22-2016
10:09 AM
1 Kudo
@Artem Ervits Copied jets3t.properties to all data nodes. Currently I'm getting below exception: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><Resource>/hive-bucket</Resource><RequestId></RequestId></Error>
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:470)
... View more
02-17-2016
04:34 PM
1 Kudo
I'm using Hadoop 2.6.
... View more
02-17-2016
01:08 PM
1 Kudo
Thanks for the response. Yes, I'm able to access S3 through simple Hive queries.From the logs, I could see that the map-reduce job is trying to connect to "hive-bucket.s3.amazonaws.com:443", which doesn't exist. I need to connect to a custom S3 endpoint, which is "s3-customlocation.net". I have gone through the hdfs-site configuration,but I couldnt find any parameter to set custom endpoint.
... View more
02-17-2016
12:41 PM
1 Kudo
I'm using a custom S3 for Eucalyptus, not the AWS one. I have been trying to resolve this since past few weeks.
... View more
02-17-2016
12:35 PM
1 Kudo
I have a Hadoop cluster(HDP 2.2) set-up in Eucalyptus environment. I have created an external table in Hive(0.14), using the below query: CREATE EXTERNAL TABLE tempbatting (col_value STRING) LOCATION 's3n://hive-bucket/';
I'm using a custom S3 location, so I have set jets3t property in Hive configuration directory as below: set s3service.https-only = true;
set s3service.s3-endpoint = s3-customlocation.net;
set s3service.s3-endpoint-http-port = 80;
set s3service.s3-endpoint-https-port = 443;
set s3service.disable-dns-buckets = true;
set s3service.enable-storage-classes = false;
Though I'm able to execute simple select queries on the table successfully, the aggregate queries are failing. Below are the logs: Error: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to hive-bucket.s3.amazonaws.com:443 timed out
at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:416)
From the logs, the map-reduce job seems to access Amazon S3. I have tried using the set command for Hive(set fs.s3n.endpoint=s3-customlocation.net), but it didn't seem to work. Is there a way to specify custom end-point?
... View more
Labels:
- Labels:
-
Apache Hive
02-11-2016
12:26 PM
1 Kudo
Noted. Thank you.
... View more
02-11-2016
11:53 AM
1 Kudo
Setting no proxy for the fqdn in /etc/profile solved the issue. Thanks.
... View more
02-11-2016
11:24 AM
1 Kudo
The JDK version is 1.6.0
... View more
02-11-2016
11:11 AM
1 Kudo
Yes, it says connected.
... View more
02-11-2016
11:03 AM
1 Kudo
Thanks for the response. I have set no proxy for local addresses in the host machine using export no_proxy from the command line. Below is the command that fails while starting the service: resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://fqdn:50070/webhdfs/v1/app-logs?op=GETFILESTATUS&user.name=hdfs'' returned status_code=502.
... View more
02-11-2016
06:20 AM
2 Kudos
I'm trying to set up a single-node Hadoop cluster (HDP2.3 version) using
Ambari 2.2 in Eucalyptus environment. I have used Private IP address as recommended while
registering the VM's (RHEL 6.5) with Ambari. When I tried to bring up Hadoop services (MR2, Hive,
Yarn) , I'm getting the below error: <LI id=L_11001_11>Error Code 11001: Host not found
<LI id=L_11001_12>Background: This error indicates that
the gateway could not find the IP address of the website you are trying to
access. This is usually due to a DNS-related error.
<LI id=L_11001_13>Date: 2/9/2016 5:57:11 AM [GMT]
<LI id=L_11001_14>Server: FIESPRX004.xxx.net
<LI id=L_11001_15>Source: DNS error
</UL>
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)