Member since
04-11-2018
55
Posts
1
Kudos Received
0
Solutions
08-22-2018
09:12 AM
I did the setup on a Centos7 host. Get this error when I try to run command. Using AWS Elastic IP for the single node cluster, so public IP is in hosts file (edge node). root@edgenode ~]# sudo -u hdfs hadoop fs -ls /ds-datalake -ls: java.net.UnknownHostException: ip-172-31-26-58.ec2.internal Usage: hadoop fs [generic options] -ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...] Core-site.xml has this set (private ip): <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://ip-172-31-26-58.ec2.internal:8020</value> </property> @bgooley
... View more
08-22-2018
08:29 AM
Question on this. Does the edge node be configured to be able to do passwordless ssh to the namenode?
... View more
08-22-2018
07:03 AM
I found this which is somewhat relevant, I think - https://rainerpeter.wordpress.com/2014/02/12/connect-to-hdfs-running-in-ec2-using-public-ip-addresses/ But, my problem is, I am trying to connect from a remote non-hadoop edge node machine, so there is no hadoop config files here.
... View more
08-22-2018
06:41 AM
Another thing came to mind. I am using Elastic IP for the public IP address which is what I put in the code. It does resolve to the private IP as I can see in the error. requests.exceptions.ConnectionError: HTTPConnectionPool(host='ip-172-31-26-58.ec2.internal', port=50075): Max retries exceeded with url: /webhdfs/v1/tmp/sample.txt?op=OPEN&user.name=hdfs&namenoderpcaddress=ip-172-31-26-58.ec2.internal:8020&offset=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000000007693828>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',)) Security group is also configured to allow entry for these ports from my work IP address range.
... View more
08-22-2018
06:31 AM
It is a single node cluster, NN is the DN. Also, why is it able to list the directoty contents but cannot seem to read/write from it?
... View more
08-21-2018
08:46 AM
CDH 5.15 Single Node custer installed using CM on CentOS 7.x on AWS Ec2 instance. 8 CPU, 64 GB RAM.
Verfied WebHDFS is running and I am connecting from a remote machine (non-hadoop client), after being connected to the environment using SSH key.
using PyWebHDFSClient library to list, read and write files off HDFS.
The following code works -
hdfs = PyWebHdfsClient(host='IP_ADDR', port='50070', user_name='hdfs', timeout=1) # your Namenode IP & username here my_dir = 'ds-datalake/misc' pprint(hdfs.list_dir(my_dir))
{u'FileStatuses': {u'FileStatus': [{u'accessTime': 1534856157369L, u'blockSize': 134217728, u'childrenNum': 0, u'fileId': 25173, u'group': u'supergroup', u'length': 28, u'modificationTime': 1534856157544L, u'owner': u'centos', u'pathSuffix': u'sample.txt', u'permission': u'644', u'replication': 3, u'storagePolicy': 0, u'type': u'FILE'}]}}
But, when I try to read/write at same location, using something like this:
my_file = 'ds-datalake/misc/sample.txt' print(hdfs.read_file(my_file))
I get the following error:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='HOST_NAME', port=50075): Max retries exceeded with url: /webhdfs/v1/ds-datalake/misc/sample.txt?op=OPEN&user.name=hdfs&namenoderpcaddress=HOST_NAME:8020&offset=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000000068F4828>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))
This is what the HDFS folder looks like:
hadoop fs -ls /ds-datalake/misc Found 1 items -rwxrwxrwx 3 centos supergroup 28 2018-08-21 12:55 /ds-datalake/misc/sample.txt
Can you please help me? I have two single node test clusters and this happens on both. HDFS Namenode UI comes up fine from the CM site and all services look healthy.
Thanks.
... View more
Labels:
- Labels:
-
Cloudera Manager
-
HDFS
05-07-2018
08:20 AM
I have been able to access Azure Data Lake Storage from my CDH 5.14 cluster. However, I wanted to know if there is anyway to set the Kudu master and tablet server wal and data folders directly to reside in ADLS? This does not seem to be the case if using the CM configuration page as entering something like "adls://......" results in an error, for those fields. If this is not possible, is there a way to back up Kudu data and wal dirs to ADLS or some other storage and bring it back if the cluster is being rebuilt? Thanks.
... View more
Labels:
- Labels:
-
Apache Kudu
05-01-2018
06:34 AM
sqoop import --connect 'jdbc:sqlserver://data-dev.dev.eso.local;database=KUDU_40M' \ --username xxxx --password xxxx --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \ --table dim_PatientEncounter --hive-table kudu_40m.dim_PatientEncounter --create-hive-table --hive-import --warehouse-dir /user/hive/warehouse/testdb -m 4
The table has 30 million+ rows.
Running the above command, always seems to hang at mapper 3 of 4. Nothing much in the logs besides:
2018-05-01 08:22:53,364 INFO [IPC Server handler 7 on 45249] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1525122694968_0001_m_000003_0 is : 0.0
Changes I made:
mapreduce.map.memory.mb = 5 GB
mapreduce.reduce.memory.mb = 5GB
yarn.nodemanager.resource.memory-mb = 20GB (NodeManager Default Group)
yarn.nodemanager.resource.memory-mb = 8GB (NodeManager Group 1)
Cluster MetricsApps Submitted Apps Pending Apps Running Apps Completed Containers Running Memory Used Memory Total Memory Reserved VCores Used VCores Total VCores Reserved
1
0
1
0
3
11 GB
68 GB
0 B
3
32
0
Cluster Nodes MetricsActive Nodes Decommissioning Nodes Decommissioned Nodes Lost Nodes Unhealthy Nodes Rebooted Nodes
4
0
0
0
0
0
User Metrics for dr.whoApps Submitted Apps Pending Apps Running Apps Completed Containers Running Containers Pending Containers Reserved Memory Used Memory Pending Memory Reserved VCores Used VCores Pending VCores Reserved
0
0
0
0
0
0
0
0 B
0 B
0 B
0
0
0
Show 20406080100 entries
Search:
IDUserNameApplication TypeQueueStartTimeFinishTimeStateFinalStatusRunning ContainersAllocated CPU VCoresAllocated Memory MBReserved CPU VCoresReserved Memory MBProgressTracking UI
application_1525122694968_0001
root
dim_PatientEncounter.jar
MAPREDUCE
root.users.root
Tue May 1 08:11:49 -0500 2018
N/A
RUNNING
UNDEFINED
3
3
11264
0
0
ApplicationMaster
Any help is appreciated? Also, this is in CDH 5.14, so Sqoop1 is being used by default, I believe. Where are the Sqoop logs written for this, I cannot seem to find the logs besides the Yarn logs.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
-
Apache YARN
-
MapReduce
04-12-2018
11:59 AM
Can you please direct me to some docs that say how to set encodings and how to declare column level compression? I am not able to find this. Thanks.
... View more
04-12-2018
07:27 AM
Looking for documentation. Using Impala on Kudu at the moment. Found https://kudu.apache.org/docs/schema_design.html , says Kudu allows compression on a column level. How about if I wanted table level? Thanks.
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Kudu
- « Previous
- Next »