About AKB

AKB · ‎08-22-2018

I did the setup on a Centos7 host. Get this error when I try to run command. Using AWS Elastic IP for the single node cluster, so public IP is in hosts file (edge node). root@edgenode ~]# sudo -u hdfs hadoop fs -ls /ds-datalake -ls: java.net.UnknownHostException: ip-172-31-26-58.ec2.internal Usage: hadoop fs [generic options] -ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...] Core-site.xml has this set (private ip): <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://ip-172-31-26-58.ec2.internal:8020</value> </property> @bgooley

AKB · ‎08-22-2018

Question on this. Does the edge node be configured to be able to do passwordless ssh to the namenode?

AKB · ‎08-22-2018

I found this which is somewhat relevant, I think - https://rainerpeter.wordpress.com/2014/02/12/connect-to-hdfs-running-in-ec2-using-public-ip-addresses/ But, my problem is, I am trying to connect from a remote non-hadoop edge node machine, so there is no hadoop config files here.

AKB · ‎08-22-2018

Another thing came to mind. I am using Elastic IP for the public IP address which is what I put in the code. It does resolve to the private IP as I can see in the error. requests.exceptions.ConnectionError: HTTPConnectionPool(host='ip-172-31-26-58.ec2.internal', port=50075): Max retries exceeded with url: /webhdfs/v1/tmp/sample.txt?op=OPEN&user.name=hdfs&namenoderpcaddress=ip-172-31-26-58.ec2.internal:8020&offset=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000000007693828>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',)) Security group is also configured to allow entry for these ports from my work IP address range.

AKB · ‎08-22-2018

It is a single node cluster, NN is the DN. Also, why is it able to list the directoty contents but cannot seem to read/write from it?

AKB · ‎08-21-2018

CDH 5.15 Single Node custer installed using CM on CentOS 7.x on AWS Ec2 instance. 8 CPU, 64 GB RAM. Verfied WebHDFS is running and I am connecting from a remote machine (non-hadoop client), after being connected to the environment using SSH key. using PyWebHDFSClient library to list, read and write files off HDFS. The following code works - hdfs = PyWebHdfsClient(host='IP_ADDR', port='50070', user_name='hdfs', timeout=1) # your Namenode IP & username here my_dir = 'ds-datalake/misc' pprint(hdfs.list_dir(my_dir)) {u'FileStatuses': {u'FileStatus': [{u'accessTime': 1534856157369L, u'blockSize': 134217728, u'childrenNum': 0, u'fileId': 25173, u'group': u'supergroup', u'length': 28, u'modificationTime': 1534856157544L, u'owner': u'centos', u'pathSuffix': u'sample.txt', u'permission': u'644', u'replication': 3, u'storagePolicy': 0, u'type': u'FILE'}]}} But, when I try to read/write at same location, using something like this: my_file = 'ds-datalake/misc/sample.txt' print(hdfs.read_file(my_file)) I get the following error: requests.exceptions.ConnectionError: HTTPConnectionPool(host='HOST_NAME', port=50075): Max retries exceeded with url: /webhdfs/v1/ds-datalake/misc/sample.txt?op=OPEN&user.name=hdfs&namenoderpcaddress=HOST_NAME:8020&offset=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000000068F4828>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',)) This is what the HDFS folder looks like: hadoop fs -ls /ds-datalake/misc Found 1 items -rwxrwxrwx 3 centos supergroup 28 2018-08-21 12:55 /ds-datalake/misc/sample.txt Can you please help me? I have two single node test clusters and this happens on both. HDFS Namenode UI comes up fine from the CM site and all services look healthy. Thanks.

AKB · ‎05-07-2018

I have been able to access Azure Data Lake Storage from my CDH 5.14 cluster. However, I wanted to know if there is anyway to set the Kudu master and tablet server wal and data folders directly to reside in ADLS? This does not seem to be the case if using the CM configuration page as entering something like "adls://......" results in an error, for those fields. If this is not possible, is there a way to back up Kudu data and wal dirs to ADLS or some other storage and bring it back if the cluster is being rebuilt? Thanks.

AKB · ‎05-01-2018

sqoop import --connect 'jdbc:sqlserver://data-dev.dev.eso.local;database=KUDU_40M' \ --username xxxx --password xxxx --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \ --table dim_PatientEncounter --hive-table kudu_40m.dim_PatientEncounter --create-hive-table --hive-import --warehouse-dir /user/hive/warehouse/testdb -m 4 The table has 30 million+ rows. Running the above command, always seems to hang at mapper 3 of 4. Nothing much in the logs besides: 2018-05-01 08:22:53,364 INFO [IPC Server handler 7 on 45249] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1525122694968_0001_m_000003_0 is : 0.0 Changes I made: mapreduce.map.memory.mb = 5 GB mapreduce.reduce.memory.mb = 5GB yarn.nodemanager.resource.memory-mb = 20GB (NodeManager Default Group) yarn.nodemanager.resource.memory-mb = 8GB (NodeManager Group 1) Cluster MetricsApps Submitted Apps Pending Apps Running Apps Completed Containers Running Memory Used Memory Total Memory Reserved VCores Used VCores Total VCores Reserved 1 0 1 0 3 11 GB 68 GB 0 B 3 32 0 Cluster Nodes MetricsActive Nodes Decommissioning Nodes Decommissioned Nodes Lost Nodes Unhealthy Nodes Rebooted Nodes 4 0 0 0 0 0 User Metrics for dr.whoApps Submitted Apps Pending Apps Running Apps Completed Containers Running Containers Pending Containers Reserved Memory Used Memory Pending Memory Reserved VCores Used VCores Pending VCores Reserved 0 0 0 0 0 0 0 0 B 0 B 0 B 0 0 0 Show 20406080100 entries Search: IDUserNameApplication TypeQueueStartTimeFinishTimeStateFinalStatusRunning ContainersAllocated CPU VCoresAllocated Memory MBReserved CPU VCoresReserved Memory MBProgressTracking UI application_1525122694968_0001 root dim_PatientEncounter.jar MAPREDUCE root.users.root Tue May 1 08:11:49 -0500 2018 N/A RUNNING UNDEFINED 3 3 11264 0 0 ApplicationMaster Any help is appreciated? Also, this is in CDH 5.14, so Sqoop1 is being used by default, I believe. Where are the Sqoop logs written for this, I cannot seem to find the logs besides the Yarn logs.

AKB · ‎04-12-2018

Can you please direct me to some docs that say how to set encodings and how to declare column level compression? I am not able to find this. Thanks.

AKB · ‎04-12-2018

Looking for documentation. Using Impala on Kudu at the moment. Found https://kudu.apache.org/docs/schema_design.html , says Kudu allows compression on a column level. How about if I wanted table level? Thanks.

Online	Offline
Last Visited	‎02-12-2019 05:00 PM

Member Since	‎04-11-2018 12:23 PM
Last Visited	‎02-12-2019 05:00 PM
Posts	55
Kudos received	1

Cloudera Community

Re: install hadoop client on unmanaged host

Re: install hadoop client on unmanaged host

Re: Unable to access HDFS Namenode from Python lib...

Re: Unable to access HDFS Namenode from Python lib...

Re: Unable to access HDFS Namenode from Python lib...

Unable to access HDFS Namenode from Python library...

Kudu file storage on ADLS

SQL Server table to Hive table using Sqoop

Re: How to use compression with Kudu tables?

How to use compression with Kudu tables?