Member since
05-03-2016
23
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
766 | 04-08-2019 01:04 PM | |
445 | 02-21-2019 02:37 PM | |
812 | 11-22-2017 11:35 AM | |
650 | 06-03-2016 06:49 AM |
04-08-2019
01:04 PM
Got the hint logs are exactly pointing the issue. "ls -ld /" shows me it has 777 permission. Just removed the write permission for group and other users, my issue is solved. All these while I only checked the permission for the subsequent folders after "/" but the problem lies with "/" itself.
... View more
04-08-2019
12:35 PM
After a successful fresh Installation of HDP 3.1.0 on a 2 node Ubuntu 18.0.4 instances, we were able to get all the services up and running. But after a night off, Datanode doesn't start. Following is the error: ERROR datanode.DataNode (DataNode.java:secureMain(2883)) - Exception in secureMain
java.io.IOException: The path component: '/' in '/var/lib/hadoop-hdfs/dn_socket' has permissions 0777 uid 0 and gid 0. It is not protected because it is world-writable. This might help: 'chmod o-w /'. For more information: https://wiki.apache.org/hadoop/SocketPathSecurity
at org.apache.hadoop.net.unix.DomainSocket.validateSocketPathSecurity0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:193)
at org.apache.hadoop.hdfs.net.DomainPeerServer.<init>(DomainPeerServer.java:40)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:1194)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:1161)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1416)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:500)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2900)
2019-04-08 17:36:52,452 INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: java.io.IOException: The path component: '/' in '/var/lib/hadoop-hdfs/dn_socket' has permissions 0777 uid 0 and gid 0. It is not protected because it is world-writable. This might help: 'chmod o-w /'. For more information: https://wiki.apache.org/hadoop/SocketPathSecurity
2019-04-08 17:36:52,456 INFO datanode.DataNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG I did check the socket file and it's parent directories, I don't see the 777 permission at any level. But still the error appears while starting the datanode. Couldn't find any solution, so posted here for help. I have also uploaded the complete logs for the datanode. hadoop-hdfs-datanode.txtRegards, Vinay MP
... View more
02-21-2019
02:37 PM
I didin't find any configuration from Kylin side. Tomcat will be bundled inside Kylin. So changing the port on tomcat server.xml helped.
... View more
02-20-2019
06:10 PM
How can I change the Apache kylin port ? The default listen port is 7070, I have salt-bootstrap running on the Azure VMs in that port. I went through kylin.properties and didn't find a relevant property for listen port.
... View more
- Tags:
- kylin
07-19-2018
11:30 AM
I faced same problem. I had created cluster template from Cloudbreak 2.4.0. I used the same template with cloudbreak 2.7.0, cluster creation failed with "Failed to retrieve server certificate" Error. After referring this thread, I compared the Image ID used in 2.4.0 and 2.7.0 template and found them to be different. So if there is a problem with the image used to create the instances, it can lead to this error.
... View more
11-22-2017
11:35 AM
Hey All, After series of tests, we decided to move to Centos 7.4 and Upgrade to HDP-2.6.3.0 With Centos 7.4 and Ambari Version 2.6.0.0, I don't see this issue eventhough I have 'Python 2.7.5' With reference to my previous comment, it looks to be an Ambari issue.
... View more
10-09-2017
07:28 AM
@Akhil S Naik @Jay SenSharma It's a firewall issue. Now the ambari server is responding properly. I did go through Jay's article. Thanks for sharing, that will help in future. Regards, Vinay MP
... View more
10-08-2017
11:22 AM
Ambari-server performance is way too slow. I freshly installed ambari server on centos 7.3, with oracle jdk 1.8 I know only centos is supported till 7.2, But on the same configuration Ambari 2.2.2 works absolutely fine. I did test with chrome, IE and firefox. Performance is bad in all of them. It takes nearly 2 mins to login. P.S.:- This is a fresh installation, I am trying to "launch Install wizard". Navigation to every page is nearly 2-3 mins. and I am not able to proceed after "Get started" tab. Are there any known issues? Regards, Vinay MP
... View more
Labels:
09-28-2017
06:09 AM
@Jay SenSharma Haven't found a feasible solution. As mentioned in the issue description, Downgrading Python 2.6 is not feasible as there are OS dependencies and based on below link: https://stackoverflow.com/questions/46274499/ambari-agent-certificate-verify-failed-is-it-safe-to-disable-the-certificate I have got a suggestion it's not a good idea to disable certificate verification in Python. Sharing some more information from our investigation, Just thinking it might help others: We use AWS EC2 With Python 2.7, JDK 1.8 and Cent OS 7.2 there is no issue. Everything is smooth. With Python 2.7, JDK 1.8 and Cent OS 7.3 and Centos 7.4 we are seeing this issue. What I have reported here, is with respect to Centos 7.3 and with Centos 7.4 Issue is slightly different: Certificate verification fails while adding nodes to the cluster itself. Downgrading from centos 7.3 to 7.2 is not straight forward. And AWS EC2 market place provides Centos 7.0 Image and when we create instance from this image, it applies security and patch updates resulting in Centos 7.3. We can create our own Image of Centos 7.3 from existing servers but, It's always good be with the latest update for the OS for security reasons. To finish it shortly, we have workarounds but not a solution yet 🙂 Thanks for your help. I will update the solution which we follow. Regards, Vinay MP
... View more
09-18-2017
08:01 AM
Ambari version: 2.2.2.18 HDP stack: 2.4.3 OS: centos 7.3 Issue description: Ambari-server can't communicate with Ambari agent. I can see below error in the ambari-agent logs: ERROR 2017-09-18 06:35:34,684 NetUtil.py:84 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579) ERROR 2017-09-18 06:35:34,684 NetUtil.py:85 - SSLError: Failed to connect. Please check openssl library versions. I am facing this issue recently and it appears this can be replicated consistently after the instances are restarted. (I am using EC2 instances). I am able to register agent nodes successfully, install HDP cluster, run yarn jobs etc... no problem at all. Once i restart my instances, I see this problem. There are some solutions already posted for this problem like:
Downgrade the Python from 2.7 to lower. This is a known problem of Ambari with Python 2.7 Control the certificate verification by disabling it.
Set "verify = disable"; under /etc/python/cert-verification.cfg I don't want to play with Python as it can disrupt lot many things like Cassandra, yum package manager etc... Second work around is very much easy and it works well! Now comes my question :- Is it safe to disable the certificate verification in Python ? i.e. by setting property verify = disable Regards, Vinay MP
... View more
Labels:
06-06-2017
12:40 PM
1 Kudo
Hi All, Please share any link to blogs / Documents to follow, to use HDP after setting up kerberos. I am mainly looking to find some guidelines for: Using HDFS to upload files Running the Spark jobs Adding new services etc... Regards, Vinay MP
... View more
06-06-2017
11:39 AM
I did face the same problem in HDP 2.3 cluster with open JDK 1.6 I tried the above solution but it didn't work for me. I then decided to try with HDP 2.4 with open JDK 1.7, Now kerberos setup is successful.
... View more
02-24-2017
12:39 PM
Here is the Scenario: I have data in My hive DB with 2 tables, I want to connect tableau to these 2 tables to build my reports. We have a business requirement to truncate the table quite often and fetch the new reports with new data(The key data of few columns will remain the same but there are other changes which keeps happening to other columns and we want to visualize them), that can't be changed. We have hortonworks cluster. Used Hive ODBC to connect to the tables and it all works fine except the performance. When we used spark ODBC and connected through Spark-thrift, performance is far better than hive odbc. But this have a problem, Whenever we truncate, load the new data into tables, tableau will fail with below errors: [Microsoft][SparkODBC] (35) Error from server: error code: '0' error message: 'java.lang.IllegalArgumentException: orcFileOperator: path hdfs://server:8020/HIVE/my.db/mytable/yearmonth=201702/daytimestamp=02070200 does not have valid orc files matching the pattern'.
The table "[my].[mytable]" does not exist
Hive data stored under folder /HIVE directory in hdfs with yearmonth and day timestamp partitions. We did try below workarounds to truncate tables but it doesn't help: Created a dummy record in the table with a key "DELETEID" and execute below query: insert overwrite table mytable PARTITION(yearmonth, daytimestamp) select * from mytable where my id = "DELETEID"; This will erase same timestamp records of the row "DELETEID" and doesn't affect after that. Went ahead and removed files in HDFS, "hdfs dfs -rm -R -skipTrash /HIVE/my.db/mytable/*" After uploading data again and refresh the reports, it still refers to one of the OLD hdfs path of the table data and doesn't work. Interesting things is with hive CLI i can see the table OR query the data and also through Hive view in ambari and also If I use hive ODBC in tableau but it fails consistently with above error when tried for tableau --> SparkODBC --> SparkThrift --> Hive connection Im quite sure if we remove the partition it should work, but as the data grows partition becomes necessary. Anyone faced similar problems with SparkODBC ? Please share suggestions. Regards, Vinay MP
... View more
Labels:
01-25-2017
06:19 AM
Hey @rguruvannagari, Sorry to reply so late and taking so long time on this. Since Thrift server wasn't required for our project we decided to stop in the cluster. And thank you for the suggestion. Now got some free time and verified. Yarn was keeping it in the ACCEPTED state as long as memory wasn't available. Once memory is available, I can see the hive prompt as the application goes to RUNNING state.
... View more
01-10-2017
09:38 AM
Thanks. I had created few folder under /usr/hdp and faced same issue. It's a good practice to not to create any files, folders under /usr/hdp as the script doesn't like it. Easy to move/create the folders (Thank modifying the script) somewhere else if required. And that solves my issue!
... View more
01-04-2017
12:50 PM
Hey All, I faced this issue in HDP 2.4 cluster running on centos. When I run 'hive' command, it always used to halt like below: Tried to add the proxyuser properties for hosts and groups and finally found that's not actually causing this. I stop Spark thrift server, immediately it gives me the hive prompt and I can work with hive cli. Anybody faced similar problems ??? Regards, Vinay MP
... View more
Labels:
12-13-2016
08:53 AM
1 Kudo
I am trying to create a 3 node HDP cluster with Private IPs only. Say node1, 2 and 3. I have an additional edge node where I have created a local repository. Without creating instance with Public IP OR assigning an elastic IP, I am not able to install ambari-server on node1. always it will die saying couldn't connect to CDS load balancer. Eventhough I have the local repository. snippet of "yum repolist" command is below: # yum repolist Loaded plugins: amazon-id, rhui-lb, search-disabled-repos
Could not contact CDS load balancer rhui2-cds01.us-west-2.aws.ce.redhat.com, trying others.
Could not contact any CDS load balancers: rhui2-cds01.us-west-2.aws.ce.redhat.com, rhui2-cds02.us-west-2.aws.ce.redhat.com. ambari.repo: #VERSION_NUMBER=2.4.2.0-136
[Updates-ambari-2.4.2.0]
name=ambari-2.4.2.0 - Updates
baseurl=http://<node1 private IP>/hdp/AMBARI-2.4.2.0/centos7/2.4.2.0-136 gpgcheck=1 gpgkey=http://<node1 private IP>/hdp/AMBARI-2.4.2.0/centos7/2.4.2.0-136/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1 priority=1 Anyone faced similar problems ? Any solutions ? Regards, Vinay MP
... View more
Labels:
11-24-2016
07:23 AM
Hey @Kuldeep Kulkarni I haven't installed Ranger.
... View more
11-23-2016
01:02 PM
HDP version 2.52. Ambari 2.4 OS: RHEL 7
... View more
11-23-2016
12:59 PM
Below is the error. This happens each time when i try to start all the services through "Actions" . But services starts up if i start them navigating to HOSTS. Main problem is when I want to add a new service (like Kafka, oozie). I get same error and I haven't found a solution yet. Services got installed without any issues during cluster creation. File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-INSTALL/scripts/hook.py", line 37, in <module>
BeforeInstallHook().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-INSTALL/scripts/hook.py", line 33, in hook
install_repos()
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-INSTALL/scripts/repo_initialization.py", line 66, in install_repos
_alter_repo("create", params.repo_info, template)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-INSTALL/scripts/repo_initialization.py", line 33, in _alter_repo
repo_dicts = json.loads(repo_string)
File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
... View more
Labels:
06-03-2016
06:49 AM
Finally I managed to get a new 16GB machine where I can run the VM with good performance. As an initial practice I was using a 8GB machine. I used the same VM, the command went through fine in 16GB machine and it failed in 8GB machine. Not exactly sure whether memory wasn't sufficient (i didn't see any OOM / related exceptions in 8GB machine) to run these tests in 8GB machine but I am glad the problem is solved. @Ian Roberts, @Predrag Minovic Thanks for taking time to reply. Regards, Vinay MP
... View more
05-04-2016
07:26 AM
Hi @Ian Roberts , @Predrag Minovic Thanks for the suggestions. I will try them and update. As per now I checked netstat and I was able to see resourcemanager was up and listening on 8030, 8050 and few more ports. All of a sudden I am not able to open terminal session to Node1 (one of the host in my VM). I will fix that and verify the mapreduce example. Regards, Vinay MP
... View more
05-03-2016
12:52 PM
Hello, I am running below command from the map reduce examples for PI, it is failing and I can see socket timeout exception in the logs. I am not able to find a solution anywhere till now, would be glad if someone can help. Command: yarn jar hadoop-mapreduce-examples.jar pi 5 10 (From the directory: /usr/hdp/2.3.0.0-2557/hadoop-mapreduce) Below is the log trace: 2016-04-20 06:12:48,333 WARN [RMCommunicator Allocator] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.17.0.2:53751 remote=node1/172.17.0.2:8030]
2016-04-20 06:12:51,884 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM.
java.io.IOException: Failed on local exception: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.17.0.2:53751 remote=node1/172.17.0.2:8030]; Host Details : local host is: "node1/172.17.0.2"; destination host is: "node1":8030;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773 I can see the property in advances yarn site -- yarn.resourcemanager.scheduler.address node1:8030 Hosts file entry: [root@node1 ~]# cat /etc/hosts
172.17.0.2 node1 127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
[root@node1 ~]# Not sure what is the problem. I can ping localhost / node1/127.0.0.1 from node1 terminal. Regards, Vinay MP
... View more
Labels: