About Shelton

wernermarcel · ‎01-12-2020

Hi thanks for your Message. I found the Problem, after removal of Kerberos from the Cluster someone set Root:supergroup on /. I change ist back to hdfs:<amingroup> and now hue can create Homedirs. rg marcel

TatanParker · ‎01-11-2020

- Run virtual machine ssh -p 2122 root@localhost -u root -p hadoop - vi /sandbox/proxy/generate-proxy-deploy-script.sh Please try editing the script to add a port that you want to forward. There are 2 arrays defined in the script, one is called tcpPortsHDP, add a new entry in the following format considering port 9200 as the one you need to forward. [9200]=9200. (Example to use Elasticsearch in HDP Sandbox 2.6.5) - Save the file and rerun the script. This script generate new file 'proxy-deploy.sh' which is in the same directory (/sandbox/proxy/) and Run the script 'proxy-deploy.sh' - Check that port is already opened: run 'docker ps'. In the middle of all ports opened in the proxy towards the VM you will see your last ports configured. That's fine. - Reboot the system - VirtualBox -> settings -> network -> Advanced -> port forwarding -> add new port forwarding rule using 0.0.0.0 IP address (maybe you will have to allow access in the firewall on the host machine (in my case window) Enjoy, your address is available for it access on your host machine explorer!!! - In my case, for Elasticsearch, was necessary add to /etc/elasticsearch/elasticsearch.yml add the following code: network.bind_host: 0.0.0.0 (THAT IS THE THEY TO ALLOW IT WORKING GOOD!!) transport.host: localhost node.master: true node.data: true transport.tcp.port: 9300

Niruu · ‎01-11-2020

Hi Shelton, Tons of thanks for you, you saved my whole weekend, Ambari is started as expected.. 🙂 Thanks Niranjan

md88 · ‎01-10-2020

Thank you @Shelton for your help And thanks for sharing the take-over url, very interesting read.

Shelton · ‎01-10-2020

@asmarz On the edgenode Just to validate your situation I have spun up single node cluster Tokyo IP 192.168.0.67 and installed an edge node Busia IP 192.168.0.66 I will demonstrate the spark client setup on the edge node and evoke the spark-shell First I have to configure the passwordless ssh below my edge node Passwordless setup [root@busia ~]# mkdir .ssh [root@busia ~]# chmod 600 .ssh/ [root@busia ~]# cd .ssh [root@busia .ssh]# ll total 0 Networking not setup The master is unreachable from the edge node [root@busia .ssh]# ping 198.168.0.67 PING 198.168.0.67 (198.168.0.67) 56(84) bytes of data. From 198.168.0.67 icmp_seq=1 Destination Host Unreachable From 198.168.0.67 icmp_seq=3 Destination Host Unreachable On the master The master has a single node HDP 3.1.0 cluster, I will deploy the clients to the edge node from here [root@tokyo ~]# cd .ssh/ [root@tokyo .ssh]# ll total 16 -rw------- 1 root root 396 Jan 4 2019 authorized_keys -rw------- 1 root root 1675 Jan 4 2019 id_rsa -rw-r--r-- 1 root root 396 Jan 4 2019 id_rsa.pub -rw-r--r-- 1 root root 185 Jan 4 2019 known_hosts Networking not setup The edge node is still unreachable from the master Tokyo [root@tokyo .ssh]# ping 198.168.0.66 PING 198.168.0.66 (198.168.0.66) 56(84) bytes of data. From 198.168.0.66 icmp_seq=1 Destination Host Unreachable From 198.168.0.66 icmp_seq=2 Destination Host Unreachable Copied the id-ira.pub key to the edgenode [root@tokyo ~]# cat .ssh/id_rsa.pub | ssh root@192.168.0.215 'cat >> .ssh/authorized_keys' The authenticity of host '192.168.0.215 (192.168.0.215)' can't be established. ECDSA key fingerprint is SHA256:ZhnKxkn+R3qvc+aF+Xl5S4Yp45B60mPIaPpu4f65bAM. ECDSA key fingerprint is MD5:73:b3:5a:b4:e7:06:eb:50:6b:8a:1f:0f:d1:07:55:cf. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.0.215' (ECDSA) to the list of known hosts. root@192.168.0.215's password: Validation the passwordless ssh works [root@tokyo ~]# ssh root@192.168.0.215 Last login: Fri Jan 10 22:36:01 2020 from 192.168.0.178 [root@busia ~]# hostname -f busia.xxxxxx.xxx xxxxxx Single node Cluster [root@tokyo ~]# useradd asmarz [root@tokyo ~]# su - asmarz On the master as user asmarz I can access the spark-shell and execute any spark code Add the edge node to the cluster Install the clients on the edge node On the master as user asmarz I have access to the spark-shell Installed Client components on the edge-node can be seen in the CLI I chose to install all the clients on the edge node just to demo as I have already install the hive client on the edge node without any special setup I can now launch the hive HQL on the master Tokyo from the edge node After installing the spark client on the edge node I can now also launch the spark-shell from the edge node and run any spark code, so this demonstrates that you can create any user on the edge node and he /she can rive Hive HQL, SPARK SQL or PIG script. You will notice I didn't update the HDFS , YARN, MAPRED,HIVE configurations it was automatically done by Ambari during the installation it copied over to the edge node the correct conf files !! The asmarz user from the edge node can also acess HDFS Now as user asmarz I have launched a spark-submit job from the edge node The launch is successful on the master Tokyo see Resource Manager URL, that can be confirmed in the RM UI This walkthrough validates that any user on the edge node can launch a job in the cluster this poses a security problem in production hence my earlier hint of Kerberos. Having said that you will realize I didn't do any special configuration after the client installation because Ambari distributes the correct configuration of all the component and it does that for every installation of a new component that's the reason Ambari is a management tool If this walkthrough answers your question, please do accept the answer and close the thread. Happy hadooping

ssk26 · ‎01-07-2020

Hi @Shelton , I was able to achieve my objective of running multiple Spark Sessions under a single Spark context using YARN capacity scheduler and Spark Fair Scheduling. However, the issue still remains with YARN fair scheduler. The second Spark job is still not running (with the same memory configuration) due to lack of resources. So, what additional parameters need to be set for YARN fair scheduler to achieve this? Please help me in fixing this issue. Thanks and Regards, Sudhindra

AKR · ‎01-07-2020

Hi, We understand that logs are not getting deleted even though you had enabled spark.history.fs properties. Did you found any errors in SHS logs with regarding to this? Thanks AKR

Shelton · ‎01-06-2020

@Chittu Can you share your code example? The should be an option to specify mode='overwrite' when saving a DataFrame: myDataFrame.save(path='"/output/folder/path"', source='parquet', mode='overwrite') Please revert

shyamshaw · ‎01-06-2020

@Shelton Sorry for late response. I followed the thread and was able to configure Fair Scheduler successfully. Thank you for your help. Cheers.

stevenmatison · ‎01-06-2020

Nice one!! Always restart Ambari after any Ambari-server setup commands. Very glad to see you got it!

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: HUE Homedirectory Creation Failed.

Re: How to add port forwarding in HDP Sandbox 2.6....

Re: Changing IP and Hostname for Ambari Eco System

Re: Ambari 2.6 cluster install failing

Re: Adding a second user on hadoop cluster

Re: Unable to start Node Manager

Re: spark history logs not deleted

Re: How do I write Spark job output to NFS Share i...

Re: Switching from Capacity Scheduler to Fair Sche...

Re: hdfs groups not returning mapped groups using ...