Member since
01-19-2017
3676
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 609 | 06-04-2025 11:36 PM | |
| 1177 | 03-23-2025 05:23 AM | |
| 584 | 03-17-2025 10:18 AM | |
| 2186 | 03-05-2025 01:34 PM | |
| 1375 | 03-03-2025 01:09 PM |
01-11-2020
10:31 AM
- Run virtual machine ssh -p 2122 root@localhost -u root -p hadoop - vi /sandbox/proxy/generate-proxy-deploy-script.sh Please try editing the script to add a port that you want to forward. There are 2 arrays defined in the script, one is called tcpPortsHDP, add a new entry in the following format considering port 9200 as the one you need to forward. [9200]=9200. (Example to use Elasticsearch in HDP Sandbox 2.6.5) - Save the file and rerun the script. This script generate new file 'proxy-deploy.sh' which is in the same directory (/sandbox/proxy/) and Run the script 'proxy-deploy.sh' - Check that port is already opened: run 'docker ps'. In the middle of all ports opened in the proxy towards the VM you will see your last ports configured. That's fine. - Reboot the system - VirtualBox -> settings -> network -> Advanced -> port forwarding -> add new port forwarding rule using 0.0.0.0 IP address (maybe you will have to allow access in the firewall on the host machine (in my case window) Enjoy, your address is available for it access on your host machine explorer!!! - In my case, for Elasticsearch, was necessary add to /etc/elasticsearch/elasticsearch.yml add the following code: network.bind_host: 0.0.0.0 (THAT IS THE THEY TO ALLOW IT WORKING GOOD!!) transport.host: localhost node.master: true node.data: true transport.tcp.port: 9300
... View more
01-11-2020
09:37 AM
Hi Shelton, Tons of thanks for you, you saved my whole weekend, Ambari is started as expected.. 🙂 Thanks Niranjan
... View more
01-10-2020
09:13 PM
Thank you @Shelton for your help And thanks for sharing the take-over url, very interesting read.
... View more
01-10-2020
06:04 PM
1 Kudo
@asmarz On the edgenode Just to validate your situation I have spun up single node cluster Tokyo IP 192.168.0.67 and installed an edge node Busia IP 192.168.0.66 I will demonstrate the spark client setup on the edge node and evoke the spark-shell First I have to configure the passwordless ssh below my edge node Passwordless setup [root@busia ~]# mkdir .ssh [root@busia ~]# chmod 600 .ssh/ [root@busia ~]# cd .ssh [root@busia .ssh]# ll total 0 Networking not setup The master is unreachable from the edge node [root@busia .ssh]# ping 198.168.0.67 PING 198.168.0.67 (198.168.0.67) 56(84) bytes of data. From 198.168.0.67 icmp_seq=1 Destination Host Unreachable From 198.168.0.67 icmp_seq=3 Destination Host Unreachable On the master The master has a single node HDP 3.1.0 cluster, I will deploy the clients to the edge node from here [root@tokyo ~]# cd .ssh/ [root@tokyo .ssh]# ll total 16 -rw------- 1 root root 396 Jan 4 2019 authorized_keys -rw------- 1 root root 1675 Jan 4 2019 id_rsa -rw-r--r-- 1 root root 396 Jan 4 2019 id_rsa.pub -rw-r--r-- 1 root root 185 Jan 4 2019 known_hosts Networking not setup The edge node is still unreachable from the master Tokyo [root@tokyo .ssh]# ping 198.168.0.66 PING 198.168.0.66 (198.168.0.66) 56(84) bytes of data. From 198.168.0.66 icmp_seq=1 Destination Host Unreachable From 198.168.0.66 icmp_seq=2 Destination Host Unreachable Copied the id-ira.pub key to the edgenode [root@tokyo ~]# cat .ssh/id_rsa.pub | ssh root@192.168.0.215 'cat >> .ssh/authorized_keys' The authenticity of host '192.168.0.215 (192.168.0.215)' can't be established. ECDSA key fingerprint is SHA256:ZhnKxkn+R3qvc+aF+Xl5S4Yp45B60mPIaPpu4f65bAM. ECDSA key fingerprint is MD5:73:b3:5a:b4:e7:06:eb:50:6b:8a:1f:0f:d1:07:55:cf. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.0.215' (ECDSA) to the list of known hosts. root@192.168.0.215's password: Validation the passwordless ssh works [root@tokyo ~]# ssh root@192.168.0.215 Last login: Fri Jan 10 22:36:01 2020 from 192.168.0.178 [root@busia ~]# hostname -f busia.xxxxxx.xxx xxxxxx Single node Cluster [root@tokyo ~]# useradd asmarz [root@tokyo ~]# su - asmarz On the master as user asmarz I can access the spark-shell and execute any spark code Add the edge node to the cluster Install the clients on the edge node On the master as user asmarz I have access to the spark-shell Installed Client components on the edge-node can be seen in the CLI I chose to install all the clients on the edge node just to demo as I have already install the hive client on the edge node without any special setup I can now launch the hive HQL on the master Tokyo from the edge node After installing the spark client on the edge node I can now also launch the spark-shell from the edge node and run any spark code, so this demonstrates that you can create any user on the edge node and he /she can rive Hive HQL, SPARK SQL or PIG script. You will notice I didn't update the HDFS , YARN, MAPRED,HIVE configurations it was automatically done by Ambari during the installation it copied over to the edge node the correct conf files !! The asmarz user from the edge node can also acess HDFS Now as user asmarz I have launched a spark-submit job from the edge node The launch is successful on the master Tokyo see Resource Manager URL, that can be confirmed in the RM UI This walkthrough validates that any user on the edge node can launch a job in the cluster this poses a security problem in production hence my earlier hint of Kerberos. Having said that you will realize I didn't do any special configuration after the client installation because Ambari distributes the correct configuration of all the component and it does that for every installation of a new component that's the reason Ambari is a management tool If this walkthrough answers your question, please do accept the answer and close the thread. Happy hadooping
... View more
01-07-2020
11:03 PM
Hi @Shelton , I was able to achieve my objective of running multiple Spark Sessions under a single Spark context using YARN capacity scheduler and Spark Fair Scheduling. However, the issue still remains with YARN fair scheduler. The second Spark job is still not running (with the same memory configuration) due to lack of resources. So, what additional parameters need to be set for YARN fair scheduler to achieve this? Please help me in fixing this issue. Thanks and Regards, Sudhindra
... View more
01-07-2020
01:41 AM
Hi, We understand that logs are not getting deleted even though you had enabled spark.history.fs properties. Did you found any errors in SHS logs with regarding to this? Thanks AKR
... View more
01-06-2020
11:07 PM
@Chittu Can you share your code example? The should be an option to specify mode='overwrite' when saving a DataFrame: myDataFrame.save(path='"/output/folder/path"', source='parquet', mode='overwrite') Please revert
... View more
01-06-2020
11:03 PM
@Shelton Sorry for late response. I followed the thread and was able to configure Fair Scheduler successfully. Thank you for your help. Cheers.
... View more
01-06-2020
04:48 AM
Nice one!! Always restart Ambari after any Ambari-server setup commands. Very glad to see you got it!
... View more
01-06-2020
02:41 AM
@Shelton Any update on this? looks like it is looking for some java packages java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-4657625312215122883.8 (Permission denied)] can we install it externally?
... View more