Member since
01-19-2017
3676
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 508 | 06-04-2025 11:36 PM | |
| 1052 | 03-23-2025 05:23 AM | |
| 548 | 03-17-2025 10:18 AM | |
| 2046 | 03-05-2025 01:34 PM | |
| 1280 | 03-03-2025 01:09 PM |
01-11-2020
05:15 AM
@TVGanesh Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work. {
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["test1.py"]
"archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"]
} You sort of interchanged them below is a comparison between spark-submit command and it's equivalent in Livy REST JSON protocol Please revert
... View more
01-10-2020
06:04 PM
1 Kudo
@asmarz On the edgenode Just to validate your situation I have spun up single node cluster Tokyo IP 192.168.0.67 and installed an edge node Busia IP 192.168.0.66 I will demonstrate the spark client setup on the edge node and evoke the spark-shell First I have to configure the passwordless ssh below my edge node Passwordless setup [root@busia ~]# mkdir .ssh [root@busia ~]# chmod 600 .ssh/ [root@busia ~]# cd .ssh [root@busia .ssh]# ll total 0 Networking not setup The master is unreachable from the edge node [root@busia .ssh]# ping 198.168.0.67 PING 198.168.0.67 (198.168.0.67) 56(84) bytes of data. From 198.168.0.67 icmp_seq=1 Destination Host Unreachable From 198.168.0.67 icmp_seq=3 Destination Host Unreachable On the master The master has a single node HDP 3.1.0 cluster, I will deploy the clients to the edge node from here [root@tokyo ~]# cd .ssh/ [root@tokyo .ssh]# ll total 16 -rw------- 1 root root 396 Jan 4 2019 authorized_keys -rw------- 1 root root 1675 Jan 4 2019 id_rsa -rw-r--r-- 1 root root 396 Jan 4 2019 id_rsa.pub -rw-r--r-- 1 root root 185 Jan 4 2019 known_hosts Networking not setup The edge node is still unreachable from the master Tokyo [root@tokyo .ssh]# ping 198.168.0.66 PING 198.168.0.66 (198.168.0.66) 56(84) bytes of data. From 198.168.0.66 icmp_seq=1 Destination Host Unreachable From 198.168.0.66 icmp_seq=2 Destination Host Unreachable Copied the id-ira.pub key to the edgenode [root@tokyo ~]# cat .ssh/id_rsa.pub | ssh root@192.168.0.215 'cat >> .ssh/authorized_keys' The authenticity of host '192.168.0.215 (192.168.0.215)' can't be established. ECDSA key fingerprint is SHA256:ZhnKxkn+R3qvc+aF+Xl5S4Yp45B60mPIaPpu4f65bAM. ECDSA key fingerprint is MD5:73:b3:5a:b4:e7:06:eb:50:6b:8a:1f:0f:d1:07:55:cf. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.0.215' (ECDSA) to the list of known hosts. root@192.168.0.215's password: Validation the passwordless ssh works [root@tokyo ~]# ssh root@192.168.0.215 Last login: Fri Jan 10 22:36:01 2020 from 192.168.0.178 [root@busia ~]# hostname -f busia.xxxxxx.xxx xxxxxx Single node Cluster [root@tokyo ~]# useradd asmarz [root@tokyo ~]# su - asmarz On the master as user asmarz I can access the spark-shell and execute any spark code Add the edge node to the cluster Install the clients on the edge node On the master as user asmarz I have access to the spark-shell Installed Client components on the edge-node can be seen in the CLI I chose to install all the clients on the edge node just to demo as I have already install the hive client on the edge node without any special setup I can now launch the hive HQL on the master Tokyo from the edge node After installing the spark client on the edge node I can now also launch the spark-shell from the edge node and run any spark code, so this demonstrates that you can create any user on the edge node and he /she can rive Hive HQL, SPARK SQL or PIG script. You will notice I didn't update the HDFS , YARN, MAPRED,HIVE configurations it was automatically done by Ambari during the installation it copied over to the edge node the correct conf files !! The asmarz user from the edge node can also acess HDFS Now as user asmarz I have launched a spark-submit job from the edge node The launch is successful on the master Tokyo see Resource Manager URL, that can be confirmed in the RM UI This walkthrough validates that any user on the edge node can launch a job in the cluster this poses a security problem in production hence my earlier hint of Kerberos. Having said that you will realize I didn't do any special configuration after the client installation because Ambari distributes the correct configuration of all the component and it does that for every installation of a new component that's the reason Ambari is a management tool If this walkthrough answers your question, please do accept the answer and close the thread. Happy hadooping
... View more
01-10-2020
08:01 AM
@asmarz HDP Client means a set of binaries and libraries to run commands and develop software for a particular Hadoop service. So, if you install Hive client you can run beeline, if you install HBase client you can run an HBase shell and if you install Spark Client your can run spark-shell etc. But I would advise you install at least these clients so on edge node zookeeper-client sqoop-client spark2-client slider-client spark-client oozie-client hbase-client The local users created on the edge node can execute the spark-shell to run the spark submit , but the only difference is if you have a kerberized cluster you will have to generate keytabs and copy them over to the edge node for every user Hope that answers your question
... View more
01-10-2020
05:49 AM
@asmarz Now I have a better understanding of your deployment, I think that is a wrong technical approach. Having an edge node it a great idea in that you can create and control access to the cluster from the edge node, and usually, you have only Client software ONLY YARN, HDFS , OOZIE, ZOOKEEPER, SPARK, SQOOP.PIG client etc but not a Master node. Edge nodes run within the cluster allow for centralized management of all the Hadoop configuration entries on the cluster nodes which helps to reduce the amount of administration needed to update the config. When you configure a Linux box as an edge node during the deployment ambari configures update the conf files with the correct values so that all commands against the Cluster can be run from the edge node. For security and good practice, edge nodes need to be multi-homed into the private subnet of the Hadoop cluster as well as into the corporate network. Keeping your Hadoop cluster in its own private subnet is an excellent practice, so these edge nodes serve as a controlled window inside the cluster. In a nutshell, you don't need a Master process the edge node but Client to initiate communication with the Cluster Hope that helps
... View more
01-10-2020
12:20 AM
@obrobecker Can you check and use the correct principal of the hbase keytab? And use that one to kinit as see if that works? To get the principal do the below and press enter $ klist -kt /etc/security/keytabs/yarn-ats.hbase-client.headless.keytab The output should look like Keytab name: FILE:/etc/security/keytabs/yarn-ats.hbase-client.headless.keytab KVNO Timestamp Principal ---- ----------------- -------------------------------------------------------- 1 02/01/20 23:00:12 hbase/[FQDN].[REALM] 1 02/01/20 23:00:12 hbase/[FQDN].[REALM] 1 02/01/20 23:00:12 hbase/[FQDN].[REALM] 1 02/01/20 23:00:12 hbase/[FQDN].[REALM] 1 02/01/20 23:00:12 hbase/[FQDN].[REALM] Then you can kinit using format kinit -kt $keytab $principal $ kinit -kt /etc/security/keytabs/yarn-ats.hbase-client.headless.keytab hbase/[FQDN].[REALM] The klist command should show a valid ticket $ klist Now proceed and revert
... View more
01-09-2020
10:37 AM
@asmarz Your question is ambiguous can you elaborate? It's possible to add users to a cluster with all the necessary privileges to execute ie spark, hive in a kerberized cluster you can merge the different keytabs i.e hive, spark,oozie etc or control through Ranger. But if you can elaborate on your use-case then we can try to find a technical solution.
... View more
01-07-2020
01:02 AM
@md88 I was just wondering why you have chosen to go down a very difficult road to deploy unpackaged Hadoop when you have a simplified way using Cloudera see HDP 3.1.4 deployment this is a well-documented procedure that is relatively straight forward especially when you are new to Hadoop. In your initial thread, you said you had HDP 2.6 already installed but the document you are referencing isn't a Cloudera documentation !! Please refer to the earlier link I shared above that thoroughly documents the installation of the HDP ecosystem below are some of the components which have been tested and integrated which gives you other tools to be more productive i.e Security, Streaming, ETL,one-stop notebook Zeppelin etc Official Apache component versions for HDP 3.1.4: Apache Accumulo 1.7.0 Apache Atlas 1.1.0 Apache Calcite 1.16.0 Apache DataFu 1.3.0 Apache Druid 0.12.1 (incubating) Apache Hadoop 3.1.1 Apache HBase 2.0.2 Apache Hive 3.1.0 Apache Kafka 2.0.0 Apache Knox 1.0.0 Apache Livy 0.5.0 Apache Oozie 4.3.1 Apache Phoenix 5.0.0 Apache Pig 0.16.0 Apache Ranger 1.2.0 Apache Spark 2.3.2 Apache Sqoop 1.4.7 Apache Storm 1.2.1 Apache TEZ 0.9.1 Apache Zeppelin 0.8.0 Apache ZooKeeper 3.4.6 Having said if you want Ambari to manage your standalone Hadoop cluster you will have to walk in uncharted waters because that process is called Ambari take-over is not well documented and hence quite frustrating for a newbie. Here is a link to Ryba is an open-source project. You will have to be proficient in Python the challenge will be to give Apache Ambari full knowledge of your clusters’ topologies which services are running and where they are running. Please let me know if you need more help.
... View more
01-06-2020
11:07 PM
@Chittu Can you share your code example? The should be an option to specify mode='overwrite' when saving a DataFrame: myDataFrame.save(path='"/output/folder/path"', source='parquet', mode='overwrite') Please revert
... View more
01-06-2020
01:15 PM
@md88 Are you trying to install Ambari on an existing 2.6 cluster or you are upgrading Ambari? If you have already installed HDP 2.6 as you stated how is the registration of the host failing? Here are the steps to help resolve your issue. Can you please rephrase your question because personally I can't understand !
... View more
01-05-2020
04:28 PM
@sow Impala does not allow binary data. What you can do is use a serialize-deserialize methodology. This means you convert your image to a String format that still contains all the information necessary to transform it back. Once you need to retrieve an image on HDFS you will need to deserialize, meaning converting the string to the original format. Found this example using Python it would work like this: import base64 def img_to_string(image_path): with open(image_path, "rb") as imageFile: image_string= base64.b64encode(imageFile.read()) print image_string def string_to_img(image_string): with open("new_image.png", "wb") as imageFile: imageFile.write(str.decode('base64'))
... View more