About RangaReddy

RangaReddy · ‎10-25-2021

Hi @SimonBergerard Spark configuration parameters precedence (left is low and right is high) of the order is: spark-defaults.conf --> spark-submit/spark-shell --> spark code (scala/java/python) If you want to see the parameter values you can run with --verbose mode. spark-submit --verbose Please recheck the spark-submit command and parameters once again. --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=<directory> --conf spark.submit.deployMode=cluster

RangaReddy · ‎10-17-2021

Hi @Paop We don't have enough information (how much data, spark submit command etc) to provide solution. Please raise a case for this issue.

RangaReddy · ‎10-15-2021

@LegallyBind For each python, you need to create separate interpreter.

RangaReddy · ‎10-07-2021

Hi @shivanageshch EMR is not part of cloudera. If you are using CDP/HDP cluster, go through the following tutorial. Livy Configuration: Add the following properties to the livy.conf file: # Use this keystore for the SSL certificate and key. livy.keystore = <path-to-ssl_keystore> # Specify the keystore password. livy.keystore.password = <keystore_password> # Specify the key password. livy.key-password = <key_password> Access Livy Server: After enabling SSL over Livy server. Livy server should be accessible over https protocol. https://<livy host>:<livy port> References: 1. https://docs.cloudera.com/cdp-private-cloud-base/latest/security-encrypting-data-in-transit/topics/livy-configure-tls-ssl.html Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.

RangaReddy · ‎10-06-2021

Hi @LegallyBind Please find the following tutorial. https://community.cloudera.com/t5/Customer/How-to-use-multiple-versions-of-Python-in-Zeppelin/ta-p/271226

RangaReddy · ‎09-24-2021

Hi @Tomas79 While launching spark-shell, you need to add spark.yarn.access.hadoopFileSystems parameter. And also ensure to add dfs.namenode.kerberos.principal.pattern parameter value * in core-site.xml file. For example, # spark-shell --conf spark.yarn.access.hadoopFileSystems="hdfs://c1441-node2.coelab.cloudera.com:8020" Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/09/24 07:23:25 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! Spark context Web UI available at http://c2441-node2.supportlab.cloudera.com:4040 Spark context available as 'sc' (master = yarn, app id = application_1632395260786_0004). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.0.7.1.6.0-297 /_/ Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232) Type in expressions to have them evaluated. Type :help for more information. scala> val textDF = spark.read.textFile("hdfs://c1441-node2.coelab.cloudera.com:8020/tmp/ranga_clusterb_test.txt") textDF: org.apache.spark.sql.Dataset[String] = [value: string] scala> textDF.show(false) +---------------------+ |value | +---------------------+ |Hello Ranga, | | | +---------------------+

RangaReddy · ‎09-14-2021

Hi @Seaport As you know, resource managers like yarn, standalone, kubernets will create containers. Internally RMs will use shell script to create containers. Based on resources, it will create one or more containers in the same node.

RangaReddy · ‎09-14-2021

Hi @Seaport Please check the following example. It will may help. https://kontext.tech/column/spark/284/pyspark-convert-json-string-column-to-array-of-object-structtype-in-data-frame

RangaReddy · ‎09-08-2021

In this tutorial, we will learn how to create Apache Ozone volumes, buckets, and keys. After that, we will see how to create an Apache Hive table using Apache Ozone, and finally how we can insert/read the data from Apache Spark. Ozone Create the volume with the name vol1. # ozone sh volume create /vol1 21/08/25 06:23:27 INFO rpc.RpcClient: Creating Volume: vol1, with root as owner. Create the bucket with the name bucket1 under vol1. # ozone sh bucket create /vol1/bucket1 21/08/25 06:24:09 INFO rpc.RpcClient: Creating Bucket: vol1/bucket1, with Versioning false and Storage Type set to DISK and Encryption set to false Hive Launch the beeline shell. Create the employee table in Hive. Note: Update the om.host.example.com value. CREATE DATABASE IF NOT EXISTS ozone_db; USE ozone_db; CREATE EXTERNAL TABLE IF NOT EXISTS `employee`( `id` bigint, `name` string, `age` smallint) STORED AS parquet LOCATION 'o3fs://bucket1.vol1.om.host.example.com/employee'; Spark Spark2: Launch spark-shell spark-shell Run the following query to insert/read the data from the Hive employee table. spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""") spark.sql("SELECT * FROM ozone_db.employee").show() Spark3: Launch spark3-shell spark3-shell --jars /opt/cloudera/parcels/CDH/lib/hadoop-ozone/hadoop-ozone-filesystem-hadoop3-*.jar Run the following query to insert/read the data from the Hive employee table. spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""") spark.sql("SELECT * FROM ozone_db.employee").show() Kerberized environment Pre-requisites: Create a user and provide proper Ranger permissions to create Ozone volume and buckets, etc. kinit with the user. Spark2: Launch spark-shell Note: Before launching spark-shell update the om.host.example.com value. spark-shell \ --conf spark.yarn.access.hadoopFileSystems=o3fs://bucket1.vol1.om.host.example.com:9862 Run the following query to insert/read the data from Hive employee table. spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""") spark.sql("SELECT * FROM ozone_db.employee").show() Spark3: Launch spark3-shell Note: Before launching spark-shell update the om.host.example.com value. spark3-shell \ --conf spark.kerberos.access.hadoopFileSystems=o3fs://bucket1.vol1.om.host.example.com:9862 \ --jars /opt/cloudera/parcels/CDH/lib/hadoop-ozone/hadoop-ozone-filesystem-hadoop3-*.jar Run the following query to insert/read the data from the Hive employee table. spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""") spark.sql("SELECT * FROM ozone_db.employee").show() Notes: If you get the java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.ozone.OzoneFileSystem not foundthen add the /opt/cloudera/parcels/CDH/jars/hadoop-ozone-filesystem-hadoop3-*.jar to spark class path using --jars option. In a Kerberized environment, mandatorily, we need to specify the spark.yarn.access.hadoopFileSystems configuration, otherwise, it will display the following error. java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] Thanks for reading this article. If you liked this article, you can give kudos.

RangaReddy · ‎08-31-2021

Hi @yudh3 This application is deployed first time or it is an existing application. If it is first time then you need to tune according to what kind of operation you are doing. If an existing application, this issue is occurring recently or from long time it is there. If it is occurring recently, is there any data change or any hdfs/hive issues. Without understanding the logs, difficult to tell what is the exact issue. Please go ahead and create a case for this issue, we will work on.

Online	Offline
Last Visited	‎08-29-2024 03:41 AM

Member Since	‎06-02-2020 05:25 AM
Last Visited	‎08-29-2024 03:41 AM
Posts	331
Kudos received	65

Cloudera Community

Re: Icebreg on CDP private cloud 7.1.9

Re: How to set default time zone/local time for Sp...

Re: Load Iceberg Table on PowerBI Desktop

Re: NoClassDefFoundError due to Incompatible Spark...

Re: Creating Iceberg table

Re: My SparkConfigurations are not overwriting in ...

Re: Zeppelin max open file limits

Re: What is the best practice for multiple python ...

Re: SSL enabling for Livy-server on EMR

Re: What is the best practice for multiple python ...

Re: Spark access remote HDFS in cross realm trust ...

Re: Do I need to install Python3 on every CDP node...

Re: Parse nested json using Spark RDD

Spark Hive Ozone Integration in CDP

Re: hive on spark job suddenly runs abnormally slo...