Member since
11-12-2018
218
Posts
179
Kudos Received
35
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1261 | 08-08-2025 04:22 PM | |
| 1647 | 07-11-2025 08:48 PM | |
| 2591 | 07-09-2025 09:33 PM | |
| 1543 | 04-26-2024 02:20 AM | |
| 2159 | 04-18-2024 12:35 PM |
06-25-2022
03:05 PM
It seems like your Spark workers are pointing to the default/system installation of python rather than your virtual environment. By setting the environment variable, you can tell Spark to use your virtual environment. You can set the below two configs in <spark_home_dir>/conf/spark-env.sh export PYSPARK_PYTHON=<Python_binaries_Path>
export PYSPARK_DRIVER_PYTHON=<Python_binaries_Path>
... View more
06-15-2022
10:41 AM
2 Kudos
Hi @suri789 Can you try this below and share your feedback? >>> df.show() +----------------+ | value | +----------------+ | N. Plainfield| |North Plainfield| | West Home Land| | NEWYORK| | newyork| | So. Plainfield| | S. Plaindield| | s Plaindield| |North Plainfield| +----------------+ >>> from pyspark.sql.functions import regexp_replace, lower >>> df_tmp=df.withColumn('value', regexp_replace('value', r'\.','')) >>> df_tmp.withColumn('value', lower(df_tmp.value)).distinct().show() +----------------+ | value | +----------------+ | s plaindield| | n plainfield| | west home land| | newyork| | so plainfield| |north plainfield| +----------------+
... View more
06-15-2022
09:40 AM
1 Kudo
Hi @dfdf, I am not able to reproduce this issue. I can able to get the table details while running the queries in the Spark3 session. Please can you help us with the exact version of Spark3 and Hive version that you are running on your environment? For example, you can get the value by running spark3-shell --version Please verify are you seeing any errors or alerts related to the Hive service? Also, can you try to run similar queries directly from Hive and see whether are you getting the results?
... View more
06-15-2022
08:35 AM
1 Kudo
Hi @haze5736, You need to use Hive Warehouse Connector (HWC) software to query Apache Hive managed tables from Apache Spark. Using HWC API you can read and write Apache Hive tables from Apache Spark. For example, to write the managed table. df.write.format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", &tableName>).option("partition", <partition_spec>).save() Ref: https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/integrating-hive-and-bi/topics/hive-read-write-operations.html For more details you can refer the below documentation: https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/integrating-hive-and-bi/topics/hive_hivewarehouseconnector_for_handling_apache_spark_data.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/integrating-hive-and-bi/topics/hive_submit_a_hivewarehouseconnector_python.html
... View more
06-06-2022
01:49 PM
@QiDam Please can you check the Ozone service installed in your CDP Base Cluster and also ensure the Ozone service is up and running? After Ozone service is set up and functional, enable CDE Service afresh and remove the failed CDE service. Also, can you check all PODs are up and running?
... View more
02-23-2021
01:44 AM
Thanks @adrijand for sharing your updates, it's highly appreciated.
... View more
02-09-2021
04:57 AM
Hi @joyabrata I think you are looking in the Data Lake tab which is a different one, you can go to the Summary tab, then scroll down to FreeIPA session then click Actions and get Get FreeIPA Certificate from the drop-down menu. Hope this will help you.
... View more
02-08-2021
10:37 PM
Hi @joyabrata Obtain the FreeIPA certificate of your environment: From the CDP Home Page navigate to Management Console > Environments. Locate and select your environment from the list of available environments. Go to the Summary tab, scroll down to FreeIPA session then click Actions Select Get FreeIPA Certificate from the drop-down menu. The FreeIPA certificate downloads and follows this document
... View more
02-03-2021
09:39 PM
Hi @adrijand Thanks for your detailed explanation here. Yeah indeed, we need all versions to be the same to avoid some classnotfoundexception because of jar conflicts. We encourage you to explore these and provide feedback on your experiences.
... View more
01-23-2021
06:41 AM
Hi @adrijand Yeah, it seems some jar conflicts somewhere. You are trying to load Hive 1.1.0 classes before the ones included with Spark, and as such, they might fail to reference a Hive configuration that didn't exist in 1.1.0. like below. : java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME
at org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:195)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) But here in the description mentioned you are using CDH 5.15v but in your log snippets it showing Apache Spark (spark-2.3.0-bin-without-hadoop) and Apache Hive (apache-hive-1.1.0-bin) version which is not a pre-built package version that comes along with CDH stack compatibility. Are you trying with building with varying versions of Hive which you would like to connect from a remote airflow docker container?
... View more