Member since
11-12-2018
189
Posts
177
Kudos Received
32
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
513 | 04-26-2024 02:20 AM | |
667 | 04-18-2024 12:35 PM | |
3228 | 08-05-2022 10:44 PM | |
2938 | 07-30-2022 04:37 PM | |
6424 | 07-29-2022 07:50 PM |
06-27-2022
03:16 PM
Hi @sss123, this seems to be a bug. Please refer to https://issues.cloudera.org/browse/LIVY-3. Kindly note that Spark Notebook is not currently supported. Also please review the discussion in https://github.com/cloudera/hue/issues/254
... View more
06-27-2022
07:46 AM
Hi @ds_explorer, it seems because the edit log is too big and cannot be read by NameNode completely on the default/configured timeout. 2022-06-25 08:32:24,872 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 554705629. Expected transaction ID was 60366342312 Recent opcode offsets: 554704754 554705115 554705361 554705629 ..... Caused by: java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOpFrame(FSEditLogOp.java:4488) To fix this, can you add the below parameter and value (if you already have then kindly increase the value) HDFS > Configuration > JournalNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml hadoop.http.idle_timeout.ms=180000 And then restart the required services.
... View more
06-25-2022
03:05 PM
It seems like your Spark workers are pointing to the default/system installation of python rather than your virtual environment. By setting the environment variable, you can tell Spark to use your virtual environment. You can set the below two configs in <spark_home_dir>/conf/spark-env.sh export PYSPARK_PYTHON=<Python_binaries_Path>
export PYSPARK_DRIVER_PYTHON=<Python_binaries_Path>
... View more
06-15-2022
10:41 AM
2 Kudos
Hi @suri789 Can you try this below and share your feedback? >>> df.show() +----------------+ | value | +----------------+ | N. Plainfield| |North Plainfield| | West Home Land| | NEWYORK| | newyork| | So. Plainfield| | S. Plaindield| | s Plaindield| |North Plainfield| +----------------+ >>> from pyspark.sql.functions import regexp_replace, lower >>> df_tmp=df.withColumn('value', regexp_replace('value', r'\.','')) >>> df_tmp.withColumn('value', lower(df_tmp.value)).distinct().show() +----------------+ | value | +----------------+ | s plaindield| | n plainfield| | west home land| | newyork| | so plainfield| |north plainfield| +----------------+
... View more
06-15-2022
09:40 AM
1 Kudo
Hi @dfdf, I am not able to reproduce this issue. I can able to get the table details while running the queries in the Spark3 session. Please can you help us with the exact version of Spark3 and Hive version that you are running on your environment? For example, you can get the value by running spark3-shell --version Please verify are you seeing any errors or alerts related to the Hive service? Also, can you try to run similar queries directly from Hive and see whether are you getting the results?
... View more
06-15-2022
08:35 AM
1 Kudo
Hi @haze5736, You need to use Hive Warehouse Connector (HWC) software to query Apache Hive managed tables from Apache Spark. Using HWC API you can read and write Apache Hive tables from Apache Spark. For example, to write the managed table. df.write.format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", &tableName>).option("partition", <partition_spec>).save() Ref: https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/integrating-hive-and-bi/topics/hive-read-write-operations.html For more details you can refer the below documentation: https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/integrating-hive-and-bi/topics/hive_hivewarehouseconnector_for_handling_apache_spark_data.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/integrating-hive-and-bi/topics/hive_submit_a_hivewarehouseconnector_python.html
... View more
06-06-2022
01:49 PM
@QiDam Please can you check the Ozone service installed in your CDP Base Cluster and also ensure the Ozone service is up and running? After Ozone service is set up and functional, enable CDE Service afresh and remove the failed CDE service. Also, can you check all PODs are up and running?
... View more
04-10-2021
12:53 AM
1 Kudo
Hi @ryu, then you might need to build some customize in-house monitoring scripts using Yarn APIs or other tools like Prometheus or Grafana for your use case. Please also refer to the below links for more insights https://www.programmersought.com/article/61565532790/ http://rokroskar.github.io/monitoring-spark-on-hadoop-with-prometheus-and-grafana.html https://www.linkedin.com/pulse/how-monitor-yarn-application-via-restful-api-wayne-zhu/
... View more
02-23-2021
01:44 AM
Thanks @adrijand for sharing your updates, it's highly appreciated.
... View more
02-09-2021
04:57 AM
Hi @joyabrata I think you are looking in the Data Lake tab which is a different one, you can go to the Summary tab, then scroll down to FreeIPA session then click Actions and get Get FreeIPA Certificate from the drop-down menu. Hope this will help you.
... View more