Member since
06-02-2020
331
Posts
67
Kudos Received
49
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4097 | 07-11-2024 01:55 AM | |
| 11350 | 07-09-2024 11:18 PM | |
| 8539 | 07-09-2024 04:26 AM | |
| 8567 | 07-09-2024 03:38 AM | |
| 7500 | 06-05-2024 02:03 AM |
02-05-2024
08:58 PM
Hi @sonnh The way Spark and Hive handle reading and writing data back to the same table differs. Spark typically clears the target path before writing new data, while Hive writes to a temporary directory first and then replaces the target path with the result data upon task completion. When working with specific file formats like ORC or Parquet and interacting with Hive metastore, consider adjusting these Spark settings as needed: --conf spark.sql.hive.convertMetastoreParquet=false --conf spark.sql.hive.convertMetastoreOrc=false Reference: https://community.cloudera.com/t5/Support-Questions/Insert-overwrite-with-in-the-same-table-in-spark/m-p/242780 https://www.baifachuan.com/posts/da7bb348.html
... View more
02-04-2024
08:05 PM
1 Kudo
Hi @Meepoljd Please let me know still you need any help on this issue. If any of the above solutions is helped then mark Accept as Solution.
... View more
02-04-2024
08:11 AM
Hi @zhuw.bigdata To locate Spark logs, follow these steps: Access the Spark UI: Open the Spark UI in your web browser. Identify Nodes: Navigate to the Executors tab to view information about the driver and executor nodes involved in the Spark application. Determine Log Directory: Within the Spark UI, find the Hadoop settings section and locate the value of the yarn.nodemanager.log-dirs property. This specifies the base directory for Spark logs on the cluster. Access Log Location: Using a terminal or SSH, log in to the relevant node (driver or executor) where the logs you need are located. Navigate to Application Log Directory: Within the yarn.nodemanager.log-dirs directory, access the subdirectory for the specific application using the pattern application_${appid}, where ${appid} is the unique application ID of the Spark job. Find Container Logs: Within the application directory, locate the individual container log directories named container_{$contid}, where ${contid} is the container ID. Review Log Files: Each container directory contains the following log files generated by that container: stderr: Standard error output stdin: Standard input (if applicable) syslog: System-level logs
... View more
02-04-2024
07:58 AM
Hi @sonnh Generally it is not advisable to read and write the same table at a time. It can result in anything between data corruption and complete data loss in case of failure. As a temporary solution, First create a temporary view by reading the table data and later you can use that data and finally save the data to destination table. Reference: https://stackoverflow.com/questions/38746773/read-from-a-hive-table-and-write-back-to-it-using-spark-sql https://issues.apache.org/jira/browse/SPARK-27030
... View more
02-04-2024
07:46 AM
1 Kudo
Hi @Taries I hope you are doing good. Do you need any further help on this issue. If above solutions is helped in your case please accept the Solution. It will help for others.
... View more
02-04-2024
07:44 AM
Hi @yagoaparecidoti Unfortunately Cloudera will not support installing/using the open source Spark because of some customisations needs to be done at Cloudera end support other component integrations.
... View more
01-31-2024
06:57 AM
Hi @Taries As I mentioned previously, only the hbase.spark.query.timerange parameter can be used for filtering data during read. The hbase.spark.scan parameter wouldn't be set for this purpose. To filter the data after reading, you can apply a Spark WHERE or filter clause with your desired conditions.
... View more
01-31-2024
12:34 AM
1 Kudo
Hi @Taries You need to use the following two parameters to apply filter. hbase.spark.query.timerange.start hbase.spark.query.timerange.end Reference: https://github.com/apache/hbase-connectors/blob/307607cf7287084b3ce49cdd96d094e2ede9363a/spark/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/datasources/HBaseSparkConf.scala#L65
... View more
01-22-2024
09:51 PM
Hi @sind65 Could you please clarify few things before proceeding any solutions: 1. Are you using CDP or CDH? 2. How are you submitting the spark job? 3. Are you using kinit or you have specified principal/keytab? 4. Have you followed the following documentation to setup and run the sample example https://community.cloudera.com/t5/Community-Articles/Spark-Ozone-Integration-in-CDP/ta-p/323132
... View more
01-02-2024
03:38 AM
Thanks @Chandler641 Your issue is resolved after building the spark code properly. Note: We will not support Upstream Spark installation in our cloudera cluster because we are done lot of customisation in cloudera spark to support multiple integration components. Please let me know if you have further concerns on this issue.
... View more