Member since
09-16-2021
330
Posts
52
Kudos Received
23
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
239 | 11-10-2024 11:19 PM | |
370 | 10-25-2024 05:02 AM | |
1941 | 09-10-2024 07:50 AM | |
697 | 09-04-2024 05:35 AM | |
1553 | 08-28-2024 12:40 AM |
09-10-2024
07:50 AM
2 Kudos
The error you are encountering, java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38" occurs because Spark automatically infers the schema for the Oracle NUMBER type. When the data has a very large precision, such as 35 digits in your case, Spark may overestimate the precision due to how it handles floating-point and decimal values. To explain the issue further: Oracle's NUMBER data type is highly flexible and can store values with a very large precision. However, Spark's Decimal type has a maximum precision of 38, which limits the number of digits it can accurately represent. According to the documentation, Spark's decimal data type can have a precision of up to 38, and the scale can also be up to 38 (but must be less than or equal to the precision). To resolve this issue, you should ensure that your Oracle database does not have values larger than the maximum precision and scale allowed by Spark. You can verify this by running the following query in Oracle: SELECT MAX(LENGTH(large_number)) FROM example_table If the result is greater than 38, you can try using the following query to read the data as a string instead of a decimal data type: SELECT TO_CHAR(large_number) AS large_number FROM example_table. Spark Schema : >>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT TO_CHAR(large_number) as large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>> df.printSchema()
root
|-- LARGE_NUMBER: string (nullable = true)
>>>
>>>
>>>
>>>
>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>>
>>>
>>>
>>> df.printSchema()
root
|-- LARGE_NUMBER: decimal(35,5) (nullable = true)
>>>
... View more
09-10-2024
04:02 AM
1 Kudo
@zhuodongLi Upon reviewing the screenshot, it was observed that the child tasks have failed due to too many output errors . It is recommended to validate the failed attempts and determine if the blamed for read error is caused by the same nodemanager host. Please review the nodemanager logs for the corresponding node during the specific time period. If feasible, consider stopping the nodemanager on the host and then try rerunning the query. Additionally, please follow the instructions provided in the KB to remove the usercache directories from yarn. After completing these steps, re-run the query.
... View more
09-04-2024
05:35 AM
If setting the proper queue name resolves the problem, it is possible that the query may have been submitted in the default queue, where it competes for resources with other queries and fails due to a timeout error
... View more
09-02-2024
10:22 PM
1 Kudo
Since the failure occurred within the Tez job's child tasks, please share the yarn log or complete Stacktrace from one of the failed child task attempts. This will help us identify the root cause of the failure and provide appropriate recommendations to resolve the problem.
... View more
08-29-2024
11:09 PM
1 Kudo
You need to use Hive Warehouse Connector (HWC) to query Hive managed tables from Spark. Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/integrating-hive-and-bi/topics/hive_hivewarehouseconnector_for_handling_apache_spark_data.html
... View more
08-28-2024
01:40 AM
Unfortunately, it is not possible to change the Application-Name of an already started Application Master in Apache Hadoop YARN. The Application-Name is set when the application is submitted and cannot be modified during runtime. The Application-Name is typically specified as a parameter when submitting the application using the spark-submit command or the YARN REST API. Once the application is started, the Application-Name is fixed and cannot be changed. If you need to change the Application-Name, you will need to stop the existing application and submit a new one with the desired name.
... View more
08-28-2024
12:45 AM
When connecting through ODBC, make sure that Ranger is enabled on the connected HiveServer2 (HS2). If possible, validate the configuration with LLAP for further verification.
... View more
08-28-2024
12:40 AM
1 Kudo
When writing to a statically partitioned table using HWC, the following query is internally fired to Hive through JDBC after writing data to a temporary location: Spark write statement: df.write.format(HIVE_WAREHOUSE_CONNECTOR).mode("append").option("partition", "c1='val1',c2='val2'").option("table", "t1").save(); HWC internal query: LOAD DATA INPATH '<spark.datasource.hive.warehouse.load.staging.dir>' [OVERWRITE] INTO TABLE db.t1 PARTITION (c1='val1',c2='val2'); During static partitioning, the partition information is known during compile time, resulting in the creation of a staging directory in the partition directory. On the other hand, when writing to a dynamically partitioned table using HWC, the following query is internally fired to Hive through JDBC after writing data to a temporary location: Spark write statement: df.write.format(HIVE_WAREHOUSE_CONNECTOR).mode("append").option("partition", "c1='val1',c2").option("table", "t1").save(); HWC internal query: CREATE TEMPORARY EXTERNAL TABLE db.job_id_table(cols....) STORED AS ORC LOCATION '<spark.datasource.hive.warehouse.load.staging.dir>';
INSERT INTO TABLE t1 PARTITION (c1='val1',c2) SELECT <cols> FROM db.job_id_table; During dynamic partitioning, the partition information is known during runtime, hence the staging directory is created at the table level. Once the DAG is completed, the MOVE TASK will move the files to the respective partitions.
... View more
08-28-2024
12:07 AM
Based on the INFO logs, it appears that there is an open transaction blocking the compaction cleaner process. This requires a separate investigation, so I advise raising a support case to resolve the problem. Additionally, we need to examine the HMS logs, backend DB dump, and the output of "hdfs dfs -ls -R" command.
... View more
08-23-2024
06:09 AM
1 Kudo
Since the partition related information not mentioned in the write statement staging directory created in the table directory instead of partition directory.
... View more