Created 12-27-2023 10:53 PM
at first, i want to reuse my model in HDFS, but it reported something wrong:
it seems to be because the spark version is not set up correctly, I read the source code and found that spark calls sc.version when it stores the model. However, embarrassingly, the spark2.4.0 that comes with CDH6.3.2 that I use produces the following result: SPARK_VERSION is empty.
I need to use CDH spark to save and read the model, what should I do in this case, thanks very much!
Created 01-02-2024 03:38 AM
Thanks @Chandler641 Your issue is resolved after building the spark code properly.
Note: We will not support Upstream Spark installation in our cloudera cluster because we are done lot of customisation in cloudera spark to support multiple integration components.
Please let me know if you have further concerns on this issue.
Created 12-28-2023 07:21 AM
@Chandler641 Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Spark experts @Bharati @jagadeesan who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 12-28-2023 07:31 AM
Hi @Chandler641
You're correct that Spark 2.4.0 is the version compatible with CDH 6.3.2. Have you checked spark setup is done properly. Could you also share how you have downloaded and installed CDH 6.3.2 cluster because CDH/HDP cluster support is stopped.
References:
https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11/2.4.0-cdh6.3.2
Created 12-28-2023 05:43 PM
Thanks for reply! I installed my CDH6.3.2 in the form of a native installation package that was given to me by my predecessors. I refer to the online installation tutorial to install the configuration and have used it for a long time and found that it works very well. However, I recently discovered a problem with missing version information. Meanwhile, I found that this package is no longer accessible according to the reference link you provided.
Created 12-29-2023 06:35 PM
After several days of trying, I studied the source code and found that the package object named package in the spark-core module read the properties file and set SPARK_VERSION. I wrote it as a static version instead of obtaining it through the properties file. Then, I recompiled the spark-core module and replaced it. Finally, it can work!
I know this is just a temporary solution to this problem, and it may be caused by more serious and underlying errors, such as problems with scala dependency packages. I believe this is because there is an exception thrown when the properties file cannot be obtained in the source code.But in reality, the exception does not exist and the properties file is a real existence. I hope everyone has better insights to share with me. Thanks again for paying attention to this issue! Good luck!
Created 01-02-2024 03:38 AM
Thanks @Chandler641 Your issue is resolved after building the spark code properly.
Note: We will not support Upstream Spark installation in our cloudera cluster because we are done lot of customisation in cloudera spark to support multiple integration components.
Please let me know if you have further concerns on this issue.