Created on 03-31-2021 10:42 AM - edited 03-31-2021 10:44 AM
I am trying to build a new maven spark project but I cannot seem to pull the spark maven dependencies. I keep getting the following error:
Could not resolve dependencies for project za.co.sanlam.custhub.spark:party-matching-rules:jar:1.0.0-SNAPSHOT: Failed to collect dependencies at org.apache.spark:spark-core_2.11:jar:2.4.0.7.1.4.9-1 -> org.apache.avro:avro:jar:1.8.2.7.1.4.9-1: Failed to read artifact descriptor for org.apache.avro:avro:jar:1.8.2.7.1.4.9-1: Could not transfer artifact org.apache.avro:avro:pom:1.8.2.7.1.4.9-1 from/to cloudera (https://repository.cloudera.com/artifactory/cloudera-repos/): Transfer failed for https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/avro/avro/1.8.2.7.1.4.9-1/avro-1.8.2.7.1.4.9-1.pom ProxyInfo{host='10.0.0.132', userName='null', port=8080, type='http', nonProxyHosts='null'}: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target ->
This happens if I do a mvn clean compile or any maven command as it fails downloading the dependencies
In my pom I have: (nothing fancy)
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>za.co.my.project</groupId>
<artifactId>My-Project</artifactId>
<version>1.0.${revision}</version>
<name>My-Project</name>
<url>http://party-matching-rules</url>
<properties>
<revision>0-SNAPSHOT</revision> <!--Default if not given via ci-->
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<java.version>1.8</java.version>
<scala.version>2.11</scala.version>
<spark.version>2.4.0.7.1.4.9-1</spark.version>
<spark.scope>provided</spark.scope>
<hwc.version>1.0.0.7.1.4.9-11</hwc.version>
<maven-compiler-plugin.version>3.8.0</maven-compiler-plugin.version>
<maven-shade-plugin.version>3.2.3</maven-shade-plugin.version>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
<scope>${spark.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
<scope>${spark.scope}</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>3.1.3000.7.1.4.9-1</version>
<scope>${spark.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
<scope>${spark.scope}</scope>
</dependency>
<dependency>
<groupId>com.hortonworks.hive</groupId>
<artifactId>hive-warehouse-connector_2.11</artifactId>
<version>1.0.0.7.1.4.9-1</version>
<scope>${spark.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>${spark.version}</version>
<scope>${spark.scope}</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${maven-compiler-plugin.version}</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugins>
</build>
<repositories>
<repository>
<id>cloudera</id>
<name>Cloudera public repo</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<repository>
<id>central</id>
<name>Maven Plugin Repository</name>
<url>https://repo1.maven.org/maven2</url>
<layout>default</layout>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>central</id>
<name>Central Repository</name>
<url>https://repo.maven.apache.org/maven2</url>
<layout>default</layout>
</pluginRepository>
</pluginRepositories>
</project>
I have read this means that java doesn't trust the SSL certificate from https://repository.cloudera.com.
I am not quite sure how to proceed? I cant get past this point
Any help would be appreciated
Created 04-15-2021 06:35 AM
For anybody facing this issue, I found that it was the corporate firewall/proxy blocking access to the Cloudera repository via maven. The solution is to add the cloudera repository to a corporate nexus or artifactory proxy group as it likely one is already used and it's trusted by maven. You must remove the cloudera repository reference from your pom and from the settings.xml or it still fails.
Created 04-08-2021 09:18 AM
Are you able to download any dependencies from maven central? If no please remove cloudera library and try open source apache library. It will confirm whether cloudera repo is the issue or not.
Created 04-12-2021 08:20 AM
I have done a test and pulling the normal spark dependencies from maven central works fine
So this works:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>XXXXX</groupId>
<artifactId>XXXXXXXX</artifactId>
<version>1.0.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<java.version>1.8</java.version>
<scala.version>2.12</scala.version>
<spark.version>3.0.0</spark.version>
<hwc.version>1.0.0.7.1.4.9-11</hwc.version>
<maven-compiler-plugin.version>3.8.0</maven-compiler-plugin.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
<repositories>
<repository>
<id>central</id>
<name>Maven Plugin Repository</name>
<url>https://repo1.maven.org/maven2</url>
<layout>default</layout>
<snapshots>
<enabled>false</enabled>
</snapshots>
<releases>
<updatePolicy>never</updatePolicy>
</releases>
</repository>
</repositories>
<!-- Build -->
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${maven-compiler-plugin.version}</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
But this gives an error:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>XXXXX</groupId>
<artifactId>XXXXXXXX</artifactId>
<version>1.0.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<java.version>1.8</java.version>
<scala.version>2.11</scala.version>
<spark.version>2.4.0.7.1.4.9-1</spark.version>
<spark.scope>provided</spark.scope>
<hwc.version>1.0.0.7.1.4.9-11</hwc.version>
<maven-compiler-plugin.version>3.8.0</maven-compiler-plugin.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
<scope>${spark.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
<scope>${spark.scope}</scope>
</dependency>
<!--We are using HIVE 3.1.3-->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>3.1.3000.7.1.4.9-1</version>
<scope>${spark.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
<scope>${spark.scope}</scope>
</dependency>
<dependency>
<groupId>com.hortonworks.hive</groupId>
<artifactId>hive-warehouse-connector_2.11</artifactId>
<version>1.0.0.7.1.4.9-1</version>
<scope>${spark.scope}</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
<repositories>
<repository>
<id>cloudera</id>
<name>Cloudera public repo</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<repository>
<id>central</id>
<name>Maven Plugin Repository</name>
<url>https://repo1.maven.org/maven2</url>
<layout>default</layout>
</repository>
</repositories>
<!-- Build -->
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${maven-compiler-plugin.version}</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
The error again is:
[INFO] Building XXXXXXXX 1.0.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
Downloading from internal-repository: https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.11/2.4.0.7.1.4.9-1/spark-core_2.11-2.4.0.7.1.4.9-1.pom
Downloading from internal-repository: https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.11/2.4.0.7.1.4.9-1/spark-sql_2.11-2.4.0.7.1.4.9-1.pom
Downloading from internal-repository: https://repo1.maven.org/maven2/org/apache/hive/hive-jdbc/3.1.3000.7.1.4.9-1/hive-jdbc-3.1.3000.7.1.4.9-1.pom
Downloading from internal-repository: https://repo1.maven.org/maven2/org/apache/spark/spark-hive_2.11/2.4.0.7.1.4.9-1/spark-hive_2.11-2.4.0.7.1.4.9-1.pom
Downloading from internal-repository: https://repo1.maven.org/maven2/com/hortonworks/hive/hive-warehouse-connector_2.11/1.0.0.7.1.4.9-1/hive-warehouse-connector_2.11-1.0.0.7.1.4.9-1.pom
[INFO] ------------------------------------------------------------------------
/repo1.maven.org/maven2): Transfer failed for https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.11/2.4.0.7.1.4.9-1/spark-core_2.11-2.4.0.7.1.4.9-1.pom ProxyInfo{host='proxysouth.mud.internal.co.za', userName='null', port=8080, type='http', nonProxyHosts='null'}: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
Created 04-15-2021 06:35 AM
For anybody facing this issue, I found that it was the corporate firewall/proxy blocking access to the Cloudera repository via maven. The solution is to add the cloudera repository to a corporate nexus or artifactory proxy group as it likely one is already used and it's trusted by maven. You must remove the cloudera repository reference from your pom and from the settings.xml or it still fails.
Created 04-15-2021 06:54 AM
Mostly your company firewall is blocking the cloudera/central maven repository to download the repositories.