Created on 07-06-2016 04:24 PM - edited 09-16-2022 01:35 AM
Developing solutions with Hadoop commonly requires the use of multiple different HDP component libraries. Whether you’re building solutions with Pig, Spark, Cascading, or HBase, at some point extensions will need to be created, and those artifacts for each component will need to be used.
This guide serves as an overview of where to find those artifacts and how to get them quickly integrated with your preferred build tool, and IDE.
At Hortonworks, we store all of our artifacts in a public Sonatype Nexus repository. That repository can be easily accessed and searched for commonly used library, source code, and javadoc archives simply by navigating to http://repo.hortonworks.com.
Jar files containing compiled classes, source, and javadocs are all available in our public repository, and finding the right artifact with right version is as easy as searching the repository for classes you need to resolve.
For example, If creating a solution that requires the use of a class such as org.apache.hadoop.fs.FileSystem, you can simply search our public repository for the artifact that contains that class using the search capabilities available through http://repo.hortonworks.com. Searching for that class will locate the hadoop-common artifact that is part of the org.apache.hadoop group. There will be multiple artifacts each with a different version.
Artifacts in our repository use a 7 digit version scheme. So if we’re looking at the 2.7.1.2.3.2.0-2650 version of this artifact:
As you’re looking for the right artifact, it’s important to use the artifact version that corresponds to the HDP version you plan to deploy to. You can determine this by using hdp-select versions from the command line, or using Ambari by navigating to Admin > Stack and Versions. If neither of these are available in your version of HDP or Ambari, you can use yum, zypper, or dpkg to query the RPM or Debian packages installed for HDP and note their versions.
Once the right artifact has been found with the version that corresponds to your target HDP environment, it’s time to configure your build tool to both resolve our repository and include the artifact as a dependency. The following section outlines how to do both with commonly used with build tools such as Maven, SBT, and Gradle.
Apache Maven, is an incredibly flexible build tool used by many Hadoop ecosystem projects. In this section we will outline what updates to your project’s pom.xml file are required to start resolving HDP artifacts.
The pom.xml file enables flexible definition of project dependencies and build procedures. To add the Hortonworks repository to your project, allowing HDP artifacts to be resolved, edit the <repositories/> section and add a <repository/> entry as illustrated below:
<repositories> <repository> <id>HDP</id> <name>HDP Releases</name> <url>http://repo.hortonworks.com/content/repositories/releases/</url> </repository> </repositories>
Dependencies are added to Maven using the <dependency/> tag within the <dependencies/> section of the pom.xml. To add a dependency such as hadoop-common, add this fragment:
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.1.2.3.2.0-2650</version> </dependency>
Once both the repository has been added to the <repositories/> section, and the artifacts have been added to the <dependencies/>, a simple mvn compile can be issued from the base directory of your project to ensure that proper syntax has been used and the appropriate dependencies are downloaded.
When using Maven with an IDE, it is often helpful to have the accompanying JavaDoc and source code. To obtain both from our repository for the artifacts that you have defined in your pom.xml, run the following commands from the base directory of your project:
mvn dependency:sources mvn dependency:resolve -Dclassifier=javadoc
The Scala Build Tool is commonly used with Scala based projects, and provide simple configuration, and many flexible options for dependency and build management.
In order for SBT projects to resolve Hortonworks Data Platform dependencies, an additional resolvers entry must be added to your build.sbt file, or equivalent, as illustrated below:
resolvers += "Hortonworks Releases" at "http://repo.hortonworks.com/content/repositories/releases/"
Dependencies can be added to SBT’s libraryDependencies as illustrated below:
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.1.2.3.2.0-2650"
To explicitly ask SBT to also download source code and JavaDocs an alternate notation can be used:
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.1.2.3.2.0-2650" withSources() withJavadoc()
Once both the repository has been added to resolvers, and the artifacts have been added to dependencies, a simple sbt compile can be issued from the base directory of your project to ensure that proper syntax has been used and the appropriate dependencies are downloaded.
The Gradle build management tool is used frequently in Open Source java projects, and provides a simple Groovy-based DSL for project dependency and build definition.
Gradle uses plugins to add functionality to add new task, domain objects and conventions to your gradle build. Add the following plugins to your build.gradle file, or equivalent, as illustrated below:
apply plugin: 'java' apply plugin: 'maven' apply plugin: 'idea' // Pick IDE appropriate for you apply plugin: 'eclipse' // Pick IDE appropriate for you
In order for Gradle projects to resolve Hortonworks Data Platform dependencies, an additional entry must be added to your build.gradle file, or equivalent, as illustrated below:
repositories { maven { url "http://repo.hortonworks.com/content/repositories/releases/" } }
Dependencies can be added to Gradle’s dependencies section as illustrated below:
dependencies { compile group: "org.apache.hadoop", name: "hadoop-common", version: "2.7.1.2.3.2.0-2650" } idea { // Pick IDE appropriate for you module { downloadJavadoc = true downloadSources = true } } eclipse { // Pick IDE appropriate for you classpath { downloadSources = true downloadJavadoc = true } }
Once both the repositories and the dependencies have been added to build file, a simple gradle clean build can be issued from the base directory of your project to ensure that proper syntax has been used and the appropriate dependencies are downloaded.