Created on 08-27-2016 06:45 PM - edited 08-17-2019 10:21 AM
I have written a small Java 8 Spring Boot application to view Twitter tweets that I have ingested with NiFi and stored as JSON files in HDFS. I have an external Hive table on top of those, from those raw tweets I ran I Spark Scala job that added Stanford CoreNLP Sentiment and saved it to an ORC Hive Table. That is the table I am querying in my Spring Boot visualization program. To show something in a simple AngularJS HTML5 page, I have also queried that microservice which has a method for calling Spring Social Twitter to get live tweets. For this you will need JDK 1.8 and Maven installed on your machine or VM. I used Eclipse as my IDE, but I usually use IntelliJ, either will work fine.
Java Bean
I have a few of the fields specified, this is to put our Hive data into and transport to AngularJS as JSON serialized.
public class Twitter2 implements Serializable {
private static final long serialVersionUID = 7409772495079484269L;
private String geo;
private String unixtime;
private String handle;
private String location;
private String tag;
private String tweet_id; .... }
Core Spring Boot App
package com.dataflowdeveloper;
import javax.sql.DataSource;
import org.apache.commons.dbcp.BasicDataSource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Profile;
import org.springframework.social.twitter.api.Twitter;
import org.springframework.social.twitter.api.impl.TwitterTemplate;
@Configuration
@ComponentScan
@EnableAutoConfiguration
@SpringBootApplication
public class HiveApplication {
public static void main(String[] args) {
SpringApplication.run(HiveApplication.class, args);
}
@Configuration
@Profile("default")
static class LocalConfiguration {
Logger logger = LoggerFactory.getLogger(LocalConfiguration.class);
@Value("${consumerkey}")
private String consumerKey;
@Value("${consumersecret}")
private String consumerSecret;
@Value("${accesstoken}")
private String accessToken;
@Value("${accesstokensecret}")
private String accessTokenSecret;
@Bean
public Twitter twitter() {
Twitter twitter = null;
try {
twitter = new TwitterTemplate(consumerKey, consumerSecret, accessToken, accessTokenSecret);
} catch (Exception e) {
logger.error("Error:", e);
}
return twitter;
}
@Value("${hiveuri}")
private String databaseUri;
@Value("${hivepassword}")
private String password;
@Value("${hiveusername}")
private String username;
@Bean
public DataSource dataSource() {
BasicDataSource dataSource = new BasicDataSource();
dataSource.setUrl(databaseUri);
dataSource.setDriverClassName("org.apache.hive.jdbc.HiveDriver");
dataSource.setUsername(username);
dataSource.setPassword(password);
logger.error("Initialized Hive");
return dataSource;
}
}
}Rest Controller
This is a Spring Boot class annotated with @RestController. A pretty simple query that can be called from curl or any REST client like AngularJS via $http({method: 'GET', url: '/query/' + $query).success(function(data) {$scope.tweetlist = data; // response data});
...
@RequestMapping("/query/{query}")
public List<Twitter2> query(@PathVariable(value="query") String query)
{
return dataSourceService.search(query);
}
Datasource Service
Just regular plain old JDBC.
package com.dataflowdeveloper;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.List;
import javax.sql.DataSource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
@Component("DataSourceService")
public class DataSourceService {
Logger logger = LoggerFactory.getLogger(DataSourceService.class);
@Autowired
public DataSource dataSource;
public Twitter2 defaultValue() {
return new Twitter2();
}
@Value("${querylimit}")
private String querylimit;
public List<Twitter2> search(String query) {
..
}
}application.properties
Under src/main/resources I have a properties file (could be YAML or properties style) with a few name/value pairs like hivepassword=secretstuff.
Maven Build Script
I had some issues with Spring Boot, Hadoop and Hive having multiple copies of log4j, so see my POM exclusions to prevent build issues.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.dataflowdeveloper</groupId>
<artifactId>hive</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>hive</name>
<description>Apache Hive Spring Boot</description>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.4.0.RELEASE</version>
<relativePath /> <!-- lookup parent from repository -->
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jetty</artifactId>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty.aggregate</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.1</version>
<type>jar</type>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
<type>jar</type>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-social-twitter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-jdbc-core</artifactId>
<version>1.2.1.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
To Build
mvn package -DskipTests
To Run
I have included Jetty in my POM, so the server runs with Jetty.
java -Xms512m -Xmx512m -Dhdp.version=2.4.0.0-169 -Djava.net.preferIPv4Stack=true -jar target/hive-0.0.1-SNAPSHOT.jar
IPv4 Stack is required in some networking environments and I set HDP version to my current Sandbox version I am calling. If you are using the sandbox make sure the Thrift port is open and available. You may need more RAM depending on what you are doing. A few gigabytes wouldn't hurt if you have it.
[INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 2.982 s [INFO] Finished at: 2016-08-26T16:35:56-04:00 [INFO] Final Memory: 28M/447M [INFO] ------------------------------------------------------------------------
Spring Boot let's you set an ASCII art banner as seen above with src/main/resources/banner.txt. You can see I set the port to 9999 as to not collide with Ambari or other HDP services.
08-26 17:11:28.721 INFO 38783 --- [tp1841396611-12] org.apache.hive.jdbc.Utils : Supplied authorities: localhost:10000 2016-08-26 17:11:28.722 INFO 38783 --- [tp1841396611-12] org.apache.hive.jdbc.Utils : Resolved authority: localhost:10000 2016-08-26 17:11:28.722 INFO 38783 --- [tp1841396611-12] org.apache.hive.jdbc.HiveConnection : Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default 2016-08-26 17:12:24.768 ERROR 38783 --- [tp1841396611-12] com.dataflowdeveloper.DataSourceService : Size=1 2016-08-26 17:12:24.768 ERROR 38783 --- [tp1841396611-12] com.dataflowdeveloper.DataController : Query:hadoop,IP:127.0.0.1 Browser:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
URLs Made Available By Application
http://localhost:9999/timeline/<twitter handle>
http://localhost:9999/profile/<twitter handle>
http://localhost:9999/query/<hive query text>
http://localhost:9999/?query=hadoop
References
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC
Created on 02-28-2017 05:27 PM
Is it possible that input can be from search bar on html page of a website that query into HDFS with above micro-service?
What I got from the above micro-service implementation is that it provides maven based CLI to query into HDFS database.
Correct me, if i am wrong. I am beginner in HADOOP.
Thanks & Regards
Vikram Pal
Created on 11-22-2017 03:39 PM
Sure, can be from anywhere you want for REST. GET or POST.