1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1997 | 04-03-2024 06:39 AM | |
| 3164 | 01-12-2024 08:19 AM | |
| 1723 | 12-07-2023 01:49 PM | |
| 2502 | 08-02-2023 07:30 AM | |
| 3506 | 03-29-2023 01:22 PM |
09-07-2016
10:52 PM
https://community.hortonworks.com/questions/10868/best-practices-for-storm-deployment-on-a-hadoop-cl.html https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.html
... View more
09-05-2016
10:52 PM
11 Kudos
JSON Batch to Single Row Phoenix I grabbed open data on Crime from Philly's Open Data (https://www.opendataphilly.org/dataset/crime-incidents), after a free sign up you get access to JSON crime data (https://data.phila.gov/resource/sspu-uyfa.json) You can grab individual dates or ranges for thousands of records. I wanted to spool each JSON record as a separate HBase row. With the flexibility of Apache NiFi 1.0.0, I can specify run times via cron or other familiar setup. This is my master flow. First I use GetHTTP to retrieve the SSL JSON messages, I split the records up and store them as RAW JSON in HDFS as well as send some of them via Email, format them for Phoenix SQL and store them in Phoenix/HBase. All with no coding and in a simple flow. For extra output, I can send them to Reimann server for monitoring. Setting up SSL for accessing HTTPS data like Philly Crime, require a little configuration and knowing what Java JRE you are using to run NiFi. You can run service nifi status to quickly get which JRE. Split the Records The Open Data set has many rows of data, let's split them up and pull out the attributes we want from the JSON. Phoenix Another part that requires specific formatting is setting up the Phoenix connection. Make sure you point to the correct driver and if you have security make sure that is set. Load the Data (Upsert) Once your data is loaded you can check quickly with /usr/hdp/current/phoenix-client/bin/sqlline.py localhost:2181:/hbase-unsecure The SQL for this data set is pretty straight forward. CREATE TABLE phillycrime (dc_dist varchar,
dc_key varchar not null primary key,dispatch_date varchar,dispatch_date_time varchar,dispatch_time varchar,hour varchar,location_block varchar,psa varchar,
text_general_code varchar,ucr_general varchar);
{"dc_dist":"18","dc_key":"200918067518","dispatch_date":"2009-10-02","dispatch_date_time":"2009-10-02T14:24:00.000","dispatch_time":"14:24:00","hour":"14","location_block":"S 38TH ST / MARKETUT ST","psa":"3","text_general_code":"Other Assaults","ucr_general":"800"}
upsert into phillycrime values ('18', '200918067518', '2009-10-02','2009-10-02T14:24:00.000','14:24:00','14', 'S 38TH ST / MARKETUT ST','3','Other Assaults','800');
!tables
!describe phillycrime The DC_KEY is unique so I used that as the Phoenix key. Now all the data I get will be added and any repeats will safely update. Sometimes during the data we may reget some of the same data, that's okay, it will just update to the same value.
... View more
Labels:
09-02-2016
02:57 AM
In Apache NiFi 1.0.0 this is fixed
... View more
09-01-2016
02:41 PM
aused by: java.lang.IllegalArgumentException: No enum constant org.wali.UpdateType.
at java.lang.Enum.valueOf(Enum.java:238) ~[na:1.8.0_91]
at org.wali.UpdateType.valueOf(UpdateType.java:24) ~[nifi-write-ahead-log-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.controller.state.StateMapSerDe.deserializeRecord(StateMapSerDe.java:76) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.controller.state.StateMapSerDe.deserializeEdit(StateMapSerDe.java:69) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.controller.state.StateMapSerDe.deserializeEdit(StateMapSerDe.java:30) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.wali.MinimalLockingWriteAheadLog$Partition.recoverNextTransaction(MinimalLockingWriteAheadLog.java:1028) ~[nifi-write-ahead-log-1.0.0-BETA.jar:1.0.0-BETA]
at org.wali.MinimalLockingWriteAheadLog.recoverFromEdits(MinimalLockingWriteAheadLog.java:448) ~[nifi-write-ahead-log-1.0.0-BETA.jar:1.0.0-BETA]
at org.wali.MinimalLockingWriteAheadLog.recoverRecords(MinimalLockingWriteAheadLog.java:293) ~[nifi-write-ahead-log-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.controller.state.providers.local.WriteAheadLocalStateProvider.init(WriteAheadLocalStateProvider.java:99) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.controller.state.providers.AbstractStateProvider.initialize(AbstractStateProvider.java:34) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.controller.state.manager.StandardStateManagerProvider.createStateProvider(StandardStateManagerProvider.java:189) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.controller.state.manager.StandardStateManagerProvider.createLocalStateProvider(StandardStateManagerProvider.java:81) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.controller.state.manager.StandardStateManagerProvider.create(StandardStateManagerProvider.java:67) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.controller.FlowController.<init>(FlowController.java:470) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.controller.FlowController.createStandaloneInstance(FlowController.java:381) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.apache.nifi.spring.FlowControllerFactoryBean.getObject(FlowControllerFactoryBean.java:74) ~[nifi-framework-core-1.0.0-BETA.jar:1.0.0-BETA]
at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:168) ~[spring-beans-4.2.4.RELEASE.jar:4.2.4.RELEASE]
... 36 common frames omitted
2016-09-01 14:46:23,006 INFO [main] /nifi-content-viewer No Spring WebApplicationInitializer types detected on classpath
2016-09-01 14:46:23,024 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext@3f465ae{/nifi-content-viewer,file:///opt/nifi-1.0.0-BETA/work/jetty/nifi-web-content-viewer-1.0.0-BETA.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.0.0-BETA.nar-unpacked/META-INF/bundled-dependencies/nifi-web-content-viewer-1.0.0-BETA.war}
2016-09-01 14:46:23,025 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.s.h.ContextHandler@55731ccd{/nifi-docs,null,AVAILABLE}
2016-09-01 14:46:23,070 INFO [main] /nifi-docs No Spring WebApplicationInitializer types detected on classpath
2016-09-01 14:46:23,089 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext@750adad8{/nifi-docs,file:///opt/nifi-1.0.0-BETA/work/jetty/nifi-web-docs-1.0.0-BETA.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.0.0-BETA.nar-unpacked/META-INF/bundled-dependencies/nifi-web-docs-1.0.0-BETA.war}
2016-09-01 14:46:23,146 INFO [main] / No Spring WebApplicationInitializer types detected on classpath
2016-09-01 14:46:23,148 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext@3fed0c04{/,file:///opt/nifi-1.0.0-BETA/work/jetty/nifi-web-error-1.0.0-BETA.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.0.0-BETA.nar-unpacked/META-INF/bundled-dependencies/nifi-web-error-1.0.0-BETA.war}
2016-09-01 14:46:23,186 INFO [main] o.eclipse.jetty.server.AbstractConnector Started ServerConnector@48224381{HTTP/1.1,[http/1.1]}{0.0.0.0:8090}
2016-09-01 14:46:23,187 INFO [main] org.eclipse.jetty.server.Server Started @56596ms
2016-09-01 14:46:23,190 WARN [main] org.apache.nifi.web.server.JettyServer Failed to start web server... shutting down.
org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller.
at org.apache.nifi.web.contextlistener.ApplicationStartupContextListener.contextInitialized(ApplicationStartupContextListener.java:93) ~[na:na]
at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:837) ~[jetty-server-9.3.9.v20160517.jar:9.3.9.v20160517]
at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:533) ~[jetty-servlet-9.3.9.v20160517.jar:9.3.9.v20160517]
at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:810) ~[jetty-server-9.3.9.v20160517.jar:9.3.9.v20160517]
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:345) ~[jetty-servlet-9.3.9.v20160517.jar:9.3.9.v2016051
... View more
Labels:
- Labels:
-
Apache NiFi
08-30-2016
12:07 AM
try double quotes https://docs.mongodb.com/manual/reference/mongodb-extended-json/
... View more
08-29-2016
01:41 PM
I have a Spring JDBC application that can connect to Phoenix if I run it from the Sandbox that has Phoenix running (all the ports are open - I think.) But if I run it from my PC, it doesn't connect. 2016-08-29 09:26:00.169 INFO 79113 --- [ main] o.a.h.hbase.client.RpcRetryingCaller : Call exception, tries=22, retries=35, started=289781 ms ago, cancelled=false, msg=
2016-08-29 09:26:20.359 INFO 79113 --- [ main] o.a.h.hbase.client.RpcRetryingCaller : Call exception, tries=23, retries=35, started=309971 ms ago, cancelled=false, msg=
2016-08-29 09:26:40.532 INFO 79113 --- [ main] o.a.h.hbase.client.RpcRetryingCaller : Call exception, tries=24, retries=35, started=330144 ms ago, cancelled=false, msg=
2016-08-29 09:27:00.672 INFO 79113 --- [ main] o.a.h.hbase.client.RpcRetryingCaller : Call exception, tries=25, retries=35, started=350284 ms ago, cancelled=false, msg=
2016-08-29 09:27:20.734 INFO 79113 --- [ main] o.a.h.hbase.client.RpcRetryingCaller : Call exception, tries=26, retries=35, started=370346 ms ago, cancelled=false, msg= Any ideas? I have port 2181 and 16010 and 16020 and 16030 open.
... View more
Labels:
- Labels:
-
Apache Phoenix
08-28-2016
07:20 PM
are there firewall issues? can you connect to hive from other apps on that machine. that doesn't look like a valid thrift port? is that port open? is that real name hive1.wdp? is thrift running?
... View more
08-27-2016
06:45 PM
3 Kudos
I have written a small Java 8 Spring Boot application to view Twitter tweets that I have ingested with NiFi and stored as JSON files in HDFS. I have an external Hive table on top of those, from those raw tweets I ran I Spark Scala job that added Stanford CoreNLP Sentiment and saved it to an ORC Hive Table. That is the table I am querying in my Spring Boot visualization program. To show something in a simple AngularJS HTML5 page, I have also queried that microservice which has a method for calling Spring Social Twitter to get live tweets. For this you will need JDK 1.8 and Maven installed on your machine or VM. I used Eclipse as my IDE, but I usually use IntelliJ, either will work fine. Java Bean I have a few of the fields specified, this is to put our Hive data into and transport to AngularJS as JSON serialized. public class Twitter2 implements Serializable {
private static final long serialVersionUID = 7409772495079484269L;
private String geo;
private String unixtime;
private String handle;
private String location;
private String tag;
private String tweet_id; .... }
Core Spring Boot App package com.dataflowdeveloper;
import javax.sql.DataSource;
import org.apache.commons.dbcp.BasicDataSource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Profile;
import org.springframework.social.twitter.api.Twitter;
import org.springframework.social.twitter.api.impl.TwitterTemplate;
@Configuration
@ComponentScan
@EnableAutoConfiguration
@SpringBootApplication
public class HiveApplication {
public static void main(String[] args) {
SpringApplication.run(HiveApplication.class, args);
}
@Configuration
@Profile("default")
static class LocalConfiguration {
Logger logger = LoggerFactory.getLogger(LocalConfiguration.class);
@Value("${consumerkey}")
private String consumerKey;
@Value("${consumersecret}")
private String consumerSecret;
@Value("${accesstoken}")
private String accessToken;
@Value("${accesstokensecret}")
private String accessTokenSecret;
@Bean
public Twitter twitter() {
Twitter twitter = null;
try {
twitter = new TwitterTemplate(consumerKey, consumerSecret, accessToken, accessTokenSecret);
} catch (Exception e) {
logger.error("Error:", e);
}
return twitter;
}
@Value("${hiveuri}")
private String databaseUri;
@Value("${hivepassword}")
private String password;
@Value("${hiveusername}")
private String username;
@Bean
public DataSource dataSource() {
BasicDataSource dataSource = new BasicDataSource();
dataSource.setUrl(databaseUri);
dataSource.setDriverClassName("org.apache.hive.jdbc.HiveDriver");
dataSource.setUsername(username);
dataSource.setPassword(password);
logger.error("Initialized Hive");
return dataSource;
}
}
} Rest Controller This is a Spring Boot class annotated with @RestController. A pretty simple query that can be called from curl or any REST client like AngularJS via $http({method: 'GET', url: '/query/' + $query).success(function(data) {$scope.tweetlist = data; // response data}); ...
@RequestMapping("/query/{query}")
public List<Twitter2> query(@PathVariable(value="query") String query)
{
return dataSourceService.search(query);
}
Datasource Service Just regular plain old JDBC. package com.dataflowdeveloper;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.List;
import javax.sql.DataSource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
@Component("DataSourceService")
public class DataSourceService {
Logger logger = LoggerFactory.getLogger(DataSourceService.class);
@Autowired
public DataSource dataSource;
public Twitter2 defaultValue() {
return new Twitter2();
}
@Value("${querylimit}")
private String querylimit;
public List<Twitter2> search(String query) {
..
}
} application.properties Under src/main/resources I have a properties file (could be YAML or properties style) with a few name/value pairs like hivepassword=secretstuff. Maven Build Script I had some issues with Spring Boot, Hadoop and Hive having multiple copies of log4j, so see my POM exclusions to prevent build issues. <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.dataflowdeveloper</groupId>
<artifactId>hive</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>hive</name>
<description>Apache Hive Spring Boot</description>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.4.0.RELEASE</version>
<relativePath /> <!-- lookup parent from repository -->
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jetty</artifactId>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty.aggregate</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.1</version>
<type>jar</type>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
<type>jar</type>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-social-twitter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-jdbc-core</artifactId>
<version>1.2.1.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
To Build mvn package -DskipTests To Run I have included Jetty in my POM, so the server runs with Jetty. java -Xms512m -Xmx512m -Dhdp.version=2.4.0.0-169 -Djava.net.preferIPv4Stack=true -jar target/hive-0.0.1-SNAPSHOT.jar IPv4 Stack is required in some networking environments and I set HDP version to my current Sandbox version I am calling. If you are using the sandbox make sure the Thrift port is open and available. You may need more RAM depending on what you are doing. A few gigabytes wouldn't hurt if you have it. [INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.982 s
[INFO] Finished at: 2016-08-26T16:35:56-04:00
[INFO] Final Memory: 28M/447M
[INFO] ------------------------------------------------------------------------ Spring Boot let's you set an ASCII art banner as seen above with src/main/resources/banner.txt. You can see I set the port to 9999 as to not collide with Ambari or other HDP services. 08-26 17:11:28.721 INFO 38783 --- [tp1841396611-12] org.apache.hive.jdbc.Utils : Supplied authorities: localhost:10000
2016-08-26 17:11:28.722 INFO 38783 --- [tp1841396611-12] org.apache.hive.jdbc.Utils : Resolved authority: localhost:10000
2016-08-26 17:11:28.722 INFO 38783 --- [tp1841396611-12] org.apache.hive.jdbc.HiveConnection : Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default
2016-08-26 17:12:24.768 ERROR 38783 --- [tp1841396611-12] com.dataflowdeveloper.DataSourceService : Size=1
2016-08-26 17:12:24.768 ERROR 38783 --- [tp1841396611-12] com.dataflowdeveloper.DataController : Query:hadoop,IP:127.0.0.1 Browser:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 URLs Made Available By Application http://localhost:9999/timeline/<twitter handle> http://localhost:9999/profile/<twitter handle> http://localhost:9999/query/<hive query text> http://localhost:9999/?query=hadoop References https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC
... View more
Labels: