1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1997 | 04-03-2024 06:39 AM | |
| 3165 | 01-12-2024 08:19 AM | |
| 1723 | 12-07-2023 01:49 PM | |
| 2503 | 08-02-2023 07:30 AM | |
| 3506 | 03-29-2023 01:22 PM |
09-15-2016
03:15 AM
1 Kudo
do you have any logs? both compiled successfully? you are running both? You need to connect to Zookeeper from the client. val kafkaConf = Map(
"metadata.broker.list" -> "sandbox.hortonworks.com:6667",
"zookeeper.connect" -> "sandbox.hortonworks.com:2181") Make sure you have the ports 2181, 6667, 9092 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_HDP_Reference_Guide/content/kafka-ports.html Were you able to use the built in scripts for console producer and consumer? http://kafka.apache.org/07/quickstart.html What JDK are you using? Where are you running? on Sandbox? Local Machine? It's best to build both projects and run through from the command line. Also make sure the Kafka service is running in the sandbox, it may note be. When running from Java, use: -Djava.net.preferIPv4Stack=true
... View more
09-15-2016
02:15 AM
Flow File: sensor.xml
... View more
09-14-2016
11:54 PM
3 Kudos
If you have seen my article on Microservice on Hive, this is the Phoenix version. Phoenix seems to be a better option for REST microservice. I like having HBase as my main data store, it's very performant and highly scalable for application style queries. See: https://community.hortonworks.com/articles/53629/writing-a-spring-boot-microservices-to-access-hive.html This microservices is a Spring Boot REST service on top of the data loaded by this NiFi data flow. See: https://community.hortonworks.com/content/kbentry/54947/reading-opendata-json-and-storing-into-phoenix-tab.html Pom <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.dataflowdeveloper</groupId>
<artifactId>phoenix</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>phoenix</name>
<description>Apache Hbase Phoenix Spring Boot</description>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.4.0.RELEASE</version>
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.4.0-HBase-1.0</version>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>servlet-api-2.5</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty.aggregate</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>1.4.0.RELEASE</version>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-logging</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jetty</artifactId>
<scope>provided</scope>
<version>1.4.0.RELEASE</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.1</version>
<type>jar</type>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>servlet-api-2.5</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-social-twitter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.7.4</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.7.4</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.7.4</version>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-jdbc-core</artifactId>
<version>1.2.1.RELEASE</version>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
Follow this Maven build very carefully, it is very specific version. src/main/resources/application.properties purl=jdbc:phoenix:serverIP:2181:/hbase-unsecure
pdriver=org.apache.phoenix.jdbc.PhoenixDriver
Java Bean to Hold Philly Crime Data public class PhillyCrime implements Serializable {
private String dcDist;
private String dcKey;
private String dispatchDate;
private String dispatchDateTime;
private String dispatchTime;
private String hour;
private String locationBlock;
private String psa;
private String textGeneralCode;
private String ucrGeneral;
} Application Class to Bootstrap Spring Boot package com.dataflowdeveloper;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Profile;
import org.springframework.social.twitter.api.Twitter;
import org.springframework.social.twitter.api.impl.TwitterTemplate;
@Configuration
@ComponentScan
@EnableAutoConfiguration
@SpringBootApplication
public class HBaseApplication {
public static void main(String[] args) {
SpringApplication.run(HBaseApplication.class, args);
}
@Configuration
@Profile("default")
static class LocalConfiguration {
Logger logger = LoggerFactory.getLogger(LocalConfiguration.class);
@Value("${consumerkey}")
private String consumerKey;
@Value("${consumersecret}")
private String consumerSecret;
@Value("${accesstoken}")
private String accessToken;
@Value("${accesstokensecret}")
private String accessTokenSecret;
@Bean
public Twitter twitter() {
Twitter twitter = null;
try {
twitter = new TwitterTemplate(consumerKey, consumerSecret, accessToken, accessTokenSecret);
} catch (Exception e) {
logger.error("Error:", e);
}
return twitter;
}
@Value("${purl}")
private String databaseUri;
@Bean
public Connection connection() {
Connection con = null;
try {
con = DriverManager.getConnection(databaseUri);
} catch (SQLException e) {
e.printStackTrace();
logger.error("Connection fail: ", e);
}
//dataSource.setDriverClassName("org.apache.phoenix.jdbc.PhoenixDriver");
logger.error("Initialized hbase");
return con;
}
}
}
Phoenix Query Server does not require a Connection Pool! DataSourceService package com.dataflowdeveloper;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.List;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
@Component("DataSourceService")
public class DataSourceService {
Logger logger = LoggerFactory.getLogger(DataSourceService.class);
@Autowired
public Connection connection;
// default to empty
public PhillyCrime defaultValue() {
return new PhillyCrime();
}
// querylimit
@Value("${querylimit}")
private String querylimit;
/**
*
* @param query
* - search msg
* @return List of Twitter2
*/
public List<PhillyCrime> search(String query) {
List<PhillyCrime> crimes = new ArrayList<>();
String sql = "";
try {
logger.error("Query: " + query);
logger.error("Limit:" + querylimit);
if ( connection == null ) {
logger.error("Null connection");
return crimes;
}
if ( query == null || query.trim().length() <= 0 ) {
query = "";
sql = "select * from phillycrime";
}
else {
query = "%" + query.toUpperCase() + "%";
sql = "select * from phillycrime WHERE upper(text_general_code) like ? LIMIT ?";
}
PreparedStatement ps = connection
.prepareStatement(sql);
if ( query.length() > 1 ) {
ps.setString(1, query);
ps.setInt(2, Integer.parseInt(querylimit));
}
ResultSet res = ps.executeQuery();
PhillyCrime crime = null;
while (res.next()) {
crime = new PhillyCrime();
crime.setDcKey(res.getString("dc_key"));
crime.setDcDist(res.getString("dc_dist"));
crime.setDispatchDate(res.getString("dispatch_date"));
crime.setDispatchDateTime(res.getString("dispatch_date_time"));
crime.setDispatchTime(res.getString("dispatch_time"));
crime.setHour(res.getString("hour"));
crime.setLocationBlock(res.getString("location_block"));
crime.setPsa(res.getString("psa"));
crime.setTextGeneralCode(res.getString("text_general_code"));
crime.setUcrGeneral(res.getString("ucr_general"));
crimes.add(crime);
}
res.close();
ps.close();
connection.close();
res = null;
ps = null;
connection = null;
crime = null;
logger.error("Size=" + crimes.size());
} catch (Exception e) {
e.printStackTrace();
logger.error("Error in search", e);
}
return crimes;
}
}
This class does your basic JDBC SQL queries and return. Spring Boot Rest Controller import java.util.List;
import javax.servlet.http.HttpServletRequest;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.context.request.RequestAttributes;
import org.springframework.web.context.request.RequestContextHolder;
import org.springframework.web.context.request.ServletRequestAttributes;
@RestController
public class DataController {Logger logger = LoggerFactory.getLogger(DataController.class);
@Autowired
private DataSourceService dataSourceService;
@RequestMapping("/query/{query}")
public List<PhillyCrime> query(
@PathVariable(value="query") String query)
{
List<PhillyCrime> value = dataSourceService.search(query);
return value;
}
To Build mvn package -DskipTests To Run java -Xms512m -Xmx2048m -Dhdp.version=2.4.0.0-169 -Djava.net.preferIPv4Stack=true -jar target/phoenix-0.0.1-SNAPSHOT.jar
To check Your Phoenix Data From the Command Line (2181 is the zookeeper port) /usr/hdp/current/phoenix-client/bin/sqlline.py server:2181:/hbase
Phoenix Table DDL CREATE TABLE phillycrime (dc_dist varchar,
dc_key varchar not null primary key,dispatch_date varchar,dispatch_date_time varchar,dispatch_time varchar,hour varchar,location_block varchar,psa varchar,
text_general_code varchar,ucr_general varchar); Results 2016-09-14 20:09:25.135 INFO 11937 --- [ main] o.s.j.e.a.AnnotationMBeanExporter : Registering beans for JMX exposure on startup
2016-09-14 20:09:25.150 INFO 11937 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Starting beans in phase 0
2016-09-14 20:09:25.275 INFO 11937 --- [ main] application : Initializing Spring FrameworkServlet 'dispatcherServlet'
2016-09-14 20:09:25.275 INFO 11937 --- [ main] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization started
2016-09-14 20:09:25.294 INFO 11937 --- [ main] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization completed in 19 ms
2016-09-14 20:09:25.322 INFO 11937 --- [ main] org.mortbay.log : Logging to Logger[org.mortbay.log] via org.mortbay.log.Slf4jLog
2016-09-14 20:09:25.333 INFO 11937 --- [ main] o.e.jetty.server.AbstractConnector : Started ServerConnector@7a55af6b{HTTP/1.1,[http/1.1]}{0.0.0.0:9999}
2016-09-14 20:09:25.335 INFO 11937 --- [ main] .s.b.c.e.j.JettyEmbeddedServletContainer : Jetty started on port(s) 9999 (http/1.1)
2016-09-14 20:09:25.339 INFO 11937 --- [ main] com.dataflowdeveloper.HBaseApplication : Started HBaseApplication in 13.783 seconds (JVM running for 14.405)
2016-09-14 20:09:37.961 ERROR 11937 --- [tp1282287470-17] com.dataflowdeveloper.DataSourceService : Query: Theft
2016-09-14 20:09:37.961 ERROR 11937 --- [tp1282287470-17] com.dataflowdeveloper.DataSourceService : Limit:250
2016-09-14 20:09:39.050 ERROR 11937 --- [tp1282287470-17] com.dataflowdeveloper.DataSourceService : Size=250
2016-09-14 20:09:39.050 ERROR 11937 --- [tp1282287470-17] com.dataflowdeveloper.DataController : Query:Theft,IP:127.0.0.1 Browser:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Connection Handshake for App 2016-09-14 20:09:18.042 INFO 11937 --- [ main] o.a.h.h.zookeeper.RecoverableZooKeeper : Process identifier=hconnection-0x4116aac9 connecting to ZooKeeper ensemble=server:2181
2016-09-14 20:09:18.042 INFO 11937 --- [ main] org.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=server:2181 sessionTimeout=90000 watcher=hconnection-0x4116aac90x0, quorum=server:2181, baseZNode=/hbase-unsecure
2016-09-14 20:09:18.044 INFO 11937 --- [1.1.1.1:2181)] org.apache.zookeeper.ClientCnxn : Opening socket connection to server 192.11.1.5:2181. Will not attempt to authenticate using SASL (unknown error)
2016-09-14 20:09:18.163 INFO 11937 --- [1.1.1.1:2181)] org.apache.zookeeper.ClientCnxn : Socket connection established to 192.11.1.:2181, initiating session
2016-09-14 20:09:18.577 INFO 11937 --- [26.195.56:2181)] org.apache.zookeeper.ClientCnxn : Session establishment complete on server :2181, sessionid = 0x157063034991d14, negotiated timeout = 40000
2016-09-14 20:09:19.953 INFO 11937 --- [ main] nectionManager$HConnectionImplementation : Closing master protocol: MasterService
2016-09-14 20:09:19.953 INFO 11937 --- [ main] nectionManager$HConnectionImplementation : Closing zookeeper sessionid=0x157063034991d14
2016-09-14 20:09:20.040 INFO 11937 --- [ main] org.apache.zookeeper.ZooKeeper : Session: 0x157063034991d14 closed
2016-09-14 20:09:20.040 INFO 11937 --- [ain-EventThread] org.apache.zookeeper.ClientCnxn : EventThread shut down
2016-09-14 20:09:20.470 INFO 11937 --- [ main] o.a.p.query.ConnectionQueryServicesImpl : Found quorum: server.com:2181
2016-09-14 20:09:20.471 INFO 11937 --- [ main] o.a.h.h.zookeeper.RecoverableZooKeeper : Process identifier=hconnection-0x36b4fe2a connecting to ZooKeeper ensemble=tspanndev10.field.hortonworks.com:2181
2016-09-14 20:09:20.471 INFO 11937 --- [ main] org.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=tspanndev10.field.hortonworks.com:2181 sessionTimeout=90000 watcher=hconnection-0x36b4fe2a0x0, quorum=tspanndev10.field.hortonworks.com:2181, baseZNode=/hbase-unsecure
2016-09-14 20:09:20.472 INFO 11937 --- [222.2.2.2.2.:2181)] org.apache.zookeeper.ClientCnxn : Opening socket connection to server 1:2181. Will not attempt to authenticate using SASL (unknown error)
2016-09-14 20:09:20.555 INFO 11937 --- [26:2181)] org.apache.zookeeper.ClientCnxn : Socket connection established to 172....:2181, initiating session
2016-09-14 20:09:20.641 INFO 11937 --- [22.2.2:2181)] org.apache.zookeeper.ClientCnxn : Session establishment complete on server 2.2.2/2.2.2:2181, sessionid = 0x157063034991d15, negotiated timeout = 40000
Reference: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_data-access/content/ch_using-phoenix.html https://community.hortonworks.com/articles/19016/connect-to-phoenix-hbase-using-dbvisualizer.html
... View more
Labels:
09-14-2016
11:29 PM
To install a Mosquitto MQTT Server on Centos7 yum -y install unzip Step 1: Add the CentOS 7 mosquitto repository cd /etc/yum.repos.d wget http://download.opensuse.org/repositories/home:/oojah:/mqtt/CentOS_CentOS-7/home:oojah:mqtt.repo sudo yum update Step 2: Install mosquitto & mosquitto-clients sudo yum install -y mosquitto mosquitto-clients Step 3: Run mosquitto sudo su
/usr/sbin/mosquitto -d -c /etc/mosquitto/mosquitto.conf > /var/log/mosquitto.log 2>&1
... View more
09-14-2016
02:59 AM
3 Kudos
Running Spark Jobs Through Apache Beam on HDP 2.5 Yarn Cluster
Using the Spark Runner with Apache Beam Apache Beam is still in incubator and not supported on HDP 2.5 or other platforms. sudo yum -y install git
wget http://www.gtlib.gatech.edu/pub/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz After you get Maven downloaded, move it to /opt/demo/maven or into your path. The maven download mirror will change, so grab a new URL from http://maven.apache.org/. Using Yum will give you an older Maven not supported and may interfere with something else. So I recommend getting a new Maven just for this build. Make sure you have Java 7 or greater, which you should have on an Apache machine. I am recommending Java 8 on your new HDP 2.5 nodes if possible. cd /opt/demo/
git clone https://github.com/apache/incubator-beam
cd incubator-beam
/opt/demo/maven/bin/mvn clean install -DskipTests If you want to run this on Spark 2.0 and not Spark 1.6.2, look here for changing environment: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/spark-choose-version.html For HDP 2.5, these are the parameters: spark-submit --class org.apache.beam.runners.spark.examples.WordCount --master yarn-client target/beam-runners-spark-0.3.0-incubating-SNAPSHOT-spark-app.jar --inputFile=kinglear.txt --output=out --runner=SparkRunner --sparkMaster=yarn-client Note, I had to change the parameters to get this to work in my environment. You may also need to do /opt/demo/maven/bin/mvn package from the /opt/demo/incubator-beam/runners/spark directory. This is running a Java 7 example from the built-in examples: https://github.com/apache/incubator-beam/tree/master/examples/java These are the results of running our small Spark job. 16/09/14 02:35:08 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 34.0 KB, free 518.7 KB) 16/09/14 02:35:08 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.26.195.58:39575 (size: 34.0 KB, free: 511.1 MB) 16/09/14 02:35:08 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1008 16/09/14 02:35:08 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[14] at mapToPair at TransformTranslator.java:568) 16/09/14 02:35:08 INFO YarnScheduler: Adding task set 1.0 with 2 tasks 16/09/14 02:35:08 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, tspanndev13.field.hortonworks.com, partition 0,NODE_LOCAL, 1994 bytes) 16/09/14 02:35:08 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, tspanndev13.field.hortonworks.com, partition 1,NODE_LOCAL, 1994 bytes) 16/09/14 02:35:08 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on tspanndev13.field.hortonworks.com:36438 (size: 34.0 KB, free: 511.1 MB) 16/09/14 02:35:08 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on tspanndev13.field.hortonworks.com:36301 (size: 34.0 KB, free: 511.1 MB) 16/09/14 02:35:08 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to tspanndev13.field.hortonworks.com:52646 16/09/14 02:35:08 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 177 bytes 16/09/14 02:35:08 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to tspanndev13.field.hortonworks.com:52640 16/09/14 02:35:09 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 681 ms on tspanndev13.field.hortonworks.com (1/2) 16/09/14 02:35:09 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 1112 ms on tspanndev13.field.hortonworks.com (2/2) 16/09/14 02:35:09 INFO YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 16/09/14 02:35:09 INFO DAGScheduler: ResultStage 1 (saveAsNewAPIHadoopFile at TransformTranslator.java:745) finished in 1.113 s 16/09/14 02:35:09 INFO DAGScheduler: Job 0 finished: saveAsNewAPIHadoopFile at TransformTranslator.java:745, took 5.422285 s 16/09/14 02:35:09 INFO SparkRunner: Pipeline execution complete. 16/09/14 02:35:09 INFO SparkContext: Invoking stop() from shutdown hook [root@tspanndev13 spark]# hdfs dfs -ls
Found 5 items drwxr-xr-x - root hdfs 0 2016-09-14 02:35 .sparkStaging
-rw-r--r-- 3 root hdfs 0 2016-09-14 02:35 _SUCCESS
-rw-r--r-- 3 root hdfs 185965 2016-09-14 01:44 kinglear.txt
-rw-r--r-- 3 root hdfs 27304 2016-09-14 02:35 out-00000-of-00002
-rw-r--r-- 3 root hdfs 26515 2016-09-14 02:35 out-00001-of-00002
[root@tspanndev13 spark]# hdfs dfs -cat out-00000-of-00002
oaths: 1
bed: 7
hearted: 5
warranties: 1
Refund: 1
unnaturalness: 1
sea: 7
sham'd: 1
Only: 2
sleep: 8
sister: 29
Another: 2
carbuncle: 1 As you can see as expected it produced the two part output file in HDFS with wordcounts. Not much configuration is required to run your Apache Beam Java jobs on your HDP 2.5 YARN Spark Cluster, so if you have a development cluster, this would be a great place to try it out. Our on your own HDP 2.5 sandbox. Resources: http://beam.incubator.apache.org/learn/programming-guide/ https://github.com/apache/incubator-beam/tree/master/runners/spark
... View more
Labels:
09-13-2016
01:35 AM
2 Kudos
I just installed HDP 2.5 from Ambari on a small openstack cluster choosing Zeppelin from ambari. %jdbc(phoenix) select * from phillycrime limit 100 worked for me. Try to restart the jdbc interpreter Make sure you have: under: jdbc %jdbc (default) edit restart remove
Option shared Interpreter for note
Connect to existing process phoenix.driver org.apache.phoenix.jdbc.PhoenixDriver phoenix.hbase.client.retries.number 1 phoenix.password phoenix.url jdbc:phoenix:192.168.1.100:/hbase-unsecure phoenix.user phoenixuser
... View more
09-12-2016
07:51 PM
yes, if you use the Zeppelin now installed with Spark this should be resolved
... View more
09-12-2016
07:46 PM
2 Kudos
TitanDB is no longer active. The project is now Datastax Graph and I haven't heard or seen any updates in open source. http://www.slideshare.net/MohamedTaherAlRefaie/graph-databases-tinkerpop-and-titan-db I did install it last year and it kind of works. Follow this: http://s3.thinkaurelius.com/docs/titan/1.0.0/getting-started.html Use this page to change your configuration to point to your HDP HBase http://s3.thinkaurelius.com/docs/titan/1.0.0/configuration.html#_example_configurations Download 1.0.0 with Hadoop 2 https://github.com/thinkaurelius/titan/wiki/Downloads Try Spark is not used in TitanDB, Spark is used with TinkerPop. I would suggested to try Apache TinkerPop with the Hadoop Gremlin http://tinkerpop.apache.org/ See instructions http://tinkerpop.apache.org/docs/current/reference/#hadoop-gremlin I think that may server your needs.
... View more
09-12-2016
07:18 PM
Take a look at this article here: http://www.coding-daddy.xyz/node/7 Very detailed instructions. You can also run a local Spark server and that's easy. You could also use Zeppelin or Spark REPL so you can test as you develop. Or use the Spark testing framework https://github.com/holdenk/spark-testing-base
... View more
09-09-2016
08:13 PM
10 Kudos
Sensor Reading with Apache NiFi 1.0.0 There are many types of sensors, devices and meters that can be great sources of data. Some can push data, some can pull data, some provide APIs, some give you access to install software.
How To Access Sensors One option is to install MiNiFi on the device if you have root access. This will provide fast access to allow you to script and manage your local data. Another option for bigger devices is to install a full Java based NiFi node. It starts becoming harder once you have tens of thousands of devices. You can install an HDF Edge and communicate from this node to your HDF cluster via Site-to-Site protocol. From this Edge Node that acts as an accumulator for many devices (a good idea so that you don't send 10,000 network requests a second from each set of devices, keep as much traffic locally to save time, time-outs, networking and cloud costs). You can also now aggregate and send larger batches of data and also process some summaries and aggregates locally in NiFi. This will also let you populate local databases, dashboards and statistics that may only be of interest to the local source of the sensors (perhaps a plant manager or automated monitoring system). Another option is to have devices push or pull to a local or remote NiFi install via various protocols including TCP/IP, UDP/IP, REST HTTP, JMS, MQTT, SFTP and Email. Device Push to NiFi Your device can send messages to NiFi via any number of protocols listed. For my example, I push via MQTT. My local NiFi node will consume these messages via ConsumeMQTT.
Reference: Paho-MQTT Your device will need to run Linux (or something related), have Python 2.7 or better and PiP installed. With Pip, you can install the Eclipse library that you need to send MQTT messages. pip install paho-mqtt import paho.mqtt.client as paho
client = paho.Client()
client.connect("servername", 1883, 60)
client.publish("sensor", payload="Test", qos=0, retain=True)
Where "servername" is the name of the server you are sending the message to (it could also be on the NiFi Node, another server, a bigger device, a central aggregator or messaging server). I would recommend having it as close in the network as possible. "sensor" is the name of the topic that we are publishing the message to, NiFi will consume this message. I have cron job setup to run every minute and publish messages (* * * * * /opt/demo/sendit.sh ) NiFi Poll Device NiFi can poll your device and consume from various protocols like JMS, MQTT, SFTP, TCP and UDP. For my example, I chose a REST API over HTTP to get past hurdles of firewalls and such. I setup a Flask Server on RPI, to run my REST API, I run this in a shell script. export FLASK_APP=hello.py
flask run --host=0.0.0.0 --port=8888 --no-debugger
To install Flask, you need to run pip install flask
Sensor Reading Code #!flask/bin/python
from flask import Flask, jsonify
import sys
import time
import datetime
import subprocess
import sys
import urllib2
import json
import paho.mqtt.client as paho
from sense_hat import SenseHat
sense = SenseHat()
sense.clear()
app = Flask(__name__)
@app.route('/pi/api/v1.0/sensors', methods=['GET'])
def get_sensors():
p = subprocess.Popen(['/opt/vc/bin/vcgencmd','measure_temp'], stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = p.communicate()
temp = sense.get_temperature()
temp = round(temp, 1)
temph = sense.get_temperature_from_humidity()
temph = round(temph, 1)
tempp = sense.get_temperature_from_pressure()
tempp = round(tempp, 1)
humidity = sense.get_humidity()
humidity = round(humidity, 1)
pressure = sense.get_pressure()
pressure = round(pressure, 1)
tasks = [ { 'tempp': tempp, 'temph': temph, 'cputemp': out, 'temp': temp, 'tempf': ((temp * 1.8) + 12), 'humidity': humidity, 'pressure': pressure } ]
# As an option we can push this message when we get called as well
client = paho.Client()
client.connect("mqttmessageserver", 1883, 60)
client.publish("sensor", payload=jsonify({'readings': tasks}), qos=0, retain=True)
return jsonify({'readings': tasks})
@app.route('/pi/api/v1.0/location', methods=['GET'])
def get_loc():
orientation = sense.get_orientation()
pitch = orientation['pitch']
roll = orientation['roll']
yaw = orientation['yaw']
acceleration = sense.get_accelerometer_raw()
x = acceleration['x']
y = acceleration['y']
z = acceleration['z']
x=round(x, 0)
y=round(y, 0)
z=round(z, 0)
tasks = [ { 'pitch': pitch, 'roll': roll, 'yaw': yaw, 'x': x, 'y': y, 'z': z } ]
return jsonify({'readings': tasks})
@app.route('/pi/api/v1.0/show', methods=['GET'])
def get_pi():
temp = sense.get_temperature()
temp = round(temp, 1)
humidity = sense.get_humidity()
humidity = round(humidity, 1)
pressure = sense.get_pressure()
pressure = round(pressure, 1)
# 8x8 RGB
sense.clear()
info = 'T(C): ' + str(temp) + 'H: ' + str(humidity) + 'P: ' + str(pressure)
sense.show_message(info, text_colour=[255, 0, 0])
sense.clear()
tasks = [ { 'temp': temp, 'tempf': ((temp * 1.8) + 12), 'humidity': humidity, 'pressure': pressure } ]
return jsonify({'readings': tasks})
if __name__ == '__main__':
app.run(debug=True)
The device I am testing is a Raspberry Pi 3 Model B with a Sense Hat sensor attachment. Besides having sensors for temperature, humidity and barometric pressures it also has a 8x8 light grid for displaying text and simple graphics. We can use this to print messages (sense.show_message) or warnings that we send from NiFi. This allows for 2 way very visceral communication to remote devices. This could be used to notify local personnel of conditions.
nifi 1.0.0 Flows
JSON File Landed in HDFS in our HDP 2.5 Cluster [root@myserverhdp sensors]# hdfs dfs -ls /sensor
Found 2 items
-rw-r--r-- 3 root hdfs 202 2016-09-09 17:26 /sensor/181528179026826
drwxr-xr-x - hdfs hdfs 0 2016-09-09 15:43 /sensor/failure
[root@tspanndev13 sensors]# hdfs dfs -cat /sensor/181528179026826
{
"readings": [
{
"cputemp": "temp=55.8'C\n",
"humidity": 40.8,
"pressure": 1014.1,
"temp": 40.0,
"tempf": 84.0,
"temph": 40.0,
"tempp": 39.1
}
]
}
The final results of our flow is a JSON file on HDFS. We could easily send a copy of the data to Phoenix via PutSQL or to Hive via PutHiveQL or to Spark Streaming for additional processing via Site-To-Site or Kafka.
Resources: https://github.com/topshed/RPi_8x8GridDraw https://www.raspberrypi.org/learning/sense-hat-data-logger/worksheet/ https://www.raspberrypi.org/learning/astro-pi-flight-data-analysis/worksheet/ https://www.raspberrypi.org/learning/astro-pi-guide/sensors/temperature.md https://breadfit.wordpress.com/2015/06/24/installing-mosquitto-under-centos/
... View more
Labels: