About TimothySpann

TimothySpann · ‎09-15-2016

do you have any logs? both compiled successfully? you are running both? You need to connect to Zookeeper from the client. val kafkaConf = Map( "metadata.broker.list" -> "sandbox.hortonworks.com:6667", "zookeeper.connect" -> "sandbox.hortonworks.com:2181") Make sure you have the ports 2181, 6667, 9092 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_HDP_Reference_Guide/content/kafka-ports.html Were you able to use the built in scripts for console producer and consumer? http://kafka.apache.org/07/quickstart.html What JDK are you using? Where are you running? on Sandbox? Local Machine? It's best to build both projects and run through from the command line. Also make sure the Kafka service is running in the sandbox, it may note be. When running from Java, use: -Djava.net.preferIPv4Stack=true

TimothySpann · ‎09-15-2016

Flow File: sensor.xml

TimothySpann · ‎09-14-2016

If you have seen my article on Microservice on Hive, this is the Phoenix version. Phoenix seems to be a better option for REST microservice. I like having HBase as my main data store, it's very performant and highly scalable for application style queries. See: https://community.hortonworks.com/articles/53629/writing-a-spring-boot-microservices-to-access-hive.html This microservices is a Spring Boot REST service on top of the data loaded by this NiFi data flow. See: https://community.hortonworks.com/content/kbentry/54947/reading-opendata-json-and-storing-into-phoenix-tab.html Pom <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.dataflowdeveloper</groupId> <artifactId>phoenix</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>phoenix</name> <description>Apache Hbase Phoenix Spring Boot</description> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>1.4.0.RELEASE</version> </parent> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <java.version>1.8</java.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-core</artifactId> <version>4.4.0-HBase-1.0</version> <exclusions> <exclusion> <artifactId>slf4j-log4j12</artifactId> <groupId>org.slf4j</groupId> </exclusion> <exclusion> <artifactId>log4j</artifactId> <groupId>log4j</groupId> </exclusion> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>servlet-api-2.5</artifactId> </exclusion> <exclusion> <groupId>org.eclipse.jetty.aggregate</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> <version>1.4.0.RELEASE</version> <exclusions> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> <exclusion> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-tomcat</artifactId> </exclusion> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> <exclusion> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-logging</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-jetty</artifactId> <scope>provided</scope> <version>1.4.0.RELEASE</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.1</version> <type>jar</type> <exclusions> <exclusion> <artifactId>slf4j-log4j12</artifactId> <groupId>org.slf4j</groupId> </exclusion> <exclusion> <artifactId>log4j</artifactId> <groupId>log4j</groupId> </exclusion> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>servlet-api-2.5</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-social-twitter</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.7.4</version> <exclusions> <exclusion> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> </exclusion> <exclusion> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-annotations</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> <version>2.7.4</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-annotations</artifactId> <version>2.7.4</version> </dependency> <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-jdbc-core</artifactId> <version>1.2.1.RELEASE</version> <exclusions> <exclusion> <artifactId>slf4j-log4j12</artifactId> <groupId>org.slf4j</groupId> </exclusion> <exclusion> <artifactId>log4j</artifactId> <groupId>log4j</groupId> </exclusion> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> </exclusions> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project> Follow this Maven build very carefully, it is very specific version. src/main/resources/application.properties purl=jdbc:phoenix:serverIP:2181:/hbase-unsecure pdriver=org.apache.phoenix.jdbc.PhoenixDriver Java Bean to Hold Philly Crime Data public class PhillyCrime implements Serializable { private String dcDist; private String dcKey; private String dispatchDate; private String dispatchDateTime; private String dispatchTime; private String hour; private String locationBlock; private String psa; private String textGeneralCode; private String ucrGeneral; } Application Class to Bootstrap Spring Boot package com.dataflowdeveloper; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Value; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.EnableAutoConfiguration; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.context.annotation.Profile; import org.springframework.social.twitter.api.Twitter; import org.springframework.social.twitter.api.impl.TwitterTemplate; @Configuration @ComponentScan @EnableAutoConfiguration @SpringBootApplication public class HBaseApplication { public static void main(String[] args) { SpringApplication.run(HBaseApplication.class, args); } @Configuration @Profile("default") static class LocalConfiguration { Logger logger = LoggerFactory.getLogger(LocalConfiguration.class); @Value("${consumerkey}") private String consumerKey; @Value("${consumersecret}") private String consumerSecret; @Value("${accesstoken}") private String accessToken; @Value("${accesstokensecret}") private String accessTokenSecret; @Bean public Twitter twitter() { Twitter twitter = null; try { twitter = new TwitterTemplate(consumerKey, consumerSecret, accessToken, accessTokenSecret); } catch (Exception e) { logger.error("Error:", e); } return twitter; } @Value("${purl}") private String databaseUri; @Bean public Connection connection() { Connection con = null; try { con = DriverManager.getConnection(databaseUri); } catch (SQLException e) { e.printStackTrace(); logger.error("Connection fail: ", e); } //dataSource.setDriverClassName("org.apache.phoenix.jdbc.PhoenixDriver"); logger.error("Initialized hbase"); return con; } } } Phoenix Query Server does not require a Connection Pool! DataSourceService package com.dataflowdeveloper; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.util.ArrayList; import java.util.List; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Value; import org.springframework.stereotype.Component; @Component("DataSourceService") public class DataSourceService { Logger logger = LoggerFactory.getLogger(DataSourceService.class); @Autowired public Connection connection; // default to empty public PhillyCrime defaultValue() { return new PhillyCrime(); } // querylimit @Value("${querylimit}") private String querylimit; /** * * @param query * - search msg * @return List of Twitter2 */ public List<PhillyCrime> search(String query) { List<PhillyCrime> crimes = new ArrayList<>(); String sql = ""; try { logger.error("Query: " + query); logger.error("Limit:" + querylimit); if ( connection == null ) { logger.error("Null connection"); return crimes; } if ( query == null || query.trim().length() <= 0 ) { query = ""; sql = "select * from phillycrime"; } else { query = "%" + query.toUpperCase() + "%"; sql = "select * from phillycrime WHERE upper(text_general_code) like ? LIMIT ?"; } PreparedStatement ps = connection .prepareStatement(sql); if ( query.length() > 1 ) { ps.setString(1, query); ps.setInt(2, Integer.parseInt(querylimit)); } ResultSet res = ps.executeQuery(); PhillyCrime crime = null; while (res.next()) { crime = new PhillyCrime(); crime.setDcKey(res.getString("dc_key")); crime.setDcDist(res.getString("dc_dist")); crime.setDispatchDate(res.getString("dispatch_date")); crime.setDispatchDateTime(res.getString("dispatch_date_time")); crime.setDispatchTime(res.getString("dispatch_time")); crime.setHour(res.getString("hour")); crime.setLocationBlock(res.getString("location_block")); crime.setPsa(res.getString("psa")); crime.setTextGeneralCode(res.getString("text_general_code")); crime.setUcrGeneral(res.getString("ucr_general")); crimes.add(crime); } res.close(); ps.close(); connection.close(); res = null; ps = null; connection = null; crime = null; logger.error("Size=" + crimes.size()); } catch (Exception e) { e.printStackTrace(); logger.error("Error in search", e); } return crimes; } } This class does your basic JDBC SQL queries and return. Spring Boot Rest Controller import java.util.List; import javax.servlet.http.HttpServletRequest; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.web.bind.annotation.PathVariable; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RestController; import org.springframework.web.context.request.RequestAttributes; import org.springframework.web.context.request.RequestContextHolder; import org.springframework.web.context.request.ServletRequestAttributes; @RestController public class DataController {Logger logger = LoggerFactory.getLogger(DataController.class); @Autowired private DataSourceService dataSourceService; @RequestMapping("/query/{query}") public List<PhillyCrime> query( @PathVariable(value="query") String query) { List<PhillyCrime> value = dataSourceService.search(query); return value; } To Build mvn package -DskipTests To Run java -Xms512m -Xmx2048m -Dhdp.version=2.4.0.0-169 -Djava.net.preferIPv4Stack=true -jar target/phoenix-0.0.1-SNAPSHOT.jar To check Your Phoenix Data From the Command Line (2181 is the zookeeper port) /usr/hdp/current/phoenix-client/bin/sqlline.py server:2181:/hbase Phoenix Table DDL CREATE TABLE phillycrime (dc_dist varchar, dc_key varchar not null primary key,dispatch_date varchar,dispatch_date_time varchar,dispatch_time varchar,hour varchar,location_block varchar,psa varchar, text_general_code varchar,ucr_general varchar); Results 2016-09-14 20:09:25.135 INFO 11937 --- [ main] o.s.j.e.a.AnnotationMBeanExporter : Registering beans for JMX exposure on startup 2016-09-14 20:09:25.150 INFO 11937 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Starting beans in phase 0 2016-09-14 20:09:25.275 INFO 11937 --- [ main] application : Initializing Spring FrameworkServlet 'dispatcherServlet' 2016-09-14 20:09:25.275 INFO 11937 --- [ main] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization started 2016-09-14 20:09:25.294 INFO 11937 --- [ main] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization completed in 19 ms 2016-09-14 20:09:25.322 INFO 11937 --- [ main] org.mortbay.log : Logging to Logger[org.mortbay.log] via org.mortbay.log.Slf4jLog 2016-09-14 20:09:25.333 INFO 11937 --- [ main] o.e.jetty.server.AbstractConnector : Started ServerConnector@7a55af6b{HTTP/1.1,[http/1.1]}{0.0.0.0:9999} 2016-09-14 20:09:25.335 INFO 11937 --- [ main] .s.b.c.e.j.JettyEmbeddedServletContainer : Jetty started on port(s) 9999 (http/1.1) 2016-09-14 20:09:25.339 INFO 11937 --- [ main] com.dataflowdeveloper.HBaseApplication : Started HBaseApplication in 13.783 seconds (JVM running for 14.405) 2016-09-14 20:09:37.961 ERROR 11937 --- [tp1282287470-17] com.dataflowdeveloper.DataSourceService : Query: Theft 2016-09-14 20:09:37.961 ERROR 11937 --- [tp1282287470-17] com.dataflowdeveloper.DataSourceService : Limit:250 2016-09-14 20:09:39.050 ERROR 11937 --- [tp1282287470-17] com.dataflowdeveloper.DataSourceService : Size=250 2016-09-14 20:09:39.050 ERROR 11937 --- [tp1282287470-17] com.dataflowdeveloper.DataController : Query:Theft,IP:127.0.0.1 Browser:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Connection Handshake for App 2016-09-14 20:09:18.042 INFO 11937 --- [ main] o.a.h.h.zookeeper.RecoverableZooKeeper : Process identifier=hconnection-0x4116aac9 connecting to ZooKeeper ensemble=server:2181 2016-09-14 20:09:18.042 INFO 11937 --- [ main] org.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=server:2181 sessionTimeout=90000 watcher=hconnection-0x4116aac90x0, quorum=server:2181, baseZNode=/hbase-unsecure 2016-09-14 20:09:18.044 INFO 11937 --- [1.1.1.1:2181)] org.apache.zookeeper.ClientCnxn : Opening socket connection to server 192.11.1.5:2181. Will not attempt to authenticate using SASL (unknown error) 2016-09-14 20:09:18.163 INFO 11937 --- [1.1.1.1:2181)] org.apache.zookeeper.ClientCnxn : Socket connection established to 192.11.1.:2181, initiating session 2016-09-14 20:09:18.577 INFO 11937 --- [26.195.56:2181)] org.apache.zookeeper.ClientCnxn : Session establishment complete on server :2181, sessionid = 0x157063034991d14, negotiated timeout = 40000 2016-09-14 20:09:19.953 INFO 11937 --- [ main] nectionManager$HConnectionImplementation : Closing master protocol: MasterService 2016-09-14 20:09:19.953 INFO 11937 --- [ main] nectionManager$HConnectionImplementation : Closing zookeeper sessionid=0x157063034991d14 2016-09-14 20:09:20.040 INFO 11937 --- [ main] org.apache.zookeeper.ZooKeeper : Session: 0x157063034991d14 closed 2016-09-14 20:09:20.040 INFO 11937 --- [ain-EventThread] org.apache.zookeeper.ClientCnxn : EventThread shut down 2016-09-14 20:09:20.470 INFO 11937 --- [ main] o.a.p.query.ConnectionQueryServicesImpl : Found quorum: server.com:2181 2016-09-14 20:09:20.471 INFO 11937 --- [ main] o.a.h.h.zookeeper.RecoverableZooKeeper : Process identifier=hconnection-0x36b4fe2a connecting to ZooKeeper ensemble=tspanndev10.field.hortonworks.com:2181 2016-09-14 20:09:20.471 INFO 11937 --- [ main] org.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=tspanndev10.field.hortonworks.com:2181 sessionTimeout=90000 watcher=hconnection-0x36b4fe2a0x0, quorum=tspanndev10.field.hortonworks.com:2181, baseZNode=/hbase-unsecure 2016-09-14 20:09:20.472 INFO 11937 --- [222.2.2.2.2.:2181)] org.apache.zookeeper.ClientCnxn : Opening socket connection to server 1:2181. Will not attempt to authenticate using SASL (unknown error) 2016-09-14 20:09:20.555 INFO 11937 --- [26:2181)] org.apache.zookeeper.ClientCnxn : Socket connection established to 172....:2181, initiating session 2016-09-14 20:09:20.641 INFO 11937 --- [22.2.2:2181)] org.apache.zookeeper.ClientCnxn : Session establishment complete on server 2.2.2/2.2.2:2181, sessionid = 0x157063034991d15, negotiated timeout = 40000 Reference: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_data-access/content/ch_using-phoenix.html https://community.hortonworks.com/articles/19016/connect-to-phoenix-hbase-using-dbvisualizer.html

TimothySpann · ‎09-14-2016

To install a Mosquitto MQTT Server on Centos7 yum -y install unzip Step 1: Add the CentOS 7 mosquitto repository cd /etc/yum.repos.d wget http://download.opensuse.org/repositories/home:/oojah:/mqtt/CentOS_CentOS-7/home:oojah:mqtt.repo sudo yum update Step 2: Install mosquitto & mosquitto-clients sudo yum install -y mosquitto mosquitto-clients Step 3: Run mosquitto sudo su /usr/sbin/mosquitto -d -c /etc/mosquitto/mosquitto.conf > /var/log/mosquitto.log 2>&1

TimothySpann · ‎09-14-2016

Running Spark Jobs Through Apache Beam on HDP 2.5 Yarn Cluster Using the Spark Runner with Apache Beam Apache Beam is still in incubator and not supported on HDP 2.5 or other platforms. sudo yum -y install git wget http://www.gtlib.gatech.edu/pub/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz After you get Maven downloaded, move it to /opt/demo/maven or into your path. The maven download mirror will change, so grab a new URL from http://maven.apache.org/. Using Yum will give you an older Maven not supported and may interfere with something else. So I recommend getting a new Maven just for this build. Make sure you have Java 7 or greater, which you should have on an Apache machine. I am recommending Java 8 on your new HDP 2.5 nodes if possible. cd /opt/demo/ git clone https://github.com/apache/incubator-beam cd incubator-beam /opt/demo/maven/bin/mvn clean install -DskipTests If you want to run this on Spark 2.0 and not Spark 1.6.2, look here for changing environment: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/spark-choose-version.html For HDP 2.5, these are the parameters: spark-submit --class org.apache.beam.runners.spark.examples.WordCount --master yarn-client target/beam-runners-spark-0.3.0-incubating-SNAPSHOT-spark-app.jar --inputFile=kinglear.txt --output=out --runner=SparkRunner --sparkMaster=yarn-client Note, I had to change the parameters to get this to work in my environment. You may also need to do /opt/demo/maven/bin/mvn package from the /opt/demo/incubator-beam/runners/spark directory. This is running a Java 7 example from the built-in examples: https://github.com/apache/incubator-beam/tree/master/examples/java These are the results of running our small Spark job. 16/09/14 02:35:08 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 34.0 KB, free 518.7 KB) 16/09/14 02:35:08 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.26.195.58:39575 (size: 34.0 KB, free: 511.1 MB) 16/09/14 02:35:08 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1008 16/09/14 02:35:08 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[14] at mapToPair at TransformTranslator.java:568) 16/09/14 02:35:08 INFO YarnScheduler: Adding task set 1.0 with 2 tasks 16/09/14 02:35:08 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, tspanndev13.field.hortonworks.com, partition 0,NODE_LOCAL, 1994 bytes) 16/09/14 02:35:08 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, tspanndev13.field.hortonworks.com, partition 1,NODE_LOCAL, 1994 bytes) 16/09/14 02:35:08 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on tspanndev13.field.hortonworks.com:36438 (size: 34.0 KB, free: 511.1 MB) 16/09/14 02:35:08 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on tspanndev13.field.hortonworks.com:36301 (size: 34.0 KB, free: 511.1 MB) 16/09/14 02:35:08 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to tspanndev13.field.hortonworks.com:52646 16/09/14 02:35:08 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 177 bytes 16/09/14 02:35:08 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to tspanndev13.field.hortonworks.com:52640 16/09/14 02:35:09 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 681 ms on tspanndev13.field.hortonworks.com (1/2) 16/09/14 02:35:09 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 1112 ms on tspanndev13.field.hortonworks.com (2/2) 16/09/14 02:35:09 INFO YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 16/09/14 02:35:09 INFO DAGScheduler: ResultStage 1 (saveAsNewAPIHadoopFile at TransformTranslator.java:745) finished in 1.113 s 16/09/14 02:35:09 INFO DAGScheduler: Job 0 finished: saveAsNewAPIHadoopFile at TransformTranslator.java:745, took 5.422285 s 16/09/14 02:35:09 INFO SparkRunner: Pipeline execution complete. 16/09/14 02:35:09 INFO SparkContext: Invoking stop() from shutdown hook [root@tspanndev13 spark]# hdfs dfs -ls Found 5 items drwxr-xr-x - root hdfs 0 2016-09-14 02:35 .sparkStaging -rw-r--r-- 3 root hdfs 0 2016-09-14 02:35 _SUCCESS -rw-r--r-- 3 root hdfs 185965 2016-09-14 01:44 kinglear.txt -rw-r--r-- 3 root hdfs 27304 2016-09-14 02:35 out-00000-of-00002 -rw-r--r-- 3 root hdfs 26515 2016-09-14 02:35 out-00001-of-00002 [root@tspanndev13 spark]# hdfs dfs -cat out-00000-of-00002 oaths: 1 bed: 7 hearted: 5 warranties: 1 Refund: 1 unnaturalness: 1 sea: 7 sham'd: 1 Only: 2 sleep: 8 sister: 29 Another: 2 carbuncle: 1 As you can see as expected it produced the two part output file in HDFS with wordcounts. Not much configuration is required to run your Apache Beam Java jobs on your HDP 2.5 YARN Spark Cluster, so if you have a development cluster, this would be a great place to try it out. Our on your own HDP 2.5 sandbox. Resources: http://beam.incubator.apache.org/learn/programming-guide/ https://github.com/apache/incubator-beam/tree/master/runners/spark

TimothySpann · ‎09-13-2016

I just installed HDP 2.5 from Ambari on a small openstack cluster choosing Zeppelin from ambari. %jdbc(phoenix) select * from phillycrime limit 100 worked for me. Try to restart the jdbc interpreter Make sure you have: under: jdbc %jdbc (default) edit restart remove Option shared Interpreter for note Connect to existing process phoenix.driver org.apache.phoenix.jdbc.PhoenixDriver phoenix.hbase.client.retries.number 1 phoenix.password phoenix.url jdbc:phoenix:192.168.1.100:/hbase-unsecure phoenix.user phoenixuser

TimothySpann · ‎09-12-2016

yes, if you use the Zeppelin now installed with Spark this should be resolved

TimothySpann · ‎09-12-2016

TitanDB is no longer active. The project is now Datastax Graph and I haven't heard or seen any updates in open source. http://www.slideshare.net/MohamedTaherAlRefaie/graph-databases-tinkerpop-and-titan-db I did install it last year and it kind of works. Follow this: http://s3.thinkaurelius.com/docs/titan/1.0.0/getting-started.html Use this page to change your configuration to point to your HDP HBase http://s3.thinkaurelius.com/docs/titan/1.0.0/configuration.html#_example_configurations Download 1.0.0 with Hadoop 2 https://github.com/thinkaurelius/titan/wiki/Downloads Try Spark is not used in TitanDB, Spark is used with TinkerPop. I would suggested to try Apache TinkerPop with the Hadoop Gremlin http://tinkerpop.apache.org/ See instructions http://tinkerpop.apache.org/docs/current/reference/#hadoop-gremlin I think that may server your needs.

TimothySpann · ‎09-12-2016

Take a look at this article here: http://www.coding-daddy.xyz/node/7 Very detailed instructions. You can also run a local Spark server and that's easy. You could also use Zeppelin or Spark REPL so you can test as you develop. Or use the Spark testing framework https://github.com/holdenk/spark-testing-base

TimothySpann · ‎09-09-2016

Sensor Reading with Apache NiFi 1.0.0 There are many types of sensors, devices and meters that can be great sources of data. Some can push data, some can pull data, some provide APIs, some give you access to install software. How To Access Sensors One option is to install MiNiFi on the device if you have root access. This will provide fast access to allow you to script and manage your local data. Another option for bigger devices is to install a full Java based NiFi node. It starts becoming harder once you have tens of thousands of devices. You can install an HDF Edge and communicate from this node to your HDF cluster via Site-to-Site protocol. From this Edge Node that acts as an accumulator for many devices (a good idea so that you don't send 10,000 network requests a second from each set of devices, keep as much traffic locally to save time, time-outs, networking and cloud costs). You can also now aggregate and send larger batches of data and also process some summaries and aggregates locally in NiFi. This will also let you populate local databases, dashboards and statistics that may only be of interest to the local source of the sensors (perhaps a plant manager or automated monitoring system). Another option is to have devices push or pull to a local or remote NiFi install via various protocols including TCP/IP, UDP/IP, REST HTTP, JMS, MQTT, SFTP and Email. Device Push to NiFi Your device can send messages to NiFi via any number of protocols listed. For my example, I push via MQTT. My local NiFi node will consume these messages via ConsumeMQTT. Reference: Paho-MQTT Your device will need to run Linux (or something related), have Python 2.7 or better and PiP installed. With Pip, you can install the Eclipse library that you need to send MQTT messages. pip install paho-mqtt import paho.mqtt.client as paho client = paho.Client() client.connect("servername", 1883, 60) client.publish("sensor", payload="Test", qos=0, retain=True) Where "servername" is the name of the server you are sending the message to (it could also be on the NiFi Node, another server, a bigger device, a central aggregator or messaging server). I would recommend having it as close in the network as possible. "sensor" is the name of the topic that we are publishing the message to, NiFi will consume this message. I have cron job setup to run every minute and publish messages (* * * * * /opt/demo/sendit.sh ) NiFi Poll Device NiFi can poll your device and consume from various protocols like JMS, MQTT, SFTP, TCP and UDP. For my example, I chose a REST API over HTTP to get past hurdles of firewalls and such. I setup a Flask Server on RPI, to run my REST API, I run this in a shell script. export FLASK_APP=hello.py flask run --host=0.0.0.0 --port=8888 --no-debugger To install Flask, you need to run pip install flask Sensor Reading Code #!flask/bin/python from flask import Flask, jsonify import sys import time import datetime import subprocess import sys import urllib2 import json import paho.mqtt.client as paho from sense_hat import SenseHat sense = SenseHat() sense.clear() app = Flask(__name__) @app.route('/pi/api/v1.0/sensors', methods=['GET']) def get_sensors(): p = subprocess.Popen(['/opt/vc/bin/vcgencmd','measure_temp'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) out, err = p.communicate() temp = sense.get_temperature() temp = round(temp, 1) temph = sense.get_temperature_from_humidity() temph = round(temph, 1) tempp = sense.get_temperature_from_pressure() tempp = round(tempp, 1) humidity = sense.get_humidity() humidity = round(humidity, 1) pressure = sense.get_pressure() pressure = round(pressure, 1) tasks = [ { 'tempp': tempp, 'temph': temph, 'cputemp': out, 'temp': temp, 'tempf': ((temp * 1.8) + 12), 'humidity': humidity, 'pressure': pressure } ] # As an option we can push this message when we get called as well client = paho.Client() client.connect("mqttmessageserver", 1883, 60) client.publish("sensor", payload=jsonify({'readings': tasks}), qos=0, retain=True) return jsonify({'readings': tasks}) @app.route('/pi/api/v1.0/location', methods=['GET']) def get_loc(): orientation = sense.get_orientation() pitch = orientation['pitch'] roll = orientation['roll'] yaw = orientation['yaw'] acceleration = sense.get_accelerometer_raw() x = acceleration['x'] y = acceleration['y'] z = acceleration['z'] x=round(x, 0) y=round(y, 0) z=round(z, 0) tasks = [ { 'pitch': pitch, 'roll': roll, 'yaw': yaw, 'x': x, 'y': y, 'z': z } ] return jsonify({'readings': tasks}) @app.route('/pi/api/v1.0/show', methods=['GET']) def get_pi(): temp = sense.get_temperature() temp = round(temp, 1) humidity = sense.get_humidity() humidity = round(humidity, 1) pressure = sense.get_pressure() pressure = round(pressure, 1) # 8x8 RGB sense.clear() info = 'T(C): ' + str(temp) + 'H: ' + str(humidity) + 'P: ' + str(pressure) sense.show_message(info, text_colour=[255, 0, 0]) sense.clear() tasks = [ { 'temp': temp, 'tempf': ((temp * 1.8) + 12), 'humidity': humidity, 'pressure': pressure } ] return jsonify({'readings': tasks}) if __name__ == '__main__': app.run(debug=True) The device I am testing is a Raspberry Pi 3 Model B with a Sense Hat sensor attachment. Besides having sensors for temperature, humidity and barometric pressures it also has a 8x8 light grid for displaying text and simple graphics. We can use this to print messages (sense.show_message) or warnings that we send from NiFi. This allows for 2 way very visceral communication to remote devices. This could be used to notify local personnel of conditions. nifi 1.0.0 Flows JSON File Landed in HDFS in our HDP 2.5 Cluster [root@myserverhdp sensors]# hdfs dfs -ls /sensor Found 2 items -rw-r--r-- 3 root hdfs 202 2016-09-09 17:26 /sensor/181528179026826 drwxr-xr-x - hdfs hdfs 0 2016-09-09 15:43 /sensor/failure [root@tspanndev13 sensors]# hdfs dfs -cat /sensor/181528179026826 { "readings": [ { "cputemp": "temp=55.8'C\n", "humidity": 40.8, "pressure": 1014.1, "temp": 40.0, "tempf": 84.0, "temph": 40.0, "tempp": 39.1 } ] } The final results of our flow is a JSON file on HDFS. We could easily send a copy of the data to Phoenix via PutSQL or to Hive via PutHiveQL or to Spark Streaming for additional processing via Site-To-Site or Kafka. Resources: https://github.com/topshed/RPi_8x8GridDraw https://www.raspberrypi.org/learning/sense-hat-data-logger/worksheet/ https://www.raspberrypi.org/learning/astro-pi-flight-data-analysis/worksheet/ https://www.raspberrypi.org/learning/astro-pi-guide/sensors/temperature.md https://breadfit.wordpress.com/2015/06/24/installing-mosquitto-under-centos/

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: Kafka Java Consumer Client returning empty rec...

Re: Reading Sensor Data from Remote Sensors on Ras...

Creating a Spring Boot Java 8 Microservice To Read...

Re: Reading Sensor Data from Remote Sensors on Ras...

Running Apache Beam Spark Runner on HDP 2.5

Re: Trying to create phoenix interpreter using %jd...

Re: Configuring Zeppelin Spark Interpreters

Re: Integration Titan DB with Hortonworks HDP

Re: Is there a easy way to test spark applications...

Reading Sensor Data from Remote Sensors on Raspber...