About TimothySpann

TimothySpann · ‎08-22-2017

That may run, best running it on a separate unrelated node. It doesn't need to be on the Hadoop cluster. For HDP 2.6 and later, it can be installed via Ambari as part of the cluster and will work fine with Druid. I was just running it standalone for testing. Best and easiest course of action is to upgrade to HDP 2.6 and install druid and superset through that.

TimothySpann · ‎08-22-2017

For Sentiment Analysis with NiFi Processors Download and build these https://github.com/tspannhw/nifi-corenlp-processor https://github.com/tspannhw/nifi-nlp-processor

TimothySpann · ‎08-22-2017

Try another browser. Make sure port 8888 is open in the VM settings Try with localhost:8888 or yourcomputername:8888 some machines don't have 127.0.0.1 setup properly or blocked by virus scanners or corporate firewalls

TimothySpann · ‎08-21-2017

The MiniFi flow executes two scripts: one to call TensorFlow Python that captures an OpenCV Raspberry Pi Camera and runs Inception on it. That message is formatted as JSON and sent on. The second script reads GPS values from a USB GSP sensor and outputs JSON. Get File reads the Pi Camera image produced by the ClassifyImages process. Cleanup logs is a standalone timed script that cleans up old logs on the Raspberry Pi. Using InferredAvroSchema I created a schema for the GPS unit and stored it in the Hortonworks Schema Registry. This is the provenance event for a typical GPS message sent. You can see what shell script we ran and from what host. In Apache NiFi we process the message, routing to the correct place, setting a schema and querying it for a latitude. Then we convert the AVRO record to ORC to save as a Hive table. MiniFi requires we change the NiFi created template to a configuration file via the command-line MiniFi Toolkit. minifi-toolkit-0.2.0/bin/config.sh transform gpstensorflowpiminifi2.xml config.yml scp config.yml pi@192.168.1.167:/opt/demo/minifi/conf/ ./gpsrun.sh {"ipaddress": "192.168.1.167", "utc": "2017-08-21T20:00:06.000Z", "epx": "10.301", "epv": "50.6", "serialno": "000000002a1f1e34", "altitude": "38.393", "cputemp": 58.0, "eps": "37.16", "longitude": "-74.52923472", "ts": "2017-08-21 20:00:03", "public_ip": "71.168.184.247", "track": "236.6413", "host": "vid5", "mode": "3", "time": "2017-08-21T20:00:06.000Z", "latitude": "40.268194845", "climb": "-0.054", "speed": "0.513", "ept": "0.005"} 2017-08-21 16:20:33,199 INFO [Timer-Driven Process Thread-6] o.apache.nifi.remote.client.PeerSelector New Weighted Distribution of Nodes: PeerStatus[hostname=HW13125.local,port=8080,secure=false,flowFileCount=0] will receive 100.0% of data 2017-08-21 16:20:34,261 INFO [Timer-Driven Process Thread-6] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=MiniFi TensorFlowImage,targets=http://hw13125.local:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=f84767ec-c627-4b63-9e88-bba1dfb4eb9b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1503346615133-2, container=default, section=2], offset=2198, length=441],offset=0,name=3460526041973,size=441]] (441 bytes) to http://HW13125.local:8080/nifi-api in 117 milliseconds at a rate of 3.65 KB/sec {"ipaddress": "192.168.1.167", "utc": "2017-08-21T20:17:21.010Z", "epx": "10.301", "epv": "50.6", "serialno": "000000002a1f1e34", "altitude": "43.009", "cputemp": 52.0, "eps": "1.33", "longitude": "-74.529242206", "ts": "2017-08-21 20:16:55", "public_ip": "71.168.184.247", "track": "190.894", "host": "vid5", "mode": "3", "time": "2017-08-21T20:17:21.010Z", "latitude": "40.268159632", "climb": "0.022", "speed": "0.353", "ept": "0.005"} To collect our GPS information, below is my script called by MiniFi. Source: #! /usr/bin/python import os from gps import * from time import * import time import threading import json import time import colorsys import os import json import sys, socket import subprocess import time import datetime from time import sleep from time import gmtime, strftime import signal import time import urllib2 # Need sudo apt-get install gpsd gpsd-clients python-gps ntp # Based on #Author: Callum Pritchard, Joachim Hummel #Project Name: Flick 3D Gesture #Project Description: Sending Flick 3D Gesture sensor data to mqtt #Version Number: 0.1 #Date: 15/6/17 #Release State: Alpha testing #Changes: Created # Based on # Written by Dan Mandle http://dan.mandle.me September 2012 # License: GPL 2.0 # Based on: https://hortonworks.com/tutorial/analyze-iot-weather-station-data-via-connected-data-architecture/section/3/ #### Initialization # yyyy-mm-dd hh:mm:ss currenttime= strftime("%Y-%m-%d %H:%M:%S",gmtime()) external_IP_and_port = ('198.41.0.4', 53) # a.root-servers.net socket_family = socket.AF_INET host = os.uname()[1] def getCPUtemperature(): res = os.popen('vcgencmd measure_temp').readline() return(res.replace("temp=","").replace("'C\n","")) def IP_address(): try: s = socket.socket(socket_family, socket.SOCK_DGRAM) s.connect(external_IP_and_port) answer = s.getsockname() s.close() return answer[0] if answer else None except socket.error: return None # Get Raspberry Pi Serial Number def get_serial(): # Extract serial from cpuinfo file cpuserial = "0000000000000000" try: f = open('/proc/cpuinfo','r') for line in f: if line[0:6]=='Serial': cpuserial = line[10:26] f.close() except: cpuserial = "ERROR000000000" return cpuserial # Get Raspberry Pi Public IP via IPIFY Rest Call def get_public_ip(): ip = json.load(urllib2.urlopen('https://api.ipify.org/?format=json'))['ip'] return ip cpuTemp=int(float(getCPUtemperature())) ipaddress = IP_address() # Attempt to get Public IP public_ip = get_public_ip() # Attempt to get Raspberry Pi Serial Number serial = get_serial() gpsd = None class GpsPoller(threading.Thread): def __init__(self): threading.Thread.__init__(self) global gpsd #bring it in scope gpsd = gps(mode=WATCH_ENABLE) #starting the stream of info self.current_value = None self.running = True #setting the thread running to true def run(self): global gpsd while gpsp.running: gpsd.next() #this will continue to loop and grab EACH set of gpsd info to clear the buffer if __name__ == '__main__': gpsp = GpsPoller() # create the thread stopthis = False try: gpsp.start() # start it up while not stopthis: if gpsd.fix.latitude > 0: row = { 'latitude': str(gpsd.fix.latitude), 'longitude': str(gpsd.fix.longitude), 'utc': str(gpsd.utc), 'time': str(gpsd.fix.time), 'altitude': str(gpsd.fix.altitude), 'eps': str(gpsd.fix.eps), 'epx': str(gpsd.fix.epx), 'epv': str(gpsd.fix.epv), 'ept': str(gpsd.fix.ept), 'speed': str(gpsd.fix.speed), 'climb': str(gpsd.fix.climb), 'track': str(gpsd.fix.track), 'ts': currenttime, 'public_ip': public_ip, 'serialno': serial, 'host': host, 'cputemp': round(cpuTemp,2), 'ipaddress': ipaddress, 'mode': str(gpsd.fix.mode)} json_string = json.dumps(row) print json_string gpsp.running = False stopthis = True except (KeyboardInterrupt, SystemExit): #when you press ctrl+c gpsp.running = False gpsp.join() # wait for the thread to finish what it's doing Link https://github.com/tspannhw/dws2017sydney

TimothySpann · ‎08-21-2017

I upload a sample of the data incase you don't want to generate your own with mockaroo. It's in simplecsv.txt. Drop this file in the GetFile directory

TimothySpann · ‎08-16-2017

Technology: CSV, AVRO, Hortonworks Schema Registry, Apache NiFi, Streaming Analytics Manager, Kafka, Hadoop HDFS This is a simple example of reading CSV data, using it's schema to convert it to AVRO, sending it via Kafka to SAM. SAM then reads it and stores it to HDFS. This is a simple flow, but a start to setting up any level of complex flow. A number of questions have come up on how to setup this basic flow. Gotchas: Must send AVRO, beware of nulls, set your schema, put your schema name somewhere, create your kafka topic and make sure you have permissions. Building Sample CSV Data curl "http://api.mockaroo.com/api/ANID?count=1000&key=AKEY" > "simple.csv" ****You need to sign up for mockaroo and create your own schema. You will then need to add the key and account id to the above link. Import this schema to your account <strong>{ "id": 76671, "num_rows": 1000, "file_format": "csv", "name": "simple", "include_header": true, "columns": [ { "name": "first", "type": "First Name (Male)", "formula": "" }, { "name": "second", "type": "Row Number", "formula": "" } ] }</strong> Create a Kafka Topic (simple) ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partition 1 --topic simple You can Test Consuming the Messages ./kafka-console-consumer.sh --zookeeper princeton10.field.hortonworks.com:2181 --topic simple Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].{metadata.broker.list=princeton10.field.hortonworks.com:6667, request.timeout.ms=30000, client.id=console-consumer-7447, security.protocol=PLAINTEXT}aabaabb Apache NiFi 1.x Flow For CSV Ingest Set the Schema Name via Update Properties in NiFi Publish Kafka Record Settings (need CSV Reader and AVRO Record Set Writer) CSV Reader (Make sure your use the same Schema Registry, Access Strategy and Schema Name) Avro Writer (Make sure you use the same Schema Registry, Write Strategy and Schema Name) Add Your Simple Schema to the HSR Simple Streaming Analytics Manager Store to HDFS SAM Kafka Source: SAM HDFS Sink: Set a directory and what fields you want to output Result File in HDFS hdfs dfs -cat /simple/38-HDFS-2-0-1502904631885.txt a,1 a,2 b,1 a,5 a,-1 b,-5 b,9 a,1 a,2 b,1 a,5 a,-1 b,-5 b,9 Confirmations via Data Provenance Source File wc -l simple.csv 1000 simple.csv PublishKafka msg.count 1000 hdfs dfs -cat /simple/38-HDFS-2-3-1502906678558.txt | wc -l 472 hdfs dfs -cat /simple/38-HDFS-2-2-1502906677976.txt | wc -l 296 hdfs dfs -cat /simple/38-HDFS-2-1-1502906676961.txt | wc -l 232 Reference: https://community.hortonworks.com/questions/122811/schema-registry-error-from-nifi-unrecognized-field.html?childToView=122075#answer-122075 https://hortonworks.com/open-source/streaming-analytics-manager/ https://hortonworks.com/tutorial/real-time-event-processing-in-nifi-sam-schema-registry-and-superset/ Templates (Load greg.xml into NiFi via templates. Change simplejson.txt to simple.json and load into SAM) greg.xml simplejson.txt Sample Data simplecsv.txt Create a New Schema in the Registry from this Schema { "name": "simple", "type": "record", "fields": [ { "name": "first", "type": "string" }, { "name": "second", "type": "int" } ] }

TimothySpann · ‎08-16-2017

You have to set the schema name as an attribute make sure you create your kafka topic Make sure you setup csv reader and avro writer correctly Set the schema, setting content-type doesn't hurt Add schema to hortonworks schema registry, match those names simple set the reader and writer, server and topic configure the reader configure the writer create a topic before hand and then you can consume it after it get's pushed through NiFi ./kafka-console-consumer.sh --zookeeper princeton10.field.hortonworks.com:2181 --topic simple Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].{metadata.broker.list=princeton10.field.hortonworks.com:6667, request.timeout.ms=30000, client.id=console-consumer-7447, security.protocol=PLAINTEXT}aabaabb

TimothySpann · ‎08-16-2017

can you post the schema? can you post an example CSV? are you running kafka locally? i see a localhost link there. You must use the same version of NiFi and Schema Registry they must be from the same HDF 3.x release. So you can't download nifi.apache.org NiFi and try that.

TimothySpann · ‎08-16-2017

1. Go to Ambari 2. Under HBase, Under Service Actions, Restart Phoenix Query Servers. 3. Under HBase, go to configs, under Filter type in your parameter 4. Change it, save and restart. https://community.hortonworks.com/questions/68570/cannot-connect-to-phoenix-query-server-using-jdbc.html https://phoenix.apache.org/server.html https://community.hortonworks.com/articles/2663/phoenix-queryserver.html @joshelser https://community.hortonworks.com/questions/53827/phoenixdb-connection-to-phoenix-query-server.html

TimothySpann · ‎08-15-2017

Make sure your queue is empty, old bad messages might be sent. I would create a new empty queue add one checked valid message (check your avro message with avro tools) and send that.. No nulls!

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: Working with airbnb's Superset

Re: Using Sentiment Analysis and NLP Tools With HD...

Re: Can't access the 127.0.0.1:8888 port

Sensors and Image Capture and Deep Learning Analys...

Re: How to Build a S.A.M. Hello World!

How to Build a S.A.M. Hello World!

Re: Schema Registry metadata error from NiFi: Unre...

Re: Schema Registry metadata error from NiFi: Unre...

Re: Phoenixdb FetchAll not working and returning e...

Re: SAM error: com.hortonworks.registries.schemare...