1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2460 | 04-03-2024 06:39 AM | |
| 3807 | 01-12-2024 08:19 AM | |
| 2054 | 12-07-2023 01:49 PM | |
| 3040 | 08-02-2023 07:30 AM | |
| 4164 | 03-29-2023 01:22 PM |
01-10-2017
04:50 PM
https://help.sumologic.com/Send_Data/Sources/02Sources_for_Hosted_Collectors/HTTP_Source
... View more
11-11-2016
06:13 PM
8 Kudos
It's easier than I would have thought to add images to your SQL results tables in Apache Zeppelin. It's pretty simple to do this in HDP 2.5's version of Apache Zeppelin. You use the %html tag to output HTML instead of text. Use Case: Displaying Image with TensorFlow Inception Image Recognition Results in Same List Example SQL: SELECT user_name, handle, concat('%html <img width=50 height=60 src="', media_url, '">') as media,substring(inception,0,150) as inception, msg, sentiment, stanfordsentiment, location, time
FROM twitterorc where inception not like '%Not found%' and inception is not null and trim(inception)!= ''
... View more
Labels:
11-07-2016
04:48 PM
hdfs dfs -cat /music/meta/amclassical_beethoven_fur_elise.mp3
{"xmpDM:trackNumber":"","invokehttp.tx.id":"b4690a5a-ec60-4f68-8d1a-de9344eced8d","xmpDM:releaseDate":"2001","Server":"Apache","gethttp.remote.source":"192.168.1.2","xmpDM:artist":"A-M Classical","fragment.identifier":"d8b6da55-c512-42c5-b9a0-a93b0f32fd71","link":"http://www.amclassical.com/mp3/amclassical_beethoven_fur_elise.mp3","dc:creator":"A-M Classical","Last-Modified":"Sun, 05 Oct 2008 20:59:37 GMT","title":"Für Elise","xmpDM:audioChannelType":"Stereo","uuid":"d1e6b6cb-63cc-46a8-ac63-a588da01ecff","invokehttp.request.url":"http://www.amclassical.com/mp3/amclassical_beethoven_fur_elise.mp3","path":"./","xmpDM:logComment":"eng -","xmpDM:audioSampleRate":"44100","dc:title":"Für Elise","OkHttp-Sent-Millis":"1478537014168","segment.original.filename":"mp3.json","Content-Length":"3393536","Content-Type":"audio/mpeg","samplerate":"44100","Keep-Alive":"timeout=2, max=95","xmpDM:genre":"Classical","xmpDM:composer":"Ludwig van Beethoven","X-Parsed-By":"org.apache.tika.parser.DefaultParser, org.apache.tika.parser.mp3.Mp3Parser","creator":"A-M Classical","xmpDM:album":"","meta:author":"A-M Classical","invokehttp.status.code":"200","Connection":"Keep-Alive","fragment.index":"0","xmpDM:audioCompressor":"MP3","mime.type":"audio/mpeg","version":"MPEG 3 Layer III Version 1","Date":"Mon, 07 Nov 2016 16:38:13 GMT","Accept-Ranges":"bytes","descr":"Beethoven: Für Elise","filename":"amclassical_beethoven_fur_elise.mp3","OkHttp-Received-Millis":"1478537014194","channels":"2","ETag":"\"33c800-45887d8256040\"","Author":"A-M Classical","fragment.count":"28","invokehttp.status.message":"OK","xmpDM:duration":"211996.546875"} I added an ExtractMediaMetaData processor and examined the mp3s
... View more
11-07-2016
04:36 AM
2 Kudos
Use Case Before meetups, I wanted to play some music. NiFi seemed like a great choice for streaming free music through my Mac.
MIDI Command Line Player For OSX brew install timidity Then you can simply play MIDI files with timidity file.mid. Microservice to Extract Links from Web Pages Java 8 Source Code: https://github.com/tspannhw/linkextractor The Spring Boot REST API accepts a URL, parses out mid files and returns JSON containing linking and descriptions. Example REST Call to Service curl -G -v "http://<urL>:8080/extract/url?url=http://www.midiworld.com/classic.htm/&type=mid" Run the Microservice java -Xms512m -Xmx2048m -Djava.net.preferIPv4Stack=true -jar target/linkextractor-0.0.1-SNAPSHOT.jar Java Snippet Using JSoup to extract links from URL (HTML) pLink = new PrintableLink();
pLink.setLink(link.attr("abs:href"));
pLink.setDescr(trim(link.text(), 100));
linksReturned.add(pLink); Output hdfs dfs -ls /music/*.mid
-rw-r--r-- 3 tspann hdfs 87 2016-11-07 03:51 /music/2_ase.mid
-rw-r--r-- 3 tspann hdfs 99 2016-11-07 03:51 /music/4_mtking.mid
-rw-r--r-- 3 tspann hdfs 105 2016-11-07 03:50 /music/EspanjaCaphriccoCatalan.mid
-rw-r--r-- 3 tspann hdfs 87 2016-11-07 03:50 /music/EspanjaPrelude.mid
-rw-r--r-- 3 tspann hdfs 162 2016-11-07 03:50 /music/J_M_Bach_Auf_lasst_uns_den_Herren_loben.mid
-rw-r--r-- 3 tspann hdfs 93 2016-11-07 03:52 /music/adelina.mid
-rw-r--r-- 3 tspann hdfs 95 2016-11-07 03:52 /music/aida_ii2.mid
-rw-r--r-- 3 tspann hdfs 89 2016-11-07 03:50 /music/al_adagi.mid
-rw-r--r-- 3 tspann hdfs 95 2016-11-07 03:52 /music/alborada.mid
-rw-r--r-- 3 tspann hdfs 82 2016-11-07 03:52 /music/aquarium.mid
-rw-r--r-- 3 tspann hdfs 105 2016-11-07 03:52 /music/barbero.mid
-rw-r--r-- 3 tspann hdfs 101 2016-11-07 03:51 /music/barimyst.mid
-rw-r--r-- 3 tspann hdfs 80 2016-11-07 03:52 /music/beevar2.mid
-rw-r--r-- 3 tspann hdfs 111 2016-11-07 03:50 /music/biz_arls.mid
-rw-r--r-- 3 tspann hdfs 94 2016-11-07 03:51 /music/blas1.mid
-rw-r--r-- 3 tspann hdfs 114 2016-11-07 03:50 /music/boccher.mid
-rw-r--r-- 3 tspann hdfs 78 2016-11-07 03:52 /music/bolero.mid
-rw-r--r-- 3 tspann hdfs 100 2016-11-07 03:51 /music/cantique.mid
-rw-r--r-- 3 tspann hdfs 88 2016-11-07 03:51 /music/carminab.mid
-rw-r--r-- 3 tspann hdfs 96 2016-11-07 03:51 /music/clairdelune.mid
-rw-r--r-- 3 tspann hdfs 87 2016-11-07 03:51 /music/cmveder.mid
-rw-r--r-- 3 tspann hdfs 108 2016-11-07 03:52 /music/coucou.mid
-rw-r--r-- 3 tspann hdfs 99 2016-11-07 03:51 /music/coup8a.mid
-rw-r--r-- 3 tspann hdfs 97 2016-11-07 03:51 /music/cpf-bird.mid
2016-11-06 22:57:20.095 ERROR 28694 --- [nio-8080-exec-1] com.dataflowdeveloper.DataController : Query:http://www.midiworld.com/classic.htm/ mid,IP:192.168.1.2 Browser:nifi-agent
2016-11-06 22:57:20.313 ERROR 28694 --- [nio-8080-exec-3] com.dataflowdeveloper.DataController : Query:http://www.midiworld.com/classic.htm/ mid,IP:192.168.1.2 Browser:nifi-agent
2016-11-06 22:57:20.500 ERROR 28694 --- [nio-8080-exec-5] com.dataflowdeveloper.DataController : Query:http://www.midiworld.com/classic.htm/ mid,IP:192.168.1.2 Browser:nifi-agent
ls -lt /opt/demo/midi | more
total 20456
-rw-r--r-- 1 tspann staff 117731 Nov 6 22:58 appspg13.mid
-rw-r--r-- 1 tspann staff 13449 Nov 6 22:58 intrlude.mid
-rw-r--r-- 1 tspann staff 8777 Nov 6 22:58 latalant.mid
-rw-r--r-- 1 tspann staff 1911 Nov 6 22:58 lbvar2.mid
-rw-r--r-- 1 tspann staff 2230 Nov 6 22:58 lbvar4.mid
-rw-r--r-- 1 tspann staff 1458 Nov 6 22:58 lbvar6ep.mid
NiFi Flow GetHTTP: Call JSoup Microservice that converts HTML page full of MIDI links into JSON file of links and descriptions of MIDI files. SplitJSON: Split that big JSON file into individual link, description pairs for working with individual songs. EvaluateJSONPath: Use JSONPATH to pull out link and description as attributes. InvokeHTTP: Download the MIDI file from the link. UpdateAttribute: Give it a good file name. I just want the file name from the link (example http://sdfsdf:8080/test.mid PutFile: Store the MIDI on the OSX filesystem. ExecuteStreamCommand: Run the timidity CLI to play the MIDI file. Pass a link to the stored MIDI file to timidity player. PutHDFS: Store the MIDI on HDP 2.5 HDFS.
References http://jsonpath.com/ http://macappstore.org/timidity/
... View more
Labels:
11-03-2016
08:08 PM
1 Kudo
Starting My Hadoop Tools NiFi can interface directly with Hive, HDFS, HBase, Flume and Phoenix. And I can also trigger Spark and Flink through Kafka and Site-To-Site. Sometimes I need to run some Pig scripts. Apache Pig is very stable and has a lot of functions and tools that make for some smart processing. You can easily augment and add this piece to a larger pipeline or part of the process. Pig Setup I like to use Ambari to install the HDP 2.5 clients on my NiFi box to have access to all the tools I may need. Then I can just do: yum install pig Pig to Apache NiFi 1.0.0 ExecuteProcess We call a shell script that wraps the Pig script. Output of script is stored to HDFS: hdfs dfs -ls /nifi-logs
Shell Script export JAVA_HOME=/opt/jdk1.8.0_101/
pig -x local -l /tmp/pig.log -f /opt/demo/pigscripts/test.pig
You can run in different Pig modes like local, mapreduce and tez. You can also pass in parameters or the script. Pig Script messages = LOAD '/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log';
warns = FILTER messages BY $0 MATCHES '.*WARN+.*';
DUMP warns
store warns into 'warns.out'
This is a basic example from the internet, with the NIFI 1.0 log used as the source. As an aside, I run a daily script with the schedule 1 * * * * ? to clean up my logs. Simply: /bin/rm -rf /opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/*2016* PutHDFS Hadoop Configuration: /etc/hadoop/conf/core-site.xml Pick a directory and store away. Results HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures
2.7.3.2.5.0.0-12450.16.0.2.5.0.0-1245root2016-11-03 19:53:572016-11-03 19:53:59FILTER
Success!
Job Stats (time in seconds):
JobIdMapsReducesMaxMapTimeMinMapTimeAvgMapTimeMedianMapTimeMaxReduceTimeMinReduceTimeAvgReduceTimeMedianReducetimeAliasFeatureOutputs
job_local72884441_000110n/an/an/an/a0000messages,warnsMAP_ONLYfile:/tmp/temp1540654561/tmp-600070101,
Input(s):
Successfully read 30469 records from: "/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log"
Output(s):
Successfully stored 1347 records in: "file:/tmp/temp1540654561/tmp-600070101"
Counters:
Total records written : 1347
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local72884441_0001 Reference: http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_5 http://hortonworks.com/apache/pig/#section_2 http://hortonworks.com/blog/jsonize-anything-in-pig-with-tojson/ https://github.com/dbist/pig https://github.com/sudar/pig-samples http://hortonworks.com/hadoop-tutorial/how-to-use-basic-pig-commands/ http://hadooptutorial.info/built-in-load-store-functions-in-pig/ https://cwiki.apache.org/confluence/display/PIG/PigTutorial https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_installing_manually_book/content/validate_the_installation_pig.html http://pig.apache.org/docs/r0.16.0/start.html http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig https://github.com/alanfgates/programmingpig/tree/master/examples/ch2
... View more
Labels:
11-15-2016
04:23 PM
A big note run visudo make sure there is no Defaults requiretty That will block ambari agent from doing sudo within installs which is needs.
... View more
11-16-2017
10:06 PM
incrementalstream-1.xml
... View more
03-16-2018
03:15 PM
now use record processor
... View more
10-24-2016
10:13 PM
1 Kudo
SysDig SysDig (Github) is an open source tool that allows for the exploration, analysis and trouble shooting of Linux systems and containers. It is well documented and very easy to install and use. It can be used for container and Linux system diagnostics, security analysis, monitoring and basic system information capture. Remember that sysdig can produces thousands of lines of messages and can continue doing so forever depending on the options selected. Check out the examples and read through all the options, you can monitor a ton of data really fast and also check for security anomalies. NiFi Ingesting SysDig Sysdig can produce amazing amounts of logs. I chose to ingest 1 second chunks as ASCII JSON. I selected those options and listed them below. The results are arrays of JSON. I decided it's best to save them as a large JSON files for now and convert them to ORC later for Hive analysis in Zeppelin. You could also split them into individual JSON rows and process those. I also save them to Apache Phoenix for fast queries. ExecuteProcess Command sysdig -A -j -M 1 --unbuffered I just wrap that in a shell script for neatness. HDF 2.0 / NIFI 1.0.0 Flow Event JSON from SysDig {"evt.cpu":6,"evt.dir":">","evt.info":"fd=7(<f>/usr/lib64/python2.7/lib-dynload/_elementtree.so) ","evt.num":111138,"evt.outputtime":1477313882635597873,"evt.type":"fstat",
"proc.name":"python","thread.tid":14602}
Apache Phoenix Table CREATE TABLE sysdigevents
(
evtcpu varchar,
evtdir varchar,
evtinfo varchar,
evtoutputtime varchar,
evttype varchar,
procname varchar,
threadtid varchar,
evtnum varchar not null primary key
);
Links Sysdig Examples Sysdig Cheatsheet mapping to legacy tools Monitor Linux Server with sysdig NiFi Flow sysdig.xml
... View more
Labels:
10-14-2016
07:16 PM
2 Kudos
Sometimes you have Java messages that you would like to easily ingest into HDFS or perhaps HDFS as raw files, Phoenix, Hive and other destinations. You can do that pretty easy with Apache NiFi 1.0.0 as part of HDF 2.0. For this simple example, I also added a REST gateway for bulk loading, testing and to provide another way to easily send JMS messages. ListenHTTP accepts HTTP POSTS on port 8099, which I made the listener port for that processor. It takes what you send and publishes that to a JMS queue. I am using ActiveMQ. I have a little Python 2.7 script that I found on github that makes fake log records and modified it to send 1,000 JSON messages via REST to our REST to JMS gateway in NIFI for testing. You can easily do this with shell script and CURL, Apache JMeter, Java code, Go script and many other open source REST testers and clients. url = 'http://server.com:8099/contentListener'
r = requests.post(url, json={"rtimestamp": timestamp, "ip": random_ip(), "country": country, "status": status}) I installed an ActiveMQ JMS broker as my example JMS server, which is very simple on Centos 7. All you need to do is download the gziped tar and untar it. It's ready to run with a chmod. That download also includes the client jar that we will need on the HDF 2.0 server for accessing the message queue server. You must also have the port open. On ActiveMQ that defaults to 61616. ActiveMQ also includes a nice web console that you may want to unblock that port for viewing the status of queues and messages. In my simple example, I am running JMS via: bin/activemq start > /tmp/smlog 2>&1 &; I recommend changing your HTTP Listening Port, so you can run a bunch of these processors as needed. Processors used: ConsumeJMS, MergeContent and PutHDFS. You need to set Destination Name which is the name of the QUEUE in this case, but could also be the name of the Topic. I picked Destination Type of QUEUE since I am using a QUEUE in Apache ActiveMQ. It's very easy to add more output processors for sinking data into Apache Phoenix, HBase, Hive, Email, Slack and other NoSQL stores. It's also easy to convert messages into AVRO, ORC and other optimized big data file formats. As you see we get a number of jms_ attributes including priority, message ID and other attributes associated with the JMS message. Example Message
ActiveMQ Screens References:
https://community.hortonworks.com/articles/59349/hdf-20-flow-for-ingesting-real-time-tweets-from-st.html https://community.hortonworks.com/articles/59975/ingesting-edi-into-hdfs-using-hdf-20.html http://activemq.apache.org/uri-protocols.html http://activemq.apache.org/initial-configuration.html http://activemq.apache.org/version-5-getting-started.html http://www.apache.org/dyn/closer.cgi?filename=/activemq/5.14.1/apache-activemq-5.14.1-bin.tar.gz&action=download
... View more
Labels: