1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1843 | 04-03-2024 06:39 AM | |
| 2875 | 01-12-2024 08:19 AM | |
| 1585 | 12-07-2023 01:49 PM | |
| 2349 | 08-02-2023 07:30 AM | |
| 3241 | 03-29-2023 01:22 PM |
11-11-2016
06:13 PM
8 Kudos
It's easier than I would have thought to add images to your SQL results tables in Apache Zeppelin. It's pretty simple to do this in HDP 2.5's version of Apache Zeppelin. You use the %html tag to output HTML instead of text. Use Case: Displaying Image with TensorFlow Inception Image Recognition Results in Same List Example SQL: SELECT user_name, handle, concat('%html <img width=50 height=60 src="', media_url, '">') as media,substring(inception,0,150) as inception, msg, sentiment, stanfordsentiment, location, time
FROM twitterorc where inception not like '%Not found%' and inception is not null and trim(inception)!= ''
... View more
Labels:
04-18-2019
06:17 PM
This is an old thread, but for anyone looking at it, count(1) is the same query as count(*), so there are no performance benefits to using one over the other.
... View more
11-23-2016
07:00 PM
2 Kudos
@bala krishnan It works for me when I set Replacement Strategy to "Literal Replace": my input file has control-a (but no \001) and my output file has control-a followed by test. When I use the default Replacement Value ("Regex Replace") my output file has \001test
... View more
11-08-2016
08:11 PM
1 Kudo
Avro doesn't like the dots in the attribute names (which become field names), perhaps you could rename them (with UpdateAttribute or at the source processor(s)) using underscores (or other valid characters like alphanumeric characters, see the link for the rules).
... View more
06-08-2018
05:25 PM
Hi. You can check @Tran Quyet Thang https://github.com/iheb-boughzala/AllFbPostsUsingNifi
... View more
11-09-2016
08:58 PM
Was the HTML content actually valid XML? Did the content viewer open? If so, is the content viewer unable to show the content in it's 'formatted' form? What about the 'original' or 'hex' forms?
... View more
11-07-2016
04:48 PM
hdfs dfs -cat /music/meta/amclassical_beethoven_fur_elise.mp3
{"xmpDM:trackNumber":"","invokehttp.tx.id":"b4690a5a-ec60-4f68-8d1a-de9344eced8d","xmpDM:releaseDate":"2001","Server":"Apache","gethttp.remote.source":"192.168.1.2","xmpDM:artist":"A-M Classical","fragment.identifier":"d8b6da55-c512-42c5-b9a0-a93b0f32fd71","link":"http://www.amclassical.com/mp3/amclassical_beethoven_fur_elise.mp3","dc:creator":"A-M Classical","Last-Modified":"Sun, 05 Oct 2008 20:59:37 GMT","title":"Für Elise","xmpDM:audioChannelType":"Stereo","uuid":"d1e6b6cb-63cc-46a8-ac63-a588da01ecff","invokehttp.request.url":"http://www.amclassical.com/mp3/amclassical_beethoven_fur_elise.mp3","path":"./","xmpDM:logComment":"eng -","xmpDM:audioSampleRate":"44100","dc:title":"Für Elise","OkHttp-Sent-Millis":"1478537014168","segment.original.filename":"mp3.json","Content-Length":"3393536","Content-Type":"audio/mpeg","samplerate":"44100","Keep-Alive":"timeout=2, max=95","xmpDM:genre":"Classical","xmpDM:composer":"Ludwig van Beethoven","X-Parsed-By":"org.apache.tika.parser.DefaultParser, org.apache.tika.parser.mp3.Mp3Parser","creator":"A-M Classical","xmpDM:album":"","meta:author":"A-M Classical","invokehttp.status.code":"200","Connection":"Keep-Alive","fragment.index":"0","xmpDM:audioCompressor":"MP3","mime.type":"audio/mpeg","version":"MPEG 3 Layer III Version 1","Date":"Mon, 07 Nov 2016 16:38:13 GMT","Accept-Ranges":"bytes","descr":"Beethoven: Für Elise","filename":"amclassical_beethoven_fur_elise.mp3","OkHttp-Received-Millis":"1478537014194","channels":"2","ETag":"\"33c800-45887d8256040\"","Author":"A-M Classical","fragment.count":"28","invokehttp.status.message":"OK","xmpDM:duration":"211996.546875"} I added an ExtractMediaMetaData processor and examined the mp3s
... View more
11-07-2016
04:36 AM
2 Kudos
Use Case Before meetups, I wanted to play some music. NiFi seemed like a great choice for streaming free music through my Mac.
MIDI Command Line Player For OSX brew install timidity Then you can simply play MIDI files with timidity file.mid. Microservice to Extract Links from Web Pages Java 8 Source Code: https://github.com/tspannhw/linkextractor The Spring Boot REST API accepts a URL, parses out mid files and returns JSON containing linking and descriptions. Example REST Call to Service curl -G -v "http://<urL>:8080/extract/url?url=http://www.midiworld.com/classic.htm/&type=mid" Run the Microservice java -Xms512m -Xmx2048m -Djava.net.preferIPv4Stack=true -jar target/linkextractor-0.0.1-SNAPSHOT.jar Java Snippet Using JSoup to extract links from URL (HTML) pLink = new PrintableLink();
pLink.setLink(link.attr("abs:href"));
pLink.setDescr(trim(link.text(), 100));
linksReturned.add(pLink); Output hdfs dfs -ls /music/*.mid
-rw-r--r-- 3 tspann hdfs 87 2016-11-07 03:51 /music/2_ase.mid
-rw-r--r-- 3 tspann hdfs 99 2016-11-07 03:51 /music/4_mtking.mid
-rw-r--r-- 3 tspann hdfs 105 2016-11-07 03:50 /music/EspanjaCaphriccoCatalan.mid
-rw-r--r-- 3 tspann hdfs 87 2016-11-07 03:50 /music/EspanjaPrelude.mid
-rw-r--r-- 3 tspann hdfs 162 2016-11-07 03:50 /music/J_M_Bach_Auf_lasst_uns_den_Herren_loben.mid
-rw-r--r-- 3 tspann hdfs 93 2016-11-07 03:52 /music/adelina.mid
-rw-r--r-- 3 tspann hdfs 95 2016-11-07 03:52 /music/aida_ii2.mid
-rw-r--r-- 3 tspann hdfs 89 2016-11-07 03:50 /music/al_adagi.mid
-rw-r--r-- 3 tspann hdfs 95 2016-11-07 03:52 /music/alborada.mid
-rw-r--r-- 3 tspann hdfs 82 2016-11-07 03:52 /music/aquarium.mid
-rw-r--r-- 3 tspann hdfs 105 2016-11-07 03:52 /music/barbero.mid
-rw-r--r-- 3 tspann hdfs 101 2016-11-07 03:51 /music/barimyst.mid
-rw-r--r-- 3 tspann hdfs 80 2016-11-07 03:52 /music/beevar2.mid
-rw-r--r-- 3 tspann hdfs 111 2016-11-07 03:50 /music/biz_arls.mid
-rw-r--r-- 3 tspann hdfs 94 2016-11-07 03:51 /music/blas1.mid
-rw-r--r-- 3 tspann hdfs 114 2016-11-07 03:50 /music/boccher.mid
-rw-r--r-- 3 tspann hdfs 78 2016-11-07 03:52 /music/bolero.mid
-rw-r--r-- 3 tspann hdfs 100 2016-11-07 03:51 /music/cantique.mid
-rw-r--r-- 3 tspann hdfs 88 2016-11-07 03:51 /music/carminab.mid
-rw-r--r-- 3 tspann hdfs 96 2016-11-07 03:51 /music/clairdelune.mid
-rw-r--r-- 3 tspann hdfs 87 2016-11-07 03:51 /music/cmveder.mid
-rw-r--r-- 3 tspann hdfs 108 2016-11-07 03:52 /music/coucou.mid
-rw-r--r-- 3 tspann hdfs 99 2016-11-07 03:51 /music/coup8a.mid
-rw-r--r-- 3 tspann hdfs 97 2016-11-07 03:51 /music/cpf-bird.mid
2016-11-06 22:57:20.095 ERROR 28694 --- [nio-8080-exec-1] com.dataflowdeveloper.DataController : Query:http://www.midiworld.com/classic.htm/ mid,IP:192.168.1.2 Browser:nifi-agent
2016-11-06 22:57:20.313 ERROR 28694 --- [nio-8080-exec-3] com.dataflowdeveloper.DataController : Query:http://www.midiworld.com/classic.htm/ mid,IP:192.168.1.2 Browser:nifi-agent
2016-11-06 22:57:20.500 ERROR 28694 --- [nio-8080-exec-5] com.dataflowdeveloper.DataController : Query:http://www.midiworld.com/classic.htm/ mid,IP:192.168.1.2 Browser:nifi-agent
ls -lt /opt/demo/midi | more
total 20456
-rw-r--r-- 1 tspann staff 117731 Nov 6 22:58 appspg13.mid
-rw-r--r-- 1 tspann staff 13449 Nov 6 22:58 intrlude.mid
-rw-r--r-- 1 tspann staff 8777 Nov 6 22:58 latalant.mid
-rw-r--r-- 1 tspann staff 1911 Nov 6 22:58 lbvar2.mid
-rw-r--r-- 1 tspann staff 2230 Nov 6 22:58 lbvar4.mid
-rw-r--r-- 1 tspann staff 1458 Nov 6 22:58 lbvar6ep.mid
NiFi Flow GetHTTP: Call JSoup Microservice that converts HTML page full of MIDI links into JSON file of links and descriptions of MIDI files. SplitJSON: Split that big JSON file into individual link, description pairs for working with individual songs. EvaluateJSONPath: Use JSONPATH to pull out link and description as attributes. InvokeHTTP: Download the MIDI file from the link. UpdateAttribute: Give it a good file name. I just want the file name from the link (example http://sdfsdf:8080/test.mid PutFile: Store the MIDI on the OSX filesystem. ExecuteStreamCommand: Run the timidity CLI to play the MIDI file. Pass a link to the stored MIDI file to timidity player. PutHDFS: Store the MIDI on HDP 2.5 HDFS.
References http://jsonpath.com/ http://macappstore.org/timidity/
... View more
Labels:
11-03-2016
08:08 PM
1 Kudo
Starting My Hadoop Tools NiFi can interface directly with Hive, HDFS, HBase, Flume and Phoenix. And I can also trigger Spark and Flink through Kafka and Site-To-Site. Sometimes I need to run some Pig scripts. Apache Pig is very stable and has a lot of functions and tools that make for some smart processing. You can easily augment and add this piece to a larger pipeline or part of the process. Pig Setup I like to use Ambari to install the HDP 2.5 clients on my NiFi box to have access to all the tools I may need. Then I can just do: yum install pig Pig to Apache NiFi 1.0.0 ExecuteProcess We call a shell script that wraps the Pig script. Output of script is stored to HDFS: hdfs dfs -ls /nifi-logs
Shell Script export JAVA_HOME=/opt/jdk1.8.0_101/
pig -x local -l /tmp/pig.log -f /opt/demo/pigscripts/test.pig
You can run in different Pig modes like local, mapreduce and tez. You can also pass in parameters or the script. Pig Script messages = LOAD '/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log';
warns = FILTER messages BY $0 MATCHES '.*WARN+.*';
DUMP warns
store warns into 'warns.out'
This is a basic example from the internet, with the NIFI 1.0 log used as the source. As an aside, I run a daily script with the schedule 1 * * * * ? to clean up my logs. Simply: /bin/rm -rf /opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/*2016* PutHDFS Hadoop Configuration: /etc/hadoop/conf/core-site.xml Pick a directory and store away. Results HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures
2.7.3.2.5.0.0-12450.16.0.2.5.0.0-1245root2016-11-03 19:53:572016-11-03 19:53:59FILTER
Success!
Job Stats (time in seconds):
JobIdMapsReducesMaxMapTimeMinMapTimeAvgMapTimeMedianMapTimeMaxReduceTimeMinReduceTimeAvgReduceTimeMedianReducetimeAliasFeatureOutputs
job_local72884441_000110n/an/an/an/a0000messages,warnsMAP_ONLYfile:/tmp/temp1540654561/tmp-600070101,
Input(s):
Successfully read 30469 records from: "/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log"
Output(s):
Successfully stored 1347 records in: "file:/tmp/temp1540654561/tmp-600070101"
Counters:
Total records written : 1347
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local72884441_0001 Reference: http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_5 http://hortonworks.com/apache/pig/#section_2 http://hortonworks.com/blog/jsonize-anything-in-pig-with-tojson/ https://github.com/dbist/pig https://github.com/sudar/pig-samples http://hortonworks.com/hadoop-tutorial/how-to-use-basic-pig-commands/ http://hadooptutorial.info/built-in-load-store-functions-in-pig/ https://cwiki.apache.org/confluence/display/PIG/PigTutorial https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_installing_manually_book/content/validate_the_installation_pig.html http://pig.apache.org/docs/r0.16.0/start.html http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig https://github.com/alanfgates/programmingpig/tree/master/examples/ch2
... View more
Labels:
11-03-2016
03:22 PM
Hello, thanks everyone for the prompt response. With some aid I was able to figure it out
Mostly my problem was to understand the difference between the Grouping Regular Expression and extracting the date parameter which in my case are pretty much the same expression. Also I have to admit that the RouteText.Group attribute was not something easy to find even in the documentation.
I feel that reading a TCP connection with logs and store it partitioned directly to a Hive table should be a fairly common use case, so I'm attaching the template as a grain of sand contribution. recordtexttopartition.xml Thanks again
... View more