1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2455 | 04-03-2024 06:39 AM | |
| 3806 | 01-12-2024 08:19 AM | |
| 2052 | 12-07-2023 01:49 PM | |
| 3034 | 08-02-2023 07:30 AM | |
| 4153 | 03-29-2023 01:22 PM |
11-22-2016
10:44 PM
3 Kudos
Download URLCrazy (http://www.morningstarsecurity.com/downloads/urlcrazy-0.5.tar.gz)
An Example Command Line Run for URLCrazy
[root@tspanndev13 security]# ./url.sh dataflowdeveloper.com
Typo Type,Typo,Valid,Pop,DNS-A,CC-A,Country-A,DNS-MX,Extn
Character Omission,daaflowdeveloper.com,true,,,?,,com,
Character Omission,dataflodeveloper.com,true,,,?,,com,
Character Omission,dataflowdeeloper.com,true,,,?,,com,
Character Omission,dataflowdeveloer.com,true,,,?,,com,
Character Omission,dataflowdevelope.com,true,,,?,,com,
Character Omission,dataflowdeveloper.cm,true,,,?,,cm,
Character Omission,dataflowdeveloper.co,false,,,?,,,
Character Omission,dataflowdeveloper.om,false,,,?,,,
Character Omission,dataflowdevelopercom,false,,,?,,,
...
Shell Script to Call From Apache NiFi
/opt/demo/security/urlcrazy-0.5/urlcrazy -i -f csv -p $@
An Example Command Line Run for NSLookup
Non-authoritative answer:
sparkdeveloper.com text = "v=spf1 ip4:00.000.0.0/24 ip4:00.000.00.0/24 ip4:11.111.111.0/19 ?all"
The Final JSON Output:
{
"path" : "./",
"execution.command" : "/opt/demo/security/url.sh",
"urlcrazy" : "Typo Type,Typo,Valid,Pop,DNS-A,CC-A,Country-A,DNS-MX,Extn\nCharacter Omission,sarkdeveloper.com,true,,,?,,com,\nCharacter Omission,spakdeveloper.com,true,,,?,,com,\nCharacter Omission,spardeveloper.com,true,,,?,,com,\nCharacter Omission,sparkdeeloper.com,true,,,?,,com,\nCharacter Omission,sparkdeveloer.com,true,,,?,,com,\nCharacter Omission,sparkdevelope.com,true,543,,?,,com,\nCharacter Omission,sparkdeveloper.cm,true,214000,,?,,cm,\nCharacter Omission,sparkdeveloper.co,false,,,?,,,\nCharacter Omission,sparkdeveloper.om,false,,,?,,,\nCharacter Omission,sparkdevelopercom,false,,,?,,,\nCharacter Omission,sparkdevelopr.com,true,,,?,,com,\nCharacter Omission,sparkdevelper.com,true,2190,,?,,com,\nCharacter Omission,sparkdeveoper.com,true,,,?,,com,\nCharacter Omission,sparkdevloper.com,true,2230,,?,,com,\nCharacter Omission,sparkdveloper.com,true,,,?,,com,\nCharacter Omission,sparkeveloper.com,true,,,?,,com,\nCharacter Omission,sprkdeveloper.com,true,,,?,,com,\nCharacter Repeat,spaarkdeveloper.com,true,,,?,,com,\nCharacter Repeat,sparkddeveloper.com,true,,,?,,com,\nCharacter Repeat,sparkdeeveloper.com,true,,,?,,com,\nCharacter Repeat,sparkdeveeloper.com,true,,,?,,com,\nCharacter Repeat,sparkdevelloper.com,true,,,?,,com,\nCharacter Repeat,sparkdevelooper.com,true,,,?,,com,\nCharacter Repeat,sparkdevelopeer.com,true,,,?,,com,\nCharacter Repeat,sparkdeveloper..com,false,,,?,,com,\nCharacter Repeat,sparkdeveloper.ccom,false,,,?,,,\nCharacter Repeat,sparkdeveloper.comm,false,,,?,,,\nCharacter Repeat,sparkdeveloper.coom,false,,,?,,,\nCharacter Repeat,sparkdeveloperr.com,true,2120,,?,,com,\nCharacter Repeat,sparkdevelopper.com,true,203,,?,,com,\nCharacter Repeat,sparkdevveloper.com,true,,,?,,com,\nCharacter Repeat,sparkkdeveloper.com,true,,,?,,com,\nCharacter Repeat,sparrkdeveloper.com,true,,,?,,com,\nCharacter Repeat,spparkdeveloper.com,true,,,?,,com,\nCharacter Repeat,ssparkdeveloper.com,true,,,?,,com,\nCharacter Swap,psarkdeveloper.com,true,,,?,,com,\nCharacter Swap,saprkdeveloper.com,true,,,?,,com,\nCharacter Swap,spakrdeveloper.com,true,,,?,,com,\nCharacter Swap,spardkeveloper.com,true,,,?,,com,\nCharacter Swap,sparkdeevloper.com,true,,,?,,com,\nCharacter Swap,sparkdeveloepr.com,true,,,?,,com,\nCharacter Swap,sparkdevelope.rcom,false,,,?,,,\nCharacter Swap,sparkdeveloper.cmo,false,,,?,,,\nCharacter Swap,sparkdeveloper.ocm,false,,,?,,,\nCharacter Swap,sparkdeveloperc.om,false,,,?,,,\nCharacter Swap,sparkdevelopre.com,true,,,?,,com,\nCharacter Swap,sparkdevelpoer.com,true,,,?,,com,\nCharacter Swap,sparkdeveolper.com,true,,,?,,com,\nCharacter Swap,sparkdevleoper.com,true,,,?,,com,\nCharacter Swap,sparkdveeloper.com,true,,,?,,com,\nCharacter Swap,sparkedveloper.com,true,,,?,,com,\nCharacter Swap,sprakdeveloper.com,true,18,,?,,com,\nCharacter Replacement,aparkdeveloper.com,true,129,,?,,com,\nCharacter Replacement,dparkdeveloper.com,true,,,?,,com,\nCharacter Replacement,soarkdeveloper.com,true,,,?,,com,\nCharacter Replacement,spaekdeveloper.com,true,,,?,,com,\nCharacter Replacement,sparjdeveloper.com,true,,,?,,com,\nCharacter Replacement,sparkdebeloper.com,true,,,?,,com,\nCharacter Replacement,sparkdeceloper.com,true,,,?,,com,\nCharacter Replacement,sparkdevekoper.com,true,,,?,,com,\nCharacter Replacement,sparkdeveliper.com,true,,,?,,com,\nCharacter Replacement,sparkdevelooer.com,true,92,,?,,com,\nCharacter Replacement,sparkdevelopee.com,true,,,?,,com,\nCharacter Replacement,sparkdeveloper.cim,false,,,?,,,\nCharacter Replacement,sparkdeveloper.con,false,,,?,,,\nCharacter Replacement,sparkdeveloper.cpm,false,,,?,,,\nCharacter Replacement,sparkdeveloper.vom,false,,,?,,,\nCharacter Replacement,sparkdeveloper.xom,false,,,?,,,\nCharacter Replacement,sparkdevelopet.com,true,,,?,,com,\nCharacter Replacement,sparkdeveloprr.com,true,,,?,,com,\nCharacter Replacement,sparkdevelopwr.com,true,,,?,,com,\nCharacter Replacement,sparkdevelpper.com,true,,,?,,com,\nCharacter Replacement,sparkdevrloper.com,true,,,?,,com,\nCharacter Replacement,sparkdevwloper.com,true,,,?,,com,\nCharacter Replacement,sparkdrveloper.com,true,,,?,,com,\nCharacter Replacement,sparkdwveloper.com,true,,,?,,com,\nCharacter Replacement,sparkfeveloper.com",
"filename" : "4963644600105857",
"execution.command.args" : "sparkdeveloper.com",
"execution.status" : "0",
"spf" : "Server:\t\t10.42.1.20\nAddress:\t10.42.1.20#53\n\nNon-authoritative answer:\nsparkdeveloper.com\ttext = \"v=spf1 ip4:38.113.1.0/24 ip4:38.113.20.0/24 ip4:65.254.224.0/19 ?all\"\n\nAuthoritative answers can be found from:\n\n",
"execution.error" : "",
"uuid" : "f13ca0f5-bac7-4da7-b5c3-8b1c145591bf",
"url" : "sparkdeveloper.com",
"enrich.dns.record0.group0" : "\"v=spf1 ip4:00.000.0.0/24 ip4:00.000.00.0/24 ip4:11.111.111.0/19 ?all\""
}
You can grab lots of different command line and REST results for augmenting existing data, tools and feeds.
An URL Crazy report is useful for intelligence on what other domains people may be squatting on that are close to yours. Often these can be used by spammers, malware and for other nefarious purposes.
... View more
Labels:
11-18-2016
09:37 PM
1 Kudo
Use Case: Store log data in Hadoop Data Lake and Send Curated, Reduced Set to Sumologic via REST API Integration The integration point for sending log data to Sumologic is their HTTP Source. To send data you must setup an HTTP Source in Sumologic from your web console as shown below. Take the HTTP string they give to you and put into an InvokeHTTP processor with POST, it will look something like this: https://endpoint1.collection.us2.sumologic.com/receiver/v1/http/ZaLongCodeLong I noticed IP Address in the data, so I decided to parse it out: ${regex.6:substringAfterLast('source ip: '):replaceAll('\)','')} Then send it to MaxMind for processing. The MaxMind GeoIP free database is easy to download and use with NiFi. Just add the GeoIP processor and connect the field and the file location. Finally displaying and charting data is up next, easy as pie in Zeppelin. Just query my Phoenix data.
The flow is a bit long as I am using RegEx to convert the logs from NiFi Log4J format to individual attributes then make them into a JSON file and convert to SQL upsert for Phoenix insert. I log all failures to a file.
Transmitting Log Data It's pretty easy to integrate with Sumologic. They have a nice HTTP endpoint to send this data. They will accept JSON and many other text formats. They have a native agent, which can be interfaced with as well via several logging mechanisms. I asked them about it and I may work on that in the future. Apache Phoenix Table for Log Data 0: jdbc:phoenix:tspannserver> !describe nifilogs
+------------+--------------+-------------+--------------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+---------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULLABLE | REMARKS |
+------------+--------------+-------------+--------------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+---------+
| | | NIFILOGS | SDATE | 12 | VARCHAR | null | null | null | null | 1 | |
| | | NIFILOGS | FRAGID | 12 | VARCHAR | null | null | null | null | 0 | |
| | | NIFILOGS | MSG | 12 | VARCHAR | null | null | null | null | 1 | |
| | | NIFILOGS | MODULE | 12 | VARCHAR | null | null | null | null | 1 | |
| | | NIFILOGS | STIME | 12 | VARCHAR | null | null | null | null | 1 | |
| | | NIFILOGS | STYPE | 12 | VARCHAR | null | null | null | null | 1 | |
| | | NIFILOGS | SCLASS | 12 | VARCHAR | null | null | null | null | 1 | |
| | | NIFILOGS | GEOCITY | 12 | VARCHAR | 255 | null | null | null | 1 | |
| | | NIFILOGS | GEOLATITUDE | 12 | VARCHAR | 255 | null | null | null | 1 | |
| | | NIFILOGS | GEOLONGITUDE | 12 | VARCHAR | 255 | null | null | null | 1 | |
| | | NIFILOGS | GEOCOUNTRY | 12 | VARCHAR | 255 | null | null | null | 1 | |
| | | NIFILOGS | GEOPOSTALCODE | 12 | VARCHAR | 255 | null | null | null | 1 | |
| | | NIFILOGS | GEOCOUNTRYISOCODE | 12 | VARCHAR | 255 | null | null | null | 1 | |
| | | NIFILOGS | IPADDRESS | 12 | VARCHAR | 255 | null | null | null | 1 | |
+------------+--------------+-------------+--------------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+---------+
NiFi Apache Phoenix (HBase) SQL Upsert (ReplaceText) upsert into nifilogs (sdate,fragid,msg,module,stime,stype,sclass, geocity, geolatitude, geolongitude, geocountry, geopostalcode, geocountryisocode, ipaddress, geostate)
values ('${'date'}','${'fragment.identifier'}', '${'msg'}','${'module'}','${'time'}','${'type'}','${'class'}','${'ipaddress.geo.city'}',
'${'ipaddress.geo.latitude'}','${'ipaddress.geo.longitude'}','${'ipaddress.geo.country'}','${'ipaddress.geo.postalcode'}','${'ipaddress.geo.country.isocode'}','${'ipaddress'}','${'ipaddress.geo.subdivision.isocode.0'}')
Note the use of stime, stype, sclass, sdate; I am trying to avoid using built-in SQL keywords. I added some fields for the geo encoding that will come from MaxMind database. I parse out IP Address from the main log record. References:
Fun with Regex Sumologic Download a MaxMind GeoLite Database for Geo Enrichment
... View more
Labels:
11-15-2016
04:23 PM
A big note run visudo make sure there is no Defaults requiretty That will block ambari agent from doing sudo within installs which is needs.
... View more
11-11-2016
06:13 PM
8 Kudos
It's easier than I would have thought to add images to your SQL results tables in Apache Zeppelin. It's pretty simple to do this in HDP 2.5's version of Apache Zeppelin. You use the %html tag to output HTML instead of text. Use Case: Displaying Image with TensorFlow Inception Image Recognition Results in Same List Example SQL: SELECT user_name, handle, concat('%html <img width=50 height=60 src="', media_url, '">') as media,substring(inception,0,150) as inception, msg, sentiment, stanfordsentiment, location, time
FROM twitterorc where inception not like '%Not found%' and inception is not null and trim(inception)!= ''
... View more
Labels:
11-07-2016
04:48 PM
hdfs dfs -cat /music/meta/amclassical_beethoven_fur_elise.mp3
{"xmpDM:trackNumber":"","invokehttp.tx.id":"b4690a5a-ec60-4f68-8d1a-de9344eced8d","xmpDM:releaseDate":"2001","Server":"Apache","gethttp.remote.source":"192.168.1.2","xmpDM:artist":"A-M Classical","fragment.identifier":"d8b6da55-c512-42c5-b9a0-a93b0f32fd71","link":"http://www.amclassical.com/mp3/amclassical_beethoven_fur_elise.mp3","dc:creator":"A-M Classical","Last-Modified":"Sun, 05 Oct 2008 20:59:37 GMT","title":"Für Elise","xmpDM:audioChannelType":"Stereo","uuid":"d1e6b6cb-63cc-46a8-ac63-a588da01ecff","invokehttp.request.url":"http://www.amclassical.com/mp3/amclassical_beethoven_fur_elise.mp3","path":"./","xmpDM:logComment":"eng -","xmpDM:audioSampleRate":"44100","dc:title":"Für Elise","OkHttp-Sent-Millis":"1478537014168","segment.original.filename":"mp3.json","Content-Length":"3393536","Content-Type":"audio/mpeg","samplerate":"44100","Keep-Alive":"timeout=2, max=95","xmpDM:genre":"Classical","xmpDM:composer":"Ludwig van Beethoven","X-Parsed-By":"org.apache.tika.parser.DefaultParser, org.apache.tika.parser.mp3.Mp3Parser","creator":"A-M Classical","xmpDM:album":"","meta:author":"A-M Classical","invokehttp.status.code":"200","Connection":"Keep-Alive","fragment.index":"0","xmpDM:audioCompressor":"MP3","mime.type":"audio/mpeg","version":"MPEG 3 Layer III Version 1","Date":"Mon, 07 Nov 2016 16:38:13 GMT","Accept-Ranges":"bytes","descr":"Beethoven: Für Elise","filename":"amclassical_beethoven_fur_elise.mp3","OkHttp-Received-Millis":"1478537014194","channels":"2","ETag":"\"33c800-45887d8256040\"","Author":"A-M Classical","fragment.count":"28","invokehttp.status.message":"OK","xmpDM:duration":"211996.546875"} I added an ExtractMediaMetaData processor and examined the mp3s
... View more
11-07-2016
04:31 PM
Next steps, use the audio processor. Next step generate music with DL4J and TensorFlow. Next steps connect NIFI to a MIDI synthesizer or Raspberry PI SonicPi.
... View more
11-07-2016
04:21 PM
1 Kudo
As a follow-up to my article on downloading and playing MIDI files (https://community.hortonworks.com/content/kbentry/65154/nifi-1x-for-automatic-music-playing-pipelines.html) This quick article highlights how you can support both MP3 and MIDI. As you can see it's very easy to have multiple channels in, multiple processing paths and process many types of inputs and files in NIFI. So how do we add MP3 playback to our Jukebox? Add another GetHTTP processor http://192.168.1.2:8080/extract/url?url=http://www.amclassical.com/piano//&type=mp3 Still using my Microservice to convert web pages into links in JSON, this time filtering out MP3s. Fortunately there's a few web pages out there with free classical MP3s. Next after UpdateAttribute, I check file extension in a Route on Attribute. ${filename:contains('midi')} If it's MIDI, call the same MIDI player. I added a second PutFile to save to /opt/demo/mp3, so that I keep my music files seperately. Otherwise, call OSX's command line MP3 player. In my second ExecuteStreamCommand I call /usr/bin/afplay to play those newly downloaded MP3s. It will play and once completed the next song will play. I don't recommend feeding both MIDI and MP3 pages full at the same time. It's best to pick one and let it load up a lot of files in the queues from one type and play those. I keep my GetHTTP's stopped once I get the page in as I don't want more. it is very easy to feed in a list of pages to load, add a scheduler or other feed logic in the start to control your experience. I like to manually control this part so I have control. You could also trigger this by the presence of a file or maybe when a Jenkins build fails. It's limited by your imagination and over 180 processors. Another thing that can be added to the flow is some Audio Processing via Simon Elliston Ball's Audio Processors, that you can easily add to your NIFI. https://community.hortonworks.com/content/repo/47306/nifi-audio-processors.html
... View more
Labels:
11-07-2016
04:36 AM
2 Kudos
Use Case Before meetups, I wanted to play some music. NiFi seemed like a great choice for streaming free music through my Mac.
MIDI Command Line Player For OSX brew install timidity Then you can simply play MIDI files with timidity file.mid. Microservice to Extract Links from Web Pages Java 8 Source Code: https://github.com/tspannhw/linkextractor The Spring Boot REST API accepts a URL, parses out mid files and returns JSON containing linking and descriptions. Example REST Call to Service curl -G -v "http://<urL>:8080/extract/url?url=http://www.midiworld.com/classic.htm/&type=mid" Run the Microservice java -Xms512m -Xmx2048m -Djava.net.preferIPv4Stack=true -jar target/linkextractor-0.0.1-SNAPSHOT.jar Java Snippet Using JSoup to extract links from URL (HTML) pLink = new PrintableLink();
pLink.setLink(link.attr("abs:href"));
pLink.setDescr(trim(link.text(), 100));
linksReturned.add(pLink); Output hdfs dfs -ls /music/*.mid
-rw-r--r-- 3 tspann hdfs 87 2016-11-07 03:51 /music/2_ase.mid
-rw-r--r-- 3 tspann hdfs 99 2016-11-07 03:51 /music/4_mtking.mid
-rw-r--r-- 3 tspann hdfs 105 2016-11-07 03:50 /music/EspanjaCaphriccoCatalan.mid
-rw-r--r-- 3 tspann hdfs 87 2016-11-07 03:50 /music/EspanjaPrelude.mid
-rw-r--r-- 3 tspann hdfs 162 2016-11-07 03:50 /music/J_M_Bach_Auf_lasst_uns_den_Herren_loben.mid
-rw-r--r-- 3 tspann hdfs 93 2016-11-07 03:52 /music/adelina.mid
-rw-r--r-- 3 tspann hdfs 95 2016-11-07 03:52 /music/aida_ii2.mid
-rw-r--r-- 3 tspann hdfs 89 2016-11-07 03:50 /music/al_adagi.mid
-rw-r--r-- 3 tspann hdfs 95 2016-11-07 03:52 /music/alborada.mid
-rw-r--r-- 3 tspann hdfs 82 2016-11-07 03:52 /music/aquarium.mid
-rw-r--r-- 3 tspann hdfs 105 2016-11-07 03:52 /music/barbero.mid
-rw-r--r-- 3 tspann hdfs 101 2016-11-07 03:51 /music/barimyst.mid
-rw-r--r-- 3 tspann hdfs 80 2016-11-07 03:52 /music/beevar2.mid
-rw-r--r-- 3 tspann hdfs 111 2016-11-07 03:50 /music/biz_arls.mid
-rw-r--r-- 3 tspann hdfs 94 2016-11-07 03:51 /music/blas1.mid
-rw-r--r-- 3 tspann hdfs 114 2016-11-07 03:50 /music/boccher.mid
-rw-r--r-- 3 tspann hdfs 78 2016-11-07 03:52 /music/bolero.mid
-rw-r--r-- 3 tspann hdfs 100 2016-11-07 03:51 /music/cantique.mid
-rw-r--r-- 3 tspann hdfs 88 2016-11-07 03:51 /music/carminab.mid
-rw-r--r-- 3 tspann hdfs 96 2016-11-07 03:51 /music/clairdelune.mid
-rw-r--r-- 3 tspann hdfs 87 2016-11-07 03:51 /music/cmveder.mid
-rw-r--r-- 3 tspann hdfs 108 2016-11-07 03:52 /music/coucou.mid
-rw-r--r-- 3 tspann hdfs 99 2016-11-07 03:51 /music/coup8a.mid
-rw-r--r-- 3 tspann hdfs 97 2016-11-07 03:51 /music/cpf-bird.mid
2016-11-06 22:57:20.095 ERROR 28694 --- [nio-8080-exec-1] com.dataflowdeveloper.DataController : Query:http://www.midiworld.com/classic.htm/ mid,IP:192.168.1.2 Browser:nifi-agent
2016-11-06 22:57:20.313 ERROR 28694 --- [nio-8080-exec-3] com.dataflowdeveloper.DataController : Query:http://www.midiworld.com/classic.htm/ mid,IP:192.168.1.2 Browser:nifi-agent
2016-11-06 22:57:20.500 ERROR 28694 --- [nio-8080-exec-5] com.dataflowdeveloper.DataController : Query:http://www.midiworld.com/classic.htm/ mid,IP:192.168.1.2 Browser:nifi-agent
ls -lt /opt/demo/midi | more
total 20456
-rw-r--r-- 1 tspann staff 117731 Nov 6 22:58 appspg13.mid
-rw-r--r-- 1 tspann staff 13449 Nov 6 22:58 intrlude.mid
-rw-r--r-- 1 tspann staff 8777 Nov 6 22:58 latalant.mid
-rw-r--r-- 1 tspann staff 1911 Nov 6 22:58 lbvar2.mid
-rw-r--r-- 1 tspann staff 2230 Nov 6 22:58 lbvar4.mid
-rw-r--r-- 1 tspann staff 1458 Nov 6 22:58 lbvar6ep.mid
NiFi Flow GetHTTP: Call JSoup Microservice that converts HTML page full of MIDI links into JSON file of links and descriptions of MIDI files. SplitJSON: Split that big JSON file into individual link, description pairs for working with individual songs. EvaluateJSONPath: Use JSONPATH to pull out link and description as attributes. InvokeHTTP: Download the MIDI file from the link. UpdateAttribute: Give it a good file name. I just want the file name from the link (example http://sdfsdf:8080/test.mid PutFile: Store the MIDI on the OSX filesystem. ExecuteStreamCommand: Run the timidity CLI to play the MIDI file. Pass a link to the stored MIDI file to timidity player. PutHDFS: Store the MIDI on HDP 2.5 HDFS.
References http://jsonpath.com/ http://macappstore.org/timidity/
... View more
Labels:
11-03-2016
08:08 PM
1 Kudo
Starting My Hadoop Tools NiFi can interface directly with Hive, HDFS, HBase, Flume and Phoenix. And I can also trigger Spark and Flink through Kafka and Site-To-Site. Sometimes I need to run some Pig scripts. Apache Pig is very stable and has a lot of functions and tools that make for some smart processing. You can easily augment and add this piece to a larger pipeline or part of the process. Pig Setup I like to use Ambari to install the HDP 2.5 clients on my NiFi box to have access to all the tools I may need. Then I can just do: yum install pig Pig to Apache NiFi 1.0.0 ExecuteProcess We call a shell script that wraps the Pig script. Output of script is stored to HDFS: hdfs dfs -ls /nifi-logs
Shell Script export JAVA_HOME=/opt/jdk1.8.0_101/
pig -x local -l /tmp/pig.log -f /opt/demo/pigscripts/test.pig
You can run in different Pig modes like local, mapreduce and tez. You can also pass in parameters or the script. Pig Script messages = LOAD '/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log';
warns = FILTER messages BY $0 MATCHES '.*WARN+.*';
DUMP warns
store warns into 'warns.out'
This is a basic example from the internet, with the NIFI 1.0 log used as the source. As an aside, I run a daily script with the schedule 1 * * * * ? to clean up my logs. Simply: /bin/rm -rf /opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/*2016* PutHDFS Hadoop Configuration: /etc/hadoop/conf/core-site.xml Pick a directory and store away. Results HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures
2.7.3.2.5.0.0-12450.16.0.2.5.0.0-1245root2016-11-03 19:53:572016-11-03 19:53:59FILTER
Success!
Job Stats (time in seconds):
JobIdMapsReducesMaxMapTimeMinMapTimeAvgMapTimeMedianMapTimeMaxReduceTimeMinReduceTimeAvgReduceTimeMedianReducetimeAliasFeatureOutputs
job_local72884441_000110n/an/an/an/a0000messages,warnsMAP_ONLYfile:/tmp/temp1540654561/tmp-600070101,
Input(s):
Successfully read 30469 records from: "/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log"
Output(s):
Successfully stored 1347 records in: "file:/tmp/temp1540654561/tmp-600070101"
Counters:
Total records written : 1347
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local72884441_0001 Reference: http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_5 http://hortonworks.com/apache/pig/#section_2 http://hortonworks.com/blog/jsonize-anything-in-pig-with-tojson/ https://github.com/dbist/pig https://github.com/sudar/pig-samples http://hortonworks.com/hadoop-tutorial/how-to-use-basic-pig-commands/ http://hadooptutorial.info/built-in-load-store-functions-in-pig/ https://cwiki.apache.org/confluence/display/PIG/PigTutorial https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_installing_manually_book/content/validate_the_installation_pig.html http://pig.apache.org/docs/r0.16.0/start.html http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig https://github.com/alanfgates/programmingpig/tree/master/examples/ch2
... View more
Labels:
11-01-2016
09:03 PM
1 Kudo
Hadoop Installation on RHEL 7.2 Tips Custom Install Tips: Read this article and have it available. Make sure you know your Hadoop Mount Points Follow the networking Best Practices Make sure you follow the Hortonworks HDP minimum requirements. Make sure you have root access and make sure you have yum, rpm, scp, curl, wget, unzip, tar, yum-utils, createrepo, reposync installed, working and in your path. Make sure networking, iptables, proxies and firewalls are all open enough for you to access the Hortonworks repos and the bandwidth is decent. Download the OpenJDK 1.8 64-bit update 51 or higher Make sure you setup passwordless SSH to all machines, including the current machine you are starting from which is the Ambari Server. You may need to SSH to yourself. Make sure you have 20G+ of /var space, 20G+ /usr space and plenty of /tmp space. Many things will go here: /usr/hdp One time I needed to manually install MySQL via sudo yum -y install mysql Any space where you will install your data nodes will need to be owned by hdfs:hadoop and have 755 access. Keep community.hortonworks.com in a browser, you can quickly search and find more answers.
... View more
Labels: