Member since
01-17-2016
42
Posts
50
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1892 | 04-21-2016 10:41 PM | |
576 | 04-15-2016 03:22 AM | |
799 | 04-13-2016 04:03 PM | |
3158 | 04-12-2016 01:59 PM |
01-17-2017
06:43 PM
7 Kudos
If you have ever tried to spawn multiple cloudbreak shells you may have run into an error. That is because the default "cbd util cloudbreak-shell" uses docker containers. The fastest work around of this is to use the Jars directly. These Jars can be remotely run from your personal machine or run on the cloudbreak machine itself. Prepping the cloudbreak machine(only needed if running jars locally on the AWS image) Log into your cloudbreak instance and go to /etc/yum.repos.d Remove the Centos-Base.repo file (this is a redhat machine and this can cause conflicts) Install java-8 (yum install java-1.8.0*) Change directory back to /home/cloudbreak Downloading the Jar Set a global variable equal to your cloudbreak version (export CB_SHELL_VERSION=1.6.1) Download the jar (curl -o cloudbreak-shell.jar https://s3-eu-west-1.amazonaws.com/maven.sequenceiq.com/releases/com/sequenceiq/cloudbreak-shell/$CB_SHELL_VERSION/cloudbreak-shell-$CB_SHELL_VERSION.jar) Using the Jar Interactive mode (java -jar ./cloudbreak-shell.jar --cloudbreak.address=https://<your-public-hostname> --sequenceiq.user=admin@example.com --sequenceiq.password=cloudbreak --cert.validation=false) Using a command file (java -jar ./cloudbreak-shell.jar --cloudbreak.address=https://<your-public-hostname> --sequenceiq.user=admin@example.com --sequenceiq.password=cloudbreak --cert.validation=false --cmdfile=<your-FILE>)
... View more
- Find more articles tagged with:
- aws
- cloud
- Cloud & Operations
- Cloudbreak
- How-ToTutorial
Labels:
11-08-2016
02:27 AM
1 Kudo
The easiest "hack" is to give it a filename ending in .xml it could be update attribute Filename | ${Filename}.xml
... View more
10-02-2016
04:20 PM
Is it possible to make autoscaling policies using either the CLI or REST-API? I was reviewing the documentation and I was unable to find anything.
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
10-02-2016
04:08 PM
Thanks a ton. This really clarifies things for me. One side question, just for my own understanding, the fixed 50gb volumes are for non hdfs storage only right? If i understand you hdfs does not go on there.
... View more
10-01-2016
09:13 PM
All servers seem to be starting with a 50GiB EBS volume as the root device. Is it possible to change this to just use ephemeral storage on nodes that have substantial ephemeral storage. Below is a picture i took from my 4 node cluster
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
09-15-2016
11:28 AM
6 Kudos
In this article I will review the steps required to enrich and filter logs. It is assumed that the logs are landing one at a time as a stream into the nifi cluster. The steps involved
Extract Attributes - IP and Action Extract Attributes - IP and Action Cold Store non ip logs GeoEnrich the IP address Cold store local IP addresses Route the remaining logs based on threat level Store the low threat logs in HDFS Place high threat logs into an external table Extract IP Address and Action - ExtractText Processor This processor will evaluate each log and parse the information into attributes. To create a new attribute add a property and give it a name(soon to be attribute name) and a java-style regex command. As the processor runs it will evaluate the regex and create an attribute with the result.
If there is no match it will be sent to the 'unmatched' result which is a simple way of filtering out different logs. GeoEnrichIP - GeoEnrichIP Processor This processor takes the ipaddr attribute generated in the previous step and compares it to a geo-database('mmdb'). I am using the GeoLite - City Database found here Route on Threat - RouteOnAttribute Processor This processor takes the IsDenied attribute from the previous step and tests to see if it is there. This will only exist if the "Extract IP Address" Processor found "iptables denied" in the log. It is then routed to a connectionw ith that property's name. More properties can be added with thier own rules following the nifi expression language
Note I plan on adding location filtering but did not want to obscure the demo in too many steps. Cold and Medium Storage - Processor Groups These two processor groups are very similar in function. Eventually they could be combined into one shared group using attributes for rules but for now they are separate. Merge Content - This processor takes each individual line and combines them into a larger aggregated file. This helps avoid the too many small files problem that arises in large clusters Compress Content - Simply saves disk space by compressing them Set Filename As Timestamp - UpdateAttribute Processor - This takes each aggragate and sets the attribute 'filename' to the current time. This will allow us to sort the aggregates by when they were written for later review PutHDFS Processor - Takes the aggregate and saves it to HDFS High Threat - Processor Group In order to be read by a hive external table we need to convert the data to a JSON format and save it to the correct directory. Rename Attributes - UpdateAttribute Processor - This renames the fields to match the hive field format Put Into JSON - AttributesToJSON - Takes the renamed fields and saves them in a JSON string that the hive SerDe can read natively Set Filename As Timestamp - UpdateAttribute Processor - Once again this sets the filename to the timestamp. This may be better served as systemname + timestamp moving forward PutHDFS - Stores the data to the hive external file location Hive Table Query Using the ambari hive view I am able to now query my logs and use sql-style queries to get results CREATE TABLE `securitylogs`( `ctime` varchar(255) COMMENT 'from deserializer', `country` varchar(255) COMMENT 'from deserializer', `city` varchar(255) COMMENT 'from deserializer', `ipaddr` varchar(255) COMMENT 'from deserializer', `fullbody` varchar(5000) COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://sandbox.hortonworks.com:8020/user/nifi/High_Threat'
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- Hive
- How-ToTutorial
- logs
- NiFi
- Security
Labels:
09-06-2016
08:01 PM
8 Kudos
I was recently tinkering with the walmart rest-api. This is publicly available interface and can be used for a quick price look up for products. The overall goal of the project is to keep track of the cost of specific shopping carts day to day but this intermediate step provides an interesting example-case. The requirements of this stage: Use UPC codes to provide a lookup Avoid having the pass the walmart API key to the internal client making the call Extract features such as price in preparation for a JDBC database entry. Core Concepts in Nifi Nifi has the ability to serve as a custom restful api which is managed with the "HandleHttpRequest" and "HandleHttpResponce" processors. This can be a Get/Post or any of the other common types Nifi can make calls to an external rest API via the "InvokeHTTP" Processor XML Data can be extracted with the "EvaluateXPath" Processor The HandleHttpRequest Processor This processor receives the incoming rest call and makes a flow file with information pertaining to the headers. As you can see in the image below it is listening on port 9091 and only responding to the path '/lookup'. Additionally the flow file it created has an attribute value for all of the headers it received, particularly "upc_code" And the flow file The InvokeHTTP Processor This processor takes the header and makes a call to the walmart API directly. As you can see i am using the attribute for upc_code received from the request handler. This then sends an XML file in the body of the flow file to the next stage The EvaluateXPath Processor In this article I covered how the xpath processor works in more detail. I am extracting key attributes for later analysis. https://community.hortonworks.com/articles/25720/parsing-xml-logs-with-nifi-part-1-of-3.html HandleHTTPResponse (Code 200 or Dead Letter) After successfully extracting the attributes I send a response code of 200(success) back to the rest client along with the xml that walmart provided. In my above example if i do not successfully extract the values the message goes to a dead letter que. This is not ideal and in a production setting I would send the appropriate HTTP error code. Closing Thoughts This process group provides a solid basis for my pricing engine. I still need to write in the error handling but this start provides a feature-rich flow file to the next stage of my project.
... View more
- Find more articles tagged with:
- api
- Data Ingestion & Streaming
- How-ToTutorial
- NiFi
- use-cases
Labels:
08-26-2016
07:58 PM
Its probably reading the same file repeatedly without permissions to delete it. On the get file processor configure it to only run every 5 seconds. Then in flow view right click and refresh the page and you will probably see the outbound que with a file. If you don't refresh the view you may not see the flow files building up and hten it builds up enough and you run out of memory.
... View more
08-10-2016
04:22 PM
A quick thing to check before we dig into any other problems. The processor needs to be in the stopped state before an update is attempted.
... View more
04-22-2016
06:51 PM
Shannon, try to extract the entire json into an attribute like you do. Then do "attributesToJson" to push it all inot one flow file perhaps?
... View more
04-22-2016
06:46 PM
1 Kudo
To build on what Bernhard said this is normal yarn behavior if it cant assign new containers. You said you are running virtual box, the sandbox defaults to 8gb of RAM, if you are running on a laptop with the same amount that could be an issue. If this is the case drop the virtualbox to only have 4GB of ram and restart the "node"
... View more
04-22-2016
03:47 AM
Architecturally each mmessage is immutable, changing anything in it means re-writing the flow-file contents. To add text to a message like that is tough. Is the message already in JSON format?
... View more
04-21-2016
11:14 PM
are you looking to replace the flowfile content with the extracted attribute? Perhaps AttributesToJSON If you could mark my answer as correct if that solves the first one it closes out this question!
... View more
04-21-2016
10:41 PM
Hey Shannon, You would call the new attribute with the expression language so you would type /root/${label}/ the {$VARIABLE} syntax can be used anywhere where expression language is allowed.
... View more
04-21-2016
09:35 PM
Hey Shannon, You want to use EvaluateJsonPath to make an attribute called "msg" with the value $.msg You then want to use RouteonAttribute with ${msg:equals('xyz')} You are very close to having it down correctly!
... View more
04-19-2016
09:11 PM
{"hub_insteon_id":"","device_group":"1","device_insteon_id":"368D4E","recieved_at":"2016-04-11T23:36:36.332240Z","status":"on"} was taken after i changed it
... View more
04-19-2016
08:59 PM
{"hub_insteon_id":"","device_group":"1","device_insteon_id":"368D4E","recieved_at":"2016-04-11T23:36:36.332240Z","status":"on"}
... View more
04-19-2016
07:47 PM
The flow leading into that and the AttributesToJson Configuration
... View more
04-19-2016
07:33 PM
So the execute SQL processor successfully connects with the same settings. returns Objavro.schemaú{"type":"record","name":"Home_Events","namespace":"any.data","fields":[{"name":"hub_insteon_id","type":["null","string"]},{"name":"device_insteon_id","type":["null","string"]},{"name":"device_group","type":["null","string"]},{"name":"status","type":["null","string"]},{"name":"recieved_at","type":["null","string"]}]} and the JsonToSQL processor is configured as
... View more
04-19-2016
07:05 PM
3 Kudos
I am having trouble with my JsonToSQL processor in nifi. I am trying to post to This table 'Home_Events', 'CREATE TABLE `Home_Events` (\n `hub_insteon_id` varchar(255) DEFAULT NULL,\n `device_insteon_id` varchar(45) DEFAULT NULL,\n `device_group` varchar(45) DEFAULT NULL,\n `status` varchar(45) DEFAULT NULL,\n `recieved_at` varchar(45) DEFAULT NULL\n) ENGINE=InnoDB DEFAULT CHARSET=latin1' With this JSON {"hub_insteon_id":"","device_group":"1","device_insteon_id":"3F68A2","recieved_at":"","status":"on"} Getting This Error ConvertJSONToSQL[id=d0dd4cc5-f2ab-43ab-8921-b2aafea03cb5] Failed to convert StandardFlowFileRecord[uuid=611848ee-f0e8-40a7-8119-0539d4b531dd,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461081489294-74, container=default, section=74], offset=8383, length=127],offset=0,name=180088917788802,size=127] to a SQL INSERT statement due to org.apache.nifi.processor.exception.ProcessException: None of the fields in the JSON map to the columns defined by the Home_Automation.Home_Events table; routing to failure: org.apache.nifi.processor.exception.ProcessException: None of the fields in the JSON map to the columns defined by the Home_Automation.Home_Events table Any ideas how to resolve this?
... View more
Labels:
- Labels:
-
Apache NiFi
04-15-2016
03:22 AM
2 Kudos
Two answers: For a rolling window look into "DistributedSetCache" as that allows the most recent X events to be lookedup for time chunking this jira question(also asked by you) resolves it https://issues.apache.org/jira/browse/NIFI-1775
... View more
04-14-2016
11:14 PM
https://github.com/spark-jobserver/spark-jobserver#ad-hoc-mode---single-unrelated-jobs-transient-context details jobs to be started from the spark job server if there is one present. I don't be believe the hortonworks stack has it by default but it could still be a good option if this is a requirement
... View more
04-14-2016
10:58 PM
1 Kudo
Can we use spark's rest API to invoke the job when the flow file hits the invokehttp processor? http://arturmkrtchyan.com/apache-spark-hidden-rest-api
... View more
04-13-2016
04:03 PM
We recently redid our site. Check out this link, there is a download for a .gz on this page http://hortonworks.com/downloads/#dataflow
... View more
04-12-2016
01:59 PM
4 Kudos
Hey Sunile, I believe you are looking for the unpack content processor found here https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.UnpackContent/index.html Allowable file types:
use mime.type attribute tar zip flowfile-stream-v3 flowfile-stream-v2 flowfile-tar-v1
... View more
04-06-2016
12:16 PM
5 Kudos
Recently I had a
client ask about how would we go about connecting a windows share to
Nifi to HDFS, or if it was even possible. This is how you build a
working proof of concept to demo the capabilities! You will need two
Servers or Virtual machines. One for windows, one for Hadoop + Nifi. I personally elected to use these two
The Sandbox
http://hortonworks.com/products/hortonworks-sandbox/ A windows VM
running win 7
https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/linux/ You then need to
install nifi on the sandbox, I find this repo to be the easiest to
follow. https://github.com/abajwa-hw/ambari-nifi-service Be sure the servers
can talk to each other directly, I personally used a bridged network
connection in virtual box and looked up the IPs on my router's
control panel. Next you need to
setup a windows share of some format. This can be combined with
active directory but I personally just enabled guest accounts and
made an account called Nifi_Test. These instructions were the basis
of creating a windows share
http://emby.media/community/index.php?/topic/703-how-to-make-unc-folder-shares/
Keep in mind network user permissions may get funky and the example
above will enforce a read only permission unless you do additional
work. Now you have mount
the share into the hadoop machine using CIFs+Samba. The instructions
I followed are here
http://blog.zwiegnet.com/linux-server/mounting-windows-share-on-centos/ Finally we are able
to setup nifi to read the mounted drive and post it to HDFS. The GetFile
processor retrieves the files while the PutHDFS stores it. To configure HDFS
for the incoming data I ran the following commands on the sandbox: "su HDFS" ; “Hadoop dfs -mkdir
/user/nifi” ; “Hadoop dfs -chmod
777 /user/nifi” I elected to keep
the source file for troubleshooting purposes so that every time the
processor ran it would just stream the data in. GetFile Configuration The PutHDFS Configuration for sandbox And finally run it and confirm it lands in HDFS!
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- How-ToTutorial
- NiFi
- Windows
Labels:
04-03-2016
04:20 PM
6 Kudos
I have a plan to
write a 3 part “intro” series as to how to handle your XML files.
The subjects will be:
Basic XML and
Feature Extraction via Text Managment, Splitting and Xpath Interactive
Text Handling with XQuery and Regex in relation to XMLs XML schema
validation and transformations XML data is read
into the flowfile contents when the file lands in nifi. As long as
it is a valid XML format the 5 dedicated XML processors can be
applied to it for management and feature extraction. Commonly a user will want to get this XML data into a
database which will require us to do a feature extraction and convert to a
new format such as JSON or AVRO. The simplest of the
XML processors is the “SplitXml” processor. This simply takes
the current selection of data and breaks the children off into their
own files. The depth of the split in relation to the root is configurable as shown below. An example of when this may be helpful is when you have a list of events, each of which should be treated seperatly XPath is is a syntax
language way of extracting information from an XML. It allows you to
search for nodes based on hierarchy, name, or even attribute. It has
limited regex integration and has framework for moderately complex
queries. More complete documentation can be found here
http://www.w3schools.com/xsl/xpath_syntax.asp
The processor below shows the “EvaluateXPath” processor being
combined with XPath language to extract node data and an attribute. It should not be confused for XQuery which I will cover in my next article. With executing the Xpath
module something very important happens, the xml attributes are now
NIFI attributes. This allows us to apply routing and other
intelligence that is Nifi's signature. One of the transformations I
have previously worked on is how to get the XML data into an AVRO
format for easy ingestion. At this time all of the AVRO processors
in nifi play nicely with JSONs so the “AttributestoJSON”
processor can be used to as an out of the box intermediary to get the
format you need. Note that I have set the destination of the
processor to “flowfile-contents” which will over-ride the
existing XML contents for a JSON. With a JSON +
attributes this is a very easy flow file to work with and can be
easily merged into existing workflows or written out to a file for
the Hive SerDe.
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- How-ToTutorial
- logs
- NiFi
- xml
Labels: