Member since
05-22-2019
26
Posts
26
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2177 | 03-01-2017 09:38 PM |
09-13-2017
10:30 PM
3 Kudos
Hortonworks DataFlow (HDF) includes Apache Nifi with a wealth of processors that make the process of ingesting various syslogs from multiple servers easy. Information collected from the syslog can be stored on the HDFS distributed filesystem as well as forwarded to other systems as Spunk. Furthermore you can parse the stream and select which information should be stored on HDFS and which should be routed to an indexer on Splunk. To demonstrate this capability let us first review the Nifi ListenSyslog processor: The above processor corresponds to the syslog configuration in /etc/rsyslog.conf which includes the following line: ... *.* @127.0.0.1:7780 This will invoke syslog messages to be stream with Nifi flow which we can direct to another processor - PutSplunk, it was configured as follows: In the spunk UI you can configure data inputs under Setting->Data input -> TCP - Listen on a TCP port for incoming data, e.g. syslog.: To complete the selection use the port corresponding to the one we configured in the above Nifi putSplunk processor (516) Follow the next step to configure linux_syslog as follows At this point you can start the flow and Nifi will ingest linux syslog messages into Spunk. Once data is received you can search it in Splunk as follows: To retrieve information from Splunk you can use the GetSplunk processor and connect it to PutFile or PutHDFS processor, as an example I have used the GetSplunk as follows: For more details on HDF: https://hortonworks.com/products/data-center/hdf/
... View more
Labels:
03-02-2017
03:39 PM
1 Kudo
@eorgad To protect the S3A access/secret keys, it is recommended that you use either: IAM role-based authentication (such as EC2 instance profile), or the Hadoop Credential Provider Framework - securely storing them and accessing them through configuration. The Hadoop Credential Provider Framework allows secure "Credential Providers" to keep secrets outside Hadoop configuration files, storing them in encrypted files in local or Hadoop filesystems, and including them in requests. The Hadoop-AWS Module documentation describes how to configure this properly.
... View more
02-27-2017
06:58 PM
5 Kudos
This also includes an
analysis on the fly for showing odds on a Craps game. This
example shows a simple use of Nifi - HDF - handling multiple streams of Dice
data - each one simulating a separate Craps table – showing a Monte-Carlo
simulation and results of a 1000 run – emulating throws each second. To demonstrate this capability
we generate some random dice data, each stream generation uses independent
thread. We throttle the threads to sleep for a second between throws, mainly to
demonstrate an ongoing stream of data over time. Source for data generation: https://github.com/eorgad/Dice-nifi-streams-example/tree/master/Dice-nifi-stream-example/Dice-nifi-streams/src We use Nifi to create a
streaming flow of that data as it is being generated. This simulation will use the
following Nifi processors: HandleHttpRequest (Starts an HTTP Server and listens for HTTP Requests) RouteOnAttribute (Routes FlowFiles based on their Attributes using the
Attribute Expression Language) ExecuteStreamCommand (Executes an external command on the contents of a flow file,
and creates a new flow file with the results of the command.) HandleHttpResponse (Sends an HTTP Response to the Requestor that generated a
FlowFile) Site-to-site (To send
data from one instance of Nifi to Another) You can use a Template to
handle each stream with individual Nifi flow from: https://github.com/eorgad/Dice-nifi-streams-example/blob/master/Multi-stream-dice-example.xml The Nifi flow would look as
follows when importing the xml template: Web services: We can use Nifi to host web
services either on your HDP instance (can use edge node or the same host
serving Ambari), or a stand along server. However in many cases
organizations already use web servers internally and externally so you can use
an existing instance to link the UI example or generate one using the following
steps: Set up a local web
service: You can set up you web services either on the a server or on your
local mac for demo purposes. 2.1. Installation
on CentOS server: To install apache, open terminal and type in this command: sudo yum install
httpd 2.2. Make configuration
changes for your web services: vi
/etc/httpd/conf/httpd.conf Place the content
of the UI folder in the DocumentRoot location to be accessed via the webserver
DocumentRoot "/var/www/html" 2.3. Start apache
by running sudo service httpd
start Our simple architecture will
look as follows: 3. You
can import the java project into eclipse or run the TwoCrapsTest from the cli
to generate two files that Nifi would stream to your web instance. In the
template there is a port that you can use to stream the feed via site-to-site
to another Nifi instance, such as instance running on the edge node of your HDP
instance (used HDP 2.5 sandbox VM for this example) When launching the following
example you would now be able to view real time streaming data from Nifi
handled by your webserver showing a real time analysis of a game of Craps. Each
stream represents one table. The bar shows you an accumulation of $ win or lost
relating to the theoretical gamble on one of the options: pass line, six,
eight, five, nine etc. This simulation will run only
1000 iterations per thread (table in this case), so to get better approximation
to the odds, you can increase this Monte Carlo simulation and run it million
throws per thread. The
following is the result of launching your index.html with the two streams
displayed in real time as they arrive: The following is a Bell curve with reference to
the UI/dice8.html
... View more
04-01-2016
06:08 PM
1 Kudo
@eorgadn You should wrap the geoDistance functions as hive UDF’s it will be a lot friendlier for most people that will want to use it in hive.
... View more
03-11-2016
10:56 PM
1 Kudo
That certainly works but going forward wouldn't it cause problems?
... View more
11-03-2015
08:57 PM
5 Kudos
Spark reads from HDFS and sends jobs to YARN. So security for both HDFS, YARN managed by Ranger works with Spark. From security point of view this is very similar to MR jobs being run on YARN. Since Spark reads from HDFS using HDFS client, the HDFS TDE feature is transparent to Spark and with right key permissions for user running Spark job, there is nothing in Spark to configure. Knox isn’t yet relevant to Spark. In future when we have a REST API for Spark, we will integrate Knox with it.
... View more