About Eran

Eran · ‎01-02-2019

The tls-toolkit.sh supports --configJsonIn in the same guide that you reference

Eran · ‎09-13-2017

Hortonworks DataFlow (HDF) includes Apache Nifi with a wealth of processors that make the process of ingesting various syslogs from multiple servers easy. Information collected from the syslog can be stored on the HDFS distributed filesystem as well as forwarded to other systems as Spunk. Furthermore you can parse the stream and select which information should be stored on HDFS and which should be routed to an indexer on Splunk. To demonstrate this capability let us first review the Nifi ListenSyslog processor: The above processor corresponds to the syslog configuration in /etc/rsyslog.conf which includes the following line: ... *.* @127.0.0.1:7780 This will invoke syslog messages to be stream with Nifi flow which we can direct to another processor - PutSplunk, it was configured as follows: In the spunk UI you can configure data inputs under Setting->Data input -> TCP - Listen on a TCP port for incoming data, e.g. syslog.: To complete the selection use the port corresponding to the one we configured in the above Nifi putSplunk processor (516) Follow the next step to configure linux_syslog as follows At this point you can start the flow and Nifi will ingest linux syslog messages into Spunk. Once data is received you can search it in Splunk as follows: To retrieve information from Splunk you can use the GetSplunk processor and connect it to PutFile or PutHDFS processor, as an example I have used the GetSplunk as follows: For more details on HDF: https://hortonworks.com/products/data-center/hdf/

Eran · ‎03-01-2017

Sounds like this is a sandbox issue of being in safe mode, you can confirm this by reviewing the logs. If this is the case you can exit by: bin/hadoop fs -safemode leave To start hdfs manually: su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start namenode"

Eran · ‎02-28-2017

Eran · ‎02-27-2017

This also includes an analysis on the fly for showing odds on a Craps game. This example shows a simple use of Nifi - HDF - handling multiple streams of Dice data - each one simulating a separate Craps table – showing a Monte-Carlo simulation and results of a 1000 run – emulating throws each second. To demonstrate this capability we generate some random dice data, each stream generation uses independent thread. We throttle the threads to sleep for a second between throws, mainly to demonstrate an ongoing stream of data over time. Source for data generation: https://github.com/eorgad/Dice-nifi-streams-example/tree/master/Dice-nifi-stream-example/Dice-nifi-streams/src We use Nifi to create a streaming flow of that data as it is being generated. This simulation will use the following Nifi processors: HandleHttpRequest (Starts an HTTP Server and listens for HTTP Requests) RouteOnAttribute (Routes FlowFiles based on their Attributes using the Attribute Expression Language) ExecuteStreamCommand (Executes an external command on the contents of a flow file, and creates a new flow file with the results of the command.) HandleHttpResponse (Sends an HTTP Response to the Requestor that generated a FlowFile) Site-to-site (To send data from one instance of Nifi to Another) You can use a Template to handle each stream with individual Nifi flow from: https://github.com/eorgad/Dice-nifi-streams-example/blob/master/Multi-stream-dice-example.xml The Nifi flow would look as follows when importing the xml template: Web services: We can use Nifi to host web services either on your HDP instance (can use edge node or the same host serving Ambari), or a stand along server. However in many cases organizations already use web servers internally and externally so you can use an existing instance to link the UI example or generate one using the following steps: Set up a local web service: You can set up you web services either on the a server or on your local mac for demo purposes. 2.1. Installation on CentOS server: To install apache, open terminal and type in this command: sudo yum install httpd 2.2. Make configuration changes for your web services: vi /etc/httpd/conf/httpd.conf Place the content of the UI folder in the DocumentRoot location to be accessed via the webserver DocumentRoot "/var/www/html" 2.3. Start apache by running sudo service httpd start Our simple architecture will look as follows: 3. You can import the java project into eclipse or run the TwoCrapsTest from the cli to generate two files that Nifi would stream to your web instance. In the template there is a port that you can use to stream the feed via site-to-site to another Nifi instance, such as instance running on the edge node of your HDP instance (used HDP 2.5 sandbox VM for this example) When launching the following example you would now be able to view real time streaming data from Nifi handled by your webserver showing a real time analysis of a game of Craps. Each stream represents one table. The bar shows you an accumulation of $ win or lost relating to the theoretical gamble on one of the options: pass line, six, eight, five, nine etc. This simulation will run only 1000 iterations per thread (table in this case), so to get better approximation to the odds, you can increase this Monte Carlo simulation and run it million throws per thread. The following is the result of launching your index.html with the two streams displayed in real time as they arrive: The following is a Bell curve with reference to the UI/dice8.html

Eran · ‎12-28-2016

I suggest following this tutorial, it show how to load data and copy files... http://hortonworks.com/hadoop-tutorial/hands-on-tour-of-apache-spark-in-5-minutes/

Eran · ‎12-28-2016

In Zepplin you can use: %sh id pwd hdfs dfs -ls /user/zeppelin uid=503(zeppelin) gid=501(hadoop) groups=501(hadoop) /home/zeppelin So this user you can use local or store it on hdfs at this users home dir: /user/zeppelin

Eran · ‎12-28-2016

If you are looking for a way to load data from external table to ORC here is an example: CREATE TABLE New_ORC_table STORED AS ORC tblproperties(“orc.compress”=“SNAPPY”) AS SELECT * FROM old_external_table;

Eran · ‎03-25-2016

Geo Distance calculations in Hive and Java March 23, 2016 Eran Orgad Geographical distance is the distance measured along the surface of the earth. The distances between points are defined by geographical coordinates (Geo Location cords) in terms of latitude and longitude. Each pair represent a location as: Lexington MA: Latitude : 42.4428, Longitude: -71.2317 Mountain View CA: Latitude: 37.405990600586, Longitude: -122.07851409912 Distance: 2676.09225228497 Miles To be able to calculate the distance on Hortonworks HDP using Hive (and Tez) let’s generate some sample data that includes some pairs of Geolocation coordinates. Data file called: distance1.csv 42.28,-71.87,42.28,-71.86 42.00,-71.87,42.28,-71.11 42.28,-71.87,42.28,-71.86 42.28,-71.87,42.28,-71.22 42.00,-71.87,42.28,-72.33 42.28,-71.87,42.28,-70.44 42.00,-71.87,42.28,-71.55 42.28,-71.87,41.28,-71.66 42.00,-71.87,43.28,-71.77 42.28,-71.87,44.28,-71.88 42.00,-71.87,45.28,-71.99 42.28,-71.87,46.11,-71.00 42.00,-71.87,47.22,-71.00 42.4428,-71.2317,37.405990600586,-122.07851409912 …. We can create a schema to be able to read the content in Hive Here is the table I created: hive> show create table distancecalc; OK CREATE EXTERNAL TABLE distancecalc( src_lat double, src_long double, dest_lat double, dest_long double) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://sandbox.hortonworks.com:8020/tmp/dist2' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='false', 'numFiles'='1', 'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='338', 'transient_lastDdlTime'='1458866694') After placing my distance1.csv in the /tmp/dist2/ directory I can query the content in hive: The following query will produce a calculation of distance with every given pair of geo location coordinates: select src_lat, src_long, dest_lat, dest_long, 60*1.1515*(180*(acos(((sin(radians(src_lat))*sin(radians(dest_lat))) + (cos(radians(src_lat))*cos(radians(dest_lat))*cos(radians(src_long-dest_long))))))/PI()) as distancecalc from distancecalc; screen-shot-2016-03-25-at-122643-pm.png Last entry is the distance between Lexington MA and Mountain View CA. Java code to calculate the distance: /* * Class to calculate geo-location distance * eorgad - Hortonworks.com 9/30/2014 */ package geodistance; import java.sql.SQLException; public class calcGeoDistance { public static double distance(double lat1, double lon1, double lat2, double lon2, String string) { double theta = lon1 - lon2; double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta)); dist = Math.acos(dist); dist = rad2deg(dist); dist = dist * 60 * 1.1515; if (string == "K") { dist = dist * 1.609344; } else if (string == "N") { dist = dist * 0.8684; } return (dist); } /*###################################################################*/ /*# This function converts decimal degrees to radians #*/ /*###################################################################*/ privatestaticdouble deg2rad(double deg) { return (deg * Math.PI / 180.0); } /*###################################################################*/ /*# This function converts radians to decimal degrees #*/ /*###################################################################*/ privatestaticdouble rad2deg(double rad) { return (rad * 180 / Math.PI); } publicstaticvoid main(String[] args) throws SQLException { System.out.println(distance(42.4428, -71.2317, 37.405990600586, -122.07851409912, "M") + " Miles\n"); // lexington to mountain view // 42.4428 -71.2317 37.405990600586 -122.07851409912 } }

Eran · ‎03-11-2016

This is an upgraded cluster at one of our customers, I have no explanation to the missing class in the jar but I found that class in the pre upgraded jar. I was looking to make configuration changes - ended up swapping the jars - and the old one works !!

Online	Offline
Last Visited	‎01-06-2020 09:47 AM

Member Since	‎05-22-2019 04:37 AM
Last Visited	‎01-06-2020 09:47 AM
Posts	26
Kudos received	24

Cloudera Community

Re: System didnt work after installing NiFi. Cant...

Re: nifi security configuration

Nifi Splunk syslog integration

Re: System didnt work after installing NiFi. Cant...

What is the best way to secure S3A objects on HDP ...

Multi stream with Nifi, a simple Monte Carlo simul...

Re: In Zeppelin loading a simple TextFile where do...

Re: In Zeppelin loading a simple TextFile where do...

Re: While loading the data from external hive tabl...

Geo Distance calculations in Hive and Java

Re: spark history server error class not found