Member since
05-22-2019
26
Posts
26
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2177 | 03-01-2017 09:38 PM |
01-02-2019
08:26 PM
The tls-toolkit.sh supports --configJsonIn in the same guide that you reference
... View more
09-13-2017
10:30 PM
3 Kudos
Hortonworks DataFlow (HDF) includes Apache Nifi with a wealth of processors that make the process of ingesting various syslogs from multiple servers easy. Information collected from the syslog can be stored on the HDFS distributed filesystem as well as forwarded to other systems as Spunk. Furthermore you can parse the stream and select which information should be stored on HDFS and which should be routed to an indexer on Splunk. To demonstrate this capability let us first review the Nifi ListenSyslog processor: The above processor corresponds to the syslog configuration in /etc/rsyslog.conf which includes the following line: ... *.* @127.0.0.1:7780 This will invoke syslog messages to be stream with Nifi flow which we can direct to another processor - PutSplunk, it was configured as follows: In the spunk UI you can configure data inputs under Setting->Data input -> TCP - Listen on a TCP port for incoming data, e.g. syslog.: To complete the selection use the port corresponding to the one we configured in the above Nifi putSplunk processor (516) Follow the next step to configure linux_syslog as follows At this point you can start the flow and Nifi will ingest linux syslog messages into Spunk. Once data is received you can search it in Splunk as follows: To retrieve information from Splunk you can use the GetSplunk processor and connect it to PutFile or PutHDFS processor, as an example I have used the GetSplunk as follows: For more details on HDF: https://hortonworks.com/products/data-center/hdf/
... View more
Labels:
03-01-2017
09:38 PM
Sounds like this is a sandbox issue of being in safe mode, you can confirm this by reviewing the logs. If this is the case you can exit by: bin/hadoop fs -safemode leave
To start hdfs manually:
su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start namenode"
... View more
02-28-2017
01:04 PM
1 Kudo
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
02-27-2017
06:58 PM
5 Kudos
This also includes an
analysis on the fly for showing odds on a Craps game. This
example shows a simple use of Nifi - HDF - handling multiple streams of Dice
data - each one simulating a separate Craps table – showing a Monte-Carlo
simulation and results of a 1000 run – emulating throws each second. To demonstrate this capability
we generate some random dice data, each stream generation uses independent
thread. We throttle the threads to sleep for a second between throws, mainly to
demonstrate an ongoing stream of data over time. Source for data generation: https://github.com/eorgad/Dice-nifi-streams-example/tree/master/Dice-nifi-stream-example/Dice-nifi-streams/src We use Nifi to create a
streaming flow of that data as it is being generated. This simulation will use the
following Nifi processors: HandleHttpRequest (Starts an HTTP Server and listens for HTTP Requests) RouteOnAttribute (Routes FlowFiles based on their Attributes using the
Attribute Expression Language) ExecuteStreamCommand (Executes an external command on the contents of a flow file,
and creates a new flow file with the results of the command.) HandleHttpResponse (Sends an HTTP Response to the Requestor that generated a
FlowFile) Site-to-site (To send
data from one instance of Nifi to Another) You can use a Template to
handle each stream with individual Nifi flow from: https://github.com/eorgad/Dice-nifi-streams-example/blob/master/Multi-stream-dice-example.xml The Nifi flow would look as
follows when importing the xml template: Web services: We can use Nifi to host web
services either on your HDP instance (can use edge node or the same host
serving Ambari), or a stand along server. However in many cases
organizations already use web servers internally and externally so you can use
an existing instance to link the UI example or generate one using the following
steps: Set up a local web
service: You can set up you web services either on the a server or on your
local mac for demo purposes. 2.1. Installation
on CentOS server: To install apache, open terminal and type in this command: sudo yum install
httpd 2.2. Make configuration
changes for your web services: vi
/etc/httpd/conf/httpd.conf Place the content
of the UI folder in the DocumentRoot location to be accessed via the webserver
DocumentRoot "/var/www/html" 2.3. Start apache
by running sudo service httpd
start Our simple architecture will
look as follows: 3. You
can import the java project into eclipse or run the TwoCrapsTest from the cli
to generate two files that Nifi would stream to your web instance. In the
template there is a port that you can use to stream the feed via site-to-site
to another Nifi instance, such as instance running on the edge node of your HDP
instance (used HDP 2.5 sandbox VM for this example) When launching the following
example you would now be able to view real time streaming data from Nifi
handled by your webserver showing a real time analysis of a game of Craps. Each
stream represents one table. The bar shows you an accumulation of $ win or lost
relating to the theoretical gamble on one of the options: pass line, six,
eight, five, nine etc. This simulation will run only
1000 iterations per thread (table in this case), so to get better approximation
to the odds, you can increase this Monte Carlo simulation and run it million
throws per thread. The
following is the result of launching your index.html with the two streams
displayed in real time as they arrive: The following is a Bell curve with reference to
the UI/dice8.html
... View more
12-28-2016
03:51 PM
I suggest following this tutorial, it show how to load data and copy files... http://hortonworks.com/hadoop-tutorial/hands-on-tour-of-apache-spark-in-5-minutes/
... View more
12-28-2016
03:27 PM
In Zepplin you can use: %sh
id
pwd
hdfs dfs -ls /user/zeppelin
uid=503(zeppelin) gid=501(hadoop) groups=501(hadoop)
/home/zeppelin
So this user you can use local or store it on hdfs at this users home dir: /user/zeppelin
... View more
12-28-2016
02:04 PM
1 Kudo
If you are looking for a way to load data from external table to ORC here is an example: CREATE
TABLE New_ORC_table STORED AS ORC tblproperties(“orc.compress”=“SNAPPY”)
AS SELECT * FROM old_external_table;
... View more
03-25-2016
07:01 PM
6 Kudos
Geo Distance calculations in Hive and Java
March 23, 2016
Eran Orgad
Geographical distance is the
distance measured along the surface of the earth.
The distances between points are defined by geographical
coordinates (Geo Location cords) in terms of latitude and longitude.
Each pair represent a location as:
Lexington MA:
Latitude : 42.4428, Longitude: -71.2317
Mountain View CA:
Latitude: 37.405990600586,
Longitude: -122.07851409912
Distance: 2676.09225228497
Miles
To be able to calculate the distance on Hortonworks HDP
using Hive (and Tez) let’s generate some sample data that includes some pairs
of Geolocation coordinates.
Data file called: distance1.csv
42.28,-71.87,42.28,-71.86
42.00,-71.87,42.28,-71.11
42.28,-71.87,42.28,-71.86
42.28,-71.87,42.28,-71.22
42.00,-71.87,42.28,-72.33
42.28,-71.87,42.28,-70.44
42.00,-71.87,42.28,-71.55
42.28,-71.87,41.28,-71.66
42.00,-71.87,43.28,-71.77
42.28,-71.87,44.28,-71.88
42.00,-71.87,45.28,-71.99
42.28,-71.87,46.11,-71.00
42.00,-71.87,47.22,-71.00
42.4428,-71.2317,37.405990600586,-122.07851409912
….
We can create a schema to be able to read the content in
Hive
Here is the table I created:
hive> show create table distancecalc;
OK
CREATE EXTERNAL TABLE distancecalc(
src_lat double,
src_long double,
dest_lat double,
dest_long double)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY
','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://sandbox.hortonworks.com:8020/tmp/dist2'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='false',
'numFiles'='1',
'numRows'='-1',
'rawDataSize'='-1',
'totalSize'='338',
'transient_lastDdlTime'='1458866694')
After placing my distance1.csv in the /tmp/dist2/ directory I can query the
content in hive:
The following query will produce a calculation of distance
with every given pair of geo location coordinates:
select src_lat, src_long, dest_lat, dest_long,
60*1.1515*(180*(acos(((sin(radians(src_lat))*sin(radians(dest_lat)))
+
(cos(radians(src_lat))*cos(radians(dest_lat))*cos(radians(src_long-dest_long))))))/PI())
as distancecalc
from distancecalc;
screen-shot-2016-03-25-at-122643-pm.png
Last entry is the distance between Lexington MA and Mountain
View CA.
Java code to calculate the distance: /* * Class to calculate geo-location distance * eorgad - Hortonworks.com 9/30/2014 */ package geodistance; import java.sql.SQLException; public class calcGeoDistance { public static double distance(double lat1, double lon1, double lat2, double lon2, String string) { double theta = lon1 - lon2; double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta)); dist = Math.acos(dist); dist = rad2deg(dist); dist = dist * 60 * 1.1515; if (string == "K") { dist = dist * 1.609344; } else if (string == "N") { dist = dist * 0.8684; } return (dist); } /*###################################################################*/ /*# This function converts decimal degrees to radians #*/ /*###################################################################*/ privatestaticdouble deg2rad(double deg) { return (deg * Math.PI / 180.0); } /*###################################################################*/ /*# This function converts radians to decimal degrees #*/ /*###################################################################*/ privatestaticdouble rad2deg(double rad) { return (rad * 180 / Math.PI); }
publicstaticvoid main(String[] args) throws SQLException {
System.out.println(distance(42.4428,
-71.2317, 37.405990600586, -122.07851409912, "M") + " Miles\n");
// lexington to mountain view
// 42.4428 -71.2317 37.405990600586 -122.07851409912
}
}
... View more
03-11-2016
09:53 PM
1 Kudo
This is an upgraded cluster at one of our customers, I have no explanation to the missing class in the jar but I found that class in the pre upgraded jar.
I was looking to make configuration changes - ended up swapping the jars - and the old one works !!
... View more