04-03-2024 06:39 AM
01-12-2024 08:19 AM
12-07-2023 01:49 PM
08-02-2023 07:30 AM
03-29-2023 01:22 PM
04:30 PM
1 Kudo
has opened source another real-time, distributed, fault-tolerant stream
processing engine called Heron. They
see as the successor for Storm. It is
backwards compatible with Storm's topology API. First I followed the getting started guide. Downloading and installing on MacOsx. Downloads ./heron-client-install-0.14.0-darwin.sh --user
Heron client installer
Heron is now installed!
Make sure you have "/usr/local/bin" in your path.
See http://heronstreaming.io/docs/getting-started.html for how to use Heron.
heron.build.version : 0.14.0
heron.build.time : Tue May 24 22:44:01 PDT 2016
heron.build.timestamp : 1464155053000
heron.build.host : tw-mbp-kramasamy
heron.build.user : kramasamy
heron.build.git.revision : be87b09f348e0ed05f45503340a2245a4ef68a35
heron.build.git.status : Clean
➜ Downloads export PATH=$PATH::/usr/local/bin
➜ Downloads ./heron-tools-install-0.14.0-darwin.sh --user
Heron tools installer
Heron Tools is now installed!
Make sure you have "/usr/local/bin" in your path.
See http://heronstreaming.io/docs/getting-started.html for how to use Heron.
http://twitter.github.io/heron/docs/getting-started/ Run the example to make sure everything is installed heron submit local ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology
[2016-05-25 16:16:32 -0400] com.twitter.heron.scheduler.local.LocalLauncher INFO: For checking the status and logs of the topology, use the working directory /Users/tspann/.herondata/topologies/local/tspann/ExclamationTopology
INFO: Topology 'ExclamationTopology' launched successfully
INFO: Elapsed time: 4.722s.
heron activate local ExclamationTopology
[2016-05-25 16:19:38 -0400] com.twitter.heron.spi.utils.TMasterUtils SEVERE: Topology is already activateed
INFO: Successfully activated topology 'ExclamationTopology'
INFO: Elapsed time: 2.739s.
Run the UI sudo heron-ui
25 May 2016 16:20:31-INFO:main.py:101: Listening at
25 May 2016 16:20:31-INFO:main.py:102: Using tracker url: http://localhost:8888
To not step on HDP ports, I change the port sudo heron-tracker --port 8881
25 May 2016 16:24:14-INFO:main.py:183: Running on port: 8881
25 May 2016 16:24:14-INFO:main.py:184: Using config file: /usr/local/herontools/conf/heron_tracker.yaml
Look at the heron website: http://localhost:8881/topologies {"status": "success", "executiontime": 4.291534423828125e-05, "message": "", "version": "1.0.0", "result": {}} Let's run the UI: sudo heron-ui --port 8882 --tracker_url http://localhost:8881
25 May 2016 16:28:53-INFO:main.py:101: Listening at
25 May 2016 16:28:53-INFO:main.py:102: Using tracker url: http://localhost:8881
Look at the Heron Cluster http://localhost:8881/clusters
{"status": "success", "executiontime": 1.9073486328125e-05, "message": "",
"version": "1.0.0", "result": ["localzk", "local"]} Using Heron CLI heron
usage: heron <command> <options> ...
Available commands:
activate Activate a topology
deactivate Deactivate a topology
help Prints help for commands
kill Kill a topology
restart Restart a topology
submit Submit a topology
version Print version of heron-cli
Getting more help:
heron help <command> Prints help and options for <command>
For detailed documentation, go to http://heronstreaming.io
If you need to restart a topology: heron restart local ExclamationTopology
INFO: Successfully restarted topology 'ExclamationTopology'
INFO: Elapsed time: 3.928s. Look at my topology http://localhost:8881/topologies#/all/all/ExclamationTopology
"status": "success", "executiontime": 7.104873657226562e-05, "message": "",
"version": "1.0.0",
"result": {"local": {"default": ["ExclamationTopology"]}}
} Adding --verbose will add a ton of debug logs. Attached are some screen shots. The Heron UI is decent. I am hoping Heron screens will be integrated into Ambari.
01:46 PM
was there anything on the spark history server or in logs.
06:33 PM
In the installed Zeppelin setup on HDP 2.4 it's available. Just run your queries %sql select * from hivetable %hive select * from hivetable You should be able to connect the Hive interpreter standard way.
03:07 PM
With PXF, you can do anything. Not sure PIG + HAWQ is the best combo though
08:29 PM
1 Kudo
Create a Hive Table as ORC File through Spark SQL in Zeppelin. %sql
create table default.logs_orc_table (clientIp STRING, clientIdentity STRING, user STRING, dateTime STRING, request STRING, statusCode INT, bytesSent FLOAT, referer STRING, userAgent STRING) stored as orc Load data from a DataFrame into this table: %sql
insert into table default.logs_orc_table select t.* from accessLogsDF t
I can create a table in the Hive View from Ambari. CREATE TABLE IF NOT EXISTS survey
( firstName STRING, lastName STRING, gender STRING,
phone STRING, email STRING,
address STRING,
city STRING,
postalcode STRING,
surveyanswer STRING)
Then really easy to load some data from a CSV file. LOAD DATA INPATH '/demo/survey1.csv' OVERWRITE INTO TABLE survey; I can create an ORC based table in Hive from Hive View in Ambari, Spark / Spark SQL or Hive areas in Zeppelin: create table survey_orc(
firstName varchar(255),
lastName varchar(255),
gender varchar(255),
phone varchar(255),
email varchar(255),
address varchar(255),
city varchar(255),
postalcode varchar(255),
surveyanswer varchar(255)
) stored as orc tblproperties
I can do the same insert into from Hive. %hive
insert into table default.survey_orc select t.* from survey t
I can query Hive tables from Spark SQL or Hive easily.
03:20 PM
I will give this a try and I'll post the results. For Windows and DBVisualizer, there's an article with step by step details. DBVisualizer Windows For Tableau: http://kb.tableau.com/articles/knowledgebase/connecting-to-hive-server-2-in-secure-mode For Squirrel SQL: https://community.hortonworks.com/questions/17381/hive-with-dbvisualiser-or-squirrel-sql-client.html
11:31 PM
2 Kudos
I have used Apache Ignite with Spark 1.6 on a standalone cluster and it did provide acceleration for file reading
04:33 PM
3 Kudos
Example Log - - [24/Feb/2016:00:11:58 -0500] "GET / HTTP/1.1" 200 91966 "http://creativelabs.biz" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0" SBT name := "Logs"
version := "1.0"
scalaVersion := "2.10.6"
jarName in assembly := "Logs.jar"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.1" % "provided"
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.1" % "provided"
libraryDependencies += "com.databricks" %% "spark-avro" % "2.0.1"
Scala Program Pieces import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.serializer.{KryoSerializer}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.serializer.{KryoSerializer, KryoRegistrator}
import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.sql.SQLContext
import com.databricks.spark.avro._
case class LogRecord( clientIp: String, clientIdentity: String, user: String, dateTime: String, request:String,statusCode:Int, bytesSent:Long, referer:String, userAgent:String )
object Logs {
val PATTERN = """^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}) (\S+) "(\S+)" "([^"]*)"""".r
def parseLogLine(log: String): LogRecord = {
try {
val res = PATTERN.findFirstMatchIn(log)
if (res.isEmpty) {
println("Rejected Log Line: " + log)
LogRecord("Empty", "-", "-", "", "", -1, -1, "-", "-" )
else {
val m = res.get
// NOTE: HEAD does not have a content size.
if (m.group(9).equals("-")) {
LogRecord(m.group(1), m.group(2), m.group(3), m.group(4),
m.group(5), m.group(8).toInt, 0, m.group(10), m.group(11))
else {
LogRecord(m.group(1), m.group(2), m.group(3), m.group(4),
m.group(5), m.group(8).toInt, m.group(9).toLong, m.group(10), m.group(11))
} catch
case e: Exception =>
println("Exception on line:" + log + ":" + e.getMessage);
LogRecord("Empty", "-", "-", "", "-", -1, -1, "-", "-" )
//// Main Spark Program
def main(args: Array[String]) {
val log = Logger.getLogger("com.hortonworks.spark.Logs")
log.info("Started Logs Analysis")
val sparkConf = new SparkConf().setAppName("Logs")
sparkConf.set("spark.cores.max", "16")
sparkConf.set("spark.serializer", classOf[KryoSerializer].getName)
sparkConf.set("spark.sql.tungsten.enabled", "true")
sparkConf.set("spark.eventLog.enabled", "true")
sparkConf.set("spark.app.id", "Logs")
sparkConf.set("spark.io.compression.codec", "snappy")
sparkConf.set("spark.rdd.compress", "false")
sparkConf.set("spark.suffle.compress", "true")
val sc = new SparkContext(sparkConf)
val logFile = sc.textFile("data/access3.log")
val accessLogs = logFile.map(parseLogLine).filter(!_.clientIp.equals("Empty"))
log.info("# of Partitions %s".format(accessLogs.partitions.size))
try {
println("===== Log Count: %s".format(accessLogs.count()))
try {
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df1 = accessLogs.toDF()
} catch {
case e: Exception =>
log.error("Writing files after job. Exception:" + e.getMessage);
// Calculate statistics based on the content size.
val contentSizes = accessLogs.map(log => log.bytesSent)
val contentTotal = contentSizes.reduce(_ + _)
println("===== Number of Log Records: %s Content Size Total: %s, Avg: %s, Min: %s, Max: %s".format(
contentTotal / contentSizes.count,
First, I set up a Scala Case Class that will hold the parsed record (clientIP, clientID, User, DateTime, Request, StatusCode, BytesSent, Referer, UserAgent). Next I have a regex and method to parse the logs (one line at a time) into case classes. I filter out the empty records. I use Spark SQL to examine the data. Then I write out the data to AVRO and finally do some counts.
01:22 PM
I would like to access SnappyData in-memory data from NIFI (Get and Put). Has anyone tried this? Connectors to in-memory stores like Apache Geode, Apache Ignite, SnappyData, Redis, GemfireXD, Memcache.
Apache NiFi
Apache NiFi
Cloudera DataFlow (CDF)
01:21 PM
Otherwise we can just send REST calls out, but it would be interesting to have NIFI or an agent on a phone. Anyone try this? I would like to send phone sensor data to NIFI.
Apache NiFi
Apache NiFi