1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
779 | 04-03-2024 06:39 AM | |
1502 | 01-12-2024 08:19 AM | |
772 | 12-07-2023 01:49 PM | |
1331 | 08-02-2023 07:30 AM | |
1923 | 03-29-2023 01:22 PM |
05-25-2016
04:30 PM
1 Kudo
Twitter
has opened source another real-time, distributed, fault-tolerant stream
processing engine called Heron. They
see as the successor for Storm. It is
backwards compatible with Storm's topology API. First I followed the getting started guide. Downloading and installing on MacOsx. Downloads ./heron-client-install-0.14.0-darwin.sh --user
Heron client installer
----------------------
Uncompressing......
Heron is now installed!
Make sure you have "/usr/local/bin" in your path.
See http://heronstreaming.io/docs/getting-started.html for how to use Heron.
heron.build.version : 0.14.0
heron.build.time : Tue May 24 22:44:01 PDT 2016
heron.build.timestamp : 1464155053000
heron.build.host : tw-mbp-kramasamy
heron.build.user : kramasamy
heron.build.git.revision : be87b09f348e0ed05f45503340a2245a4ef68a35
heron.build.git.status : Clean
➜ Downloads export PATH=$PATH::/usr/local/bin
➜ Downloads ./heron-tools-install-0.14.0-darwin.sh --user
Heron tools installer
---------------------
Uncompressing......
Heron Tools is now installed!
Make sure you have "/usr/local/bin" in your path.
See http://heronstreaming.io/docs/getting-started.html for how to use Heron.
heron.build.version : 0.14.0
heron.build.time : Tue May 24 22:44:01 PDT 2016
heron.build.timestamp : 1464155053000
heron.build.host : tw-mbp-kramasamy
heron.build.user : kramasamy
heron.build.git.revision : be87b09f348e0ed05f45503340a2245a4ef68a35
heron.build.git.status : Clean
http://twitter.github.io/heron/docs/getting-started/ Run the example to make sure everything is installed heron submit local ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology
[2016-05-25 16:16:32 -0400] com.twitter.heron.scheduler.local.LocalLauncher INFO: For checking the status and logs of the topology, use the working directory /Users/tspann/.herondata/topologies/local/tspann/ExclamationTopology
INFO: Topology 'ExclamationTopology' launched successfully
INFO: Elapsed time: 4.722s.
heron activate local ExclamationTopology
[2016-05-25 16:19:38 -0400] com.twitter.heron.spi.utils.TMasterUtils SEVERE: Topology is already activateed
INFO: Successfully activated topology 'ExclamationTopology'
INFO: Elapsed time: 2.739s.
heron activate local ExclamationTopology
[2016-05-25 16:19:38 -0400] com.twitter.heron.spi.utils.TMasterUtils SEVERE: Topology is already activateed
INFO: Successfully activated topology 'ExclamationTopology'
INFO: Elapsed time: 2.739s.
Run the UI sudo heron-ui
25 May 2016 16:20:31-INFO:main.py:101: Listening at http://192.168.1.5:8889
25 May 2016 16:20:31-INFO:main.py:102: Using tracker url: http://localhost:8888
To not step on HDP ports, I change the port sudo heron-tracker --port 8881
25 May 2016 16:24:14-INFO:main.py:183: Running on port: 8881
25 May 2016 16:24:14-INFO:main.py:184: Using config file: /usr/local/herontools/conf/heron_tracker.yaml
Look at the heron website: http://localhost:8881/topologies {"status": "success", "executiontime": 4.291534423828125e-05, "message": "", "version": "1.0.0", "result": {}} Let's run the UI: sudo heron-ui --port 8882 --tracker_url http://localhost:8881
25 May 2016 16:28:53-INFO:main.py:101: Listening at http://192.168.1.5:8882
25 May 2016 16:28:53-INFO:main.py:102: Using tracker url: http://localhost:8881
Look at the Heron Cluster http://localhost:8881/clusters
{"status": "success", "executiontime": 1.9073486328125e-05, "message": "",
"version": "1.0.0", "result": ["localzk", "local"]} Using Heron CLI heron
usage: heron <command> <options> ...
Available commands:
activate Activate a topology
deactivate Deactivate a topology
help Prints help for commands
kill Kill a topology
restart Restart a topology
submit Submit a topology
version Print version of heron-cli
Getting more help:
heron help <command> Prints help and options for <command>
For detailed documentation, go to http://heronstreaming.io
If you need to restart a topology: heron restart local ExclamationTopology
INFO: Successfully restarted topology 'ExclamationTopology'
INFO: Elapsed time: 3.928s. Look at my topology http://localhost:8881/topologies#/all/all/ExclamationTopology
{
"status": "success", "executiontime": 7.104873657226562e-05, "message": "",
"version": "1.0.0",
"result": {"local": {"default": ["ExclamationTopology"]}}
} Adding --verbose will add a ton of debug logs. Attached are some screen shots. The Heron UI is decent. I am hoping Heron screens will be integrated into Ambari.
... View more
Labels:
05-25-2016
01:46 PM
was there anything on the spark history server or in logs.
... View more
05-24-2016
06:33 PM
In the installed Zeppelin setup on HDP 2.4 it's available. Just run your queries %sql select * from hivetable %hive select * from hivetable You should be able to connect the Hive interpreter standard way.
... View more
05-23-2016
03:07 PM
With PXF, you can do anything. Not sure PIG + HAWQ is the best combo though
... View more
05-22-2016
08:29 PM
1 Kudo
Create a Hive Table as ORC File through Spark SQL in Zeppelin. %sql
create table default.logs_orc_table (clientIp STRING, clientIdentity STRING, user STRING, dateTime STRING, request STRING, statusCode INT, bytesSent FLOAT, referer STRING, userAgent STRING) stored as orc Load data from a DataFrame into this table: %sql
insert into table default.logs_orc_table select t.* from accessLogsDF t
I can create a table in the Hive View from Ambari. CREATE TABLE IF NOT EXISTS survey
( firstName STRING, lastName STRING, gender STRING,
phone STRING, email STRING,
address STRING,
city STRING,
postalcode STRING,
surveyanswer STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' STORED AS TEXTFILE;
Then really easy to load some data from a CSV file. LOAD DATA INPATH '/demo/survey1.csv' OVERWRITE INTO TABLE survey; I can create an ORC based table in Hive from Hive View in Ambari, Spark / Spark SQL or Hive areas in Zeppelin: create table survey_orc(
firstName varchar(255),
lastName varchar(255),
gender varchar(255),
phone varchar(255),
email varchar(255),
address varchar(255),
city varchar(255),
postalcode varchar(255),
surveyanswer varchar(255)
) stored as orc tblproperties
("orc.compress"="NONE");
I can do the same insert into from Hive. %hive
insert into table default.survey_orc select t.* from survey t
I can query Hive tables from Spark SQL or Hive easily.
... View more
Labels:
05-20-2016
03:20 PM
I will give this a try and I'll post the results. For Windows and DBVisualizer, there's an article with step by step details. DBVisualizer Windows For Tableau: http://kb.tableau.com/articles/knowledgebase/connecting-to-hive-server-2-in-secure-mode For Squirrel SQL: https://community.hortonworks.com/questions/17381/hive-with-dbvisualiser-or-squirrel-sql-client.html
... View more
05-19-2016
11:31 PM
2 Kudos
I have used Apache Ignite with Spark 1.6 on a standalone cluster and it did provide acceleration for file reading
... View more
05-19-2016
04:33 PM
3 Kudos
Example Log 94.158.95.124 - - [24/Feb/2016:00:11:58 -0500] "GET / HTTP/1.1" 200 91966 "http://creativelabs.biz" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0" SBT name := "Logs"
version := "1.0"
scalaVersion := "2.10.6"
jarName in assembly := "Logs.jar"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.1" % "provided"
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.1" % "provided"
libraryDependencies += "com.databricks" %% "spark-avro" % "2.0.1"
Scala Program Pieces import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.serializer.{KryoSerializer}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.serializer.{KryoSerializer, KryoRegistrator}
import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.sql.SQLContext
import com.databricks.spark.avro._
case class LogRecord( clientIp: String, clientIdentity: String, user: String, dateTime: String, request:String,statusCode:Int, bytesSent:Long, referer:String, userAgent:String )
object Logs {
val PATTERN = """^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}) (\S+) "(\S+)" "([^"]*)"""".r
def parseLogLine(log: String): LogRecord = {
try {
val res = PATTERN.findFirstMatchIn(log)
if (res.isEmpty) {
println("Rejected Log Line: " + log)
LogRecord("Empty", "-", "-", "", "", -1, -1, "-", "-" )
}
else {
val m = res.get
// NOTE: HEAD does not have a content size.
if (m.group(9).equals("-")) {
LogRecord(m.group(1), m.group(2), m.group(3), m.group(4),
m.group(5), m.group(8).toInt, 0, m.group(10), m.group(11))
}
else {
LogRecord(m.group(1), m.group(2), m.group(3), m.group(4),
m.group(5), m.group(8).toInt, m.group(9).toLong, m.group(10), m.group(11))
}
}
} catch
{
case e: Exception =>
println("Exception on line:" + log + ":" + e.getMessage);
LogRecord("Empty", "-", "-", "", "-", -1, -1, "-", "-" )
}
}
//// Main Spark Program
def main(args: Array[String]) {
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.apache.spark.storage.BlockManager").setLevel(Level.ERROR)
Logger.getLogger("com.hortonworks.spark.Logs").setLevel(Level.INFO)
val log = Logger.getLogger("com.hortonworks.spark.Logs")
log.info("Started Logs Analysis")
val sparkConf = new SparkConf().setAppName("Logs")
sparkConf.set("spark.cores.max", "16")
sparkConf.set("spark.serializer", classOf[KryoSerializer].getName)
sparkConf.set("spark.sql.tungsten.enabled", "true")
sparkConf.set("spark.eventLog.enabled", "true")
sparkConf.set("spark.app.id", "Logs")
sparkConf.set("spark.io.compression.codec", "snappy")
sparkConf.set("spark.rdd.compress", "false")
sparkConf.set("spark.suffle.compress", "true")
val sc = new SparkContext(sparkConf)
val logFile = sc.textFile("data/access3.log")
val accessLogs = logFile.map(parseLogLine).filter(!_.clientIp.equals("Empty"))
log.info("# of Partitions %s".format(accessLogs.partitions.size))
try {
println("===== Log Count: %s".format(accessLogs.count()))
accessLogs.take(5).foreach(println)
try {
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df1 = accessLogs.toDF()
df1.registerTempTable("accessLogsDF")
df1.printSchema()
df1.describe("bytesSent").show()
df1.first()
df1.head()
df1.explain()
df1.write.format("avro").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("statusCode").avro("avroresults")
} catch {
case e: Exception =>
log.error("Writing files after job. Exception:" + e.getMessage);
e.printStackTrace();
}
// Calculate statistics based on the content size.
val contentSizes = accessLogs.map(log => log.bytesSent)
val contentTotal = contentSizes.reduce(_ + _)
println("===== Number of Log Records: %s Content Size Total: %s, Avg: %s, Min: %s, Max: %s".format(
contentSizes.count,
contentTotal,
contentTotal / contentSizes.count,
contentSizes.min,
contentSizes.max))
sc.stop()
}
}
First, I set up a Scala Case Class that will hold the parsed record (clientIP, clientID, User, DateTime, Request, StatusCode, BytesSent, Referer, UserAgent). Next I have a regex and method to parse the logs (one line at a time) into case classes. I filter out the empty records. I use Spark SQL to examine the data. Then I write out the data to AVRO and finally do some counts.
... View more
Labels:
05-19-2016
01:22 PM
I would like to access SnappyData in-memory data from NIFI (Get and Put). Has anyone tried this? Connectors to in-memory stores like Apache Geode, Apache Ignite, SnappyData, Redis, GemfireXD, Memcache.
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
05-19-2016
01:21 PM
Otherwise we can just send REST calls out, but it would be interesting to have NIFI or an agent on a phone. Anyone try this? I would like to send phone sensor data to NIFI.
... View more
Labels:
- Labels:
-
Apache NiFi