Member since
09-21-2015
133
Posts
130
Kudos Received
24
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6033 | 12-17-2016 09:21 PM | |
3657 | 11-01-2016 02:28 PM | |
1686 | 09-23-2016 09:50 PM | |
2694 | 09-21-2016 03:08 AM | |
1453 | 09-19-2016 06:41 PM |
08-16-2016
05:04 PM
More details would probably help. I'm using minifi-cpp to produce data and push to NiFi via Site2Site. minifi-cpp does not give me the ability to include additional metadata in the flowfiles. Specifically, I need to access Transit Uri so I can extract the domain name and use it to route in my flow. I believe I'm pushing too many events through Site2Site to try issuing a Provenance query for every flowfile. So you're suggesting export Provenance records in bulk via the reporting task. Then downstream (say, in Spark), I join my raw flowfiles up to their provenance metadata on FlowFile Uuid?
... View more
08-16-2016
02:10 PM
Events in my flow have the pictured metadata. Can I access these provenance fields as part of the flowfile using expression language?
... View more
Labels:
- Labels:
-
Apache NiFi
08-10-2016
04:18 PM
I found the answer. In the Spark interpreter menu there is a "zeppelin.spark.printREPLOutput" property which you can set to false.
... View more
08-10-2016
04:05 PM
In report view, Zeppelin is still printing object reference information. I want only the dropdown (green circle) to show to the end-user, and hide the standard Scala repl output (red rectangle).
... View more
08-10-2016
03:10 AM
2 Kudos
I suggest having a gander at related comments here. NiFi cannot replace Storm for some use cases. NiFi can replace Storm for some use cases.
NiFi's sweet spot is simple event processing (one event context at a time). Storm, Spark, Flink, etc. are all more powerful when it comes to complex (windowed, or cross-event) processing, but require writing code. Do you have a particular use case in mind?
... View more
08-09-2016
11:23 PM
I'm using Zeppelin to build a simple interactive dashboard. I allow the user to select values from a dropdown, and want to show them the picklist, but not show them the Scala output underneath it. If I use the "Hide Output" button, my picklist is hidden as well. See attached screenshot and code snippet: val nodes = sqlContext.sql("select distinct node from nodes").collect().map(x => (x(0).asInstanceOf[String], x(0).asInstanceOf[String]))
val query = """
select
concat_ws(',', collect_list(date_format(datetime, 'HH:mm'))) as time,
concat_ws(',', collect_list(cast(diskbusy as string))) as Disk,
concat_ws(',', collect_list(cast(cpuuser as string))) as CPUUser,
concat_ws(',', collect_list(cast(cpusys as string))) as CPUSys,
concat_ws(',', collect_list(cast(round(cpusys+cpuuser, 2) as string))) as CPU,
concat_ws(',', collect_list(cast(round(100*memavailable/memtotal, 2) as string))) as Mem,
concat_ws(',', collect_list(cast(greatest(unknown.requests, 0) as string))) as ExternalRequests,
concat_ws(',', collect_list(cast(greatest(known.requests, 0) as string))) as InternalRequests,
concat_ws(',', collect_list(cast(greatest(known.requests, 0) + greatest(unknown.requests, 0) as string))) as TotalRequests,
concat_ws(',', collect_list(cast(round(memavailable/1000, 0) as string))) as MemAvail
from node_monitoring
left outer join (
select date_format(datetime, 'HH:mm') as time, app_host, count(*) as requests
from web_logs_enriched
where source_host is not null or source_ip = '127.0.0.1'
group by date_format(datetime, 'HH:mm'), app_host
) known
on node_monitoring.node = known.app_host and date_format(node_monitoring.datetime, 'HH:mm') = known.time
left outer join (
select date_format(datetime, 'HH:mm') as time, app_host, count(*) as requests
from web_logs_enriched
where source_host is null
group by date_format(datetime, 'HH:mm'), app_host
) unknown
on node_monitoring.node = unknown.app_host and date_format(node_monitoring.datetime, 'HH:mm') = unknown.time
where node = '""" + z.select("node", nodes) + """'
group by node
"""
val data2 = sqlContext.sql(query)
z.angularBind("data2", data2.collect())
z.angularBind("data2Schema", data2.schema)
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
08-05-2016
07:30 AM
I ended up creating an additional source upstream that generates "tick" events at my specified interval, then joined the two RDDs. Every interval, the RDD element from the "tick" stream has a non-zero value.
... View more
08-05-2016
06:58 AM
2 Kudos
From the Spark 1.6.x docs: "For other Hadoop InputFormats, you can use the SparkContext.hadoopRDD method, which takes an arbitrary JobConf and input format class, key class and value class. Set these the same way you would for a Hadoop job with your input source. You can also use SparkContext.newAPIHadoopRDD for InputFormats based on the “new” MapReduce API ( org.apache.hadoop.mapreduce )." The InputFormat class you'd specify is (I believe) TableSnapshotInputFormat. I recommend reading a bit of that API doc, as it notes the need to use a CellScanner which gives you access to Cells, which gives you access to values and to their timestamps. If you get an example working, an article would be excellent!
... View more
08-05-2016
06:05 AM
1 Kudo
select transform(inputCol1, inputCol2) using 'myScript.py' as myCol1, myCol2 from myTable; 'myScript.py' should be in the root of the archive file you're adding. Alternatively, and, I think better, is to ship a virtualenv as an archive, and add your UDFs as separate files. This way you can produce one commonly used virtualenv archive and use it with many separate UDFs that depend on the virtualenv and you wont have to produce a new archive everytime you create a new UDF: add archive virtualenv.tgz; add file myUDF1.py; add file myUDF2.py; select transform(inputCol1, inputCol2) using 'myUDF1.py' as myCol1, myCol2 from myTable; select transform(inputCol1, inputCol2) using 'myUDF2.py' as myCol1, myCol2 from myTable;
... View more
07-27-2016
09:20 PM
5 Kudos
The best option right now are the HBase and Phoenix (via JDBC) capabilities built into Apache Zeppelin. If you specifically want an Ambari View, there's a very basic community implementation here, but it hasn't been updated for recent HDP versions.
... View more