About rgelhausen

rgelhausen · ‎08-16-2016

More details would probably help. I'm using minifi-cpp to produce data and push to NiFi via Site2Site. minifi-cpp does not give me the ability to include additional metadata in the flowfiles. Specifically, I need to access Transit Uri so I can extract the domain name and use it to route in my flow. I believe I'm pushing too many events through Site2Site to try issuing a Provenance query for every flowfile. So you're suggesting export Provenance records in bulk via the reporting task. Then downstream (say, in Spark), I join my raw flowfiles up to their provenance metadata on FlowFile Uuid?

rgelhausen · ‎08-16-2016

Events in my flow have the pictured metadata. Can I access these provenance fields as part of the flowfile using expression language?

rgelhausen · ‎08-10-2016

I found the answer. In the Spark interpreter menu there is a "zeppelin.spark.printREPLOutput" property which you can set to false.

rgelhausen · ‎08-10-2016

In report view, Zeppelin is still printing object reference information. I want only the dropdown (green circle) to show to the end-user, and hide the standard Scala repl output (red rectangle).

rgelhausen · ‎08-10-2016

I suggest having a gander at related comments here. NiFi cannot replace Storm for some use cases. NiFi can replace Storm for some use cases. NiFi's sweet spot is simple event processing (one event context at a time). Storm, Spark, Flink, etc. are all more powerful when it comes to complex (windowed, or cross-event) processing, but require writing code. Do you have a particular use case in mind?

rgelhausen · ‎08-09-2016

I'm using Zeppelin to build a simple interactive dashboard. I allow the user to select values from a dropdown, and want to show them the picklist, but not show them the Scala output underneath it. If I use the "Hide Output" button, my picklist is hidden as well. See attached screenshot and code snippet: val nodes = sqlContext.sql("select distinct node from nodes").collect().map(x => (x(0).asInstanceOf[String], x(0).asInstanceOf[String])) val query = """ select concat_ws(',', collect_list(date_format(datetime, 'HH:mm'))) as time, concat_ws(',', collect_list(cast(diskbusy as string))) as Disk, concat_ws(',', collect_list(cast(cpuuser as string))) as CPUUser, concat_ws(',', collect_list(cast(cpusys as string))) as CPUSys, concat_ws(',', collect_list(cast(round(cpusys+cpuuser, 2) as string))) as CPU, concat_ws(',', collect_list(cast(round(100*memavailable/memtotal, 2) as string))) as Mem, concat_ws(',', collect_list(cast(greatest(unknown.requests, 0) as string))) as ExternalRequests, concat_ws(',', collect_list(cast(greatest(known.requests, 0) as string))) as InternalRequests, concat_ws(',', collect_list(cast(greatest(known.requests, 0) + greatest(unknown.requests, 0) as string))) as TotalRequests, concat_ws(',', collect_list(cast(round(memavailable/1000, 0) as string))) as MemAvail from node_monitoring left outer join ( select date_format(datetime, 'HH:mm') as time, app_host, count(*) as requests from web_logs_enriched where source_host is not null or source_ip = '127.0.0.1' group by date_format(datetime, 'HH:mm'), app_host ) known on node_monitoring.node = known.app_host and date_format(node_monitoring.datetime, 'HH:mm') = known.time left outer join ( select date_format(datetime, 'HH:mm') as time, app_host, count(*) as requests from web_logs_enriched where source_host is null group by date_format(datetime, 'HH:mm'), app_host ) unknown on node_monitoring.node = unknown.app_host and date_format(node_monitoring.datetime, 'HH:mm') = unknown.time where node = '""" + z.select("node", nodes) + """' group by node """ val data2 = sqlContext.sql(query) z.angularBind("data2", data2.collect()) z.angularBind("data2Schema", data2.schema)

rgelhausen · ‎08-05-2016

I ended up creating an additional source upstream that generates "tick" events at my specified interval, then joined the two RDDs. Every interval, the RDD element from the "tick" stream has a non-zero value.

rgelhausen · ‎08-05-2016

From the Spark 1.6.x docs: "For other Hadoop InputFormats, you can use the SparkContext.hadoopRDD method, which takes an arbitrary JobConf and input format class, key class and value class. Set these the same way you would for a Hadoop job with your input source. You can also use SparkContext.newAPIHadoopRDD for InputFormats based on the “new” MapReduce API ( org.apache.hadoop.mapreduce )." The InputFormat class you'd specify is (I believe) TableSnapshotInputFormat. I recommend reading a bit of that API doc, as it notes the need to use a CellScanner which gives you access to Cells, which gives you access to values and to their timestamps. If you get an example working, an article would be excellent!

rgelhausen · ‎08-05-2016

select transform(inputCol1, inputCol2) using 'myScript.py' as myCol1, myCol2 from myTable; 'myScript.py' should be in the root of the archive file you're adding. Alternatively, and, I think better, is to ship a virtualenv as an archive, and add your UDFs as separate files. This way you can produce one commonly used virtualenv archive and use it with many separate UDFs that depend on the virtualenv and you wont have to produce a new archive everytime you create a new UDF: add archive virtualenv.tgz; add file myUDF1.py; add file myUDF2.py; select transform(inputCol1, inputCol2) using 'myUDF1.py' as myCol1, myCol2 from myTable; select transform(inputCol1, inputCol2) using 'myUDF2.py' as myCol1, myCol2 from myTable;

rgelhausen · ‎07-27-2016

The best option right now are the HBase and Phoenix (via JDBC) capabilities built into Apache Zeppelin. If you specifically want an Ambari View, there's a very basic community implementation here, but it hasn't been updated for recent HDP versions.

Online	Offline
Last Visited	‎01-23-2018 02:10 AM

Member Since	‎09-21-2015 08:50 PM
Last Visited	‎01-23-2018 02:10 AM
Posts	133
Kudos received	123

Cloudera Community

Re: Phoenix table design

Re: How to determine whether a hive script fails?

Re: Performance metrics phoenix bulk load vs hbase...

Re: What is recommended way of moving mainframe da...

Re: HBase Row Level Filtering

Re: Can I access event provenance metadata using e...

Can I access event provenance metadata using expre...

Re: Is there a way to show only form output in a Z...

Re: Is there a way to show only form output in a Z...

Re: what is the relation between nifi and storm in...

Is there a way to show only form output in a Zeppe...

Re: Is there a way to get time-based ticks/trigger...

Re: How to Query Hbase Snapshot (in HDFS) from Spa...

Re: How can I run a (python) UDF from Hive using t...

Re: HBase graphical client