Community Articles

Find and share helpful community-sourced technical articles.
Celebrating as our community reaches 100,000 members! Thank you!
Master Guru

2017 in Review

First off, this was an amazing year for Big Data, IoT, Streaming, Machine Learning and Deep Learning. So many cool events, updates, new products, new projects, new libraries and community growth. I've seen a lot of people adopt and grow Big Data and streaming projects from nothing. Using the power of Open Source and the tools made available by Apache, companies are growing with the help of trusted partners and a community of engineers and users.

We had three awesome DataWorksSummit (Formerly Hadoop Summit, but now a lot more things from IoT, AI and Streaming).

I attended Munich and spoke at Sydney. I missed California, but all the videos and slides were online and I loved those.

I spoke at Oracle Code in NYC which was a fun little event. I was surprised to learn that many people never heard of Apache NiFi or how easily you could use it to build real-time dataflows including Deep Learning and Big Data.

I got to talk to a lot of interesting people while working the Hortonworks Booth at Strata NYC. Such a huge event, fidget spinners and streaming were the main talk away there.

We had a lot of awesome meetups in Princeton and in the NYC and Philadelphia areas. The Princeton Future of Data Group grew to over 750 members! A great community of data scientists, engineers, students, analysts, techies and business thought leaders. I am really proud to be apart of this amazing group.



I got to speak at most of the meetups except when we had special guests. I had some great NY/NJ/Philly team mates co-running the meetup: @milind pandit @Greg Keys. Greg and I also created a North Jersey meetup.

November 14th - Enterprise Data at Scale

I spoke on IBM DSX, Apache NiFi, Apache Spark, Python, Jupyter and Data Science. We had two excellent IBM resources assisting me fortunately.

October 5th - Deep Learning with DeepLearning4J (DL4J). A great talk by my friend from SkyMind. It's nice to see their project get accepted to Eclipse.

August 8th - Deep Dive into HDF 3.0 @ Honeywell

June 20th - Latest Innovation -Schema Registry and More. @TRAC Intermodal

May 16th - Hadoop Tools Overview

March 28th - Apache NiFi: Ingesting Enterprise Data at Scale


Libraries, SDKs, Tools, Frameworks

  • TensorFlow
  • Apache MXNet
  • NLTK
  • Apache OpenNLP
  • Apache Tika
  • Apache NiFi Custom Processors
  • OpenCV
  • Apache NiFi 1.4
  • Apache Zeppelin
  • Apache Spark 2.x
  • Apache Hive LLAP
  • Apache HBase with Apache Phoenix
  • Apache ORC
  • Apache Hadoop
  • Hortonworks Schema Registry
  • Hortonworks Streaming Analytics Manager
  • Druid
  • Apache SuperSet - Now in Apache
  • PyTorch
  • Apache Storm - Big Updates


  • Raspberry Pi Zero Wireless
  • Raspberry Pi 3B+
  • Movidius
  • Nvidia Jetson TX1
  • Matrix Creator
  • Google AIY Voice Kit
  • Kudrone
  • Christmas Tree Hat
  • Sense Hat
  • Many Cameras and Video Cameras
  • NanoPi Duo
  • Tinkerboard

There were a lot of big news this year, Apache Hive LLAP became a real production thing and brought Apache Hadoop into the world of EDW completely Open Source. On the Apache Spark front, we past verison 2.0 and Livy became a production standby and became Apache Livy. The JanusGraph database appeared and is quickly becoming the standard for Graphs. Apache Calcite went into so many projects that SQL queries are everywhere including in Apache NiFi. A huge number of interesting software projects arrised including Hortonworks Data Plane, Hortonworks Schema Registry and Hortonworks Streaming Analytics Manager. This was an awesome year for software.



Presentations From Talks Available

My HCC Articles of 2017

My Articles on DZone

My RefCard

My Guide


My Github Source Code

I have some example Apache NiFi custom processors developed in JDK 8 including ones for TensorFlow, OpenNLP, DL4J, Apache Tika, Stanford CoreNLP and more. I also published all the Python scripts, documentation, Shell scripts, SQL, Apache NiFi Templates and Apache Zeppelin notebooks as Apache licensed open source on Github.


Next year will be amazing, more libraries, more use cases for Deep Learning, enhancements to all the great projects and tools out there. Another Google AIY Kit, more DataWorks Summits, Hadoop 3, HDF 4, HDP 3, so many things to look forward to.

See you at meetups, summits and online next year.