This is an open group. Sign in and click the "Join Group" button to become a group member and start posting.
Cloudera Employee
Posts: 22
Registered: ‎09-21-2015
May 25th Dallas CUG: New Apache Hadoop Storage for Fast Analytics on Fast Data

Date/Time: Wednesday, May 25th 6-8pm

Location: TBD

Signup link:


6-630pm Food, Drinks and Networking
630-715pm Tech Talk with Q&A
715-8pm Networking


Tech Talk Description:
If you're building relational, time-series, IOT, or real-time architectures using Hadoop, you will find Apache Kudu an attractive choice. With Kudu, you'll be able to build your applications more simply and with fewer moving parts.

Hadoop has become faster and more capable, and has continued to narrow the gap compared to traditional database technologies. However, for developers looking for up-to-the-second analytics on fast-moving data, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing and analytical workloads.

This talk will describe Kudu, the new addition to the open source Hadoop ecosystem with out-of-the-box integration with Apache Spark and Apache Impala. Kudu fills the gap described above to provide a new option to achieve fast scans and fast random access from a single API.


Ryan Bosshart is a Systems Engineer where he leads the field storage specialization team.