Community Articles

mkumar13 · ‎07-15-2016

Apache Kylin origin

In today's era of big data, Hadoop has become the de facto standards, and a large number of tools one after another around the Hadoop platform to build, to address the needs of different scenarios.

For example, Hadoop Hive is a data warehouse tools, data files stored on HDFS distributed file system can be mapped to a database table and provides SQL queries. Hive execution engine can be converted to SQL MapReduce task to run, ideally suited for data warehouse data analysis. Another example HBase is based on Hadoop, high availability, high performance, column-oriented, scalable distributed storage system, Hadoop HDFS architecture to provide high reliability of HBase underlying storage support.

Although existing business analytical tools such as Tableau, etc. are here because of the lack of Hadoop-based distributed analysis engine. But they exist with significant limitations, such as difficult to extend horizontally, can not handle large scale data, but also the lack of support for Hadoop.

Apache Kylin (Chinese: Kirin) appears, can solve the above problems based on Hadoop. Apache Kylin is an open source distributed storage engine originally developed by the eBay contribution to the open source community. It provides Hadoop above the SQL query interface and multidimensional analysis (OLAP) capability to support large-scale data, and even be able to handle TB PB-level analysis tasks, be able to query a huge table in the Hive sub-second, and supports high concurrency.

Apache Kylin scenarios

(1) If your data exists in the Hadoop HDFS distributed file system, and you use Hive to build a data warehouse based on HDFS systems, and data analysis, and huge amount of data, such as TB levels.

(2) At the same time you can also use HBase Hadoop platform for data storage and use HBase line keys for fast data query applications

(3) The huge amount of data your Hadoop platform Accumulated daily and would like to do Dimension Data analysis.

If your application is similar to the above, it is very suitable for Apache Kylin do large amounts of multidimensional data analysis. Apache Kylin core idea is to use the space for time, the computed result is stored in multidimensional data HBase, fast data query. And because Apache Kylin develop a variety of flexible policy in terms of queries and further improve the utilization of space, so that such a balance in the application of the policy worthwhile.

Apache Kylin development course Apache Kylin in October 2014 in github open source, and soon joined Apache Incubator in November 2014, in November 2015 officially graduated to become top-level Apache project, also became the first entirely Chinese team designed and developed the top-level Apache project.

Apache Kylin official website is: http://kylin.apache.org

In March 2016, Apache Kylin core developers create Kyligence company in Shanghai, to better promote the rapid development of the project and the community.

The company's official website is: http: //kyligence.io In order to get better development, in April 2016, big data company Kyligence Kui-Technology has been awarded a multi-million dollar angel investment round.

Cloudera Community

Community Articles

Past and Future of Apache Kylin

Apache Hadoop