Member since
09-22-2015
24
Posts
24
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1290 | 07-26-2016 02:23 PM | |
4623 | 07-11-2016 12:46 PM |
06-07-2018
03:40 PM
1 Kudo
We are pleased to announce the immediate availability of Hortonworks Data Flow(HDF) version 3.1.2 for both x86 and IBM Power Systems. This version is a maintenance release and includes critical bug fixes for NiFi, MiNiFi, Ranger, Streams Analytics Manager(SAM), Schema Registry, plus more. HDF 3.1.2 includes the following components:
Apache Ambari 2.6.2 Apache Kafka 1.0.0 Apache NiFi 1.5.0 NiFi Registry 0.1.0 Apache Ranger 0.7.0 Apache Storm 1.1.1 Apache ZooKeeper 3.4.6 Apache MiNiFi Java Agent 0.4.0 Apache MiNiFi C++ 0.4.0 Hortonworks Schema Registry 0.5.0 Hortonworks Streaming Analytics Manager 0.6.0 The release and documentation are available at: Hortonworks Data Flow v3.1.2 Download: (link) Hortonworks Data Flow v3.1.2 Documentation: (link)
... View more
Labels:
03-01-2018
12:15 PM
Hortonworks Data Flow v3.1.1 Documentation (link) Thank you to the Hortonworks Data Flow Development, Product Management, Quality Engineering, Partner Certification, Documentation, and Release Engineering teams.
... View more
Labels:
12-21-2017
08:35 PM
We are pleased to announce the availability of Hortonworks Data Flow(HDF) v3.0.3 for IBM Power Systems on RHEL 7.2. This is an important release as it is the second Hortonworks product ported to IBM Power Systems, specifically Power8 processors, in 2017. This release is a win-win for both IBM and Hortonworks customers as HDF v3.0.3 is the next generation of our open source data-in-motion platform and enables customers to collect, curate, analyze and act on all data in real-time, across the enterprise. Combined with the Hortonworks Data Platform (HDP) currently available for IBM Power Systems, we are improving the experience for customers by simplifying how they create and deploy streaming analytics applications to deliver real time analytics while benefiting from the flexibility, cost of operation and performance of IBM Power8 processors. For additional information, please refer to: Hortonworks Data Flow v3.0.3 Documentation (link) Hortonworks Data Platform v2.6.3 / Ambari v2.6 Documentation (link)
... View more
11-11-2017
11:38 AM
1 Kudo
We are pleased to announce the availability of Hortonworks Data Flow (HDF) version 3.0.2. This maintenance release of HDF is the final version of our flagship 3.0 product line and includes critical patches and bug fixes. An increasing number of Hortonworks customers are using HDF to meet their needs for enterprise flow and streaming use cases. The team is very pleased to deliver a high quality, well tested release before move onto the next big release. The next release of HDF, version 3.1, is currently in development. For additional information, please refer to: Hortonworks Data Flow (bits) (docs)
... View more
Labels:
11-11-2017
11:36 AM
2 Kudos
We are pleased to announce the certification of IBM Data Science Experience (DXS) Local V1.1.2.01 with Hortonworks Data Platform (HDP) 2.6.3 / Ambari 2.6 on RHEL 7. As an important part of the agreement between the two companies, IBM and Hortonworks collaborated on an extensive set of test cases to specifically validate IBM DSX with both Hortonworks Data Platform and Ambari. This version of IBM DSX integrates with Zeppelin 0.7.3 and allows users to configure Livy interpreter to run workloads on HDP clusters (both secure and unsecure). Users also have the option to launch their DSX jobs on either Spark1 or Spark2. This certification is win-win for both DSX and HDP customers as it brings a production-ready data science experience to HDP customers and at the same time provides DSX customers access to information stored in HDP data lakes with an enterprise grade compute grid. An increasing number of Hortonworks customers are using data science to get a greater value from their data and support use cases ranging from churn prediction, predictive maintenance to optimizing product placement and store layout. For additional information, please refer to: IBM Data Science Experience Local (link) Hortonworks Documentation (link)*The Data Science portlet is at the far right of the second row Hortonworks Data Platform 2.6.3 / Ambari 2.6 (link)
... View more
06-28-2017
11:56 PM
5 Kudos
IBM Spectrum Scale 4.2.3 has been certified with the Hortonworks Data Platform (HDP) 2.6 / Ambari 2.5 on IBM Power Systems. IBM and Hortonworks collaborated on an optimized and integrated solution that was validated against a comprehensive suite of integration test cases across the full stack of HDP components and Ambari. Testing covered secure and non-secure scenarios with Accumulo, Atlas, Falcon, Flume, Hbase, HDFS, Hive, HiveServer2, Kafka, Knox, Mahout, Map Reduce, Oozie, Phoenix, Pig, Spark, Sqoop, Storm, Tez, Yarn, Zeppelin, and Zookeeper. This certification is for Spectrum Scale software and hence applies to all deployment models of Spectrum Scale, including Elastic Storage Server (ESS). Further, this certification includes a paper certification for Hortonworks Data Flow (HDF) V3.0 use with IBM Spectrum Scale. IBM’s Power platform is already certified to run HDP and offers 3x price performance compared with x86. IBM ESS (pre-integrated system powered by IBM Spectrum Scale) includes software RAID function that eliminates the need for the three-way replication for data protection that is required with other solutions. Instead, IBM ESS requires just 30% extra capacity to offer similar data protection benefits. IBM Power Systems along with the IBM ESS offer the most optimized hardware stack for running analytics workloads. Clients can enjoy up to 3x reduction of storage and compute infrastructure on Power Systems and IBM ESS compared to commodity scale-out x86 systems. IBM Spectrum Scale is scheduled to be certified with HDP running on x86 systems by the end of July. Additional references:
https://hortonworks.com/blog/hdp-ibm-spectrum-scale-brings-enterprise-class-storage-place-analytics/
https://developer.ibm.com/storage/2017/06/16/top-five-benefits-ibm-spectrum-scale-hortonworks-data-platform/ https://www-03.ibm.com/press/us/en/pressrelease/51562.wss
... View more
Labels:
-
Apache Accumulo
-
Apache Ambari
-
Apache Atlas
-
Apache Falcon
-
Apache Flume
-
Apache HBase
-
Apache Hive
-
Apache Kafka
-
Apache Knox
-
Apache Oozie
-
Apache Phoenix
-
Apache Pig
-
Apache Spark
-
Apache Sqoop
-
Apache Storm
-
Apache Tez
-
Apache YARN
-
Apache Zeppelin
-
Apache Zookeeper
-
Certification
-
HDFS
-
Hortonworks Data Platform (HDP)
01-25-2017
06:00 PM
2 Kudos
This is the first of a series of short articles on using Apache Spark with Hortonworks HDP for beginners. If you’re reading this, you don’t need need me to define what Spark is, there are numerous references on the web that can speak about that its API, being data structure centric and in my opinion one of the most important Open Source projects. The intent of this article is let you know what helped me get started using Spark on the Hortonworks Data Platform. This is not a tutorial. I’m assuming you have access to a an HDP 2.5.3 cluster or to the Hortonworks Sandbox for HDP 2.5.3 or above. Also, I’m going to assume that you are familiar with SQL, Apache Hive, and using the Linux/Unix Bourne Shell. The problem I was having, which pushed me to use Spark was that using Hive for data processing was limiting. There were things I was used to using in my past life as an Oracle DBA that were not available. Hive is a fantastic product, but Hive SQL didn’t give me all the bells and whistles to do my job, plus complex Hive SQL statements can be time consuming. In my case, I needed to summarize allot of time series data and then store that summarized data in a hive table so others could query it using Apache Zeppelin. For the sake of this article, I’ll keep the table layout simple: txn_details txn_date string, txn_action string, txn_value number The example below will illustrate using the spark command line to summarize data from a hive table and There are a couple ways to run spark commands, but I prefer using a command line. The command line tool or more precisely the Spark repl is called spark-shell. See http://spark.apache.org/docs/latest/quick-start.html. Another good option is to use Apache zeppelin, but we will use spark-shell. Starting up the spark-shell is very easy and is executed from the linux shell prompt by typing: $ spark-shell The standard spark-shell is verbose, which you can turn off. Google for how to do this. Executing spark-shell will bring you to the scala> prompt. From the scala> prompt, the first thing we’ll do is create a data from with all the contents of the txn_detail table. But before executing a piece of SQL we need to define a sql context object scala> val sqlcontext = new org.apache.spark.sql.SQLContext(sc); Next, the command below will execute a SQL statement to query all rows from the txn_detail table and put the result set into a Spark dataframe called ‘dataframe_A’. scala> val dataframe_A = sqlContext.sql(‘’’ Select txn_date, txn_action, txn_value from txn_detail “””);
Now that we have data in a dataframe we can summarize it grouping on either txn_action or txn_date. Summarize on txn_date scala> dataframe_A.groupBy($“txn_date”).agg(sum(“txn_value”).alias(“txn_value”)).show() +-------------+------------------+ |txn_date | txn_value| +-------------+-------------------+ | 2015-12-27| 22.0| | 2015-12-28| 74.0| | 2015-11-20| 59.0| | 2015-12-29| 44.0| | 2015-11-21| 98.0| | 2015-11-22| 52.0| | 2015-11-23| 35.0| | 2015-11-24| 31.0| | 2015-11-25| 62.0| | 2015-11-26| 74.0| | 2015-11-27| 14.0| | 2015-09-21| 25.0| | 2015-10-20| 17.0| | 2015-09-22| 14.0| | 2015-11-29| 14.0| | 2015-10-21| 21.0| | 2015-09-23| 54.0| | 2016-12-01| 42.0| | 2015-10-22| 52.0| | 2015-09-24| 73.0| +-------------+------------------+ only showing top 20 rows
Summarize on txn_action scala> dataframe_A.groupBy($“txn_action”).agg(sum(“txn_value”).alias(“txn_value”)).show() +-------------+------------------+ |txn_action | txn_value| +-------------+-------------------+ | Open | 11.0| | Close | 99.0| +-------------+------------------+ Let’s store the summarized results for txn_date into a separate dataframe and then save those results off to a hive table. Save the result set into a new dataframe scala> val dataframe_B = dataframe_A.groupBy($“txn_date”).agg(sum(“txn_value”).alias(“txn_value”)); Create a temporary table. This will allow us to query it as like any other hive table. scala> dataframe_B.registerTempTable(“txn_date_temp”); Create a hive table and save the data scala> sqlContext.sql(“””create table hive_txn_data as select * from txn_data_temp”””); Now that you have the data summarized in the hive_txn_data hive table, users can query data from the table using Apache Zeppelin or any other tool. Summary There are numerous ways to perform this type of work, but using Spark is very efficient to summarize and execute calculations. In coming articles, I’ll discuss other functions of spark.
For additional Hortonworks tutorials check out: http://hortonworks.com/tutorials/
... View more
12-07-2016
03:44 PM
What about registering a temptable and then creating a static table to hold onto the results? Drop/recreate as needed..
... View more
11-30-2016
04:47 PM
1 Kudo
Over the last year, Oracle has continued to update and add support for Hortonworks HDP. Below is a list of products which support using Hortonworks HDP 2.5.0.0 Big Data SQL https://www.oracle.com/database/big-data-sql/index.html Big Data Connector https://www.oracle.com/database/big-data-connectors/certifications.html Includes
-Oracle SQL Connector for HDFS -Oracle Loader for Hadoop -Oracle Data Integrator -Oracle XQuery for Hadoop -Oracle R Advanced Analytics for Hadoop -Oracle Datasource for Hadoop Spatial and Graph https://www.oracle.com/database/spatial/index.html GoldenGate for Big Data https://www.oracle.com/middleware/data-integration/goldengate/big-data/index.html Oracle Data Integrator Enterprise Edition https://www.oracle.com/middleware/data-integration/enterprise-edition-big-data/index.html Big Data Discovery https://www.oracle.com/big-data/big-data-discovery/index.html
... View more
Labels: