Member since 
    
	
		
		
		09-22-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                24
            
            
                Posts
            
        
                24
            
            
                Kudos Received
            
        
                2
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1692 | 07-26-2016 02:23 PM | |
| 6469 | 07-11-2016 12:46 PM | 
			
    
	
		
		
		06-07-2018
	
		
		03:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 We are pleased to announce the immediate availability of Hortonworks Data Flow(HDF) version 3.1.2 for both x86 and IBM Power Systems.  This version is a maintenance release and includes critical bug fixes for NiFi, MiNiFi, Ranger, Streams Analytics Manager(SAM), Schema Registry, plus more.   HDF 3.1.2 includes the following components:  
 Apache Ambari 2.6.2  Apache Kafka 1.0.0  Apache NiFi 1.5.0  NiFi Registry 0.1.0  Apache Ranger 0.7.0  Apache Storm 1.1.1  Apache ZooKeeper 3.4.6  Apache MiNiFi Java Agent 0.4.0  Apache MiNiFi C++ 0.4.0  Hortonworks Schema Registry 0.5.0  Hortonworks Streaming Analytics Manager 0.6.0   The release and documentation are available at:   Hortonworks Data Flow v3.1.2 Download:  (link)     Hortonworks Data Flow v3.1.2 Documentation: (link) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		03-01-2018
	
		
		12:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
 Hortonworks Data Flow v3.1.1 Documentation (link)   Thank you to the Hortonworks Data Flow Development, Product Management, Quality Engineering, Partner Certification, Documentation, and Release Engineering teams. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-21-2017
	
		
		08:35 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 We are pleased to announce the availability of Hortonworks Data Flow(HDF) v3.0.3 for IBM Power Systems on RHEL 7.2.  This is an important release as it is the second Hortonworks product ported to IBM Power Systems, specifically Power8 processors, in 2017.  This release is a win-win for both IBM and Hortonworks customers as HDF v3.0.3 is the next generation of our open source data-in-motion platform and enables customers to collect, curate, analyze and act on all data in real-time, across the enterprise.  Combined with the Hortonworks Data Platform (HDP) currently available for IBM Power Systems, we are improving the experience for customers by simplifying how they create and deploy streaming analytics applications to deliver real time analytics while benefiting from the flexibility, cost of operation and performance of IBM Power8 processors.  For additional information, please refer to:  Hortonworks Data Flow v3.0.3 Documentation (link)  Hortonworks Data Platform v2.6.3 / Ambari v2.6 Documentation (link) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-11-2017
	
		
		11:38 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							      We are pleased to announce the availability of Hortonworks Data Flow (HDF) version 3.0.2. This maintenance release of HDF is the final version of our flagship 3.0 product line and includes critical patches and bug fixes.  An increasing number of Hortonworks customers are using HDF to meet their needs for enterprise flow and streaming use cases.  The team is very pleased to deliver a high quality, well tested release before move onto the next big release.  The next release of HDF, version 3.1, is currently in development.  For additional information, please refer to:  Hortonworks Data Flow (bits) (docs)     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		11-11-2017
	
		
		11:36 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 We are pleased to announce the certification of IBM Data Science Experience (DXS) Local V1.1.2.01 with Hortonworks Data Platform (HDP) 2.6.3 / Ambari 2.6 on RHEL 7. As an important part of the agreement between the two companies, IBM and Hortonworks collaborated on an extensive set of test cases to specifically validate IBM DSX with both Hortonworks Data Platform and Ambari. This version of IBM DSX integrates with Zeppelin 0.7.3 and allows users to configure Livy interpreter to run workloads on HDP clusters (both secure and unsecure). Users also have the option to launch their DSX jobs on either Spark1 or Spark2.  This certification is win-win for both DSX and HDP customers as it brings a production-ready data science experience to HDP customers and at the same time provides DSX customers access to information stored in HDP data lakes with an enterprise grade compute grid. An increasing number of Hortonworks customers are using data science to get a greater value from their data and support use cases ranging from churn prediction, predictive maintenance to optimizing product placement and store layout.  For additional information, please refer to:  IBM Data Science Experience Local (link)  Hortonworks Documentation (link)*The Data Science portlet is at the far right of the second row  Hortonworks Data Platform 2.6.3 / Ambari 2.6 (link) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-28-2017
	
		
		11:56 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		5 Kudos
		
	
				
		
	
		
					
							 IBM Spectrum Scale 4.2.3 has been certified with the Hortonworks Data Platform (HDP) 2.6 / Ambari 2.5 on IBM Power Systems. IBM and Hortonworks collaborated on an optimized and integrated solution that was validated against a comprehensive suite of integration test cases across the full stack of HDP components and Ambari. Testing covered secure and non-secure scenarios with Accumulo, Atlas, Falcon, Flume, Hbase, HDFS, Hive, HiveServer2, Kafka, Knox, Mahout, Map Reduce, Oozie, Phoenix, Pig, Spark, Sqoop, Storm, Tez, Yarn, Zeppelin, and Zookeeper. This certification is for Spectrum Scale software and hence applies to all deployment models of Spectrum Scale, including Elastic Storage Server (ESS). Further, this certification includes a paper certification for Hortonworks Data Flow (HDF) V3.0 use with IBM Spectrum Scale. IBM’s Power platform is already certified to run HDP and offers 3x price performance compared with x86.      IBM ESS (pre-integrated system powered by IBM Spectrum Scale) includes software RAID function that eliminates the need for the three-way replication for data protection that is required with other solutions. Instead, IBM ESS requires just 30% extra capacity to offer similar data protection benefits. IBM Power Systems along with the IBM ESS offer the most optimized hardware stack for running analytics workloads. Clients can enjoy up to 3x reduction of storage and compute infrastructure on Power Systems and IBM ESS compared to commodity scale-out x86 systems.  IBM Spectrum Scale is scheduled to be certified with HDP running on x86 systems by the end of July.  Additional references:
https://hortonworks.com/blog/hdp-ibm-spectrum-scale-brings-enterprise-class-storage-place-analytics/
https://developer.ibm.com/storage/2017/06/16/top-five-benefits-ibm-spectrum-scale-hortonworks-data-platform/  https://www-03.ibm.com/press/us/en/pressrelease/51562.wss 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- 
						
							
		
			Apache Accumulo
 - 
						
							
		
			Apache Ambari
 - 
						
							
		
			Apache Atlas
 - 
						
							
		
			Apache Falcon
 - 
						
							
		
			Apache Flume
 - 
						
							
		
			Apache HBase
 - 
						
							
		
			Apache Hive
 - 
						
							
		
			Apache Kafka
 - 
						
							
		
			Apache Knox
 - 
						
							
		
			Apache Oozie
 - 
						
							
		
			Apache Phoenix
 - 
						
							
		
			Apache Pig
 - 
						
							
		
			Apache Spark
 - 
						
							
		
			Apache Sqoop
 - 
						
							
		
			Apache Storm
 - 
						
							
		
			Apache Tez
 - 
						
							
		
			Apache YARN
 - 
						
							
		
			Apache Zeppelin
 - 
						
							
		
			Apache Zookeeper
 - 
						
							
		
			Certification
 - 
						
							
		
			HDFS
 - 
						
							
		
			Hortonworks Data Platform (HDP)
 
			
    
	
		
		
		01-25-2017
	
		
		06:00 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 This is the first of a series of short articles on using Apache Spark with Hortonworks HDP for beginners.  If you’re reading this, you don’t need need me to define what Spark is, there are numerous references on the web that can speak about that its API,  being data structure centric and in my opinion one of the most important Open Source projects.  The intent of this article is let you know what helped me get started using Spark on the Hortonworks Data Platform.  This is not a tutorial.  I’m assuming you have access to a an HDP 2.5.3 cluster or to the Hortonworks Sandbox for HDP 2.5.3 or above.  Also, I’m going to assume that you are familiar with SQL, Apache Hive, and using the Linux/Unix Bourne Shell.    The problem I was having, which pushed me to use Spark was that using Hive for data processing was limiting.  There were things I was used to using in my past life as an Oracle DBA that were not available.  Hive is a fantastic product, but Hive SQL didn’t give me all the bells and whistles to do my job, plus complex Hive SQL statements can be time consuming.    In my case, I needed to summarize allot of time series data and then store that summarized data in a hive table so others could query it using Apache Zeppelin.  For the sake of this article, I’ll keep the table layout simple:  txn_details  txn_date       string,  txn_action    string,  txn_value     number  The example below will illustrate using the spark command line to summarize data from a hive table and   There are a couple ways to run spark commands, but I prefer using a command line.  The command line tool or more precisely the Spark repl is called spark-shell.  See http://spark.apache.org/docs/latest/quick-start.html.  Another good option is to use Apache zeppelin, but we will use spark-shell.   Starting up the spark-shell is very easy and is executed from the linux shell prompt by typing:   $ spark-shell  The standard spark-shell is verbose, which you can turn off.  Google for how to do this.  Executing spark-shell will bring you to the scala> prompt.  From the scala> prompt, the first thing we’ll do is create a data from with all the contents of the txn_detail table.  But before executing a piece of SQL we need to define a sql context object  scala> val sqlcontext = new org.apache.spark.sql.SQLContext(sc);  Next, the command below will execute a SQL statement to query all rows from the txn_detail table and put the result set into a Spark dataframe called ‘dataframe_A’.  scala> val dataframe_A = sqlContext.sql(‘’’  Select txn_date, txn_action, txn_value from txn_detail  “””);  
Now that we have data in a dataframe we can summarize it grouping on either txn_action or txn_date.  Summarize on txn_date  scala> dataframe_A.groupBy($“txn_date”).agg(sum(“txn_value”).alias(“txn_value”)).show()  +-------------+------------------+  |txn_date  |  txn_value|  +-------------+-------------------+  |  2015-12-27|  22.0|  |  2015-12-28| 74.0|  |  2015-11-20|  59.0|  |  2015-12-29|  44.0|  |  2015-11-21|  98.0|  |  2015-11-22|  52.0|  |  2015-11-23|  35.0|  |  2015-11-24|  31.0|  |  2015-11-25|  62.0|  |  2015-11-26|  74.0|  |  2015-11-27|  14.0|  |  2015-09-21|  25.0|  |  2015-10-20|  17.0|  |  2015-09-22|  14.0|  |  2015-11-29|  14.0|  |  2015-10-21|  21.0|  |  2015-09-23|  54.0|  |  2016-12-01|  42.0|  |  2015-10-22|  52.0|  |  2015-09-24|  73.0|  +-------------+------------------+  only showing top 20 rows  
Summarize on txn_action  scala> dataframe_A.groupBy($“txn_action”).agg(sum(“txn_value”).alias(“txn_value”)).show()  +-------------+------------------+  |txn_action |  txn_value|  +-------------+-------------------+  |  Open  |  11.0|  |  Close  | 99.0|  +-------------+------------------+  Let’s store the summarized results for txn_date into a separate dataframe and then save those results off to a hive table.  Save the result set into a new dataframe  scala>  val dataframe_B = dataframe_A.groupBy($“txn_date”).agg(sum(“txn_value”).alias(“txn_value”));  Create a temporary table.  This will allow us to query it as like any other hive table.  scala> dataframe_B.registerTempTable(“txn_date_temp”);  Create a hive table and save the data  scala> sqlContext.sql(“””create table hive_txn_data as select * from txn_data_temp”””);  Now that you have the data summarized in the hive_txn_data hive table, users can query data from the table using Apache Zeppelin or any other tool.  Summary  There are numerous ways to perform this type of work, but using Spark is very efficient to summarize and execute calculations.  In coming articles, I’ll discuss other functions of spark.   
For additional Hortonworks tutorials check out: http://hortonworks.com/tutorials/ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-07-2016
	
		
		03:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 What about registering a temptable and then creating a static table to hold onto the results?  Drop/recreate as needed.. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-30-2016
	
		
		04:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Over the last year, Oracle has continued to update and add support for Hortonworks HDP.   Below is a list of products which support using Hortonworks HDP 2.5.0.0  Big Data SQL    https://www.oracle.com/database/big-data-sql/index.html  Big Data Connector    https://www.oracle.com/database/big-data-connectors/certifications.html    Includes   
 -Oracle SQL Connector for HDFS  -Oracle Loader for Hadoop  -Oracle Data Integrator  -Oracle XQuery for Hadoop  -Oracle R Advanced Analytics for Hadoop  -Oracle Datasource for Hadoop   Spatial and Graph    https://www.oracle.com/database/spatial/index.html  GoldenGate for Big Data    https://www.oracle.com/middleware/data-integration/goldengate/big-data/index.html  Oracle Data Integrator Enterprise Edition    https://www.oracle.com/middleware/data-integration/enterprise-edition-big-data/index.html  Big Data Discovery    https://www.oracle.com/big-data/big-data-discovery/index.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels: