About gmartin

gmartin · ‎05-16-2017

@ken jiang The version of Linux in the VirtualBox is 64-Bit CentOS, as per the below: https://hortonworks.com/hadoop-tutorial/hortonworks-sandbox-guide/#system-information HDP 2.6 supports the below 64-Bit Linux OS, including Red Hat: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_support-matrices/content/ch_matrices-hdp.html which you can install on premise, via Azure, AWS, GCP, etc. The Sandbox is designed to by more prescriptive, therefore fewer options in terms of OS choice. Regards,

gmartin · ‎05-16-2017

@Shafi Ahmad There is a dependency on HBase (Storage), Kafka (Tag Sync), and Ambari Infra (Solr - Graph Index search, depends on the version of HDP) already running before starting Atlas. Atlas does not start Kafka, HBase, Ambari Infra (when done in Ambari).

gmartin · ‎04-24-2017

@Smart Data You are correct, Falcon has been deprecated in 2.6. Please keep in mind, it is included in 2.6, so will have 2+1 years of support from GA date (GA was April 2017). https://hortonworks.com/agreements/support-services-policy/ http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_release-notes/content/deprecated_items.html Falcon will be replaced by another solution, to be announced. Apache Atlas continues to be a key project, with active development by Hortonworks. There is no impact to Atlas by the Falcon announcement.

gmartin · ‎04-20-2017

Please read the article below. There has been a fork of Titan - JanusGraph which is supported by a number of organisations, including Hortonworks: https://opensource.googleblog.com/2017/01/janusgraph-connects-past-and-future-of-titan.html?m=1

gmartin · ‎03-29-2017

@sushil nagur Some options: - Hive can be used and is a common pattern. Land the data in HDFS, and use HiveQL to cleanse, transform, into a Hive Table (e.g. ORC format). HBase can also be a target (or indeed Solr). - SparkSQL is often used to ingest data. Again, land in HDFS, and use SparkSQL to process and add to Hive/HBase tables. - HDF (Ni-Fi) is more of a stealth ETL Tool or simple event processing, but can perform a number of transforms (also includes an expression builder/language, and many out of the box processors for different sources/targets. - Pig can be used to build data pipelines. Sqoop can be used to extract data, but only performs basic transforms. - Hortonworks has an eco-system of Partners with ETL solutions (e.g. Syncsort, etc.). - Storm and Spark Streaming are options for streaming operations, can be use Kafka as a buffer. In terms of commercial ETL vs Open Source, it comes down to many points - requirements, budget, time, skills, strategy, etc. The commercial ETL tools are mature, and some have sophisticated functionality. transformations, and connectivity. Hortonworks partners with commercial ETL vendors when the scenario fits. In other scenarios, native HDP tooling (as listed above) is sufficient. HTH, Graham

gmartin · ‎03-22-2017

@sbx hadoop You can access the Audit logs through the Ranger UI. The logs are stored in Solr (Ambari Infra), so you are able to apply a filter and search conditions to the logs. The search conditions include Result (Denied/Allowed, Date, User, IP, Component, Access Type, Tag, etc.). screen-shot-2017-03-22-at-150306.png

gmartin · ‎03-17-2017

With the latest version of HDP - 2.5.3, you can install Spark 1.6.2 (GA and stable), as well as Spark 2.0.1 (Tech Preview). http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_release-notes/content/ch_relnotes_v253.html The install via Ambari, will create two separate Spark environments on the cluster. You can access different Spark versions as detailed in: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/spark-choose-version.html Regards

gmartin · ‎03-06-2017

@voca voca An example is the tutorial below: https://hortonworks.com/hadoop-tutorial/loading-data-into-the-hortonworks-sandbox/ A bit more adventurous would be to ingest twitter data using N-Fi, visualizing via Solr/Banana, and then doing some Query processing using Hive: https://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/ Full list of tutorials: https://hortonworks.com/tutorials/

gmartin · ‎03-03-2017

Can you review the Cloudbreak capabilities, as this is designed to rapidly provision clusters regardless of on premise or cloud provider. http://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-1.6.3/index.html

gmartin · ‎03-02-2017

If this is the HDP sandbox, root password is hadoop. But if you installed HDP via IaaS on Azure, the root password is as you defined it in Azure (if you chose the password, non-ssh option). That is also assuming you defined the 'root' account as the password user. The other option in Azure is use ssh, and thereby you will need to download the pem file, to access the host.

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎07-07-2016 10:30 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	79
Kudos received	17

Cloudera Community

Re: Kappa Architecture Using HDP/HDF

Re: Atlas : how to secure Kafka ?

Re: HDP Versions end of support roadmap

Re: difference between nifi, storm and kafka and t...

Re: Does Kerberos needs to be redone after a Hadoo...

Re: No 64-bit Operating System

Re: Atlas failed to start embedded kafka

Re: Apache Falcon in HDP 3.0

Re: Keeping Titan DB in Apache Atlas

Re: traditional ETL vs open source

Re: Ranger logs UI?

Re: Spark 2.0 still TP?

Re: ETL WEBSITE CONTENT IN HADOOP SANDBOX

Re: HDP 2.5.3 silent installation

Re: what is the root password for ssh root@localho...