About gkeys

gkeys · ‎09-18-2016

Could you restart the VM image and try again. After it restarts, verify ambari is running (as root) the command ambari-server status. Also if it is running, try to log into http://127.0.0.1:8080/ as maria_dev maria_dev and see what happens.

gkeys · ‎09-18-2016

One more thing ... a new version of the sandbox was released on Friday. (It is 2.5 GA, not the previous 2.5 TP). If you downloaded your sandbox before Friday, please download the new one and try again. Otherwise, please let me know that you have the recent GA version.

gkeys · ‎09-18-2016

@Ajit Bajwa Could you identify the browser and version you are using

gkeys · ‎09-18-2016

@Armel Djangone Could you identify the browser and version you are using.

gkeys · ‎09-18-2016

@Ajit Bajwa Please confirm that your issue is identical to https://community.hortonworks.com/storage/attachments/7739-ambari-issue-1.jpg which is described in greater detail in https://community.hortonworks.com/questions/56938/sandbox-hello-udp-lab-test-2-issue.html#comment-57056 Which hosting envt is your sandbox in (VirtualBox or Azure) and any other details.

gkeys · ‎09-17-2016

So let's define a Big Data scenario. Typically this is defined in terms of 3 Vs: It is a Big Data scenario when one or more of the following is true: Volume: data exists in such large volumes (typically TB or PB) that a traditional relational db is not able to store it physically or at a reasonable cost Variety: in addition to structured data, data is also semi-structured (e.g. tweets) or non-structured (e.g video) Velocity: data arrives at extremely high rates, typically as streaming If neither is true, we are in the world of traditional data -- and your question. Hadoop still has advantages over SQL server or ODI in this case, and often will coexist with them. Advantages of Hadoop are: Easy to ingest data: Hadoop does not need the data structure or schema known at ingest-time. You dump it in the lake and structure it when you need to process it. You can structure the same data differently at different times, according to your needs. This is called schema-on-read. Traditional relational databases are schema-on-write. You have to define the schema when you write to it and you are stuck with this schema unless you transform it to something else. These design needs and commitments make acquiring data slow and reusing it inflexible. Batch processing: Hadoop processes data in parallel (map-reduce or spark) and excels at batch processing quickly and cheaply. Cheap storage: data stored on Hadoop is much cheaper than storing it in a relational db. Note that the above leads to a common EDW Offloading use case. In a typical Enterprise Data Warehouses 70% of the data is stored in temporary staging tables, where it sits to be ETLd into tables that are queried. It is much cheaper to store this staging data in Hadoop. Additionally, the ETL process uses typically 50-60% of the database cpu. This background processing slows queries run by the end user to run reports, Business Intelligence, etc. Organizations that offload the staged data to Hadoop and the ETL to Hadoop batch processing save literal millions of dollars per year by avoiding paying for expensive storage in the EDW. Additionally, the queries on the EDW are significantly faster. Other advantages to Hadoop in a non Big Data scenario are the following: Central data store: storing data from various sources on the same platform provides new opportunities to analyze and provide business value. For example, it is possible to know more about a customer (ie. achieve a Customer 360 view) and therefore cross-sell, upsell, market, and recommend in ways that are not otherwise possible or easy. Great toolset: Hadoop has excellent tools like Hive, Spark, Zeppelin, HBase, Phoenix to work with data. These tools are all out of the box with the Hortonworks HDP (Hadoop distribution) and are easily installed, managed and monitored through Ambari which is also part of the distribution. And another advantage of Hadoop in a non Big Data scenario is that you most likely will move into a Big Data scenario and need Hadoop. You will either be forced to move to Big Data because of one or more of the 3 Vs above, or because you want to achieve new capabilities (like Customer 360) that Hadoop enables, often because your competitors are already doing this and you are falling behind. These I believe cover the main advantages of using Hadoop even in a non Big Data Scenario. I am sure others have some more points .. let's hear them!

gkeys · ‎09-17-2016

@Fabian Schreiber This is a standard DMZ network architecture where a subset of hosts (knox gateway, edge node) form a communication layer between the external network and the rests of the hosts in the internal network. Hosts in the DMZ can be seen as being both in the internal and external network. Their purpose is to isolate the rest of the hosts (the hadoop clusters) from any direct communication with the external network. In the above example, the first firewall forces all internet communication to talk only to the knox gateway. Communication that passes security challenges at the gateway (IP, ports, Kerberos/LDAP authentication, other) are routed to the cluster. Theoretically the first firewall should be sufficient to secure the cluster. This firewall however is exposed to the entire global internet and all of the hackers and evolving hacking techniques out there. As such, there is still risk of attacks from the internet directly into the cluster and its data, mission critical operations, etc. The second firewall further isolates the cluster by forcing the cluster to only accept communication from the gateway, which is a known host on the internal network. The overall result is that any malicious attacks are isolated to the DMZ hosts and cannot penetrate into the cluster. Compromizes are isolated to the DMZ. The DMZ concept is based on Demilitarized Zones in the military when a zone is built to hold buildings etc that are used by parties inside and outside the military, but only the military in the DMZ could communicate with the militarized zone (the internal network). For details on HDP Knox Gateway security settings: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Knox_Gateway_Admin_Guide/content/ch01.html

gkeys · ‎09-15-2016

From a relational database to HDFS or Hive, Sqoop is your best tool http://hortonworks.com/apache/sqoop/ You can schedule it through Oozie http://hortonworks.com/apache/oozie/ For diverse sources like logs, emails, rss, etc NiFi is your best bet. http://hortonworks.com/apache/nifi/ This includes Restful API capabilities via easy-to-configure HTTP processors. It has its own scheduler. HCC has many articles on NiFi. You could also do a Restful wget from a linux server and push this to hdfs. You could also use Zeppelin to pull via wget as above and also to pull streaming via Spark. Zeppelin lets you visualize as well. It has its own scheduler. https://zeppelin.apache.org/docs/0.5.5-incubating/tutorial/tutorial.html http://hortonworks.com/apache/zeppelin/ Sqoop, Oozie and Zeppelin come out of the box with the HDP platform NiFi is part of the HDF platform and easily integrates with the HDFS It is not difficult to set up a linux box to communicate with HDFS

gkeys · ‎09-13-2016

This will give you commands to control your services: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_HDP_Reference_Guide/content/ch_controlling_hdp_svcs_manually.html It should work for whatever recent version of HDP you are running on.

gkeys · ‎09-13-2016

@Randy Gelhausen Thanks. What threw me off is that when creating a new jdbc interpretter (at least in sandbox) it is prepopulutated with default prefix properties and psql values.. Did not know that entire property and value needed to be deleted and recreated with new prefix (vs only new values).

Online	Offline
Last Visited	‎06-11-2019 01:24 AM

Member Since	‎06-20-2016 01:29 PM
Last Visited	‎06-11-2019 01:24 AM
Posts	488
Kudos received	430

Cloudera Community

Re: DR for hadoop

Re: API + how to know by API command all machines ...

Re: Does data get copied in edge node from externa...

Re: is it possible to set the hadoop.tmp.dir value...

Re: How to handle nulls when exporting from Hive?

Re: Sandbox 2.5 in VMWare cannot SETUP AMBARI adm...

Re: How to get Hive Upload Table option ?

Re: How to get Hive Upload Table option ?

Re: How to get Hive Upload Table option ?

Re: How to get Hive Upload Table option ?

Re: Hadoop versus (SQL Server or ODI)

Re: Why 2 firewalls in Hortonworks perimerter secu...

Re: How to pull data from API and store it in HDFS

Re: how can i do these two steps in Hortonworks on...

Re: Trying to create phoenix interpreter using %jd...