About gkeys

sarath_mec · ‎04-12-2017

I also faced same issue. The issue was that I was running in Google Cloud Engine.But inside a Docker installed on Google VM, we cannot delete the files. If we try to delete the file, the file is like a symbolic link and is not deleted. Use Google Optimzed OS for Docker VM (Note this VM is not fully open and has only Toolbox available, I was not able to install Docker-Compose through this)

gkeys · ‎09-20-2016

In the HDP 2.5 release notes it says that Hive 2.1 is TP http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/tech_previews.html Hive New in this release: Stored Procedures Client-Side LLAP Daemons (HIVE-7193) LLAP / Spark security interface Hive 2.1 In the HDP 2.5 Release email to customers it was stated that Hive 2.1 is TP but that Hive ACID is certified for production with Hive 1.2.1 Apache Hive Includes Apache Hive 1.2.1 for production and Hive 2.1 (Technical Preview) for cutting-edge performance Hive LLAP (Technical Preview): Persistent query servers and optimized in-memory caching for blazing fast SQL. Up to 25x faster for BI workloads. 100% compatible with existing Hive workloads Hive ACID and Streaming Ingest certified for production use with Hive 1.2.1 Dynamic user-based security policies for data masking and filtering HPL/SQL: Procedural programming within Hive Hive View v1.5.0, improved robustness and security Parquet format fully certified with Hive 1.2.1 / 2.1 In the Hortonworks.com Hive Overview section it states (confusingly) that ACID is GA in Hive 2.1 (though originated in 0.14) http://hortonworks.com/apache/hive/#section_3

saidileep_talla · ‎04-18-2017

Works great ! Thanks.

raghu1bobby · ‎03-02-2017

I am also facing the same issue. What is the workaround for this? I have setup the HDP cluster on EC2. , I am also facing the same issue. What is the work around for this.

gkeys · ‎09-17-2016

@Fabian Schreiber This is a standard DMZ network architecture where a subset of hosts (knox gateway, edge node) form a communication layer between the external network and the rests of the hosts in the internal network. Hosts in the DMZ can be seen as being both in the internal and external network. Their purpose is to isolate the rest of the hosts (the hadoop clusters) from any direct communication with the external network. In the above example, the first firewall forces all internet communication to talk only to the knox gateway. Communication that passes security challenges at the gateway (IP, ports, Kerberos/LDAP authentication, other) are routed to the cluster. Theoretically the first firewall should be sufficient to secure the cluster. This firewall however is exposed to the entire global internet and all of the hackers and evolving hacking techniques out there. As such, there is still risk of attacks from the internet directly into the cluster and its data, mission critical operations, etc. The second firewall further isolates the cluster by forcing the cluster to only accept communication from the gateway, which is a known host on the internal network. The overall result is that any malicious attacks are isolated to the DMZ hosts and cannot penetrate into the cluster. Compromizes are isolated to the DMZ. The DMZ concept is based on Demilitarized Zones in the military when a zone is built to hold buildings etc that are used by parties inside and outside the military, but only the military in the DMZ could communicate with the militarized zone (the internal network). For details on HDP Knox Gateway security settings: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Knox_Gateway_Admin_Guide/content/ch01.html

gkeys · ‎09-13-2016

@Randy Gelhausen Thanks. What threw me off is that when creating a new jdbc interpretter (at least in sandbox) it is prepopulutated with default prefix properties and psql values.. Did not know that entire property and value needed to be deleted and recreated with new prefix (vs only new values).

gkeys · ‎09-22-2016

Thank you for confirming.

gkeys · ‎09-09-2016

For profiling data off Hadoop, see https://community.hortonworks.com/questions/35396/data-quality-analysis.html For profiling data on Hadoop, the best solution for you should be: zeppelin as your client/UI spark in zeppelin as your toolset to profile Both zeppelin and spark are extremely powerful tools for interacting with data and are packaged in HDP. Zeppelin is a browser-based notebook UI (like iPython/Jupyter) that excels at interacting with and exploring data. Spark of course is in-memory data analysis and is lightening fast. Both are key pieces in the future of Big Data analysis. BTW, you can use python in spark or you can use scala, including integration of external libraries. See the following links to get started: http://hortonworks.com/apache/zeppelin/ http://www.social-3.com/solutions/personal_data_profiling.php

Eric_Periard · ‎09-29-2016

Changed YARN Java heap size from 1Gb to 4... it still dies?

coatespt · ‎09-20-2016

Do you mean 50Mbps per mapper or for the cluster as a whole? (I assume you mean the former, as the latter would imply almost two days to read a TB of S3 data.) Assuming you do mean 50Mbps per mapper, what is the limit on S3 throughput to the whole cluster—that’s the key information. Do you have a ballpark number for this?

Online	Offline
Last Visited	‎06-11-2019 01:24 AM

Member Since	‎06-20-2016 01:29 PM
Last Visited	‎06-11-2019 01:24 AM
Posts	488
Kudos received	430

Cloudera Community

Re: DR for hadoop

Re: API + how to know by API command all machines ...

Re: Does data get copied in edge node from externa...

Re: is it possible to set the hadoop.tmp.dir value...

Re: How to handle nulls when exporting from Hive?

Re: Ranger MySql connection error: check_host.py n...

Re: Is Hive ACID GA in HDP 2.5?

Re: Sandbox 2.5 in VMWare cannot SETUP AMBARI adm...

Re: How to get Hive Upload Table option ?

Re: Why 2 firewalls in Hortonworks perimerter secu...

Re: Trying to create phoenix interpreter using %jd...

Re: Zeppelin: %hive and %phoenix interpreters in 2...

Re: Any Tool/python library available for data pro...

Re: Lz0 is enabled now what?

Re: Are there any special considerations or optimi...