Member since
06-20-2016
488
Posts
433
Kudos Received
118
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3106 | 08-25-2017 03:09 PM | |
1965 | 08-22-2017 06:52 PM | |
3393 | 08-09-2017 01:10 PM | |
8063 | 08-04-2017 02:34 PM | |
8115 | 08-01-2017 11:35 AM |
04-12-2017
05:43 AM
I also faced same issue. The issue was that I was running in Google Cloud Engine.But inside a Docker installed on Google VM, we cannot delete the files. If we try to delete the file, the file is like a symbolic link and is not deleted. Use Google Optimzed OS for Docker VM (Note this VM is not fully open and has only Toolbox available, I was not able to install Docker-Compose through this)
... View more
09-20-2016
07:15 PM
3 Kudos
In the HDP 2.5 release notes it says that Hive 2.1 is TP http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/tech_previews.html Hive New in this release: Stored Procedures Client-Side LLAP Daemons (HIVE-7193) LLAP / Spark security interface Hive 2.1 In the HDP 2.5 Release email to customers it was stated that Hive 2.1 is TP but that Hive ACID is certified for production with Hive 1.2.1
Apache Hive
Includes Apache Hive 1.2.1 for production and Hive 2.1 (Technical Preview) for cutting-edge performance Hive LLAP (Technical Preview): Persistent query servers and optimized in-memory caching for blazing fast SQL. Up to 25x faster for BI workloads. 100% compatible with existing Hive workloads Hive ACID and Streaming Ingest certified for production use with Hive 1.2.1 Dynamic user-based security policies for data masking and filtering HPL/SQL: Procedural programming within Hive Hive View v1.5.0, improved robustness and security Parquet format fully certified with Hive 1.2.1 / 2.1 In the Hortonworks.com Hive Overview section it states (confusingly) that ACID is GA in Hive 2.1 (though originated in 0.14) http://hortonworks.com/apache/hive/#section_3
... View more
04-18-2017
06:38 PM
Works great ! Thanks.
... View more
03-02-2017
07:38 AM
I am also facing the same issue. What is the workaround for this? I have setup the HDP cluster on EC2. , I am also facing the same issue. What is the work around for this.
... View more
09-17-2016
01:21 PM
3 Kudos
@Fabian Schreiber This is a standard DMZ network architecture where a subset of hosts (knox gateway, edge node) form a communication layer between the external network and the rests of the hosts in the internal network. Hosts in the DMZ can be seen as being both in the internal and external network. Their purpose is to isolate the rest of the hosts (the hadoop clusters) from any direct communication with the external network. In the above example, the first firewall forces all internet communication to talk only to the knox gateway. Communication that passes security challenges at the gateway (IP, ports, Kerberos/LDAP authentication, other) are routed to the cluster. Theoretically the first firewall should be sufficient to secure the cluster. This firewall however is exposed to the entire global internet and all of the hackers and evolving hacking techniques out there. As such, there is still risk of attacks from the internet directly into the cluster and its data, mission critical operations, etc. The second firewall further isolates the cluster by forcing the cluster to only accept communication from the gateway, which is a known host on the internal network. The overall result is that any malicious attacks are isolated to the DMZ hosts and cannot penetrate into the cluster. Compromizes are isolated to the DMZ. The DMZ concept is based on Demilitarized Zones in the military when a zone is built to hold buildings etc that are used by parties inside and outside the military, but only the military in the DMZ could communicate with the militarized zone (the internal network). For details on HDP Knox Gateway security settings: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Knox_Gateway_Admin_Guide/content/ch01.html
... View more
09-13-2016
04:07 PM
@Randy Gelhausen Thanks. What threw me off is that when creating a new jdbc interpretter (at least in sandbox) it is prepopulutated with default prefix properties and psql values.. Did not know that entire property and value needed to be deleted and recreated with new prefix (vs only new values).
... View more
09-22-2016
01:41 AM
Thank you for confirming.
... View more
09-09-2016
01:14 PM
1 Kudo
For profiling data off Hadoop, see https://community.hortonworks.com/questions/35396/data-quality-analysis.html For profiling data on Hadoop, the best solution for you should be: zeppelin as your client/UI spark in zeppelin as your toolset to profile Both zeppelin and spark are extremely powerful tools for interacting with data and are packaged in HDP. Zeppelin is a browser-based notebook UI (like iPython/Jupyter) that excels at interacting with and exploring data. Spark of course is in-memory data analysis and is lightening fast. Both are key pieces in the future of Big Data analysis. BTW, you can use python in spark or you can use scala, including integration of external libraries. See the following links to get started: http://hortonworks.com/apache/zeppelin/ http://www.social-3.com/solutions/personal_data_profiling.php
... View more
09-20-2016
04:14 PM
Do you mean 50Mbps per mapper or for the cluster as a whole? (I assume you mean the former, as the latter would imply almost two days to
read a TB of S3 data.) Assuming you do mean
50Mbps per mapper, what is the limit on S3 throughput to the whole cluster—that’s
the key information. Do you have a ballpark number for this?
... View more