Member since
08-23-2016
261
Posts
201
Kudos Received
106
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1762 | 01-26-2018 07:28 PM | |
1402 | 11-29-2017 04:02 PM | |
35340 | 11-29-2017 03:56 PM | |
3518 | 11-28-2017 01:01 AM | |
962 | 11-22-2017 04:08 PM |
04-06-2017
03:48 PM
2 Kudos
Hi @Vinay R thanks for the follow up comment, I didn't see the update here on Mar 21. Remember what I said above (and perhaps described more clearly in the tutorial I linked) about snapshots. When you take a HDFS Snapshot, the blocks become protected (think read-only). The SnapShot is recording what the NameNode state was at that point in time, but the blocks themselves remain in HDFS in a read-only state. Future deletes will affect the NameNode status only, because the blocks are immutable until the snapshot is manually removed by the admin. Therefore, unless I'm still not understanding your scenario, step #3 in your description will not result in free-ing up space to allow for the 10% addition. Deleting 50% of the cluster data after taking an entire snapshot will result in NameNode transactions only because the blocks will remain on the disk in a read-only status. If adding more disk/datanodes is not an option, you may want to focus on investigating what is occupying the space. Hortonworks has made this a bit easier in the latest and greatest versions of HDP 2.6 / Ambari 2.5.x by adding a HDFS Top N feature to help cluster admins focus on areas of pressure on the NameNode.
... View more
04-03-2017
01:38 PM
1 Kudo
@Raj Kumar The pictures are not working for me, so I am answering based on an assumption that we see commonly. There are two layers on the HDP Sandbox (Docker and the VM). When using an SSH client, be sure to double check that you are using port 2222 for the Sandbox, and port 2122 if you need to access the Docker side (normally not needed unless you are adding ports).
... View more
03-27-2017
07:30 PM
@Marcy All of them can work. Their access to Hive is commonly done using a Notebook tool called Apache Zeppelin (included in the Hortonworks Data Platform). Hortonworks has many tutorials that can show you step by step on how to connect these: https://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/ https://hortonworks.com/hadoop-tutorial/getting-started-apache-zeppelin/
... View more
03-27-2017
06:53 PM
@Marcy If you disable the Hive CLI, your best and recommended option is to have users use Beeline for HiveQL. It is supported by Hortonworks, and is the most popular client. Additionally, you may wish to explore a GUI-based tool included in Ambari called the Ambari Hive View (which gets even better in the upcoming HDP 2.6 release). The first link I included outlines the major differences between Hive and Beeline for you, but in a nutshell, Beeline goes through HiveServer2 which means it will respect Ranger based authorization whereas Hive is more like a brute-force direct connection if you will and bypasses many of the security features. All of the options you listed are possible. When looking at different methods of accessing data in Hive, what you want to ensure is that they go through HiveServer2 so that the Ranger-based security is respected. This is normally Hadoop administrator's primary concern. Here is an additional link that goes over various Hive clients: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients In my experience, BeeLine and the Ambari Hive View is where most Hadoopers start their journey and remain until a use case comes along that requires additional technologies like Spark, R or Python.
... View more
03-27-2017
06:42 PM
2 Kudos
@Marcy Using the Hive CLI, the connection is direct to the Hive Metastore, and relies on Storage-based Authorization. To take advantage of the Ranger-based central security, Hortonworks recommends using Beeline (instead of the Hive CLI) as it will go through HiveServer2 and the Ranger-based policies will apply. In fact, in production environments, it is often suggested to have administrators disable the hive CLI and force users to issue CLI-based interactions through Beeline. Here are some relevant links that you may find useful. As
always, if you find this post useful, don't forget to upvote and/or accept the
answer. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-access/content/beeline-vs-hive-cli.html https://community.hortonworks.com/articles/10367/apache-ranger-and-hive-column-level-security.html https://community.hortonworks.com/questions/10760/how-to-disable-hive-shell-for-all-users.html
... View more
03-23-2017
05:38 PM
1 Kudo
@Eric England The goal of LogSearch was indeed for cluster components, and not third party logs. Hortonworks does provide a solution that could work for third party logs with HDP Search (Solr/Banana) on the HDP cluster that you may want to look at. Here are some links that may be useful: https://hortonworks.com/apache/solr/ http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_solr-search-installation/content/ch_hdp-search.html As always, if you find this post useful, don't forgot to upvote and/or
accept the answer.
... View more
03-16-2017
03:19 PM
5 Kudos
hi @Vinay R HDFS Snapshots are point in time copies of the filesystem and taken either on a dir or the entire FS, depending on the administrator's preferences/policies. When you take a snapshot using the -createSnapshot command on a dir, a ".snapshot" dir will be created (usually with a timestamp appended by default but can be something else if you wish). The blocks of data within that HDFS dir are then protected (meaning the dir becomes read-only), and any subsequent delete commands will alter the metadata stored in the namenode only. Since the blocks are preserved, one can use the snapshots to restore the data as well. There is no time-limit on snapshots, so you can recover blocks from a few weeks back in your example if someone took a snapshot of them before any delete commands. There are however upper limits on the number of simultaneous snapshots that can be taken (though it is large at 65536). When snapshots are being used, care should also be taken to ensure snapshots are also being cleaned up to avoid clogging up the system. Here are a couple of useful links on Snapshots that you may want to review: http://hortonworks.com/hadoop-tutorial/using-hdfs-snapshots-protect-important-enterprise-datasets/ https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html As always, if you find this post useful, don't forgot to upvote and/or accept the answer.
... View more
02-23-2017
11:25 PM
1 Kudo
@Murray S The sandbox-version command should definitely work if you are SSHd into the correct environment using port 2222 and logged in as root. I think your command to start Ambari is invalid though, I believe it should be as follows: service ambari-server [status|start|stop|restart] The other way in is to use the web SSH tool: http://127.0.0.1:4200/ I find this a bit more convenient than using the direct access on VirtualBox. If you can login there with root, you can try changing the Ambari admin password again: ambari-admin-password-reset Once you change the Ambari admin password, try logging in as the admin via: http://127.0.0.1:8080 Otherwise you always have the option of deleting the Virtual Box VM and re-importing the appliance to start over again
... View more
02-23-2017
10:45 PM
1 Kudo
Hi @Murray S Can you confirm the port you were using to ssh into the Sandbox? I'm wondering if you were perhaps ssh'ing into the container environment instead. You should be using port 2222 to access the hadoop ecosystem related components, including changing the Ambari admin password via CLI, and stopping/starting components. So for example: ssh -l root -p 2222 127.0.0.1
... View more