Member since
08-23-2016
261
Posts
201
Kudos Received
106
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
920 | 01-26-2018 07:28 PM | |
767 | 11-29-2017 04:02 PM | |
27218 | 11-29-2017 03:56 PM | |
1688 | 11-28-2017 01:01 AM | |
394 | 11-22-2017 04:08 PM |
05-15-2017
05:20 PM
1 Kudo
@George Meltser The Sandbox is a custom built image. You might be better off downloading the latest version rather than trying to upgrade bits on your existing image.
... View more
05-11-2017
05:31 AM
1 Kudo
Hi @Anil Reddy If you are like me, and dislike RegEx, one trick you can try is to use the SplitContent processor first. Change config dropdown to use Text instead of Hexadecimal, and use the byte sequence of your pair delimiter &. This would simplify the RegEx if you wanted to use ExtractText still. Or perhaps you can explore using another SplitContent processor on the = to get the field and value tokens separately. Hopefully you can avoid the RegEx there. As always, if you find this post helpful, please accept the answer.
... View more
05-10-2017
02:56 PM
1 Kudo
@Roberto Sancho Flume is included in the HDP repo, and usually installed on an Edge node. The Installation instructions can be found here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_command-line-installation/content/installing_flume.html As always, if you find this post useful, don't forget to accept the answer.
... View more
05-08-2017
07:53 PM
2 Kudos
hi @Stefan Schuster I'm not sure why they aren't on the site, perhaps re-organized. I believe the one you are looking for is still avail here: https://github.com/hortonworks/data-tutorials/blob/1f3893c64bbf5ffeae4f1a5cbf1bd667dcea6b06/tutorials/hdp/hdp-2.6/hadoop-tutorial-getting-started-with-hdp/tutorial-8.md As
always, if you find this post useful, don't forget to accept the
answer.
... View more
05-08-2017
07:46 PM
1 Kudo
@Roberto Sancho Assuming the box meets the other prereqs (OS, etc) then the client tools can be installed on the same box, though it isn't typical.
... View more
05-08-2017
03:13 PM
1 Kudo
@Roberto Sancho The HDP Client tools including Sqoop and Flume would typically be installed on HDP Edge Nodes, and not usually in a HDF cluster. As
always, if you find this post useful, don't forget to accept the
answer.
... View more
05-08-2017
03:11 PM
1 Kudo
@Simran Kaur To the best of my knowledge, Hue, being a lightweight web UI, doesn't support having big data tables display fully. For Select * type of queries, you may want to consider saving the results to a file in HDFS and downloading them to view offline. As
always, if you find this post useful, don't forget to accept the
answer.
... View more
05-04-2017
02:30 AM
1 Kudo
@Rakesh Maheshwari can you confirm the version of the Sandbox you downloaded. Also, can you show the screenshot if the Ambari menu including the user you are logging in with? If not admin, the user should have the view permissions assigned. With the admin user, click on the dropdown (in my screenshot, I'd click on Admin) -> Manage Ambari -> Views -> Hive -> Hive View 2.0 then scroll down to permissions and make sure the users you require are added. As
always, if you find this post useful, don't forget to accept the
answer, and/or upvote.
... View more
05-03-2017
11:22 PM
3 Kudos
Hi @Aref Asvadi The SandBox archive link is a little easy to miss, but it is right on the Sandbox download page. Scroll down past the current Sandbox, and right past the Sandbox in the cloud section, and you should see an expandable/clickable section for Hortonworks Sandbox Archive. Expand the section, and download the version of your choice! As
always, if you find this post useful, don't forget to accept and/or upvote the
answer. https://hortonworks.com/downloads/
... View more
05-03-2017
04:08 PM
2 Kudos
@Dinesh Das Remember that Apache Phoenix is a SQL Skin over HBase. The underlying database is HBase, but accessed via Phoenix is one wishes to use it for SQL. Here are a couple of good links that can help explain further: https://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html https://hortonworks.com/hadoop-tutorial/introduction-apache-hbase-concepts-apache-phoenix-new-backup-restore-utility-hbase/ As
always, if you find this post useful, don't forget to accept and/or upvote the
answer.
... View more
04-06-2017
03:48 PM
As
always, if you find this post useful, don't forget to accept and/or upvote the
answer.
... View more
04-06-2017
03:48 PM
2 Kudos
Hi @Vinay R thanks for the follow up comment, I didn't see the update here on Mar 21. Remember what I said above (and perhaps described more clearly in the tutorial I linked) about snapshots. When you take a HDFS Snapshot, the blocks become protected (think read-only). The SnapShot is recording what the NameNode state was at that point in time, but the blocks themselves remain in HDFS in a read-only state. Future deletes will affect the NameNode status only, because the blocks are immutable until the snapshot is manually removed by the admin. Therefore, unless I'm still not understanding your scenario, step #3 in your description will not result in free-ing up space to allow for the 10% addition. Deleting 50% of the cluster data after taking an entire snapshot will result in NameNode transactions only because the blocks will remain on the disk in a read-only status. If adding more disk/datanodes is not an option, you may want to focus on investigating what is occupying the space. Hortonworks has made this a bit easier in the latest and greatest versions of HDP 2.6 / Ambari 2.5.x by adding a HDFS Top N feature to help cluster admins focus on areas of pressure on the NameNode.
... View more
04-03-2017
01:38 PM
1 Kudo
@Raj Kumar The pictures are not working for me, so I am answering based on an assumption that we see commonly. There are two layers on the HDP Sandbox (Docker and the VM). When using an SSH client, be sure to double check that you are using port 2222 for the Sandbox, and port 2122 if you need to access the Docker side (normally not needed unless you are adding ports).
... View more
03-27-2017
07:30 PM
@Marcy All of them can work. Their access to Hive is commonly done using a Notebook tool called Apache Zeppelin (included in the Hortonworks Data Platform). Hortonworks has many tutorials that can show you step by step on how to connect these: https://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/ https://hortonworks.com/hadoop-tutorial/getting-started-apache-zeppelin/
... View more
03-27-2017
06:53 PM
@Marcy If you disable the Hive CLI, your best and recommended option is to have users use Beeline for HiveQL. It is supported by Hortonworks, and is the most popular client. Additionally, you may wish to explore a GUI-based tool included in Ambari called the Ambari Hive View (which gets even better in the upcoming HDP 2.6 release). The first link I included outlines the major differences between Hive and Beeline for you, but in a nutshell, Beeline goes through HiveServer2 which means it will respect Ranger based authorization whereas Hive is more like a brute-force direct connection if you will and bypasses many of the security features. All of the options you listed are possible. When looking at different methods of accessing data in Hive, what you want to ensure is that they go through HiveServer2 so that the Ranger-based security is respected. This is normally Hadoop administrator's primary concern. Here is an additional link that goes over various Hive clients: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients In my experience, BeeLine and the Ambari Hive View is where most Hadoopers start their journey and remain until a use case comes along that requires additional technologies like Spark, R or Python.
... View more
03-27-2017
06:42 PM
2 Kudos
@Marcy Using the Hive CLI, the connection is direct to the Hive Metastore, and relies on Storage-based Authorization. To take advantage of the Ranger-based central security, Hortonworks recommends using Beeline (instead of the Hive CLI) as it will go through HiveServer2 and the Ranger-based policies will apply. In fact, in production environments, it is often suggested to have administrators disable the hive CLI and force users to issue CLI-based interactions through Beeline. Here are some relevant links that you may find useful. As
always, if you find this post useful, don't forget to upvote and/or accept the
answer. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-access/content/beeline-vs-hive-cli.html https://community.hortonworks.com/articles/10367/apache-ranger-and-hive-column-level-security.html https://community.hortonworks.com/questions/10760/how-to-disable-hive-shell-for-all-users.html
... View more
03-23-2017
05:38 PM
1 Kudo
@Eric England The goal of LogSearch was indeed for cluster components, and not third party logs. Hortonworks does provide a solution that could work for third party logs with HDP Search (Solr/Banana) on the HDP cluster that you may want to look at. Here are some links that may be useful: https://hortonworks.com/apache/solr/ http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_solr-search-installation/content/ch_hdp-search.html As always, if you find this post useful, don't forgot to upvote and/or
accept the answer.
... View more
03-20-2017
03:48 PM
@Emmanuel Portelli Can you also post a screenshot of how the network cards/config is setup for the VM? Normally, you should have two network cards, one of them being a host-only one.
... View more
03-16-2017
03:19 PM
5 Kudos
hi @Vinay R HDFS Snapshots are point in time copies of the filesystem and taken either on a dir or the entire FS, depending on the administrator's preferences/policies. When you take a snapshot using the -createSnapshot command on a dir, a ".snapshot" dir will be created (usually with a timestamp appended by default but can be something else if you wish). The blocks of data within that HDFS dir are then protected (meaning the dir becomes read-only), and any subsequent delete commands will alter the metadata stored in the namenode only. Since the blocks are preserved, one can use the snapshots to restore the data as well. There is no time-limit on snapshots, so you can recover blocks from a few weeks back in your example if someone took a snapshot of them before any delete commands. There are however upper limits on the number of simultaneous snapshots that can be taken (though it is large at 65536). When snapshots are being used, care should also be taken to ensure snapshots are also being cleaned up to avoid clogging up the system. Here are a couple of useful links on Snapshots that you may want to review: http://hortonworks.com/hadoop-tutorial/using-hdfs-snapshots-protect-important-enterprise-datasets/ https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html As always, if you find this post useful, don't forgot to upvote and/or accept the answer.
... View more
03-08-2017
06:54 PM
1 Kudo
@voca voca Was the hiveContext initiated properly? I think I've seen this error once before when the hiveContext instantiation was missed. It might be worth checking your Zeppelin notebook and looking for the paragraph that would match the instantiation lines from the tutorial (just above the section you are currently at): Instantiate HiveContext
... View more
02-23-2017
11:25 PM
1 Kudo
@Murray S The sandbox-version command should definitely work if you are SSHd into the correct environment using port 2222 and logged in as root. I think your command to start Ambari is invalid though, I believe it should be as follows: service ambari-server [status|start|stop|restart] The other way in is to use the web SSH tool: http://127.0.0.1:4200/ I find this a bit more convenient than using the direct access on VirtualBox. If you can login there with root, you can try changing the Ambari admin password again: ambari-admin-password-reset Once you change the Ambari admin password, try logging in as the admin via: http://127.0.0.1:8080 Otherwise you always have the option of deleting the Virtual Box VM and re-importing the appliance to start over again
... View more
02-23-2017
10:45 PM
1 Kudo
Hi @Murray S Can you confirm the port you were using to ssh into the Sandbox? I'm wondering if you were perhaps ssh'ing into the container environment instead. You should be using port 2222 to access the hadoop ecosystem related components, including changing the Ambari admin password via CLI, and stopping/starting components. So for example: ssh -l root -p 2222 127.0.0.1
... View more
02-22-2017
09:13 PM
1 Kudo
Hi @Vladislav Falfushinsky Hortonworks is aiming for early Q2 for Ambari 2.5 GA release! Very exciting.
... View more
02-22-2017
06:56 PM
1 Kudo
Hi, is there a reason you wish to add them using the Sandbox? The Sandbox, being a specialized product, is likely adding unnecessary complexity. It might be easier avoid the complications fro the Sandbox and just do a fresh Ambari-based install.
... View more
02-21-2017
10:07 PM
you can also use the shell in a box included in the Sandbox to remove ambiguity: http://127.0.0.1:4200/
... View more
01-26-2017
11:27 PM
3 Kudos
@diegoavella if you are having trouble with the Ambari admin credentials, you can always login/ssh to the sandbox as the root user and use an included command-line tool to reset the password: ssh root@127.0.0.1 -p 2222 then run: ambari-admin-password-reset There is a good explanation with screenshots in Step 2.2 of the following tutorial that might be a useful reference for you: http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/
... View more
12-16-2016
05:08 PM
With virtualbox, it is it a good idea to have a second adapter enabled and configured as a host-only adapter in the screenshot here.
... View more
12-15-2016
09:34 PM
@Arsalan Siddiqi Can you post screenshots of the virtual box network settings (specifically, adapter 1 and adapter 2), and also the output of ifconfig?
... View more
12-14-2016
05:04 PM
2 Kudos
Each use case may have different requirements. I know of several organizations using Hadoop that simply have not encountered a need for Yarn Node labels yet, but are all using the queues and the capacity scheduler heavily.
... View more
- « Previous
- Next »