Member since
09-29-2015
14
Posts
8
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1144 | 04-04-2016 07:18 PM | |
1409 | 10-14-2015 03:39 PM |
05-18-2016
08:52 PM
If you are using the Sandbox with Virtual Box, you need to open several additional ports. You need all the HBase and Region Server ports forwarded, 16000, 16010, 16020, 16030. I think I had to add Zookeeper's port too; 2181. To do this, click on the Settings button for the VM instance you are working with (HDP 2.3.x or HDP 2.4). Then click the Network button in the pop up window. There is a port forwarding button at the bottom, click it. Click to add each of the following ports that you don't currently have. Follow the pattern you see. Here is an example: I hope this helps. Eric
... View more
05-16-2016
01:07 PM
Pedro, You are thinking correctly. The best way to leverage Hadoop is store all raw data in HDFS. The goal it to keep raw data as long as possible (cheap storage) in its original format. Then materialize views of data in Hive (transformations, cleansing and aggregations) for SQL workloads. Now you are ready to analyze data with any enterprise tool that has an ODBC/JDBC interface to connect to Hive (Excel, MicroStrategy, etc.). Spark is also a perfect tool to bring in Hive data for analysis. Try using the Zeppelin Notebook to make it really easy. Any output can be written from Spark back to Hive for consumption by any tool as mentioned above. I hope this helps. Eric
... View more
04-04-2016
07:18 PM
2 Kudos
If you don't have Ranger Admin HA, then you can't change, add, or remove policies if the GUI is down. So really it is to ensure you admins can manage policies without down time. I guess you would make a decision on whether you need this based on how often you manage your policies. I hope this helps. Eric
... View more
03-30-2016
06:16 PM
Narasimha, Here are some great docs on Knox, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_Knox_Gateway_Admin_Guide/content/ch01.html. Also notice the guides posted by others here to help you with the setup. Eric
... View more
03-29-2016
02:11 PM
1 Kudo
You can use any load balancer in front of Knox and typically a round robin pattern works well. You will need at least 2 Knox instances to ensure HA, but I would look at your expected load and make sure the servers that are up during a failure can handle the load. For example, if you have 2 instances of Knox and one fails, can the one left active handle the load. If not, you may need 3 or more Knox instance. I hope this helps.
... View more
10-14-2015
03:39 PM
1 Kudo
@Cassandra, Ideally, you don't need to backup HDFS since it stores 3 copies by default. If you need a DR strategy, a good strategy is to have a separate cluster in another datacenter. Use Apache Falcon or distcp to mirror the data to the DR cluster. If you need to backup certain high value datasets, take a snapshot of the data and back it up to tape (ugh!) or put it on your corporate SAN/NAS (if permitted). This will give you a way to recover the data if disaster strikes. I don't know if you are adverse to cloud storage (based on your S3 comment), but it is cheap and online all the time to recover data when needed. I hope this helps, Eric
... View more