About jstraub

jstraub · ‎01-23-2016

Agree! Upgrades are more complicated than just adding a service or changing some configuration. From an ops. perspective I want to see whats happening and control the upgrade process.

jstraub · ‎01-21-2016

I dont think HBase should be a DataLake (storing many files with different sizes and formats), but you can certainly use HBase to store the content of your small files (depending on the content, whats in those files?). HBase is massively scalable, look at this example https://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919/ Facebook is storing billions of messages in their HBase (Hydrabase) setup and Bloomberg is using HBase to store TB of data and respond to about 5billion requests per day (http://www.slideshare.net/HBaseCon/case-studies-session-4a-35937605)

jstraub · ‎01-21-2016

Hive-Servers are managed via Zookeeper, so you can connect to Zookeeper ... zookeeper-client -server <zookeeper-server> ...and read the "hiverserver2" Znode (Note: the Znode is configured via hive.server2.zookeeper.namespace) ls /hiveserver2 As a result you get all the available Hive-Servers [serverUri=horton02.example.com:10000;version=1.2.1.2.3.2.0-2950;sequence=0000000014, serverUri=horton03.example.com:10000;version=1.2.1.2.3.2.0-2950;sequence=0000000015]

jstraub · ‎01-21-2016

According to the tag Ambari 2.2

jstraub · ‎01-20-2016

Thats not possible, you need some kind of middleware between your frontend (html/jquery) and your data service (hive). So you basically have to create a backend, e.g. with Spring or Play, which is taking requests from your frontend, querying hive and sending the result back to your frontend as soon as the Hive query was executed. You can also use ODBC. Take a look at this http://hortonworks.com/hadoop-tutorial/how-use-excel-2013-to-analyze-hadoop-data/ Is the website (html/jquery) only used to display data for a single user or will this be something like, one of many users visits a private page and individual data is pulled from hive and displayed on the frontend.

jstraub · ‎01-20-2016

@Vipin Rathor Great question 🙂 I have implemented a script at one of my customer that is actually adding policies and hdfs directories automatically as soon as a new users joins an AD group, so here is the part about how to use the RestAPI of Ranger to add policies. HDFS Policy Template: { "policyName": "name_of_policy", "resourceName": "/path1,/path2/blub", "description": "", "repositoryName": "", "repositoryType": "hdfs", "isEnabled": "true", "isRecursive": "true", "isAuditEnabled": "true", "permMapList": [{ "groupList": ["somegroup"], "permList": ["Read","Execute", "Write", "Admin"] }] } Curl: curl -iv -u <user>:<password> -d @<policy payload> -H "Content-Type: application/json" -X POST http://<RANGER-Host>:6080/service/public/api/policy/ Hive Policy Template: { "policyName":"name_of_policy", "databases":"db1,db2", "tables":"mytable,yourtable", "columns":"", "udfs":"", "description":"", "repositoryName":"", "repositoryType":"hive", "tableType":"Inclusion", "columnType":"Inclusion", "isEnabled":"true", "isAuditEnabled":"true", "permMapList": [{ "groupList": ["somegroup"], "permList": ["Select"] }] } Curl: curl -iv -u <user>:<password> -d @<policy payload> -H "Content-Type: application/json" -X POST http://<RANGER-Host>:6080/service/public/api/policy/ Getting Policies I just tested the Rest API to get some of my policies from Ranger, it worked. Make sure the Policy ID is valid, otherwise you'll get a "Data not found" error. Curl curl -iv -u <user>:<password> -H "Content-type:application/json" -X GET http://horton01.example.com:6080/service/public/api/policy/2 Result: { "id":2, "createDate":"2015-11-21T07:03:21Z", "updateDate":"2015-12-08T05:54:24Z", "owner":"Admin", "updatedBy":"Admin", "policyName":"Ranger_audits", "resourceName":"/apps/solr/ranger_audits", "description":"", "repositoryName":"bigdata_hadoop", "repositoryType":"hdfs", "permMapList":[ { "userList":[ "solr" ], "groupList":[ ], "permList":[ "Read", "Write", "Execute" ] } ], "isEnabled":true, "isRecursive":true, "isAuditEnabled":false, "version":"5", "replacePerm":false } Let me know if you have any questions

jstraub · ‎01-20-2016

Awesome, good to hear. Good Luck with your Coursera course 🙂

jstraub · ‎01-19-2016

Unfortunately, this is one of the remaining Yarn components that does not support HA at the moment. However there are already plans for a new Timeline Server (v2), which will be more scalable and reliable. If your Timeline Server is unavailable the client will retry to publish the application data a couple of times before its giving up. This can be configured using "yarn.timeline-service.client.max-retries" (defaults to 30) Check out this page https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServer.html

jstraub · ‎01-19-2016

I am not familiar with this Coursera course and Hadoop setup. What course is this? You are getting a "permission denied"-error because you are trying to access a folder that is owned by the hdfs-user and the permissions do not allow write access from others. A) You could use the HDFS-user to run your application/script su hdfs or export HADOOP_USER_NAME=hdfs B) Change the owner of the mp2-folder (note: to change the owner you have to be a superuser or the owner => hdfs) hdfs dfs -chown -R <username_of_new_owner> /mp2

jstraub · ‎01-19-2016

Could you post some of you heap configurations? How much memory is available on the machine? OOM error usually means the heap configuration is not correct or their is not enough memory available on the machine. You also might want to check the open files limit (ulimit -a), if its too low it can cause OOM errors. (see https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_installing_manually_book/content/ref-729d1fb0-6d1b-459f-a18a-b5eba4540ab5.1.html) Even though you might be able to run Hadoop on a 32bit system, I wouldn't recommend it. You should use a 64bit system (see http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_installing_manually_book/content/meet-min-system-requirements.html)

Online	Offline
Last Visited	‎08-18-2019 08:21 AM

Member Since	‎09-15-2015 02:21 PM
Last Visited	‎08-18-2019 08:21 AM
Posts	457
Kudos received	472

Cloudera Community

Re: NiFi: How do I see the flowfile attributes nam...

Re: NiFi: JSON Array split

Re: Securing Solr with Ranger ERROR 500

Re: Is Ambari Infra open source?

Re: After disabling kerberos , ZKfailover not comi...

Re: Can we automate Upgrade/Downgrade HDP

Re: Can I use Hbase as a datalake

Re: zookeeper register components

Re: hive settings missing in ambari

Re: how to fetch hive data from html/jquery web pa...

Re: REST api URL to configure Ranger objects

Re: mkdir: Permission denied: user=root, access=WR...

Re: HA for History Server and App Timeline Server

Re: mkdir: Permission denied: user=root, access=WR...

Re: Java heap space