About awoolford

awoolford · ‎04-04-2017

Thanks so much, @Beverley Andalora. I somehow missed this in the docs. 🙂 Per the link you sent, I added the following properties to Oozie-site and it's working perfectly: oozie.service.ProxyUserService.proxyuser.[ambari-server-cl1].groups=* oozie.service.ProxyUserService.proxyuser.[ambari-server-cl1].hosts=*

awoolford · ‎04-03-2017

I'm running a kerberized HDP cluster and am unable to access the Workflow Manager view because the Oozie check is failing. Looking at the Workflow Manager view logs (/var/log/ambari-server/wfmanager-view/wfmanager-view.log), I can see that a null pointer exception is thrown after the oozie admin URL is attempted to be accessed: 02 Apr 2017 18:33:41,326 INFO [main] [ ] OozieProxyImpersonator:111 - OozieProxyImpersonator initialized for instance: WORKFLOW_MANAGER_VIEW 02 Apr 2017 18:35:13,927 INFO [ambari-client-thread-85] [WORKFLOW_MANAGER 1.0.0 WORKFLOW_MANAGER_VIEW] OozieDelegate:149 - Proxy request for url: [GET] http://hdp1.woolford.io:11000/oozie/v1/admin/configuration 02 Apr 2017 18:35:14,163 ERROR [ambari-client-thread-85] [WORKFLOW_MANAGER 1.0.0 WORKFLOW_MANAGER_VIEW] OozieProxyImpersonator:484 - Error in GET proxy java.lang.RuntimeException: java.lang.NullPointerException I'm able to access the Oozie web UI and the admin configuration URL (http://hdp1.woolford.io:11000/oozie/v1/admin/configuration) in a browser, and so it appears that Oozie is working, e.g. I'm not sure if it's relevant to this issue, but the Oozie user can impersonate any user from any host (from core-site.xml): <property> <name>hadoop.proxyuser.oozie.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.oozie.groups</name> <value>*</value> </property> Can you see what I'm doing wrong? Is there any other relevant information I should add to this question?

awoolford · ‎03-22-2017

Thank you, @jfrazee. Per your suggestion (#2), I used HAproxy and it's working perfectly.

awoolford · ‎03-22-2017

By convention, syslog listens on port 514, which is a privileged port (i.e. < 1024) meaning that only processes running as root can access them. For security reasons, Nifi runs as a non-root user and so the ListenSyslog processor can't listen on port 514. Because port 514 is a standard for syslog, devices don't always have the option to output to different port, e.g. here's a screenshot from a firewall UI: If port 514 is used for the `ListenSyslog` processor, the processor is unable to bind the port and error messages containing `Caused by: java.net.SocketException: Permission denied` show up in /var/log/nifi-app.log. Is there an easy way to configure Nifi so that only ListenSyslog runs with root permissions? Or perhaps a workaround in Linux where messages destined for port 514 are forwarded to port 1514 so they can be picked up by the processor?

awoolford · ‎03-16-2017

Thanks @pdarvasi. The CLI tool source code was very helpful to understand the step that I missed (i.e role assignment). For some reason, the role assignment step is failing, e.g. [root@cloudbreak cloudbreak-deployment]# azure role assignment create --objectId 0d49187f-6ca7-4a27-b276-b570c8dcba5a -o Owner -c /subscriptions/7d204bd6-841e-43fb-8638-c5eedf2ea797 &> $APP_NAME-assign.log [root@cloudbreak cloudbreak-deployment]# cat awoolford-assign.log info: Executing command role assignment create info: Finding role with specified name info: Creating role assignment error: The client 'awoolford@hortonworks.com' with object id '7d18df3a-d9fc-41cf-902e-2fc26a7f0b67' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/********-****-****-****-*********797'. error: Error information has been recorded to /root/.azure/azure.err error: role assignment create command failed The associated error log has a very similar, but more verbose error: [root@cloudbreak cloudbreak-deployment]# cat /root/.azure/azure.err 2017-03-16T14:59:12.520Z: { Error: The client 'awoolford@hortonworks.com' with object id '7d18df3a-d9fc-41cf-902e-2fc26a7f0b67' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/********-****-****-****-*********797'. <<< async stack >>> at __1 (/usr/lib/node_modules/azure-cli/lib/commands/arm/role/role.assignment.js:152:55) <<< raw stack >>> at Function.ServiceClient._normalizeError (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/serviceclient.js:814:23) at /usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/filters/errorhandlingfilter.js:44:29 at Request._callback (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/http/request-pipeline.js:109:14) at Request.self.callback (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:187:22) at emitTwo (events.js:106:13) at Request.emit (events.js:191:7) at Request.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:1044:10) at emitOne (events.js:101:20) at Request.emit (events.js:188:7) at IncomingMessage.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:965:12) stack: [Getter/Setter], code: 'AuthorizationFailed', statusCode: 403, requestId: '49bd5570-2c2c-49a7-aead-c30581a158a2', __frame: { name: '__1', line: 73, file: '/usr/lib/node_modules/azure-cli/lib/commands/arm/role/role.assignment.js', prev: undefined, calls: 1, active: false, offset: 79, col: 54 }, rawStack: [Getter] } Error: The client 'awoolford@hortonworks.com' with object id '7d18df3a-d9fc-41cf-902e-2fc26a7f0b67' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/********-****-****-****-*********797'. <<< async stack >>> at __1 (/usr/lib/node_modules/azure-cli/lib/commands/arm/role/role.assignment.js:152:55) <<< raw stack >>> at Function.ServiceClient._normalizeError (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/serviceclient.js:814:23) at /usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/filters/errorhandlingfilter.js:44:29 at Request._callback (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/http/request-pipeline.js:109:14) at Request.self.callback (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:187:22) at emitTwo (events.js:106:13) at Request.emit (events.js:191:7) at Request.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:1044:10) at emitOne (events.js:101:20) at Request.emit (events.js:188:7) at IncomingMessage.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:965:12) I'm a bit confused, because I know this works for other people. I'd be surprised if my Azure account was setup with different permissions from my colleagues - though that's what the error seems to suggest.

awoolford · ‎03-16-2017

I'm trying to get Cloudbreak to deploy a cluster on Azure. The first step is to create a set of Azure credentials in Cloudbreak. To do this, it's necessary to create a resource group, storage account, application, and application service principal: # create a resource group in the West US region azure group create woolford "westus" # create a storage account in that resource group azure resource create woolford woolfordstorage "Microsoft.Storage/storageAccounts" "westus" -o "2015-06-15" -p "{\"accountType\": \"Standard_LRS\"}" # create an application and service principal azure ad sp create -n awoolford -p Password123 # info: Executing command ad sp create # + Creating application awoolford # + Creating service principal for application 2a105e3d-f330-4a6f-b5e3-57de672e91c1 # data: Object Id: d14aa306-9d7c-41a5-809b-c27f86167ad5 # data: Display Name: awoolford # data: Service Principal Names: # data: 2a105e3d-f330-4a6f-b5e3-57de672e91c1 # data: http://awoolford # info: ad sp create command OK Once this is done, I collected all the ID's required by Cloudbreak and created a set of credentials in the Cloudbreak UI: # get the subscription ID azure account list # info: Executing command account list # data: Name Id Current State # data: ------------- ------------------------------------ ------- -------- # data: SE ********-****-****-****-*********797 true Enabled # get the app owner tenant ID azure account show --json | jq -r '.[0].tenantId' # b60c9401-2154-40aa-9cff-5e3d1a20085d # get the storage account key azure storage account keys list woolfordstorage --resource-group woolford # info: Executing command storage account keys list # + Getting storage account keys # data: Name Key Permissions # data: ---- ---------------------------------------------------------------------------------------- ----------- # data: key1 a9jeK3iRSgHlGlgiM4HTCVnKPpgt7srFz+WE8bGz7tiUuTfVSjl8jRR/CuA+tQ6yiaNBtkTv3E5yGBsMW1H4Cg== Full # data: key2 ozhjirLlt3pp96lLtrPzaNziPQtfJ0QGiG+ETL9uJgQnM+vrMU/qhzVUa5fhdZ8xa6xItSH/NiImL45zir7KwA== Full # info: storage account keys list command OK When I try to launch the cluster in Cloudbreak an error is thrown: Cluster Status {error={code=AuthorizationFailed, message=The client 'bbd3275e-34ba-4614-94a7-4ed09cc0f3aa' with object id 'bbd3275e-34ba-4614-94a7-4ed09cc0f3aa' does not have authorization to perform action 'Microsoft.Resources/subscriptions/resourcegroups/write' over scope '/subscriptions/7d204bd6-841e-43fb-8638-c5eedf2ea797/resourcegroups/woolford-cloudbreak18'.}} It seems that there's a permissions issue in Azure and I'm not sure how to resolve it. Can you see what I'm doing wrong? Any suggestions?

awoolford · ‎03-01-2017

If you have permissions to query the Hive metastore database directly, you could: [root@hadoop01 ~]# mysql -u hive -p Enter password: mysql> USE hive; Database changed mysql> SELECT -> name AS db_name, -> tbl_name -> FROM TBLS -> INNER JOIN DBS -> ON TBLS.DB_ID = DBS.DB_ID; +----------+-------------------+ | db_name | tbl_name | +----------+-------------------+ | medicaid | physician_compare | | medicaid | sample | | medicaid | sample_orc | +----------+-------------------+ 3 rows in set (0.00 sec)

awoolford · ‎02-08-2017

@ed day: if you're trying to access files on the local file system, try adding the `file://` protocol to the path, e.g. `file:///home/ed/...`.

awoolford · ‎12-15-2016

A typical use-case for this sort of data would be to recommend items to a customer, based on what similar customers have purchased. In 2001, Amazon introduced item-based collaborative filtering and it's still popular today. This short, and very accessible IEEE paper describes the technique. There's a good practical example of collaborative filtering in the Spark docs: https://spark.apache.org/docs/1.6.2/mllib-collaborative-filtering.html

awoolford · ‎12-14-2016

Let's create two Hive tables: table_a and table_b. table_a contains the column you want to aggregate, and has only one record per id (i.e. the key): hive> CREATE TABLE table_a ( > id STRING, > quantity INT > ); hive> INSERT INTO table_a VALUES (1, 30); hive> INSERT INTO table_a VALUES (2, 20); hive> INSERT INTO table_a VALUES (3, 10); table_b has duplicate id's: note that id=1 appears twice: hive> CREATE TABLE table_b ( > id STRING > ); hive> INSERT INTO table_b VALUES (1); hive> INSERT INTO table_b VALUES (1); hive> INSERT INTO table_b VALUES (2); hive> INSERT INTO table_b VALUES (3); If we aggregate the quantity column in table_a, we see that the aggregated quantity is 60: hive> SELECT > SUM(quantity) > FROM table_a; 60 If we join the table_a and table_b together, you can see that the duplicate keys in table_b have caused there to be four rows, and not three: hive> SELECT > * > FROM table_a > LEFT JOIN table_b > ON table_a.id = table_b.id; 1 30 1 1 30 1 2 20 2 3 10 3 Since joins happen before aggregations, when we aggregate the quantity in table_a, the quantity for id=1 has been duplicated: hive> SELECT > SUM(quantity) > FROM table_a > LEFT JOIN table_b > ON table_a.id = table_b.id; 90 I suspect that's what's happening with your query.

Online	Offline
Last Visited	‎02-16-2018 03:53 PM

Member Since	‎07-05-2016 12:34 PM
Last Visited	‎02-16-2018 03:53 PM
Posts	42
Kudos received	32

Cloudera Community

Re: Nifi - Listen Syslog - getting failed @OnSched...

Re: R plots in Zeppelin look fuzzy

Re: How to see a table belong to which database? o...

Re: aggregate function with join gives wrong valu...

Re: Falcon - hbase

Re: unable to access Workflow Manager view on kerb...

unable to access Workflow Manager view on kerberiz...

Re: ListenSyslog won't listen on port 514 because ...

ListenSyslog won't listen on port 514 because it's...

Re: Cloudbreak on Azure authorization error: clien...

Cloudbreak on Azure authorization error: client do...

Re: How to see a table belong to which database? o...

Re: Hive cannot see jar

Re: Groceries - Supersied algorithm proposal in Sp...

Re: aggregate function with join gives wrong valu...