Member since
07-05-2016
42
Posts
32
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3637 | 02-13-2018 10:56 PM | |
1600 | 08-25-2017 05:25 AM | |
9615 | 03-01-2017 05:01 AM | |
5102 | 12-14-2016 07:00 AM | |
1228 | 12-13-2016 05:43 PM |
04-04-2017
03:24 AM
Thanks so much, @Beverley Andalora. I somehow missed this in the docs. 🙂 Per the link you sent, I added the following properties to Oozie-site and it's working perfectly: oozie.service.ProxyUserService.proxyuser.[ambari-server-cl1].groups=*
oozie.service.ProxyUserService.proxyuser.[ambari-server-cl1].hosts=*
... View more
04-03-2017
01:39 AM
I'm running a kerberized HDP cluster and am unable to access the Workflow Manager view because the Oozie check is failing. Looking at the Workflow Manager view logs (/var/log/ambari-server/wfmanager-view/wfmanager-view.log), I can see that a null pointer exception is thrown after the oozie admin URL is attempted to be accessed: 02 Apr 2017 18:33:41,326 INFO [main] [ ] OozieProxyImpersonator:111 - OozieProxyImpersonator initialized for instance: WORKFLOW_MANAGER_VIEW
02 Apr 2017 18:35:13,927 INFO [ambari-client-thread-85] [WORKFLOW_MANAGER 1.0.0 WORKFLOW_MANAGER_VIEW] OozieDelegate:149 - Proxy request for url: [GET] http://hdp1.woolford.io:11000/oozie/v1/admin/configuration
02 Apr 2017 18:35:14,163 ERROR [ambari-client-thread-85] [WORKFLOW_MANAGER 1.0.0 WORKFLOW_MANAGER_VIEW] OozieProxyImpersonator:484 - Error in GET proxy
java.lang.RuntimeException: java.lang.NullPointerException I'm able to access the Oozie web UI and the admin configuration URL (http://hdp1.woolford.io:11000/oozie/v1/admin/configuration) in a browser, and so it appears that Oozie is working, e.g. I'm not sure if it's relevant to this issue, but the Oozie user can impersonate any user from any host (from core-site.xml): <property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property> Can you see what I'm doing wrong? Is there any other relevant information I should add to this question?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Oozie
03-22-2017
03:46 PM
Thank you, @jfrazee. Per your suggestion (#2), I used HAproxy and it's working perfectly.
... View more
03-22-2017
01:59 PM
By convention, syslog listens on port 514, which is a privileged port (i.e. < 1024) meaning that only processes running as root can access them. For security reasons, Nifi runs as a non-root user and so the ListenSyslog processor can't listen on port 514. Because port 514 is a standard for syslog, devices don't always have the option to output to different port, e.g. here's a screenshot from a firewall UI: If port 514 is used for the `ListenSyslog` processor, the processor is unable to bind the port and error messages containing `Caused by: java.net.SocketException: Permission denied` show up in /var/log/nifi-app.log. Is there an easy way to configure Nifi so that only ListenSyslog runs with root permissions? Or perhaps a workaround in Linux where messages destined for port 514 are forwarded to port 1514 so they can be picked up by the processor?
... View more
Labels:
- Labels:
-
Apache NiFi
03-16-2017
03:32 PM
Thanks @pdarvasi. The CLI tool source code was very helpful to understand the step that I missed (i.e role assignment). For some reason, the role assignment step is failing, e.g. [root@cloudbreak cloudbreak-deployment]# azure role assignment create --objectId 0d49187f-6ca7-4a27-b276-b570c8dcba5a -o Owner -c /subscriptions/7d204bd6-841e-43fb-8638-c5eedf2ea797 &> $APP_NAME-assign.log
[root@cloudbreak cloudbreak-deployment]# cat awoolford-assign.log
info: Executing command role assignment create
info: Finding role with specified name
info: Creating role assignment
error: The client 'awoolford@hortonworks.com' with object id '7d18df3a-d9fc-41cf-902e-2fc26a7f0b67' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/********-****-****-****-*********797'.
error: Error information has been recorded to /root/.azure/azure.err
error: role assignment create command failed The associated error log has a very similar, but more verbose error: [root@cloudbreak cloudbreak-deployment]# cat /root/.azure/azure.err
2017-03-16T14:59:12.520Z:
{ Error: The client 'awoolford@hortonworks.com' with object id '7d18df3a-d9fc-41cf-902e-2fc26a7f0b67' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/********-****-****-****-*********797'.
<<< async stack >>>
at __1 (/usr/lib/node_modules/azure-cli/lib/commands/arm/role/role.assignment.js:152:55)
<<< raw stack >>>
at Function.ServiceClient._normalizeError (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/serviceclient.js:814:23)
at /usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/filters/errorhandlingfilter.js:44:29
at Request._callback (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/http/request-pipeline.js:109:14)
at Request.self.callback (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:187:22)
at emitTwo (events.js:106:13)
at Request.emit (events.js:191:7)
at Request.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:1044:10)
at emitOne (events.js:101:20)
at Request.emit (events.js:188:7)
at IncomingMessage.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:965:12)
stack: [Getter/Setter],
code: 'AuthorizationFailed',
statusCode: 403,
requestId: '49bd5570-2c2c-49a7-aead-c30581a158a2',
__frame:
{ name: '__1',
line: 73,
file: '/usr/lib/node_modules/azure-cli/lib/commands/arm/role/role.assignment.js',
prev: undefined,
calls: 1,
active: false,
offset: 79,
col: 54 },
rawStack: [Getter] }
Error: The client 'awoolford@hortonworks.com' with object id '7d18df3a-d9fc-41cf-902e-2fc26a7f0b67' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/********-****-****-****-*********797'.
<<< async stack >>>
at __1 (/usr/lib/node_modules/azure-cli/lib/commands/arm/role/role.assignment.js:152:55)
<<< raw stack >>>
at Function.ServiceClient._normalizeError (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/serviceclient.js:814:23)
at /usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/filters/errorhandlingfilter.js:44:29
at Request._callback (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/http/request-pipeline.js:109:14)
at Request.self.callback (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:187:22)
at emitTwo (events.js:106:13)
at Request.emit (events.js:191:7)
at Request.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:1044:10)
at emitOne (events.js:101:20)
at Request.emit (events.js:188:7)
at IncomingMessage.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:965:12) I'm a bit confused, because I know this works for other people. I'd be surprised if my Azure account was setup with different permissions from my colleagues - though that's what the error seems to suggest.
... View more
03-16-2017
05:12 AM
I'm trying to get Cloudbreak to deploy a cluster on Azure. The first step is to create a set of Azure credentials in Cloudbreak. To do this, it's necessary to create a resource group, storage account, application, and application service principal: # create a resource group in the West US region
azure group create woolford "westus"
# create a storage account in that resource group
azure resource create woolford woolfordstorage "Microsoft.Storage/storageAccounts" "westus" -o "2015-06-15" -p "{\"accountType\": \"Standard_LRS\"}"
# create an application and service principal
azure ad sp create -n awoolford -p Password123
# info: Executing command ad sp create
# + Creating application awoolford
# + Creating service principal for application 2a105e3d-f330-4a6f-b5e3-57de672e91c1
# data: Object Id: d14aa306-9d7c-41a5-809b-c27f86167ad5
# data: Display Name: awoolford
# data: Service Principal Names:
# data: 2a105e3d-f330-4a6f-b5e3-57de672e91c1
# data: http://awoolford
# info: ad sp create command OK Once this is done, I collected all the ID's required by Cloudbreak and created a set of credentials in the Cloudbreak UI: # get the subscription ID
azure account list
# info: Executing command account list
# data: Name Id Current State
# data: ------------- ------------------------------------ ------- --------
# data: SE ********-****-****-****-*********797 true Enabled
# get the app owner tenant ID
azure account show --json | jq -r '.[0].tenantId'
# b60c9401-2154-40aa-9cff-5e3d1a20085d
# get the storage account key
azure storage account keys list woolfordstorage --resource-group woolford
# info: Executing command storage account keys list
# + Getting storage account keys
# data: Name Key Permissions
# data: ---- ---------------------------------------------------------------------------------------- -----------
# data: key1 a9jeK3iRSgHlGlgiM4HTCVnKPpgt7srFz+WE8bGz7tiUuTfVSjl8jRR/CuA+tQ6yiaNBtkTv3E5yGBsMW1H4Cg== Full
# data: key2 ozhjirLlt3pp96lLtrPzaNziPQtfJ0QGiG+ETL9uJgQnM+vrMU/qhzVUa5fhdZ8xa6xItSH/NiImL45zir7KwA== Full
# info: storage account keys list command OK When I try to launch the cluster in Cloudbreak an error is thrown: Cluster Status
{error={code=AuthorizationFailed, message=The client 'bbd3275e-34ba-4614-94a7-4ed09cc0f3aa' with object id 'bbd3275e-34ba-4614-94a7-4ed09cc0f3aa' does not have authorization to perform action 'Microsoft.Resources/subscriptions/resourcegroups/write' over scope '/subscriptions/7d204bd6-841e-43fb-8638-c5eedf2ea797/resourcegroups/woolford-cloudbreak18'.}} It seems that there's a permissions issue in Azure and I'm not sure how to resolve it. Can you see what I'm doing wrong? Any suggestions?
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
03-01-2017
05:01 AM
3 Kudos
If you have permissions to query the Hive metastore database directly, you could: [root@hadoop01 ~]# mysql -u hive -p
Enter password:
mysql> USE hive;
Database changed
mysql> SELECT
-> name AS db_name,
-> tbl_name
-> FROM TBLS
-> INNER JOIN DBS
-> ON TBLS.DB_ID = DBS.DB_ID;
+----------+-------------------+
| db_name | tbl_name |
+----------+-------------------+
| medicaid | physician_compare |
| medicaid | sample |
| medicaid | sample_orc |
+----------+-------------------+
3 rows in set (0.00 sec)
... View more
02-08-2017
03:22 PM
@ed day: if you're trying to access files on the local file system, try adding the `file://` protocol to the path, e.g. `file:///home/ed/...`.
... View more
12-15-2016
11:40 PM
1 Kudo
A typical use-case for this sort of data would be to recommend items to a customer, based on what similar customers have purchased. In 2001, Amazon introduced item-based collaborative filtering and it's still popular today. This short, and very accessible IEEE paper describes the technique. There's a good practical example of collaborative filtering in the Spark docs: https://spark.apache.org/docs/1.6.2/mllib-collaborative-filtering.html
... View more
12-14-2016
07:00 AM
1 Kudo
Let's create two Hive tables: table_a and table_b. table_a contains the column you want to aggregate, and has only one record per id (i.e. the key): hive> CREATE TABLE table_a (
> id STRING,
> quantity INT
> );
hive> INSERT INTO table_a VALUES (1, 30);
hive> INSERT INTO table_a VALUES (2, 20);
hive> INSERT INTO table_a VALUES (3, 10); table_b has duplicate id's: note that id=1 appears twice: hive> CREATE TABLE table_b (
> id STRING
> );
hive> INSERT INTO table_b VALUES (1);
hive> INSERT INTO table_b VALUES (1);
hive> INSERT INTO table_b VALUES (2);
hive> INSERT INTO table_b VALUES (3);
If we aggregate the quantity column in table_a, we see that the aggregated quantity is 60: hive> SELECT
> SUM(quantity)
> FROM table_a;
60 If we join the table_a and table_b together, you can see that the duplicate keys in table_b have caused there to be four rows, and not three: hive> SELECT
> *
> FROM table_a
> LEFT JOIN table_b
> ON table_a.id = table_b.id;
1 30 1
1 30 1
2 20 2
3 10 3 Since joins happen before aggregations, when we aggregate the quantity in table_a, the quantity for id=1 has been duplicated: hive> SELECT
> SUM(quantity)
> FROM table_a
> LEFT JOIN table_b
> ON table_a.id = table_b.id;
90 I suspect that's what's happening with your query.
... View more