Member since
07-05-2016
42
Posts
32
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1944 | 02-13-2018 10:56 PM | |
1050 | 08-25-2017 05:25 AM | |
6662 | 03-01-2017 05:01 AM | |
3580 | 12-14-2016 07:00 AM | |
463 | 12-13-2016 05:43 PM |
02-14-2018
12:12 AM
1 Kudo
You might start by using the `logger` command to send some sample syslog messages. Don't forget to add the `--port 1514` argument. Try running that on the Nifi host, and then on a host that's external to your Nifi cluster. If it works from a Nifi host but not from outside Nifi, you might need to tweak iptables or a firewall rule. You might try using tcpdump to monitor network traffic for port 1514. I'd also recommend running a `tail -f /var/log/nifi/nifi-app.log` on the Nifi host(s) while you're running the syslog listener to see if there are any interesting messages.
... View more
02-13-2018
10:56 PM
Port 514 is a privileged port. This means it can only be accessed by a superuser. Since there are security implications to running Nifi as root, it is typically run as the nifi user. There are a couple of options: run the syslog listener on a port > 1024, e.g. port 1514 instead of 514. use iptables to forward port 514 external to a non-privileged internal port, and have the syslog listener listen to that port. use authbind to allow the Nifi user permissions to bind to port 514.
... View more
10-17-2017
03:59 AM
1 Kudo
In Cloudbreak, there are two ways to launch clusters on Azure:
interactive login: requires admin or co-admin credentials on Azure. I don't have these permissions.
app based: can deploy a cluster using an existing 'Contributor' role.
Cloudbreak requires the following attributes in order to launch a cluster using the app based method: subscription id, tenant id, app id, and password. Here's what we did to get them:
# login
az login
# create resource group
az group create --name woolford --location westus
# subscription ID
az account show | jq -r '.id'
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx797
# tenant ID
az account show | jq -r '.tenantId'
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx85d
# create an application
az ad app create --display-name woolford --homepage https://woolford.azurehdinsight.net --identifier-uris https://woolford.azurehdinsight.net --password myS3cret!
# get the application ID
az ad app list --query "[?displayName=='woolford']" | jq -r '.[0].appId'
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxa31
We tried to deploy the a cluster with Cloudbreak and received the following error:
Failed to verify the credential: Status code 401, {"error":{"code":"InvalidAuthenticationToken","message":"The received access token is not valid: at least one of the claims 'puid' or 'altsecid' or 'oid' should be present. If you are accessing as application please make sure service principal is properly created in the tenant."}}
We then attempted to create the service the service principal:
az ad sp create-for-rbac --name woolford --password "myS3cret!" --role Owner (same outcome for --role Contributor)
... and received the following error:
role assignment response headers: {'Cache-Control': 'no-cache', 'Pragma': 'no-cache', 'Content-Type': 'application/json; charset=utf-8', 'Expires': '-1', 'x-ms-failure-cause': 'gateway', 'x-ms-request-id': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxe01', 'x-ms-correlation-request-id': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxe01', 'x-ms-routing-request-id': 'EASTUS:20171017T025354Z:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxe01', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'Date': 'Tue, 17 Oct 2017 02:53:53 GMT', 'Connection': 'close', 'Content-Length': '305'}
The client 'awoolford@hortonworks.com' with object id 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxb67' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/7xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx797'.
Can you see what we're doing wrong? Is it possible to create a service principal for an application that I created (if I'm not an admin or co-admin)? If so, how?
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
08-25-2017
05:25 AM
1 Kudo
Zeppelin stores a lot of settings in interpreter.json The default dpi (dots per inch) for R plot is 72, hence the blurry plots. This value can be increased by adding a dpi property to Zeppelin's R render options. Search for the "zeppelin.R.render.options" key and add "dpi=300": "zeppelin.R.render.options": "out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F, dpi=300", You can see an example of the output below:
... View more
08-24-2017
11:22 PM
1 Kudo
R's ggplot2 is a popular and versatile data visualization package.
The facet plots are a particularly useful method to identify characteristics and anomalies. We can break a scatter plot into facets simply by adding + facet_wrap(~myVar) to an existing plot. Here's an example plot (screenshot from RStudio):
Zeppelin supports the R interpreter. Providing R is installed, we can run R commands inside a notebook cell:
In this case, we ran a Hive query and created a facet plot.
The question: it looks to me like Zeppelin has created a raster image that's been upscaled and therefore looks fuzzy. In RStudio, the plots look a lot crisper. I notice that there's an open JIRA for this: https://issues.apache.org/jira/browse/ZEPPELIN-1445 Is there a way to make ggplots look crisp in Zeppelin? Is there a way to render plots as PDF's, i.e. a vector format that doesn't get blurry when scaled, and then display those PDF's inside of the Zeppelin notebook?
... View more
Labels:
- Labels:
-
Apache Zeppelin
08-21-2017
04:05 PM
6 Kudos
haveibeenpwned has downloadable files that contains about 320 million password hashes that have been involved in known data breaches. This site has a search feature that allows you to check whether a password exists in the list of known breached passwords. From a security perspective, entering passwords into a public website is a very bad idea. Thankfully, the downloadable files make it possible to perform this analysis offline. Fast random access of a dataset that contains hundreds of millions of records is a great fit for HBase. Queries execute in a few milliseconds. In the example below, we'll load the data into HBase. We'll then use a few lines of Python to convert passwords into a SHA-1 hash and query HBase to see if they exist in the pwned list. On a cluster node, download the files: wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-1.0.txt.7z
wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-update-1.txt.7z
wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-update-2.txt.7z The files are in 7zip format which, on CentOS can be unzipped: 7za x pwned-passwords-1.0.txt.7z
7za x pwned-passwords-update-1.txt.7z
7za x pwned-passwords-update-2.txt.7z Unzipped, the raw data looks like this: [hdfs@hdp01 ~]$ head -n 3 pwned-passwords-1.0.txt
00000016C6C075173C163757BCEA8139D4CC69CF
00000042F053B3F16733DFB83D431126D64331FC
000003449AD45B0DB016B895EC6CEA92EA2F91BE Note that the hashes are in all caps. Now we create an HDFS location for these files and upload them: hdfs dfs -mkdir /data/pwned-hashes
hdfs dfs -copyFromLocal pwned-passwords-1.0.txt /data/pwned-hashes
hdfs dfs -copyFromLocal pwned-passwords-update-1.txt /data/pwned-hashes
hdfs dfs -copyFromLocal pwned-passwords-update-2.txt /data/pwned-hashes We can then create an external Hive table: CREATE EXTERNAL TABLE pwned_hashes (
sha1 STRING
)
ROW FORMAT DELIMITED
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/data/pwned-hashes'; Hive has storage handlers that enable us to query hive using the familiar SQL syntax, and benefit from the characteristics of the underlying database technology. In this case, we'll create an HBase backed Hive table: CREATE TABLE `pwned_hashes_hbase` (
`sha1` string,
`hash_exists` boolean)
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=':key,hash_exists:hash_exists',
'serialization.format'='1')
TBLPROPERTIES (
'hbase.mapred.output.outputtable'='pwned_hashes',
'hbase.table.name'='pwned_hashes') Note the second column, 'hash_exists', in the HBase backed table. It's necessary to do this because HBase is a columnar database and cannot return just a rowkey. Now we can simply insert the data into the HBase table using Hive: INSERT INTO pwned_hashes_hbase SELECT sha1, true FROM pwned_hashes; In order to query this HBase table, Python has an easy to use HBase library called HappyBase that relies on the thrift protocol. In order to use this, it's necessary to start thrift: /usr/hdp/2.6.1.0-129/hbase/bin/hbase-daemon.sh start thrift -p 9090 --infoport 9095 We wrote a small Python function that takes a password, converts it to an (upper case) SHA-1 hash, and then checks the HBase `pwned_hashes` table to see if it exists: import happybase
import hashlib
def pwned_check(password):
connection = happybase.Connection(host='hdp01.woolford.io', port=9090)
table = connection.table('pwned_hashes')
sha1 = hashlib.sha1(password).hexdigest().upper()
row = table.row(sha1)
if row:
return True
else:
return False For example: >>> pwned_check('G0bbleG0bble')
True
>>> pwned_check('@5$~ lPaQ5<.`')
False For folks who prefer Java, we also created a RESTful 'pwned-check' service using Spring Boot: https://github.com/alexwoolford/pwned-check We were surprised to find some of our own hard-to-guess passwords in this dataset. Thanks to @Timothy Spann for identifying the haveibeenpwned datasource. This was a fun micro-project.
... View more
- Find more articles tagged with:
- CyberSecurity
- HBase
- How-ToTutorial
- leak
- password
Labels:
08-11-2017
01:42 AM
1 Kudo
I'm curious to know how the Avro data was serialized. I suspect you're experiencing the same issue as me (see https://community.hortonworks.com/questions/114646/sam-application-unknown-protocol-id-12-received-wh.html) and possibly @Brad Penelli (see https://community.hortonworks.com/questions/114758/sam-application-kafka-source-fails.html).
... View more
07-24-2017
04:44 PM
1 Kudo
I created a minimal SAM application that reads Avro messages from Kafka and writes them to Druid: The Avro schema for the data in the Kafka topic was previously added to the schema registry: When I run the topology, the following error is thrown:
com.hortonworks.registries.schemaregistry.serde.SerDesException: Unknown protocol id [12] received while deserializing the payload at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.retrieveProtocolId(AvroSnapshotDeserializer.java:75) at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.retrieveProtocolId(AvroSnapshotDeserializer.java:32) at com.hortonworks.registries.schemaregistry.serde.AbstractSnapshotDeserializer.deserialize(AbstractSnapshotDeserializer.java:145) at com.hortonworks.streamline.streams.runtime.storm.spout.AvroKafkaSpoutTranslator.apply(AvroKafkaSpoutTranslator.java:61) at org.apache.storm.kafka.spout.KafkaSpout.emitTupleIfNotEmitted(KafkaSpout.java:335) at org.apache.storm.kafka.spout.KafkaSpout.emit(KafkaSpout.java:316) at org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:236) at org.apache.storm.daemon.executor$fn__5136$fn__5151$fn__5182.invoke(executor.clj:647) at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:745) I took a peek at the code that's throwing the SerDesException. It seems that the first byte of the Avro inputstream is supposed to contain the protocol version/id: protected byte retrieveProtocolId(InputStream inputStream) throws SerDesException {
// first byte is protocol version/id.
// protocol format:
// 1 byte : protocol version
byte protocolId;
try {
protocolId = (byte) inputStream.read();
} catch (IOException e) {
throw new SerDesException(e);
}
if (protocolId == -1) {
throw new SerDesException("End of stream reached while trying to read protocol id");
}
checkProtocolHandlerExists(protocolId);
return protocolId;
}
private void checkProtocolHandlerExists(byte protocolId) {
if (SerDesProtocolHandlerRegistry.get().getSerDesProtocolHandler(protocolId) == null) {
throw new SerDesException("Unknown protocol id [" + protocolId + "] received while deserializing the payload");
}
}
The first byte of the Avro inputstream appears to be a form-feed character (ASCII code 12): Looking at the registry metastore, the only ID that exists is a 2: mysql> SELECT id, type, schemaGroup, name FROM registry.schema_metadata_info;
+----+------+-------------+----------------------+
| id | type | schemaGroup | name |
+----+------+-------------+----------------------+
| 2 | avro | Kafka | temperature_humidity |
+----+------+-------------+----------------------+
1 row in set (0.00 sec)
I don't understand how the first byte of the Avro byte array could contain the ID for the schema registry unless it were created with a schema registry aware serializer. Can you see what I'm doing wrong?
... View more
Labels:
- Labels:
-
Schema Registry
06-06-2017
02:15 PM
HBase has a convenient REST service, e.g. if we create some records in an HBase table: $ hbase shell
hbase(main):001:0> create 'profile', 'demographics'
hbase(main):002:0> put 'profile', 1234, 'demographics:age', 42
hbase(main):003:0> put 'profile', 1234, 'demographics:gender', 'F'
hbase(main):004:0> put 'profile', 2345, 'demographics:age', 8
hbase(main):005:0> put 'profile', 2345, 'demographics:gender', 'M'
hbase(main):006:0> scan 'profile'
ROW COLUMN+CELL
1234 column=demographics:age, timestamp=1496754873362, value=42
1234 column=demographics:gender, timestamp=1496754880025, value=F
2345 column=demographics:age, timestamp=1496754886334, value=8
2345 column=demographics:gender, timestamp=1496754891898, value=M ... and start the HBase REST service: [root@hdp03 ~]# hbase rest start We can retrieve the values by making calls to the HBase REST service: $ curl 'http://hdp03.woolford.io:8080/profile/1234' -H "Accept: application/json"
{
"Row": [{
"key": "MTIzNA==",
"Cell": [{
"column": "ZGVtb2dyYXBoaWNzOmFnZQ==",
"timestamp": 1496754873362,
"$": "NDI="
}, {
"column": "ZGVtb2dyYXBoaWNzOmdlbmRlcg==",
"timestamp": 1496754880025,
"$": "Rg=="
}]
}]
}
I notice that the HBase column names and cell values returned by the HBase REST service are base64 encoded: $ python
>>> import base64
>>> base64.b64decode("MTIzNA==")
'1234'
>>> base64.b64decode("ZGVtb2dyYXBoaWNzOmFnZQ==")
'demographics:age'
>>> base64.b64decode("NDI=")
'42'
>>> base64.b64decode("ZGVtb2dyYXBoaWNzOmdlbmRlcg==")
'demographics:gender'
>>> base64.b64decode("Rg==")
'F'
That's great for machine-to-machine communication, e.g. a webservice, but isn't very user-friendly since base64 isn't human readable. Is there a simple way (e.g. header parameter, HBase property) to make the HBase REST service return human-readable JSON? I realize I could write my own service, but I'd rather re-use existing code/functionality if possible.
... View more
Labels:
- Labels:
-
Apache HBase
04-05-2017
03:49 AM
For private cloud, Cloudbreak can deploy clusters on OpenStack or Mesos. There's currently no option to deploy directly on ESXi. That's a good question and something that I've wanted.
... View more
04-05-2017
03:16 AM
show partitions mytable; Note: if you have more than 500 partitions, you may want to output to a file: $hive -e 'show partitions mytable;' > partitions ref: http://stackoverflow.com/questions/15616290/hive-how-to-show-all-partitions-of-a-table
... View more
04-04-2017
03:24 AM
Thanks so much, @Beverley Andalora. I somehow missed this in the docs. 🙂 Per the link you sent, I added the following properties to Oozie-site and it's working perfectly: oozie.service.ProxyUserService.proxyuser.[ambari-server-cl1].groups=*
oozie.service.ProxyUserService.proxyuser.[ambari-server-cl1].hosts=*
... View more
04-03-2017
01:39 AM
I'm running a kerberized HDP cluster and am unable to access the Workflow Manager view because the Oozie check is failing. Looking at the Workflow Manager view logs (/var/log/ambari-server/wfmanager-view/wfmanager-view.log), I can see that a null pointer exception is thrown after the oozie admin URL is attempted to be accessed: 02 Apr 2017 18:33:41,326 INFO [main] [ ] OozieProxyImpersonator:111 - OozieProxyImpersonator initialized for instance: WORKFLOW_MANAGER_VIEW
02 Apr 2017 18:35:13,927 INFO [ambari-client-thread-85] [WORKFLOW_MANAGER 1.0.0 WORKFLOW_MANAGER_VIEW] OozieDelegate:149 - Proxy request for url: [GET] http://hdp1.woolford.io:11000/oozie/v1/admin/configuration
02 Apr 2017 18:35:14,163 ERROR [ambari-client-thread-85] [WORKFLOW_MANAGER 1.0.0 WORKFLOW_MANAGER_VIEW] OozieProxyImpersonator:484 - Error in GET proxy
java.lang.RuntimeException: java.lang.NullPointerException I'm able to access the Oozie web UI and the admin configuration URL (http://hdp1.woolford.io:11000/oozie/v1/admin/configuration) in a browser, and so it appears that Oozie is working, e.g. I'm not sure if it's relevant to this issue, but the Oozie user can impersonate any user from any host (from core-site.xml): <property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property> Can you see what I'm doing wrong? Is there any other relevant information I should add to this question?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Oozie
03-22-2017
03:46 PM
Thank you, @jfrazee. Per your suggestion (#2), I used HAproxy and it's working perfectly.
... View more
03-22-2017
01:59 PM
By convention, syslog listens on port 514, which is a privileged port (i.e. < 1024) meaning that only processes running as root can access them. For security reasons, Nifi runs as a non-root user and so the ListenSyslog processor can't listen on port 514. Because port 514 is a standard for syslog, devices don't always have the option to output to different port, e.g. here's a screenshot from a firewall UI: If port 514 is used for the `ListenSyslog` processor, the processor is unable to bind the port and error messages containing `Caused by: java.net.SocketException: Permission denied` show up in /var/log/nifi-app.log. Is there an easy way to configure Nifi so that only ListenSyslog runs with root permissions? Or perhaps a workaround in Linux where messages destined for port 514 are forwarded to port 1514 so they can be picked up by the processor?
... View more
Labels:
- Labels:
-
Apache NiFi
03-16-2017
03:32 PM
Thanks @pdarvasi. The CLI tool source code was very helpful to understand the step that I missed (i.e role assignment). For some reason, the role assignment step is failing, e.g. [root@cloudbreak cloudbreak-deployment]# azure role assignment create --objectId 0d49187f-6ca7-4a27-b276-b570c8dcba5a -o Owner -c /subscriptions/7d204bd6-841e-43fb-8638-c5eedf2ea797 &> $APP_NAME-assign.log
[root@cloudbreak cloudbreak-deployment]# cat awoolford-assign.log
info: Executing command role assignment create
info: Finding role with specified name
info: Creating role assignment
error: The client 'awoolford@hortonworks.com' with object id '7d18df3a-d9fc-41cf-902e-2fc26a7f0b67' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/********-****-****-****-*********797'.
error: Error information has been recorded to /root/.azure/azure.err
error: role assignment create command failed The associated error log has a very similar, but more verbose error: [root@cloudbreak cloudbreak-deployment]# cat /root/.azure/azure.err
2017-03-16T14:59:12.520Z:
{ Error: The client 'awoolford@hortonworks.com' with object id '7d18df3a-d9fc-41cf-902e-2fc26a7f0b67' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/********-****-****-****-*********797'.
<<< async stack >>>
at __1 (/usr/lib/node_modules/azure-cli/lib/commands/arm/role/role.assignment.js:152:55)
<<< raw stack >>>
at Function.ServiceClient._normalizeError (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/serviceclient.js:814:23)
at /usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/filters/errorhandlingfilter.js:44:29
at Request._callback (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/http/request-pipeline.js:109:14)
at Request.self.callback (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:187:22)
at emitTwo (events.js:106:13)
at Request.emit (events.js:191:7)
at Request.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:1044:10)
at emitOne (events.js:101:20)
at Request.emit (events.js:188:7)
at IncomingMessage.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:965:12)
stack: [Getter/Setter],
code: 'AuthorizationFailed',
statusCode: 403,
requestId: '49bd5570-2c2c-49a7-aead-c30581a158a2',
__frame:
{ name: '__1',
line: 73,
file: '/usr/lib/node_modules/azure-cli/lib/commands/arm/role/role.assignment.js',
prev: undefined,
calls: 1,
active: false,
offset: 79,
col: 54 },
rawStack: [Getter] }
Error: The client 'awoolford@hortonworks.com' with object id '7d18df3a-d9fc-41cf-902e-2fc26a7f0b67' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/********-****-****-****-*********797'.
<<< async stack >>>
at __1 (/usr/lib/node_modules/azure-cli/lib/commands/arm/role/role.assignment.js:152:55)
<<< raw stack >>>
at Function.ServiceClient._normalizeError (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/serviceclient.js:814:23)
at /usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/services/filters/errorhandlingfilter.js:44:29
at Request._callback (/usr/lib/node_modules/azure-cli/node_modules/azure-common/lib/http/request-pipeline.js:109:14)
at Request.self.callback (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:187:22)
at emitTwo (events.js:106:13)
at Request.emit (events.js:191:7)
at Request.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:1044:10)
at emitOne (events.js:101:20)
at Request.emit (events.js:188:7)
at IncomingMessage.<anonymous> (/usr/lib/node_modules/azure-cli/node_modules/request/request.js:965:12) I'm a bit confused, because I know this works for other people. I'd be surprised if my Azure account was setup with different permissions from my colleagues - though that's what the error seems to suggest.
... View more
03-16-2017
05:12 AM
I'm trying to get Cloudbreak to deploy a cluster on Azure. The first step is to create a set of Azure credentials in Cloudbreak. To do this, it's necessary to create a resource group, storage account, application, and application service principal: # create a resource group in the West US region
azure group create woolford "westus"
# create a storage account in that resource group
azure resource create woolford woolfordstorage "Microsoft.Storage/storageAccounts" "westus" -o "2015-06-15" -p "{\"accountType\": \"Standard_LRS\"}"
# create an application and service principal
azure ad sp create -n awoolford -p Password123
# info: Executing command ad sp create
# + Creating application awoolford
# + Creating service principal for application 2a105e3d-f330-4a6f-b5e3-57de672e91c1
# data: Object Id: d14aa306-9d7c-41a5-809b-c27f86167ad5
# data: Display Name: awoolford
# data: Service Principal Names:
# data: 2a105e3d-f330-4a6f-b5e3-57de672e91c1
# data: http://awoolford
# info: ad sp create command OK Once this is done, I collected all the ID's required by Cloudbreak and created a set of credentials in the Cloudbreak UI: # get the subscription ID
azure account list
# info: Executing command account list
# data: Name Id Current State
# data: ------------- ------------------------------------ ------- --------
# data: SE ********-****-****-****-*********797 true Enabled
# get the app owner tenant ID
azure account show --json | jq -r '.[0].tenantId'
# b60c9401-2154-40aa-9cff-5e3d1a20085d
# get the storage account key
azure storage account keys list woolfordstorage --resource-group woolford
# info: Executing command storage account keys list
# + Getting storage account keys
# data: Name Key Permissions
# data: ---- ---------------------------------------------------------------------------------------- -----------
# data: key1 a9jeK3iRSgHlGlgiM4HTCVnKPpgt7srFz+WE8bGz7tiUuTfVSjl8jRR/CuA+tQ6yiaNBtkTv3E5yGBsMW1H4Cg== Full
# data: key2 ozhjirLlt3pp96lLtrPzaNziPQtfJ0QGiG+ETL9uJgQnM+vrMU/qhzVUa5fhdZ8xa6xItSH/NiImL45zir7KwA== Full
# info: storage account keys list command OK When I try to launch the cluster in Cloudbreak an error is thrown: Cluster Status
{error={code=AuthorizationFailed, message=The client 'bbd3275e-34ba-4614-94a7-4ed09cc0f3aa' with object id 'bbd3275e-34ba-4614-94a7-4ed09cc0f3aa' does not have authorization to perform action 'Microsoft.Resources/subscriptions/resourcegroups/write' over scope '/subscriptions/7d204bd6-841e-43fb-8638-c5eedf2ea797/resourcegroups/woolford-cloudbreak18'.}} It seems that there's a permissions issue in Azure and I'm not sure how to resolve it. Can you see what I'm doing wrong? Any suggestions?
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
03-01-2017
05:01 AM
3 Kudos
If you have permissions to query the Hive metastore database directly, you could: [root@hadoop01 ~]# mysql -u hive -p
Enter password:
mysql> USE hive;
Database changed
mysql> SELECT
-> name AS db_name,
-> tbl_name
-> FROM TBLS
-> INNER JOIN DBS
-> ON TBLS.DB_ID = DBS.DB_ID;
+----------+-------------------+
| db_name | tbl_name |
+----------+-------------------+
| medicaid | physician_compare |
| medicaid | sample |
| medicaid | sample_orc |
+----------+-------------------+
3 rows in set (0.00 sec)
... View more
02-08-2017
03:22 PM
@ed day: if you're trying to access files on the local file system, try adding the `file://` protocol to the path, e.g. `file:///home/ed/...`.
... View more
01-28-2017
01:46 AM
If you drill down into the word 'Failed' in the status column, you should get see a more specific error message. What does it say? Also, your RSA key looks strange to me. Here's an example of an RSA key that's very similar to mine: -----BEGIN RSA PRIVATE KEY-----
MIIEpQIBAAKCAQEAw9WR5oDCrSYhqzhjhBzV1SlQ54Pgpl6goScsvInn3uNJqV8E
d1yx2kyXARkfS+lya97UlHKS2DcU/AeX6AfCGPXdhz25j0CgZKM4lzgv7QxXY/mc
nQToRJA4yCLRcde3XiLn5MZCO0f8+czRz0OkSoBav+5MkcbiH3YvLXai4EzZuxu7
G00KP6icU1xdDQX9WSAww0URR8TGgsv1oZHxzMOgV1HuxdymZqs0OD0s2FJH6JkH
mGtqvXAMsbEDRhFqX3xDhgjglUZGGdlt4KLgQwR+Ktkem4FoJHJmJ8hdsHRYRfJc
UvvwOA5owVQPQqftXIcvP777SVSXPpK93BtuQQIDAQABAoIBAQCo6Top1d+UVzJt
K3ryhaiObk+BEQegmDf2KAL3L/+WCPcNJo6EoagpwSvx34hWAqoVjqJO1DACXCg2
ZhpJIP/yZYbI0p2NiNGPXBVAoU79KErsSW0jJgtsr/S58wYyKjzX7kWT1sljtmjl
0RsaqZ44QFOF/nV+u0tolZiFnzFHQp+nJCkXLihsElsgPRoCVcEAEMdm97gxgiCJ
y0cIkWIeHnG4fxPEEThcVPSsGBes7ZvqnubjdVRyw+iCz+rt6gGdGK2OZvJup/tF
ZpZPobm/WkXZZm8kSQjQUspbwCxCAfKhAAyH12R3KDHU5gWMqR4Sj19HsrhxMa+b
lN6nJsJBAoGBAOTAajhDLf3t38PCuzF0/j4M5TLn9TUwqpbEuXOiaFrv2UJGK3BQ
KD3EfGmfV2tagU0afHZ5Y10NQNdEpCcXaRFoGH8CHCAHnFb0kPvzTSU2qmZcTahf
d1yERuFAThmRAkiEXcyR6kqh09rn9i0uEFsX2p8BKiR/G4A17wE1jtwZAoGBANsp
Xj92itzdldruSxqKegMamtCIlEDJi4Pkg39ynQ6oIAoCDofm1Z3V4dn3wk0CqcLL
2j/M35NNHp95TaHdlCDnXY9qKHu+k92YQtN6Ba1+QFrX/r/R5Gfxni8CpNhEt3sU
g2m+j3Ex8JXJpLPesjvRvEP7yA4AQ8PpKF3XAWhpAoGBAN1T1vJM/cj9SU4tsdUu
b3g2HeVdTYGBbuqluRHLB9FE1B8tqYXn6Keq3v2LMJgsX4Lsp5Qx6xPzaNNgFLvG
CODQqTLqJbBP7NKtm0JLrE7fT4vuryzEAcdALRvI7RGLnnvvppnybJB9d3AMk8Iv
GaApuluyUsYxPbiVdoTi/zCxAoGADhoeV2UQUF/tuZWlvYJ1kWeP2KVBLN4LHSSC
FZxRYNUOorY5KyN+UVam3rijhwMJ21/0njBXnonS054hka3JT0iz63uAOV4s85BN
lIAAh4ZdK7tEGObBLfPhItM/ui7Jw6CxSAecAUOeYHUGJRDKVTEMtS8pU0VPFvcU
wt0H2SkCgYEAsPmBYiatI5rDhdpbmOezMFm3RgkgZFpxSPxpbNKF2XuEq4npE6yS
m+eeB670sGNo9AxyEz09+BxMmIg/MPmrPew4/ki00Lv0lNsxyOLWkDvgeDNC0HCd
2dygqZCGGimVIikmgH29yetab6rDLMfD/fZirBOj+PvZpa2r1NrcYy8=
-----END RSA PRIVATE KEY----- From the Ambari node, you should be able to `ssh localhost`. If you see a message like this: [root@hadoop01 ~]# ssh localhost
The authenticity of host 'localhost (<no hostip for proxy command>)' can't be established.
ECDSA key fingerprint is 78:42:69:fe:e5:90:22:5c:cb:a8:f7:74:18:9c:be:ab.
Are you sure you want to continue connecting (yes/no)? ... then say "yes" and then retry the installation from Ambari.
... View more
01-25-2017
05:23 AM
2 Kudos
I have a Kerberized cluster (HDP 2.5.3) with Solr (5.5.2) that's working well, except for the Banana dashboard.
With a Kerberos ticket, I'm able to access the Solr UI and run queries. Without a Kerberos ticket, the Solr UI returns `Authentication required` as expected. However, when I have a ticket and try to access the Banana dashboard the following error is returned:
Steve Loughran's excellent book, Kerberos on Hadoop, identifies the possible causes of a `request is a replay` error:
The KDC is seeing too many attempts by the caller to authenticate as a specific principal, assumes some kind of attack and rejects the request. This can happen if you have too many processes/ nodes all sharing the same principal. Fix: make sure you have service/_HOST@REALM principals for all the services, rather than simple service@REALM principals.
The timestamps of the systems are out of sync, so it looks like an old token be re-issued. Check them all, including that of the KDC, make sure NTP is working, etc, etc. All the Solr principals are host-specific and NTP is running everywhere. Has anyone seen seen this before? Or have suggestions why Banana and Kerberos aren't working well together?
... View more
Labels:
- Labels:
-
Apache Solr
12-15-2016
11:40 PM
1 Kudo
A typical use-case for this sort of data would be to recommend items to a customer, based on what similar customers have purchased. In 2001, Amazon introduced item-based collaborative filtering and it's still popular today. This short, and very accessible IEEE paper describes the technique. There's a good practical example of collaborative filtering in the Spark docs: https://spark.apache.org/docs/1.6.2/mllib-collaborative-filtering.html
... View more
12-14-2016
07:00 AM
1 Kudo
Let's create two Hive tables: table_a and table_b. table_a contains the column you want to aggregate, and has only one record per id (i.e. the key): hive> CREATE TABLE table_a (
> id STRING,
> quantity INT
> );
hive> INSERT INTO table_a VALUES (1, 30);
hive> INSERT INTO table_a VALUES (2, 20);
hive> INSERT INTO table_a VALUES (3, 10); table_b has duplicate id's: note that id=1 appears twice: hive> CREATE TABLE table_b (
> id STRING
> );
hive> INSERT INTO table_b VALUES (1);
hive> INSERT INTO table_b VALUES (1);
hive> INSERT INTO table_b VALUES (2);
hive> INSERT INTO table_b VALUES (3);
If we aggregate the quantity column in table_a, we see that the aggregated quantity is 60: hive> SELECT
> SUM(quantity)
> FROM table_a;
60 If we join the table_a and table_b together, you can see that the duplicate keys in table_b have caused there to be four rows, and not three: hive> SELECT
> *
> FROM table_a
> LEFT JOIN table_b
> ON table_a.id = table_b.id;
1 30 1
1 30 1
2 20 2
3 10 3 Since joins happen before aggregations, when we aggregate the quantity in table_a, the quantity for id=1 has been duplicated: hive> SELECT
> SUM(quantity)
> FROM table_a
> LEFT JOIN table_b
> ON table_a.id = table_b.id;
90 I suspect that's what's happening with your query.
... View more
12-14-2016
06:23 AM
The join happens before the aggregation. You're aggregating the result of the join, which has inflated the row count because there are duplicates.
... View more
12-14-2016
05:15 AM
Are there duplicates in the outgrn.out_id column?
... View more
12-13-2016
05:43 PM
2 Kudos
HBase column families have a time-to-live (TTL) property which, by default, is set to FOREVER. If you wanted to delete the HBase cell values a week after being inserted, you could set the TTL to 604800 (which is the number of seconds in a week: 60 * 60 * 24 * 7).
Here's an example:
Create a table where the column family has a TTL of 10 seconds: hbase(main):001:0> create 'test', {'NAME' => 'cf1', 'TTL' => 10}
0 row(s) in 2.5940 seconds
Put a record into that table: hbase(main):002:0> put 'test', 'my-row-key', 'cf1:my-col', 'my-value'
0 row(s) in 0.1420 seconds If we scan the table right away, we can see the record: hbase(main):003:0> scan 'test'
ROW COLUMN+CELL
my-row-key column=cf1:my-col, timestamp=1481650256841, value=my-value
1 row(s) in 0.0260 seconds 10 seconds later, the record has disappeared: hbase(main):004:0> scan 'test'
ROW COLUMN+CELL
0 row(s) in 0.0130 seconds So, perhaps you could use TTL to manage your data retention.
... View more
11-04-2016
03:26 AM
4 Kudos
The `SYSTEM.CATLOG` table contains a column called `KEY_SEQ`. If this contains an integer, it's a key column. E.g. SELECT column_name FROM system.catalog WHERE table_name = '{{ your table name }}' AND key_seq IS NOT NULL;
... View more
10-04-2016
12:58 PM
Thanks @Matt Burgess. That worked perfectly. Very much appreciated. @mkalyanpur: I'm running HDP 2.5.0.0 and HDF 2.0.0.0.
... View more
10-04-2016
06:32 AM
4 Kudos
I'm trying to stream data to Hive with the PutStreamingHive Nifi processor. I saw this post: https://community.hortonworks.com/questions/59411/how-to-use-puthivestreaming.html and can confirm that: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager is the transaction manager ACID transactions run compactor is enabled There are threads available for the compactor I created an ORC backed Hive table: CREATE TABLE `streaming_messages`(
`message` string,
etc...)
CLUSTERED BY (message) INTO 5 BUCKETS
STORED AS ORC
LOCATION
'hdfs://hadoop01.woolford.io:8020/apps/hive/warehouse/mydb.db/streaming_messages'
TBLPROPERTIES('transactional'='true');
I then created a `PutStreamingHive` processor that takes messages Avro messages and writes them to a Hive table: I notice that the Nifi processor uses Thrift to send data to Hive. There are some errors in nifi-app.log and hivemetastore.log. Stacktrace from nifi-app.log: 2016-10-03 23:40:25,348 ERROR [Timer-Driven Process Thread-8] o.a.n.processors.hive.PutHiveStreaming
org.apache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://10.0.1.12:9083', database='mydb', table='streaming_messages', partitionVals=[] }
at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:80) ~[nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:45) ~[nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.processors.hive.PutHiveStreaming.makeHiveWriter(PutHiveStreaming.java:827) [nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.processors.hive.PutHiveStreaming.getOrCreateWriter(PutHiveStreaming.java:738) [nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$4(PutHiveStreaming.java:462) [nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1880) ~[na:na]
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1851) ~[na:na]
at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:389) [nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) ~[nifi-api-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1064) ~[na:na]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) ~[na:na]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) ~[na:na]
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) ~[na:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_77]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[na:1.8.0_77]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_77]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
Caused by: org.apache.nifi.util.hive.HiveWriter$TxnBatchFailure: Failed acquiring Transaction Batch from EndPoint: {metaStoreUri='thrift://10.0.1.12:9083', database='mydb', table='streaming_messages', partitionVals=[] }
at org.apache.nifi.util.hive.HiveWriter.nextTxnBatch(HiveWriter.java:255) ~[nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:74) ~[nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
... 19 common frames omitted
Caused by: org.apache.hive.hcatalog.streaming.TransactionError: Unable to acquire lock on {metaStoreUri='thrift://10.0.1.12:9083', database='mydb', table='streaming_messages', partitionVals=[] }
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:578) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1]
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransaction(HiveEndPoint.java:547) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1]
at org.apache.nifi.util.hive.HiveWriter.nextTxnBatch(HiveWriter.java:252) ~[nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
... 20 common frames omitted
Caused by: org.apache.thrift.transport.TTransportException: null
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_lock(ThriftHiveMetastore.java:3906) ~[hive-metastore-1.2.1.jar:1.2.1]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.lock(ThriftHiveMetastore.java:3893) ~[hive-metastore-1.2.1.jar:1.2.1]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:1863) ~[hive-metastore-1.2.1.jar:1.2.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_77]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_77]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_77]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_77]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152) ~[hive-metastore-1.2.1.jar:1.2.1]
at com.sun.proxy.$Proxy148.lock(Unknown Source) ~[na:na]
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:573) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1]
... 22 common frames omitted
Stacktrace from hivemetastore.log: 2016-10-03 23:40:24,322 ERROR [pool-5-thread-114]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(195)) - java.lang.IllegalStateException: Unexpected DataOperationType: UNSET agentInfo=Unknown txnid:98201
at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:938)
at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:814)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5751)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:139)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:97)
at com.sun.proxy.$Proxy12.lock(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:11860)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:11844)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Do you have any suggestions to resolve this?
... View more
Labels:
- Labels:
-
Apache NiFi
10-03-2016
04:46 PM
That worked perfectly, @Bryan Bende. Very much appreciated.
... View more