Member since
07-05-2016
42
Posts
32
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3637 | 02-13-2018 10:56 PM | |
1600 | 08-25-2017 05:25 AM | |
9613 | 03-01-2017 05:01 AM | |
5099 | 12-14-2016 07:00 AM | |
1225 | 12-13-2016 05:43 PM |
06-23-2021
02:41 AM
@Sunny93 as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.
... View more
02-26-2018
09:12 PM
@Alex Woolford Used option2 and set the Nifi to listen on that port. Works fine. Thanks!
... View more
10-17-2017
09:01 AM
3 Kudos
You should ask your azure administrator to run the "az ad sp create-for-rbac..." command for you. Only the admin users can assign roles. It will create an application, principal for application and assign the role for the principal. Then you can use the output of this command (it contains app-id) with cloudbreak.
... View more
08-25-2017
05:25 AM
1 Kudo
Zeppelin stores a lot of settings in interpreter.json The default dpi (dots per inch) for R plot is 72, hence the blurry plots. This value can be increased by adding a dpi property to Zeppelin's R render options. Search for the "zeppelin.R.render.options" key and add "dpi=300": "zeppelin.R.render.options": "out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F, dpi=300", You can see an example of the output below:
... View more
08-21-2017
04:05 PM
6 Kudos
haveibeenpwned has downloadable files that contains about 320 million password hashes that have been involved in known data breaches. This site has a search feature that allows you to check whether a password exists in the list of known breached passwords. From a security perspective, entering passwords into a public website is a very bad idea. Thankfully, the downloadable files make it possible to perform this analysis offline. Fast random access of a dataset that contains hundreds of millions of records is a great fit for HBase. Queries execute in a few milliseconds. In the example below, we'll load the data into HBase. We'll then use a few lines of Python to convert passwords into a SHA-1 hash and query HBase to see if they exist in the pwned list. On a cluster node, download the files: wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-1.0.txt.7z
wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-update-1.txt.7z
wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-update-2.txt.7z The files are in 7zip format which, on CentOS can be unzipped: 7za x pwned-passwords-1.0.txt.7z
7za x pwned-passwords-update-1.txt.7z
7za x pwned-passwords-update-2.txt.7z Unzipped, the raw data looks like this: [hdfs@hdp01 ~]$ head -n 3 pwned-passwords-1.0.txt
00000016C6C075173C163757BCEA8139D4CC69CF
00000042F053B3F16733DFB83D431126D64331FC
000003449AD45B0DB016B895EC6CEA92EA2F91BE Note that the hashes are in all caps. Now we create an HDFS location for these files and upload them: hdfs dfs -mkdir /data/pwned-hashes
hdfs dfs -copyFromLocal pwned-passwords-1.0.txt /data/pwned-hashes
hdfs dfs -copyFromLocal pwned-passwords-update-1.txt /data/pwned-hashes
hdfs dfs -copyFromLocal pwned-passwords-update-2.txt /data/pwned-hashes We can then create an external Hive table: CREATE EXTERNAL TABLE pwned_hashes (
sha1 STRING
)
ROW FORMAT DELIMITED
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/data/pwned-hashes'; Hive has storage handlers that enable us to query hive using the familiar SQL syntax, and benefit from the characteristics of the underlying database technology. In this case, we'll create an HBase backed Hive table: CREATE TABLE `pwned_hashes_hbase` (
`sha1` string,
`hash_exists` boolean)
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=':key,hash_exists:hash_exists',
'serialization.format'='1')
TBLPROPERTIES (
'hbase.mapred.output.outputtable'='pwned_hashes',
'hbase.table.name'='pwned_hashes') Note the second column, 'hash_exists', in the HBase backed table. It's necessary to do this because HBase is a columnar database and cannot return just a rowkey. Now we can simply insert the data into the HBase table using Hive: INSERT INTO pwned_hashes_hbase SELECT sha1, true FROM pwned_hashes; In order to query this HBase table, Python has an easy to use HBase library called HappyBase that relies on the thrift protocol. In order to use this, it's necessary to start thrift: /usr/hdp/2.6.1.0-129/hbase/bin/hbase-daemon.sh start thrift -p 9090 --infoport 9095 We wrote a small Python function that takes a password, converts it to an (upper case) SHA-1 hash, and then checks the HBase `pwned_hashes` table to see if it exists: import happybase
import hashlib
def pwned_check(password):
connection = happybase.Connection(host='hdp01.woolford.io', port=9090)
table = connection.table('pwned_hashes')
sha1 = hashlib.sha1(password).hexdigest().upper()
row = table.row(sha1)
if row:
return True
else:
return False For example: >>> pwned_check('G0bbleG0bble')
True
>>> pwned_check('@5$~ lPaQ5<.`')
False For folks who prefer Java, we also created a RESTful 'pwned-check' service using Spring Boot: https://github.com/alexwoolford/pwned-check We were surprised to find some of our own hard-to-guess passwords in this dataset. Thanks to @Timothy Spann for identifying the haveibeenpwned datasource. This was a fun micro-project.
... View more
Labels:
01-28-2019
11:09 AM
@Greg Keys I am having the same issue with hdf 3.3.1. I have checked the schema file and the input file as well. I have done what was mentioned by @Sriharsha Chintalapani Schema {
"namespace": "hdf.heaptrace.com",
"type": "record",
"name": "PatientField",
"fields": [
{"name": "Patient_name","type": "string"}
]
} JSON data {"Patient_name":"john"}
Please help !!! I have converted data from json to avro and then back again as well using avro tools.
... View more
08-08-2017
02:00 PM
1 Kudo
https://github.com/hortonworks/registry/blob/master/examples/schema-registry/avro/src/main/java/com/hortonworks/registries/schemaregistry/examples/avro/KafkaAvroSerDesApp.java
... View more
06-06-2017
03:04 PM
1 Kudo
No, there is presently no option to do what you're asking. HBase stores arbitrary bytes which means that data in any portion of the response object may generate invalid JSON. If you do chose to write some software to solve your issue, I would guess that the Apache HBase community would be accepting of some option/configuration that does add what you're asking for to the REST server.
... View more
04-05-2017
01:14 PM
@Namit Maheshwari Yes, there is a pattern for creating partition (yyyy-mm-dd) .. OK,your idea is , run the command and store the result and check for the existence of the partition ?? Is there any other simple way to check ?
... View more
09-04-2018
12:20 PM
org.apache.ambari.view.commons.exceptions.ServiceFormattedException at org.apache.ambari.view.commons.hdfs.UserService.homeDir(UserService.java:7 User home directory not found... Cluster is kerberized...please help
... View more