1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1850 | 04-03-2024 06:39 AM | |
| 2886 | 01-12-2024 08:19 AM | |
| 1594 | 12-07-2023 01:49 PM | |
| 2353 | 08-02-2023 07:30 AM | |
| 3246 | 03-29-2023 01:22 PM |
10-26-2017
06:47 AM
Thanks for your information. I think virtualenv venv. ./venv/bin/activate should be virtualenv venv
. ./venv/bin/activate
... View more
01-28-2017
04:50 PM
3 Kudos
Preparing a Raspberry PI to Run TensorFlow Image Recognition I can easily have a Python script that polls my webcam (use
official Raspberry Pi webcam) , calls TensorFlow and then sends the results to
NiFi via MQTT. You need to install Python MQTT Library (https://pypi.python.org/pypi/paho-mqtt/1.1) For setting up Python, Raspberry PI with Camera, see https://dzone.com/articles/picamera-ingest-real-time Raspberry Pi 3 B+ preparation Buy a good quality 16 GIG SD Card and from OSX, Run SD Formatter to Overwrite Format the device at FAT, download here: https://www.sdcard.org/downloads/formatter_4/. Download the BerryBoot image from here. Unzip it and then copy it to your complete SD card. For examples of RPi TensorFlow You Can Run: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/pi_examples/ You need to build tensorflow for pi, which took me over 4 hours. See: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/makefile https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/pi_examples/ Process: wget https://github.com/tensorflow/tensorflow/archive/master.zipapt-get install -y libjpeg-devcd tensorflow-mastertensorflow/contrib/makefile/download_dependencies.shsudo apt-get install -y autoconf automake libtool gcc-4.8
g++-4.8cd tensorflow/contrib/makefile/downloads/protobuf/./autogen.sh./configuremakesudo make installsudo ldconfig #
refresh shared library cachecd ../../../../..make -f tensorflow/contrib/makefile/Makefile HOST_OS=PI
TARGET=PI \ OPTFLAGS="-Os
-mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize" CXX=g++-4.8curl
https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015_stripped.zip
\-o /tmp/inception_dec_2015_stripped.zipunzip /tmp/inception_dec_2015_stripped.zip \-d tensorflow/contrib/pi_examples/label_image/data/make -f tensorflow/contrib/pi_examples/label_image/Makefile root@raspberrypi:/opt/demo/tensorflow-master#
tensorflow/contrib/pi_examples/label_image/gen/bin/label_image2017-01-28 01:46:48: I tensorflow/contrib/pi_examples/label_image/label_image.cc:144]
Loaded JPEG: 512x600x32017-01-28 01:46:50: W
tensorflow/core/framework/op_def_util.cc:332] Op
BatchNormWithGlobalNormalization is deprecated. It will cease to work in
GraphDef version 9. Use tf.nn.batch_normalization().2017-01-28 01:46:52: I
tensorflow/contrib/pi_examples/label_image/label_image.cc:378] Running model
succeeded!2017-01-28 01:46:52: I
tensorflow/contrib/pi_examples/label_image/label_image.cc:272] military uniform
(866): 0.6242942017-01-28 01:46:52: I
tensorflow/contrib/pi_examples/label_image/label_image.cc:272] suit (794):
0.04739812017-01-28 01:46:52: I
tensorflow/contrib/pi_examples/label_image/label_image.cc:272] academic gown
(896): 0.02809252017-01-28 01:46:52: I tensorflow/contrib/pi_examples/label_image/label_image.cc:272]
bolo tie (940): 0.01569552017-01-28 01:46:52: I
tensorflow/contrib/pi_examples/label_image/label_image.cc:272] bearskin (849):
0.0143348 It took over 4 hours to build. But only 4 seconds to run and gave good results for analyzing a picture of Computer Legend Grace Hopper. root@raspberrypi:/opt/demo/tensorflow-master#
tensorflow/contrib/pi_examples/label_image/gen/bin/label_image --help2017-01-28 01:51:26: E
tensorflow/contrib/pi_examples/label_image/label_image.cc:337]usage: tensorflow/contrib/pi_examples/label_image/gen/bin/label_imageFlags: --image="tensorflow/contrib/pi_examples/label_image/data/grace_hopper.jpg" string image to be processed --graph="tensorflow/contrib/pi_examples/label_image/data/tensorflow_inception_stripped.pb" string graph
to be executed --labels="tensorflow/contrib/pi_examples/label_image/data/imagenet_comp_graph_label_strings.txt" string name
of file containing labels --input_width=299 int32 resize image to this
width in pixels --input_height=299 int32 resize image to this height in pixels --input_mean=128 int32 scale pixel values to
this mean --input_std=128 int32 scale pixel values to this std deviation --input_layer="Mul" string name of input layer --output_layer="softmax" string name of output layer --self_test=false bool run a self test --root_dir="" string interpret image and graph file names relative
to this directory
... View more
Labels:
03-06-2018
05:02 PM
Check HCC for articles on connecting NiFi to Secure Phoenix. You must make sure you have permissions to the keytabs from NiFi
... View more
01-20-2017
09:47 PM
3 Kudos
You can run the attack library for OSX or Linux from an edge node or from outside the cluster. I ran against mine from my OSX laptop against my cluster that I had network access to. You should try to scan from inside your network, from an edge node and from a remote site on the Internet. You will need Python 2.7 or Python 3.x installed first. git clone git@github.com:CERT-W/hadoop-attack-library.git
pip install requests lxml You may need root or sudo access to install on your machine. One of the scanners hits the WebHDFS link that you may have seen a warning about. python hdfsbrowser.py timscluster
Beginning to test services accessibility using default ports ...
Testing service WebHDFS
[+] Service WebHDFS is available
Testing service HttpFS
[-] Exception during requesting the service
[+] Sucessfully retrieved 1 services
drwxrwxrwx hdfs:hdfs 2017-01-15T05:50:27+0000 /
drwxrwxrwx yarn:hadoop 2017-01-11T19:25:26+0000 app-logs /app-logs
drwxrwxrwx hdfs:hdfs 2016-12-21T23:12:40+0000 apps /apps
drwxrwxrwx yarn:hadoop 2016-09-15T21:02:30+0000 ats /ats
drwxrwxrwx root:hdfs 2016-12-21T23:08:34+0000 avroresults /avroresults
drwxrwxrwx hdfs:hdfs 2016-12-13T03:42:55+0000 banking /banking
To see how available your Hadoop configurations are available, you can use Hadoop Snooper. This is under: Tools\ Techniques\ and\ Procedures \ Getting\ the\ target\ environment\ configuration python hadoopsnooper.py timscluster -o test
Specified destination path does not exist, do you want to create it ? [y/N]y
[+] Creating configuration directory
[+] core-site.xml successfully created
[+] mapred-site.xml successfully created
[+] yarn-site.xml successfully created This downloads all those configuration files to a directory named test. These were not the full configuration files, but they pointed to correct internal servers and give an attacker more information. Another scan worth running is sqlmap. This tool will let you check various SQL tools in the system. SQLMap requires Python 2.6 or 2.7. ➜ projects git clone https://github.com/sqlmapproject/sqlmap.git sqlmap-dev
Cloning into 'sqlmap-dev'...
remote: Counting objects: 55560, done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 55560 (delta 22), reused 0 (delta 0), pack-reused 55519
Receiving objects: 100% (55560/55560), 47.25 MiB | 2.28 MiB/s, done.
Resolving deltas: 100% (42960/42960), done.
Checking connectivity... done.
➜ projects python sqlmap.py --update
➜ projects cd sqlmap-dev
➜ sqlmap-dev git:(master) python sqlmap.py --update
___
__H__
___ ___[.]_____ ___ ___ {1.1.1.14#dev}
|_ -| . [)] | .'| . |
|___|_ [']_|_|_|__,| _|
|_|V |_| http://sqlmap.org
[!] legal disclaimer: Usage of sqlmap for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to obey all applicable local, state and federal laws. Developers assume no liability and are not responsible for any misuse or damage caused by this program
[*] starting at 16:49:13
[16:49:13] [INFO] updating sqlmap to the latest development version from the GitHub repository
[16:49:13] [INFO] update in progress .
[16:49:14] [INFO] already at the latest revision 'f542e82'
[*] shutting down at 16:49:14
References: http://sqlmap.org/ http://www.slideshare.net/bunkertor/hadoop-security-54483815 http://tools.kali.org/ https://github.com/savio-code/hexorbase https://community.hortonworks.com/articles/73035/running-dns-and-domain-scanning-tools-from-apache.html
... View more
Labels:
02-09-2017
06:44 PM
Thank you! for future reference in zeppelin: you set this attribute in the interpreter configuration, not in the paragraph where the sql is being executed.
... View more
03-01-2018
11:35 PM
1 Kudo
@Timothy Spann @Matt Burgess Thanks for this very helpful query. in our case, the validation query seemed to have worked for the flow to function properly. For safe measures, we also put a retry queue on the PutHiveQL Processor.
... View more
01-15-2017
05:42 PM
2 Kudos
Raspberry PIs and other small devices often have cameras or can have camera's attached. Raspberry Pi's have cheap camera add-ons that can ingest still images and videos (https://www.raspberrypi.org/products/camera-module/). Using a simple Python script we can ingest images and then ingest them into our central Hadoop Data Lake. This is a nice simple use case for Connected Data Platforms with both Data in Motion and Data at Rest. This data can be processed in-line with Deep Learning Libraries like TensorFlow for image recognition and assessment. Using OpenCV and other tools we can process in-motion and look for issues like security breaches, leaks and other events.
The most difficult part is the Python code which reads from camera, adds a watermark, converts to bytes, sends to MQTT and then ftps to an FTP server. I do both since networking is always tricky. You could also add if it fails to connect to either, store to a directory on a mapped USB drive. Once network returns send it out, it would be easy to do that with MiniFi which could read that directory. Once the file lands into the MQTT broker or FTP server, NIFI pulls it and bring it into the flow. I first store to HDFS for our Data @ Rest permanent storage for future deep learning processing. I also run three processors to extra image metadata and then call jp2a to convert the image into an ASCII picture. ExecuteStreamCommand for Running jp2a The Output Ascii HDFS Directory of Uploaded Files Metadata extracted from the image An Example Imported Image Other Meta Data Meta Data Extracted A Converted JPG to ASCII Running JP2A on Images Stored in HDFS via WebHDFS REST API /opt/demo/jp2a-master/src/jp2a "http://hdfsnode:50070/webhdfs/v1/images/$@?op=OPEN" Python on RPI #!/usr/bin/python
import os
import datetime
import ftplib
import traceback
import math
import random, string
import base64
import json
import paho.mqtt.client as mqtt
import picamera
from time import sleep
from time import gmtime, strftime
packet_size=3000
def randomword(length):
return ''.join(random.choice(string.lowercase) for i in range(length))
# Create unique image name
img_name = 'pi_image_{0}_{1}.jpg'.format(randomword(3),strftime("%Y%m%d%H%M%S",gmtime()))
# Capture Image from Pi Camera
try:
camera = picamera.PiCamera()
camera.annotate_text = " Stored with Apache NiFi "
camera.capture(img_name, resize=(500,281))
pass
finally:
camera.close()
# MQTT
client = mqtt.Client()
client.username_pw_set("CloudMqttUserName","!MakeSureYouHaveAV@5&L0N6Pa55W0$4!")
client.connect("cloudmqttiothoster", 14162, 60)
f=open(img_name)
fileContent = f.read()
byteArr = bytearray(fileContent)
f.close()
message = '"image": {"bytearray":"' + byteArr + '"} } '
print client.publish("image",payload=message,qos=1,retain=False)
client.disconnect()
# FTP
ftp = ftplib.FTP()
ftp.connect("ftpserver", "21")
try:
ftp.login("reallyLongUserName", "FTP PASSWORDS SHOULD BE HARD")
ftp.storbinary('STOR ' + img_name, open(img_name, 'rb'))
finally:
ftp.quit()
# clean up sent file
os.remove(img_name)
References: https://community.hortonworks.com/repos/77987/rpi-picamera-mqtt-nifi.html?shortDescriptionMaxLength=140 https://github.com/bikash/RTNiFiStreamProcessors http://stackoverflow.com/questions/37499739/how-can-i-send-a-image-by-using-mosquitto https://www.raspberrypi.org/learning/getting-started-with-picamera/worksheet/ https://www.cloudmqtt.com/ https://developer.ibm.com/recipes/tutorials/sending-and-receiving-pictures-from-a-raspberry-pi-via-mqtt/ https://developer.ibm.com/recipes/tutorials/displaying-image-from-raspberry-pi-in-nodered-ui-hosted-on-bluemix/ https://www.raspberrypi.org/learning/getting-started-with-picamera/worksheet/ https://github.com/jpmens/twitter2mqtt http://www.ev3dev.org/docs/tutorials/sending-and-receiving-messages-with-mqtt/ https://github.com/njh/mqtt-http-bridge https://www.raspberrypi.org/learning/parent-detector/worksheet/ http://picamera.readthedocs.io/en/release-1.10/recipes1.html http://picamera.readthedocs.io/en/release-1.10/faq.html http://www.eclipse.org/paho/ http://picamera.readthedocs.io/en/release-1.10/recipes1.html#capturing-to-an-opencv-object https://github.com/cslarsen/jp2a https://www.raspberrypi.org/learning/getting-started-with-picamera/ https://www.raspberrypi.org/learning/tweeting-babbage/worksheet/ https://csl.name/jp2a/
... View more
01-12-2017
10:04 AM
4 Kudos
Some people say I must have a bot to read and reply to email at all crazy hours of the day. An awesome email assistant, well I decided to prototype it.
This is the first piece. After this I will add some Spark machine learning to intelligently reply to emails from a list of pretrained responses. With supervised learning it will learn what emails to send to who, based on Subject, From, Body Content, attachments, time of day, sender domain and many other variables.
For now, it just reads some emails and checks for a hard coded subject.
I could use this to trigger other processes, such as running a batch Spark job.
Since most people send and use HTML email (that's what Outlook, Outlook.com, Gmail do), I will send and receive HTML emails as to make it look more legit.
I could also run my fortune script and return that as my email content. Making me sound wise, or pull in a random selection of tweets about Hadoop or even recent news. Making the email very current and fresh.
Snippet Example of a Mixed Content Email Message (Attachments Removed to Save Space)
Return-Path: <x@example.com>
Delivered-To: nifi@example.com
Received: from x.x.net
by x.x.net (Dovecot) with LMTP id +5RhOfCcB1jpZQAAf6S19A
for <nifi@example.com>; Wed, 19 Oct 2016 12:19:13 -0400
Return-path: <x@example.com>
Envelope-to: nifi@example.com
Delivery-date: Wed, 19 Oct 2016 12:19:13 -0400
Received: from [x.x.x.x] (helo=smtp.example.com)
by x.example.com with esmtp (Exim)
id 1bwtaC-0006dd-VQ
for nifi@example.com; Wed, 19 Oct 2016 12:19:12 -0400
Received: from x.x.net ([x.x.x.x])
by x with bizsmtp
id xUKB1t0063zlEh401UKCnK; Wed, 19 Oct 2016 12:19:12 -0400
X-EN-OrigIP: 64.78.52.185
X-EN-IMPSID: xUKB1t0063zlEh401UKCnK
Received: from x.x.net (localhost [127.0.0.1])
(using TLSv1 with cipher AES256-SHA (256/256 bits))
(No client certificate requested)
by emg-ca-1-1.localdomain (Postfix) with ESMTPS id BEE9453F81
for <nifi@example.com>; Wed, 19 Oct 2016 09:19:10 -0700 (PDT)
Subject: test
MIME-Version: 1.0
x-echoworx-msg-id: e50ca00a-edc5-4030-a127-f5474adf4802
x-echoworx-emg-received: Wed, 19 Oct 2016 09:19:10.713 -0700
x-echoworx-message-code-hashed: 5841d9083d16bded28a3c4d33bc505206b431f7f383f0eb3dbf1bd1917f763e8
x-echoworx-action: delivered
Received: from 10.254.155.15 ([10.254.155.15])
by emg-ca-1-1 (JAMES SMTP Server 2.3.2) with SMTP ID 503
for <nifi@example.com>;
Wed, 19 Oct 2016 09:19:10 -0700 (PDT)
Received: from x.x.net (unknown [x.x.x.x])
(using TLSv1 with cipher AES256-SHA (256/256 bits))
(No client certificate requested)
by emg-ca-1-1.localdomain (Postfix) with ESMTPS id 6693053F86
for <nifi@example.com>; Wed, 19 Oct 2016 09:19:10 -0700 (PDT)
Received: from x.x.net (x.x.x.x) by
x.x.net (x.x.x.x) with Microsoft SMTP
Server (TLS) id 15.0.1178.4; Wed, 19 Oct 2016 09:19:09 -0700
Received: from x.x.x.net ([x.x.x.x]) by
x.x.x.net ([x.x.x.x]) with mapi id
15.00.1178.000; Wed, 19 Oct 2016 09:19:09 -0700
From: x x<x@example.com>
To: "nifi@example.com" <nifi@example.com>
Thread-Topic: test
Thread-Index: AQHSKiSFTVqN9ugyLEirSGxkMiBNFg==
Date: Wed, 19 Oct 2016 16:19:09 +0000
Message-ID: <D49AD137-3765-4F9A-BF98-C4E36D11FFD8@hortonworks.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [71.168.178.39]
x-source-routing-agent: Processed
Content-Type: multipart/related;
boundary="_004_D49AD13737654F9ABF98C4E36D11FFD8hortonworkscom_";
type="multipart/alternative"
--_004_D49AD13737654F9ABF98C4E36D11FFD8hortonworkscom_
Content-Type: multipart/alternative;
boundary="_000_D49AD13737654F9ABF98C4E36D11FFD8hortonworkscom_"
--_000_D49AD13737654F9ABF98C4E36D11FFD8hortonworkscom_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Python Script to Parse Email Messages
#!/usr/bin/env python
"""Unpack a MIME message into a directory of files."""
import json
import os
import sys
import email
import errno
import mimetypes
from optparse import OptionParser
from email.parser import Parser
def main():
parser = OptionParser(usage="""Unpack a MIME message into a directory of files.
Usage: %prog [options] msgfile
""")
parser.add_option('-d', '--directory',
type='string', action='store',
help="""Unpack the MIME message into the named
directory, which will be created if it doesn't already
exist.""")
opts, args = parser.parse_args()
if not opts.directory:
os.makedirs(opts.directory)
try:
os.mkdir(opts.directory)
except OSError as e:
# Ignore directory exists error
if e.errno != errno.EEXIST:
raise
msgstring = ''.join(str(x) for x in sys.stdin.readlines())
msg = email.message_from_string(msgstring)
headers = Parser().parsestr(msgstring)
response = {'To': headers['to'], 'From': headers['from'], 'Subject': headers['subject'], 'Received': headers['Received']}
print json.dumps(response)
counter = 1
for part in msg.walk():
# multipart/* are just containers
if part.get_content_maintype() == 'multipart':
continue
# Applications should really sanitize the given filename so that an
# email message can't be used to overwrite important files
filename = part.get_filename()
if not filename:
ext = mimetypes.guess_extension(part.get_content_type())
if not ext:
# Use a generic bag-of-bits extension
ext = '.bin'
filename = 'part-%03d%s' % (counter, ext)
counter += 1
fp = open(os.path.join(opts.directory, filename), 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
if __name__ == '__main__':
main()
mailnifi.sh
python mailnifi.py -d /opt/demo/email/"$@"
Python needs the email component for parsing the message, you can install via PIP.
pip install email
I am using Python 2.7, you could use a newer Python 3.x
Here is the flow:
For the final part of the flow, I read the files created by the parsing, load them to HDFS and delete from the file system using the standard GetFile.
Reference:
https://docs.python.org/2/library/email-examples.html
https://jsonpath.curiousconcept.com/
Files:
email-assistant-12-jan-2017.xml
... View more
Labels:
01-12-2017
09:01 AM
Protect Your Cloud Big Data Assets Step 1: Do not put anything into the cloud unless you have a CISO, Chieft Security Architect, Certified Cloud Administrator, full understanding of your PII and private data, a Lawyer to defend you against the coming lawsuits, full understanding of Hadoop, Hadoop Certified Administrators, a Hadoop premier support contract, a security plan, full understanding of your Hadoop architecture and layout. Step 2: Study all running services in Ambari. Step 3: Confirm and check all of your TCP/IP ports. Hadoop has a lot of them! Step 4: if you are not using a service, do not run it. Step 5: By default, disable all access to everything, always. Only open ports and access when something and someone critical cannot access them. Step 6: SSL, SSH, VPN and Encryption Everywhere. Step 7: Run Knox! Set it up correctly. Step 8: Run Kali and audit all your IPs and ports. Step 9: Use Kali hacking tools to attempt to access all your web ports, shells and other access points. Step 10: Run in a VPC Step 11: Setup security groups. Never open to 0.0.0.0 or all ports or all IPs!?!??!?!!! Step 12: If this seems too hard, don't run in the cloud. Step 14: Step 13 is unlucky, skip that one. Step 15: Read all the recommended security documentation and use it. Step 16: Kerberize everything. Step 17: Run Metron My recommendation is get a professional services contract with an experience Hadoop organization or use something like Microsoft HDInsight or HDC that is managed. TCP/IP Ports 50070
: Name Node Web UI 50470
: Name Node HTTPS Web UI 8020,
8022, 9000 : Name Node via HDFS 50075
: Data Node(s) WebUI 50475
: Data Node(s) HTTPS Web UI 50090
: Secondary Name Node 60000
: HBase Master 8080
: HBase REST 9090
: Thrift Server 50111
: WebHCat 8005
: Sqoop2 2181:
Zookeeper 9010:
Zookeeper JMX 50020 50010 50030 8021 50060 51111 9083 10000, 60010, 60020, 60030, 2888,
3888, 8660, 8661, 8662, 8663, 8660, 8651, 3306,
80, 8085, 1004, 1006, 8485, 8480, 2049, 4242,14000,
14001, 8021, 9290, 50060, 8032, 8030, 8031, 8033,
8088, 8040, 8042, 8041, 10020, 13562, 19888, 9090,
9095, 9083, 16000, 12000, 12001, 3181, 4181, 8019,
9010, 8888, 11000, 11001, 7077, 7078, 18080, 18081, 50100 There's more of these if you are also running your own visualization tools, other data websites, other tools, Oracle, SQL Server, mail, NiFi, Druid, etc... Reference http://www.slideshare.net/bunkertor/hadoop-security-54483815 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_installing_manually_book/content/set_up_validate_knox_gateway_installation.html https://aws.amazon.com/articles/1233/ http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html https://www.quora.com/What-are-the-best-practices-in-hardening-Amazon-EC2-instance https://stratumsecurity.com/2012/12/03/practical-tactical-cloud-security-ec2/ http://hortonworks.com/solutions/security-and-governance/ http://metron.incubator.apache.org/
... View more
Labels:
01-12-2017
06:11 AM
1 Kudo
Protect Your Cloud Big Data Assets Step 1: Do not put anything into the cloud unless you have a CISO, Chieft Security Architect, Certified Cloud Administrator, full understanding of your PII and private data, a Lawyer to defend you against the coming lawsuits, full understanding of Hadoop, Hadoop Certified Administrators, a Hadoop premier support contract, a security plan, full understanding of your Hadoop architecture and layout. Step 2: Study all running services in Ambari. Step 3: Confirm and check all of your TCP/IP ports. Hadoop has a lot of them! Step 4: if you are not using a service, do not run it. Step 5: By default, disable all access to everything, always. Only open ports and access when something and someone critical cannot access them. Step 6: SSL, SSH, VPN and Encryption Everywhere. Step 7: Run Knox! Set it up correctly. Step 8: Run Kali and audit all your IPs and ports. Step 9: Use Kali hacking tools to attempt to access all your web ports, shells and other access points. Step 10: Run in a VPC Step 11: Setup security groups. Never open to 0.0.0.0 or all ports or all IPs!?!??!?!!! Step 12: If this seems too hard, don't run in the cloud. Step 14: Step 13 is unlucky, skip that one. Step 15: Read all the recommended security documentation and use it. Step 16: Kerberize everything. Step 17: Run Metron My recommendation is get a professional services contract with an experience Hadoop organization or use something like Microsoft HDInsight or HDC that is managed. Reference http://www.slideshare.net/bunkertor/hadoop-security-54483815 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_installing_manually_book/content/set_up_validate_knox_gateway_installation.html https://aws.amazon.com/articles/1233/ http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html https://www.quora.com/What-are-the-best-practices-in-hardening-Amazon-EC2-instance https://stratumsecurity.com/2012/12/03/practical-tactical-cloud-security-ec2/ http://hortonworks.com/solutions/security-and-governance/ http://metron.incubator.apache.org/
... View more