Member since
04-27-2016
61
Posts
61
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5221 | 09-19-2016 05:42 PM | |
1954 | 06-11-2016 06:41 AM | |
4744 | 06-10-2016 05:17 PM |
06-20-2016
08:52 PM
Thanks for letting me know this.
... View more
06-15-2016
01:21 AM
5 Kudos
Often times, maintaining a hadoop cluster for longer periods on cloud is an expensive task. Sometimes engineers might also encounter situations of not having immediate access to cloud environment, in order to quickly spin up their own cluster and play arround. As an easy alternative, vagrant with virtual box as a provider, HDP cluster can be set up on your own laptop. Step 1 - Install prerequisites
Download and install from here Vagrant.
Download and install Oracle VirtualBox as the Vagrant Provider. Step 2 - Generation of Vagrantfile
Create a working directory for Vagrant file generation and initiating the deployment vis vagrant $ mkdir hdp22
$ cd hdp22 Following command will generate the Vagrantfile in current directory. This file will define VMs that are to be on the cluster. $ vagrant init
$ vi Vagrantfile Step 3 - Configuration of VMs in Vagrantfile Lets configure Vagrant to use CentOS 6.6/ CentOS 6.7 as the base box $ config.vm.box = "bento/centos-6.7" or
$ config.vm.box = "chef/centos-6.6" The below script should be included in the Vagrantfile to allow some basic provisioning for VMs like 1. Install NTP service 2. Disable firewall, SElinux 3.(Optional) Install wget $script = <<SCRIPT
sudo yum -y install ntp
sudo chkconfig ntpd on
sudo chkconfig iptables off
sudo /etc/init.d/iptables stop
sudo setenforce 0
sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
sudo sh -c 'echo "* soft nofile 10000" >> /etc/security/limits.conf'
sudo sh -c 'echo "* hard nofile 10000" >> /etc/security/limits.conf'
sudo sh -c 'echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag'
sudo sh -c 'echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled'
SCRIPT
config.vm.provision "shell", inline:
$script
Step 4 - Configure the definition of VMs The following configurations will define 4 Virtual machines to be used in the HDP cluster, 1 Ambari server 1 Hadoop master 2 slaves The machines defined have below hostnames: 1. ambari1.mycluster 2. master1.mycluster 3. slave1.mycluster 4. slave2.mycluster # Ambari1
config.vm.define :ambari1 do |a1|
a1.vm.hostname = "ambari1.mycluster"
a1.vm.network :private_network, ip: "192.168.0.11"
a1.vm.provider :virtualbox do |vb|
vb.memory = "2048"
end
a1.vm.network "forwarded_port", guest: 8080, host: 8080
a1.vm.network "forwarded_port", guest: 80, host: 80
end
# Master1
config.vm.define :master1 do |m1|
m1.vm.hostname = "master1.mycluster"
m1.vm.network :private_network, ip: "192.168.0.12"
m1.vm.provider :virtualbox do |vb|
vb.memory = "4096"
end
end
# Slave1
config.vm.define :slave1 do |s1|
s1.vm.hostname = "slave1.mycluster"
s1.vm.network :private_network, ip: "192.168.0.21"
s1.vm.provider :virtualbox do |vb|
vb.memory = "2048"
end
end
# Slave2
config.vm.define :slave2 do |s2|
s2.vm.hostname = "slave2.mycluster"
s2.vm.network :private_network, ip: "192.168.0.22"
s2.vm.provider :virtualbox do |vb|
vb.memory = "2048"
end
end Step 5 - Start the Machines and Install Ambari Server Vagrant will automatically run the provision defined in Vagrantfile by Shell Provisioner to start the Ambari server machine from Vagrant. And then SSH to the Ambari server $ vagrant up ambari1
$ vagrant ssh ambari1 As a root user, run the below commands # Install
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
yum -y install ambari-server
sudo service ntpd start
# Setup. There are several options to configure during setup.
ambari-server setup
# Start Ambari Server
ambari-server start Add the following FQDN to each the /etc/hosts file on each VM. 192.168.0.11 ambari1.mycluster ambari1
192.168.0.12 master1.mycluster master1
192.168.0.21 slave1.mycluster slave1
192.168.0.22 slave2.mycluster slave2 Set up a password less SSHing from Ambari Node to all other nodes(VMs) $ ssh-keygen
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys Copy the ambari server's public key to other nodes' authorized keys to allow communication later. Step 6 - Deploy HDP Cluster We are ready to deploy a HDP cluster from Ambari Web UI. Because the UI is really simple, I would omit the screenshots here.
Access http://192.168.0.11:8080/ from your laptop. The username and password is admin and admin respectively. Give a cluster name. Select the latest HDP version Input hostname of the VMs (one per line) and the SSH private key of Ambari server. SSH user should be vagrant . Accept the default options for rest of the wizard. Complete the wizard. It takes about 30m to finish up. Now we are all set and a 4 nodes HDP cluster ready on your local machine!
... View more
Labels:
06-11-2016
06:41 AM
2 Kudos
1. H2O is an open source in-memory solution from 0xdata for predictive analytics on big data. With familiar APIs like R and JSON, as well as common storage method of using HDFS, H2O can bring the ability to do advance analysis. Please refer to the below tutorial from Hortonworks to get started http://hortonworks.com/hadoop-tutorial/predictive-analytics-h2o-hortonworks-data-platform/ 2. Apache Spark's MLLib is a very popular one used in predictive analytics 3. scikit-learn is another open source Machine learning tool
... View more
06-10-2016
10:39 PM
5 Kudos
This beautiful demo is courtesy of Vadim Vaks. It utilizes a Lambda architecture built using the Hortonworks Data Platform and Hortonworks Data Flow. The demo shows how a Telecom can manage customer device outages using predictive maintenance and a connected workforce. Overview: Customer devices that are simulated in this specific telecom usecase are SetTopBoxes(STB) in individual homes that might need assistance from a technician, when something goes wrong. Attributes associated with STB are: 1. SignalStrength 2. Internal Temperature 3. Status Location of technician is tracked and plotted on MapUI using the latitude and longitudes. Two cycles of operation of STBs are: 1. Normal Cycle: When status is normal and internal temperature of STB fluctuates up and down 2. Failure Cycle: When status is not normal and internal temperature of STBox incrementally goes high until 109 degrees.
Step 1: Prerequisites for the Demo Set Up:
For instructions to install this demo on an HDP 2.4 Sandbox,a good place to start is the README here DeviceMangerDemo 1. Clone the DeviceManagerDemo repository from here and follow the steps suggested below code: git clone https://github.com/vakshorton/DeviceManagerDemo.git
cd DeviceManagerDemo
./install.sh 2. The install.sh handles the installation and starting of artifacts necessary for the demo onto the Sandbox
Looking for the latest Hortonworks sandbox version
Creating the NiFi service configuration, installing and starting it using the Ambari ReST API
Importing the DeviceMangerDemo NiFi template, instantiating and starting the NiFi Flow using the NiFi ReST API
TEMPLATEID=$(curl -v -F template=@"Nifi/template/DeviceManagerDemo.xml" -X POST http://sandbox.hortonworks.com:9090/nifi-api/controller/templates | grep -Po '<id>([a-z0-9-]+)' | grep -Po '>([a-z0-9-]+)' | grep -Po '([a-z0-9-]+)')
REVISION=$(curl -u admin:admin -i -X GET http://sandbox.hortonworks.com:9090/nifi-api/controller/revision |grep -Po '\"version\":([0-9]+)' | grep -Po '([0-9]+)')
curl -u admin:admin -i -H "Content-Type:application/x-www-form-urlencoded" -d "templateId=$TEMPLATEID&originX=100&originY=100&version=$REVISION" -X POST http://sandbox.hortonworks.com:9090/nifi-api/controller/process-groups/root/template-instance
Starting the Kafka Ambari service using the Ambari ReST API and configuring the TechnicianEvent and DeviceEvents topics using the kafka-topics shell script
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper sandbox.hortonworks.com:2181 --replication-factor 1 --partitions 1 --topic DeviceEvents
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper sandbox.hortonworks.com:2181 --replication-factor 1 --partitions 1 --topic TechnicianEvent
Changing YARN Container Memory Size -/var/lib/ambari-server/resources/scripts/configs.sh set sandbox.hortonworks.com Sandbox yarn-site "yarn.scheduler.maximum-allocation-mb""6144" Starting the HBase service using the Ambari ReST API Installing ad starting Docker service, download docker images, creating the working folder with the slider for MapUI, Starting the Storm service using the Ambari ReST API, and deploying the storm topology here storm jar /home/storm/DeviceMonitor-0.0.1-SNAPSHOT.jar com.hortonworks.iot.topology.DeviceMonitorTopology 3. Install.sh reboots the ambari-server, wait for that and then run the below steps cd DeviceManagerDemo
./startDemoServices.sh 4.The startDemoServices.sh should be run each time the Sandbox VM is (re)started, after all of the default Sandbox services come up successfully. It handles the initialization of all of the application-specific components of the demo. The script starts the following Amabari Services via ReST API
Kafka
NiFi
Storm
Docker
UI Servlet and CometD Server on YARN using Slider
HBase
Step 2: Understanding the code, nifi processors and then navigate to the UI 1.First,make sure the kafka events are created by the install.sh script by doing below. You should see the two event names
cd /usr/hdp/current/kafkabroker/bin/
./kafkatopics.sh list zookeeper localhost:2181 2.The install.sh script that was run in the previous section creates and submits a storm topology named DeviceMonitor that has spouts and bolts Spouts: 1. DeviceSpout 2. Technician Spout Bolts:
EnrichDeviceStatus.java
IncidentDetector.java
PersistTechnicianLocation.java
PrintDeviceAlert.java
PublishDeviceStatus.java What each of the spouts does is simply look for the status of the device('Normal' or not) and technician( 'Assigned' or not) and enqueue or emit the status to bolts for future event decisions. Bolts process the data based on the event type (device/technician) like publishing the device status and technician location, updating the Hbase tables, enriching the device status, intelligent incident detection, printing alerts, routing the technician etc. Various configurations are done in DeviceMonitorTopology (detailed code here )that set the bolts and spouts using several methods like TopologyBuilder() ,setBolt(), setSpout(), SpoutConfig() etc 3. Verify the storm topology is properly submitted, by going to the StormUI quick link on Ambari and you should see below: Spouts and Bolts on the Storm UI: 4. Inoder to see the events get simulated,Run the simulator jars from two different CLIs: cd DeviceManagerDemo
java -jar DeviceSimulator-0.0.1-SNAPSHOT-jar-with-dependencies.jar Technician 1000 Simulation
java -jar DeviceSimulator-0.0.1-SNAPSHOT-jar-with-dependencies.jar STB 1000 Simulation DeviceStatus and TechnicianLocation Events are generated by the jars: Technician: 5. After importing the Devicemangerdemo Nifi template, it look like below, with several processors connected and running the data flow in sequential fashion.
6.Some of the major decision making processors involved and by viewing their configurations,one can see the content in 'properties' tab as below:
1.RouteOnAttribute:
ChannelTuneEventSearchIndex :${routingTarget:equalsIgnoreCase('ChannelTuneEventSearchIndex')}
DeviceEvents :${routingTarget:equalsIgnoreCase('DeviceEvents')}
TechnicianEvents :${routingTarget:equalsIgnoreCase('TechnicianEvents')}
DeviceInterface :${routingTarget:equalsIgnoreCase('DeviceInterface')}
2.DeviceHTTPInterface:
HTTP Method: POST
RemoteURL: ${targetProtocol:append(${targetIpAddress}):append(':'):append(${targetPort}):append('/server/TechnicianService')}
3.TechnicianlocationEvents: These technician information is pushed to the corresponding Kafka topic. 4.DeviceStatusEvents : Device status and other related event data is pushed to the corresponding Kafka topic. 5. EvaluateJSONPath Several jsonpaths that are user-defined properties like channel_i,deviceId_s,eventTimeStamp_I etc are evaluated against the content of the flowfile and written to the corresponding attributes in the next step of dataflow. 7. Now navigate to the MapUI to see the car of the technician moving arround, which was initiated by Slider here http://sandbox.hortonworks.com:8090/MapUI/DeviceMap 8. DeviceMonitorNostradamus section of the code utilizes the spark streaming and prediction capabilities. The enriched technician data from HBase is streamed into a Spark data model to predict possible outage of a customer device and later publish the predictions to MapUI web application using CometD server. Conclusion: This telecom demo gives overview of an IoT data application scenario and exposes the power of hortonworks dataflow technologies like Nifi, Kafka,Storm along with HBase, Spark and Slider.
... View more
06-10-2016
05:17 PM
1 Kudo
Can you try --map-column-hive option. This will Overridethe default mapping from SQL type to Hive type for configured columns Refer to the documentation here https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/using_sqoop_to_move_data_into_hive.html
... View more
06-10-2016
05:16 PM
Can you try --map-column-hive option. This will Overridethe default mapping from SQL type to Hive type for configured columns Refer to the documentation here https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/using_sqoop_to_move_data_into_hive.html
... View more
06-10-2016
05:12 PM
@Janos Matyas Its just an observation and was curious about any such case. Thanks for the response!
... View more
06-07-2016
06:00 PM
When you do a 'yum update', after you install some new software on one or more nodes on the cluster, it does upgrade the docker container was well. So will this not cause any issues like node failure from cloudbreak shell or CLI?
... View more
Labels:
- Labels:
-
Docker
-
Hortonworks Cloudbreak
06-07-2016
05:45 PM
Import the below APIs and try in the lines of below: import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.client.Table; private static final TableName TABLENAME = TableName.valueOf(“NewTable”); Table table= null; Configuration config= HBaseConfiguration.create(); Connection connection == ConnectionFactory.createConnection(config); table = connection.getTable(TABLENAME);
... View more
06-07-2016
03:33 AM
7 Kudos
Very well know way to handle JSON is to use JSON SerDe which originated from hcatalog. There is another interesting rather unconventional method to handle JSON data in HIVE. json_tuple and LATERAL VIEW. Table here only has one column that loads JSON data as a single string. json_tuple() is a User defined Table Function ( UDTF ) introduced in Hive 0.7. It takes a set of names (keys) and a JSON string, and returns a tuple of values using one function.In the words of ASF "A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias". Consider below is sample data line: {"user":{"location":"","id":1514008171,"name":"Auzzie Jet","screenname":"metalheadgrunge","geoenabled":false},"tweetmessage":"Anthrax - Black - Sunglasses hell yah\n http://t.co/qCNjba57Dm","createddate":"2013-06-20T12:08:44","geolocation":null} Table creation: CREATE EXTERNAL table tweets{ tweet STRING } LOAD data local inpath /pathtotxtfile/tweets.txt into table tweets My goal is to obtain the tweetmessage of the user: Auzzie Jet. Using json_tuple and lateral view, below will be the query on the json tweet data. select t2.name, t1.tweetmessage from tweets t LATERAL VIEW json_tuple(t.tweet, 'user', 'tweetmessage' ) t1
LATERAL VIEW json_tuple(t1.user, 'name', 'location') t2 where t2.name="Auzzie Jet"; Imagine the tweets being parsed as a JSON tree in a LATERAL VIEW using the utility json_tuple.The first instance gives us a virtual table with with two columns user and tweetmessage. Similar process is iterated or repeated to extract data from the next level of the JSON tree. This time it gives us another virtual table with the columns name and location. And then we query to ask for tweetmessage by the particular user. "The function json_tuple explodes a JSON node and return the child node values. The first argument is the node to explode. The rest of the arguments are the child node names." by Apache
... View more
Labels: