About ganne

ganne · ‎06-20-2016

Thanks for letting me know this.

ganne · ‎06-15-2016

Often times, maintaining a hadoop cluster for longer periods on cloud is an expensive task. Sometimes engineers might also encounter situations of not having immediate access to cloud environment, in order to quickly spin up their own cluster and play arround. As an easy alternative, vagrant with virtual box as a provider, HDP cluster can be set up on your own laptop. Step 1 - Install prerequisites Download and install from here Vagrant. Download and install Oracle VirtualBox as the Vagrant Provider. Step 2 - Generation of Vagrantfile Create a working directory for Vagrant file generation and initiating the deployment vis vagrant $ mkdir hdp22 $ cd hdp22 Following command will generate the Vagrantfile in current directory. This file will define VMs that are to be on the cluster. $ vagrant init $ vi Vagrantfile Step 3 - Configuration of VMs in Vagrantfile Lets configure Vagrant to use CentOS 6.6/ CentOS 6.7 as the base box $ config.vm.box = "bento/centos-6.7" or $ config.vm.box = "chef/centos-6.6" The below script should be included in the Vagrantfile to allow some basic provisioning for VMs like 1. Install NTP service 2. Disable firewall, SElinux 3.(Optional) Install wget $script = <<SCRIPT sudo yum -y install ntp sudo chkconfig ntpd on sudo chkconfig iptables off sudo /etc/init.d/iptables stop sudo setenforce 0 sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config sudo sh -c 'echo "* soft nofile 10000" >> /etc/security/limits.conf' sudo sh -c 'echo "* hard nofile 10000" >> /etc/security/limits.conf' sudo sh -c 'echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag' sudo sh -c 'echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled' SCRIPT config.vm.provision "shell", inline: $script Step 4 - Configure the definition of VMs The following configurations will define 4 Virtual machines to be used in the HDP cluster, 1 Ambari server 1 Hadoop master 2 slaves The machines defined have below hostnames: 1. ambari1.mycluster 2. master1.mycluster 3. slave1.mycluster 4. slave2.mycluster # Ambari1 config.vm.define :ambari1 do |a1| a1.vm.hostname = "ambari1.mycluster" a1.vm.network :private_network, ip: "192.168.0.11" a1.vm.provider :virtualbox do |vb| vb.memory = "2048" end a1.vm.network "forwarded_port", guest: 8080, host: 8080 a1.vm.network "forwarded_port", guest: 80, host: 80 end # Master1 config.vm.define :master1 do |m1| m1.vm.hostname = "master1.mycluster" m1.vm.network :private_network, ip: "192.168.0.12" m1.vm.provider :virtualbox do |vb| vb.memory = "4096" end end # Slave1 config.vm.define :slave1 do |s1| s1.vm.hostname = "slave1.mycluster" s1.vm.network :private_network, ip: "192.168.0.21" s1.vm.provider :virtualbox do |vb| vb.memory = "2048" end end # Slave2 config.vm.define :slave2 do |s2| s2.vm.hostname = "slave2.mycluster" s2.vm.network :private_network, ip: "192.168.0.22" s2.vm.provider :virtualbox do |vb| vb.memory = "2048" end end Step 5 - Start the Machines and Install Ambari Server Vagrant will automatically run the provision defined in Vagrantfile by Shell Provisioner to start the Ambari server machine from Vagrant. And then SSH to the Ambari server $ vagrant up ambari1 $ vagrant ssh ambari1 As a root user, run the below commands # Install wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo -O /etc/yum.repos.d/ambari.repo yum -y install ambari-server sudo service ntpd start # Setup. There are several options to configure during setup. ambari-server setup # Start Ambari Server ambari-server start Add the following FQDN to each the /etc/hosts file on each VM. 192.168.0.11 ambari1.mycluster ambari1 192.168.0.12 master1.mycluster master1 192.168.0.21 slave1.mycluster slave1 192.168.0.22 slave2.mycluster slave2 Set up a password less SSHing from Ambari Node to all other nodes(VMs) $ ssh-keygen $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys Copy the ambari server's public key to other nodes' authorized keys to allow communication later. Step 6 - Deploy HDP Cluster We are ready to deploy a HDP cluster from Ambari Web UI. Because the UI is really simple, I would omit the screenshots here. Access http://192.168.0.11:8080/ from your laptop. The username and password is admin and admin respectively. Give a cluster name. Select the latest HDP version Input hostname of the VMs (one per line) and the SSH private key of Ambari server. SSH user should be vagrant . Accept the default options for rest of the wizard. Complete the wizard. It takes about 30m to finish up. Now we are all set and a 4 nodes HDP cluster ready on your local machine!

ganne · ‎06-11-2016

1. H2O is an open source in-memory solution from 0xdata for predictive analytics on big data. With familiar APIs like R and JSON, as well as common storage method of using HDFS, H2O can bring the ability to do advance analysis. Please refer to the below tutorial from Hortonworks to get started http://hortonworks.com/hadoop-tutorial/predictive-analytics-h2o-hortonworks-data-platform/ 2. Apache Spark's MLLib is a very popular one used in predictive analytics 3. scikit-learn is another open source Machine learning tool

ganne · ‎06-10-2016

This beautiful demo is courtesy of Vadim Vaks. It utilizes a Lambda architecture built using the Hortonworks Data Platform and Hortonworks Data Flow. The demo shows how a Telecom can manage customer device outages using predictive maintenance and a connected workforce. Overview: Customer devices that are simulated in this specific telecom usecase are SetTopBoxes(STB) in individual homes that might need assistance from a technician, when something goes wrong. Attributes associated with STB are: 1. SignalStrength 2. Internal Temperature 3. Status Location of technician is tracked and plotted on MapUI using the latitude and longitudes. Two cycles of operation of STBs are: 1. Normal Cycle: When status is normal and internal temperature of STB fluctuates up and down 2. Failure Cycle: When status is not normal and internal temperature of STBox incrementally goes high until 109 degrees. Step 1: Prerequisites for the Demo Set Up: For instructions to install this demo on an HDP 2.4 Sandbox,a good place to start is the README here DeviceMangerDemo 1. Clone the DeviceManagerDemo repository from here and follow the steps suggested below code: git clone https://github.com/vakshorton/DeviceManagerDemo.git cd DeviceManagerDemo ./install.sh 2. The install.sh handles the installation and starting of artifacts necessary for the demo onto the Sandbox Looking for the latest Hortonworks sandbox version Creating the NiFi service configuration, installing and starting it using the Ambari ReST API Importing the DeviceMangerDemo NiFi template, instantiating and starting the NiFi Flow using the NiFi ReST API TEMPLATEID=$(curl -v -F template=@"Nifi/template/DeviceManagerDemo.xml" -X POST http://sandbox.hortonworks.com:9090/nifi-api/controller/templates | grep -Po '<id>([a-z0-9-]+)' | grep -Po '>([a-z0-9-]+)' | grep -Po '([a-z0-9-]+)') REVISION=$(curl -u admin:admin -i -X GET http://sandbox.hortonworks.com:9090/nifi-api/controller/revision |grep -Po '\"version\":([0-9]+)' | grep -Po '([0-9]+)') curl -u admin:admin -i -H "Content-Type:application/x-www-form-urlencoded" -d "templateId=$TEMPLATEID&originX=100&originY=100&version=$REVISION" -X POST http://sandbox.hortonworks.com:9090/nifi-api/controller/process-groups/root/template-instance Starting the Kafka Ambari service using the Ambari ReST API and configuring the TechnicianEvent and DeviceEvents topics using the kafka-topics shell script /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper sandbox.hortonworks.com:2181 --replication-factor 1 --partitions 1 --topic DeviceEvents /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper sandbox.hortonworks.com:2181 --replication-factor 1 --partitions 1 --topic TechnicianEvent Changing YARN Container Memory Size -/var/lib/ambari-server/resources/scripts/configs.sh set sandbox.hortonworks.com Sandbox yarn-site "yarn.scheduler.maximum-allocation-mb""6144" Starting the HBase service using the Ambari ReST API Installing ad starting Docker service, download docker images, creating the working folder with the slider for MapUI, Starting the Storm service using the Ambari ReST API, and deploying the storm topology here storm jar /home/storm/DeviceMonitor-0.0.1-SNAPSHOT.jar com.hortonworks.iot.topology.DeviceMonitorTopology 3. Install.sh reboots the ambari-server, wait for that and then run the below steps cd DeviceManagerDemo ./startDemoServices.sh 4.The startDemoServices.sh should be run each time the Sandbox VM is (re)started, after all of the default Sandbox services come up successfully. It handles the initialization of all of the application-specific components of the demo. The script starts the following Amabari Services via ReST API Kafka NiFi Storm Docker UI Servlet and CometD Server on YARN using Slider HBase Step 2: Understanding the code, nifi processors and then navigate to the UI 1.First,make sure the kafka events are created by the install.sh script by doing below. You should see the two event names cd /usr/hdp/current/kafkabroker/bin/ ./kafkatopics.sh list zookeeper localhost:2181 2.The install.sh script that was run in the previous section creates and submits a storm topology named DeviceMonitor that has spouts and bolts Spouts: 1. DeviceSpout 2. Technician Spout Bolts: EnrichDeviceStatus.java IncidentDetector.java PersistTechnicianLocation.java PrintDeviceAlert.java PublishDeviceStatus.java What each of the spouts does is simply look for the status of the device('Normal' or not) and technician( 'Assigned' or not) and enqueue or emit the status to bolts for future event decisions. Bolts process the data based on the event type (device/technician) like publishing the device status and technician location, updating the Hbase tables, enriching the device status, intelligent incident detection, printing alerts, routing the technician etc. Various configurations are done in DeviceMonitorTopology (detailed code here )that set the bolts and spouts using several methods like TopologyBuilder() ,setBolt(), setSpout(), SpoutConfig() etc 3. Verify the storm topology is properly submitted, by going to the StormUI quick link on Ambari and you should see below: Spouts and Bolts on the Storm UI: 4. Inoder to see the events get simulated,Run the simulator jars from two different CLIs: cd DeviceManagerDemo java -jar DeviceSimulator-0.0.1-SNAPSHOT-jar-with-dependencies.jar Technician 1000 Simulation java -jar DeviceSimulator-0.0.1-SNAPSHOT-jar-with-dependencies.jar STB 1000 Simulation DeviceStatus and TechnicianLocation Events are generated by the jars: Technician: 5. After importing the Devicemangerdemo Nifi template, it look like below, with several processors connected and running the data flow in sequential fashion. 6.Some of the major decision making processors involved and by viewing their configurations,one can see the content in 'properties' tab as below: 1.RouteOnAttribute: ChannelTuneEventSearchIndex :${routingTarget:equalsIgnoreCase('ChannelTuneEventSearchIndex')} DeviceEvents :${routingTarget:equalsIgnoreCase('DeviceEvents')} TechnicianEvents :${routingTarget:equalsIgnoreCase('TechnicianEvents')} DeviceInterface :${routingTarget:equalsIgnoreCase('DeviceInterface')} 2.DeviceHTTPInterface: HTTP Method: POST RemoteURL: ${targetProtocol:append(${targetIpAddress}):append(':'):append(${targetPort}):append('/server/TechnicianService')} 3.TechnicianlocationEvents: These technician information is pushed to the corresponding Kafka topic. 4.DeviceStatusEvents : Device status and other related event data is pushed to the corresponding Kafka topic. 5. EvaluateJSONPath Several jsonpaths that are user-defined properties like channel_i,deviceId_s,eventTimeStamp_I etc are evaluated against the content of the flowfile and written to the corresponding attributes in the next step of dataflow. 7. Now navigate to the MapUI to see the car of the technician moving arround, which was initiated by Slider here http://sandbox.hortonworks.com:8090/MapUI/DeviceMap 8. DeviceMonitorNostradamus section of the code utilizes the spark streaming and prediction capabilities. The enriched technician data from HBase is streamed into a Spark data model to predict possible outage of a customer device and later publish the predictions to MapUI web application using CometD server. Conclusion: This telecom demo gives overview of an IoT data application scenario and exposes the power of hortonworks dataflow technologies like Nifi, Kafka,Storm along with HBase, Spark and Slider.

ganne · ‎06-10-2016

Can you try --map-column-hive option. This will Overridethe default mapping from SQL type to Hive type for configured columns Refer to the documentation here https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/using_sqoop_to_move_data_into_hive.html

ganne · ‎06-10-2016

Can you try --map-column-hive option. This will Overridethe default mapping from SQL type to Hive type for configured columns Refer to the documentation here https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/using_sqoop_to_move_data_into_hive.html

ganne · ‎06-10-2016

@Janos Matyas Its just an observation and was curious about any such case. Thanks for the response!

ganne · ‎06-07-2016

When you do a 'yum update', after you install some new software on one or more nodes on the cluster, it does upgrade the docker container was well. So will this not cause any issues like node failure from cloudbreak shell or CLI?

ganne · ‎06-07-2016

Import the below APIs and try in the lines of below: import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.client.Table; private static final TableName TABLENAME = TableName.valueOf(“NewTable”); Table table= null; Configuration config= HBaseConfiguration.create(); Connection connection == ConnectionFactory.createConnection(config); table = connection.getTable(TABLENAME);

ganne · ‎06-07-2016

Very well know way to handle JSON is to use JSON SerDe which originated from hcatalog. There is another interesting rather unconventional method to handle JSON data in HIVE. json_tuple and LATERAL VIEW. Table here only has one column that loads JSON data as a single string. json_tuple() is a User defined Table Function ( UDTF ) introduced in Hive 0.7. It takes a set of names (keys) and a JSON string, and returns a tuple of values using one function.In the words of ASF "A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias". Consider below is sample data line: {"user":{"location":"","id":1514008171,"name":"Auzzie Jet","screenname":"metalheadgrunge","geoenabled":false},"tweetmessage":"Anthrax - Black - Sunglasses hell yah\n http://t.co/qCNjba57Dm","createddate":"2013-06-20T12:08:44","geolocation":null} Table creation: CREATE EXTERNAL table tweets{ tweet STRING } LOAD data local inpath /pathtotxtfile/tweets.txt into table tweets My goal is to obtain the tweetmessage of the user: Auzzie Jet. Using json_tuple and lateral view, below will be the query on the json tweet data. select t2.name, t1.tweetmessage from tweets t LATERAL VIEW json_tuple(t.tweet, 'user', 'tweetmessage' ) t1 LATERAL VIEW json_tuple(t1.user, 'name', 'location') t2 where t2.name="Auzzie Jet"; Imagine the tweets being parsed as a JSON tree in a LATERAL VIEW using the utility json_tuple.The first instance gives us a virtual table with with two columns user and tweetmessage. Similar process is iterated or repeated to extract data from the next level of the JSON tree. This time it gives us another virtual table with the columns name and location. And then we query to ask for tweetmessage by the particular user. "The function json_tuple explodes a JSON node and return the child node values. The first argument is the node to explode. The rest of the arguments are the child node names." by Apache

Online	Offline
Last Visited	‎12-20-2020 09:03 PM

Member Since	‎04-27-2016 05:49 PM
Last Visited	‎12-20-2020 09:03 PM
Posts	61
Kudos received	52

Cloudera Community

Re: HDP 2.5 Sandbox Virtualbox Error: "vboxnet0 n...

Re: Predictive Analysis Tool

Re: import with sqoop smalldatetime from sql serve...

Re: Spinning up Hadoop HDP cluster on local machin...

Spinning up Hadoop HDP cluster on local machine us...

Re: Predictive Analysis Tool

Telecom DeviceManagerDemo

Re: import with sqoop smalldatetime from sql serve...

Re: import with sqoop smalldatetime from sql serve...

Re: Prevent host failure or Ambari inaccessibilit...

Prevent host failure or Ambari inaccessibility du...

Re: Simple way to create a new table using java in...

Importing and Querying JSON data in Hive