About ahadjidj

ahadjidj · ‎04-25-2016

Hi @Revathy Mourouguessane, You can use IsEmpty to check if A1 is empty or not. Try something like this grouped = COGROUP ..... ; filtered = FILTER grouped BY not IsEmpty($2); DUMP filtered; Here's an example that shows how this work for something similar: cat > owners.csv adam,cat adam,dog alex,fish david,horse alice,cat steve,dog cat > pets.csv nemo,fish fido,dog rex,dog paws,cat wiskers,cat owners = LOAD 'owners.csv' USING PigStorage(',') AS (owner:chararray,animal:chararray); pets = LOAD 'pets.csv' USING PigStorage(',') AS (name:chararray,animal:chararray); grouped = COGROUP owners BY animal, pets by animal; filtered = FILTER grouped BY not IsEmpty($2); DUMP grouped; (cat,{(alice,cat),(adam,cat)},{(wiskers,cat),(paws,cat)}) (dog,{(steve,dog),(adam,dog)},{(rex,dog),(fido,dog)}) (horse,{(david,horse)},{}) (fish,{(alex,fish)},{(nemo,fish)}) DUMP filtered; (cat,{(alice,cat),(adam,cat)},{(wiskers,cat),(paws,cat)}) (dog,{(steve,dog),(adam,dog)},{(rex,dog),(fido,dog)}) (fish,{(alex,fish)},{(nemo,fish)})

ahadjidj · ‎04-22-2016

Hi @AKILA VEL, Please check this tutorial on how you can do a wordcount with Spark on HDP 2.3: http://fr.hortonworks.com/hadoop-tutorial/a-lap-around-apache-spark/ Section 1 shows how to upgrade Spark to 1.6 version. You can ignore it and go directly to section 2. I hope this will help you.

ahadjidj · ‎04-21-2016

Can you delete this question please since it's a duplicate. Thanks

ahadjidj · ‎04-21-2016

Hi @Klaus Lucas, The VM has Ambari installed and configured so you should get Ambari UI at port 8080. Can you check your VM settings (port redirection, network, etc) and see if you can get access to Ambari ?

ahadjidj · ‎03-29-2016

Hi @Vadim, OpenCV is famous for image processing in general. They have several tools for image and face recognition. Here is an example of how to do face recognition with OpenCV: tutorial. In terms of integration with Hadoop, there's a framework called HIPI developed by University of Virginia for leveraging HDFS and MapReduce for large scale image processing. This framework supports OpenCV too. Finally, for image processing in motion, you can use HDF with an OpenCV processor like the one published here

ahadjidj · ‎03-16-2016

Hi @Lubin Lemarchand Try to change the parameter through Ambari. Go to HDFS -> Config and search for dfs.permissions.superusergroup Ambari stores the configuration in a database which is the truth of configuration. If you directly modify configuration files that are managed by Ambari, it will update the file and delete your modification at service restart. See this link doc

ahadjidj · ‎03-06-2016

@Abha R Panchal What user are you currently logged in as ? the user dev_maria doesn't have admin access so you will not have the add service button. To add services, you have to log in with admin. The admin user has been deactivated in HDP 2.4 sandbox. To activate it use the following command: ambari-admin-password-reset

ahadjidj · ‎03-05-2016

@Kyle Prins The sandbox gives you an easy way to have a working Hadoop installation in a VM. If you need a multi nodes cluster my advice is to install an HDP cluster by yourself. This way, you will understand what have been installed and how it was configured. Use Ambari for the installation, it's straightforward and quick : http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Installing_HDP_AMB/content/index.html If you want to have all nodes as VMs in your local machine, you can use Vagrant too. Look at these links to have an idea on how to do it http://uprush.github.io/hdp/2014/12/29/hdp-cluster-on-your-laptop/ and https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide

ahadjidj · ‎03-05-2016

@vinay kumar Maybe you have problem in disk partitioning. Can you check how much space you have allocated for partitions used by HDP? Here's a link for partitioning recommendations http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_cluster-planning-guide/content/ch_partitioning_chapter.html

ahadjidj · ‎03-04-2016

Hi @Prakash Punj You can use NiFi to supervise a directory and ingest each new file to HDFS (GetFile and PutHDFS processors). https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GetFile/index.html You can do Spark in a browser with Zeppelin. You can have it in Ambari with the Zeppelin view. Some tutorials here http://hortonworks.com/hadoop/zeppelin/#tutorials To avoid a SPOF you need HDFS HA. Federation is having multiple NNs for managing very big clusters and reducing the stress on a single NN. In Ambari you can have admin users and simple users. Simple users have less power in Ambari.

Online	Offline
Last Visited	‎08-19-2019 05:07 AM

Member Since	‎01-11-2016 06:11 PM
Last Visited	‎08-19-2019 05:07 AM
Posts	355
Kudos received	230

Cloudera Community

Re: How to access NIFI Process Group variable in E...

Re: GETSFTP with NiFi cluster

Re: how is Kafka different from Mosquitto(MQTT) ?

Re: Whitelisting using LookupAttribute

Re: Is there any ways if we can schedule or trigge...

Re: pig - Filter output of cogroup having NULL

Re: How to run the SparkWordCount.scala file in hd...

Re: Tag based policies with Apache Ranger and Apac...

Re: Tutorial: Tag based policies with Apache Range...

Re: What frameworks should be used for complex Ima...

Re: How to create superuser same as hdfs ?

Re: Unable to add service using Ambari Action

Re: HDP 2.3 Sandbox adding nodes fails

Re: Configure Storage capacity of Hadoop cluster

Re: Security / Operational best practices question