About ahadjidj

ahadjidj · ‎05-07-2016

In documentation page for "Configure Hive and HiveServer2 for Tez" there are two properties that looks similar to me: tez.queue.name: property to specify which queue will be used for Hive-on-Tez jobs. hive.server2.tez.default.queues: A list of comma separated values corresponding to YARN queues of the same name. When HiveServer2 is launched in Tez mode, this configuration needs to be set for multiple Tez sessions to run in parallel on the cluster. The only difference that I see is that when using "hive.server2.tez.default.queues" we can specify several queues so I guess jobs will be distributed over these queues. Hence, if we need all Hive jobs running in one queue we should use "tez.queue.name". Am I missing something here ?

ahadjidj · ‎05-07-2016

Hi @Veera B. Budhi, Job by job approach: One solution to your problem is to specify the queue to use when submitting your Spark job or when you connect to hive. When submitting your Spark job you can specify the queue by --queue like in this example $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 --queue SparkQueue lib/spark-examples*.jar 10 To specify the queue at connection time to HS2: beeline -u "jdbc:hive2://sandbox.hortonworks.com:10000/default?tez.queue.name=HiveQueue" -n it1 -p it1-d org.apache.hive.jdbc.HiveDriver Or you can set the queue after you are connected using set tez.queue.name=HiveQueue; beeline -u "jdbc:hive2://sandbox.hortonworks.com:10000/default" -n it1 -p it1-d org.apache.hive.jdbc.HiveDriver >set tez.queue.name=HiveQueue; Change default queue: The second approach would be to specify a default queue for Hive or Spark to use. To do it for Spark set spark.yarn.queue to SparkQueue instead of default in Ambari To do this for Hive, you can add tez.queue.name to custom hiverserver2-site configuration in Ambari Hope this helps

ahadjidj · ‎05-07-2016

@Premasish Dan What lab exercise are you doing ? is it a tutorial ?

ahadjidj · ‎05-07-2016

Hi @Sunile Manjee There's no Zeppelin interpreter for Solr. List of available interpreters is here. You can make Solr a Spark RDD and hence access Solr Data with Spark interpreter in Zeppelin. Another approach (that I didn't test) is to use Solr JDBC connection and Zeppelin JDBC interpreter. A Jira Ticket make me think that some problem may be encountered.

ahadjidj · ‎05-07-2016

Hi @Subhasis Roy Tuples are used to represent complex data types. Tuples are between parentheses like in this example: cat data (3,8,9) (4,5,6) (1,4,7) (3,7,5) (2,5,8) (9,5,8) A = LOAD 'data' AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int)); X = FOREACH A GENERATE t1.t1a,t2.$0; DUMP X; (3,4) (1,3) (2,9) In your case, your data is simple and not between parentheses so you don't need to use tuple in your schema. Just run this A = LOAD '/tmp/test.csv' USING PigStorage(',') AS (a:chararray, b:chararray, c:chararray, d:chararray, e:chararray); DUMP A; (1201,gopal, manager, 50000, TP) (1202,manisha, proof reader, 50000, TP) If you want to access only some fields of your data you use this (here I show only the 4 first fields): X = FOREACH A GENERATE $0, $1, $2, $3; DUMP X; (1201,gopal, manager, 50000) (1202,manisha, proof reader, 50000) Does this answer your question ?

ahadjidj · ‎05-06-2016

@jbarnett I say not Flume 🙂 have you tried NiFi ? You can can have several processors for your app, configure each one of them with some click in GUI !! you want re-configure a particular processor, no problem !! stop it, right click, configure it and run it again. If you really want to use Flume, I recommend using a config file per agent as stated in the doc : Hortonworks recommends that administrators use a separate configuration file for each Flume agent. .... While it is possible to use one large configuration file that specifies all the Flume components needed by all the agents, this is not typical of most production deployments. Since you have several agents in the same host, Ambari is not an option Use NiFi !!

ahadjidj · ‎05-06-2016

Hi @Indrajit swain, You are hitting the ElasticSearch that Atlas is running in background for its operations. This is why you get an older version of ES when you curl port 9200. To check it, stop your ES instance and check if you have something listening to port 9200 netstat -npl | grep 9200 You should still have something listening even when your ES is down. You can see the configuration of existing ES in Atlas configuration in Ambari When ES starts and find its port used (9200) it picks the next available one. So your ES instance will be running on port 9201. You can see it in the starting logs (like in my example) : [2016-05-06 17:09:41,452][INFO ][http ] [Speedball] publish_address {127.0.0.1:9201}, bound_addresses {127.0.0.1:9201} You can try to curl the two ports to test: [root@sandbox ~]# curl localhost:9200 { "status" : 200, "name" : "Gravity", "version" : { "number" : "1.2.1", "build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364", "build_timestamp" : "2014-06-03T15:02:52Z", "build_snapshot" : false, "lucene_version" : "4.8" }, "tagline" : "You Know, for Search" } [root@sandbox ~]# curl localhost:9201 { "name" : "Speedball", "cluster_name" : "elasticsearch", "version" : { "number" : "2.3.2", "build_hash" : "b9e4a6acad4008027e4038f6abed7f7dba346f94", "build_timestamp" : "2016-04-21T16:03:47Z", "build_snapshot" : false, "lucene_version" : "5.5.0" }, "tagline" : "You Know, for Search" } You can also change the port of ES to something you want in the yaml file. Hope this helps

ahadjidj · ‎05-05-2016

Hi @Revathy Mourouguessane, have you tried this solution ?

ahadjidj · ‎04-30-2016

Hi @Rendiyono Wahyu Saputro, What are you trying to build is what we call Connected Data Platform at Hortonworks. You need to understand that you have two types of workloads/requirements and you need to use HDF and HDP jointly. ML model construction: the first step towards you goal is to build your machine learning model. This require processing lot of historical data (data at rest) to detect some pattern related to what you are trying to predict. This phase is called "training phase".The best tool do this is HDP and more specifically Spark. Applying the ML model: once step1 completed, you will have a model that you can apply to new data to predict something. In my understanding you want to apply this at real time data coming from twitter (data at motion). To get the data in real time and transform to what the ML model needs, you can use NiFi. Next, NiFi send the data to Storm or Spark Streaming that applies the model and get the prediction. So you will have to use HDP to construct the model, HDF to get and transform the data, and finally a combination of HDF/HDP to apply the model and make the prediction. To build a web service with NiFi you need to use several processors: one to listen to incoming requests, one or several processors to implement your logic (transformation, extraction, etc), one to publish the result. You can check this page that contains several data flow examples. The "Hello_NiFi_Web_Service.xml" gives an example on how to do it. https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates

ahadjidj · ‎04-29-2016

Hi, Unfortunately I can't do a webex with you. Describe your problem here and I and the community will be happy to help you. Also, call the support if you have a subscription. Thanks

Online	Offline
Last Visited	‎08-19-2019 05:07 AM

Member Since	‎01-11-2016 06:11 PM
Last Visited	‎08-19-2019 05:07 AM
Posts	355
Kudos received	232

Cloudera Community

Re: How to access NIFI Process Group variable in E...

Re: GETSFTP with NiFi cluster

Re: how is Kafka different from Mosquitto(MQTT) ?

Re: Whitelisting using LookupAttribute

Re: Is there any ways if we can schedule or trigge...

tez.queue.name vs hive.server2.tez.default.queues ...

Re: How to utilize the queues or node label dynam...

Re: How do I found the Lab directory /root/devph/...

Re: Does Zeppelin have a Solr interpreter?

Re: Load a csv file in PIG as tuple

Re: Flume in Production - To Ambari or not to Amb...

Re: How to install or start Elasticsearch in HDP ...

Re: pig - Filter output of cogroup having NULL

Re: Machine Learning in Apache Nifi

Re: INSTALLATION OF HDP - PSEUDE MODE