About nmaillard1

nmaillard1 · ‎04-29-2016

Hello Pedro Spark core is a general purpose in memory analytics engine. Adding to spark core things like sparkSQL or SparkML you can do many interesting analytics or Datascience modelling, in a programatic or sql fashion. Maybe this tutorial can help you in your first steps. http://hortonworks.com/hadoop-tutorial/hands-on-tour-of-apache-spark-in-5-minutes/ http://hortonworks.com/blog/data-science-hadoop-spark-scala-part-2/

nmaillard1 · ‎04-14-2016

Hello nelson I don't think you need the Hive configuration explicitly set anymore. aka this part "-Djavax.jdo.option.ConnectionURL=jdbc:mysql://testip/hive?createDatabaseIfNotExist=true -Dhive.metastore.uris=thrift://testip:9083 "

nmaillard1 · ‎04-13-2016

Hello Nelson Instead of putting the Hive info in different properties could you try to add the hive-site.xml : (--files=/etc/hive/conf/hive-site.xml) just to make sure all is consistent. Without this spark could launch a embedded metastore causing the out of memory condition. Could you also share a little bit the app , what type of data ORC,CSV etc... Size of he table let's see if this helps

nmaillard1 · ‎04-13-2016

Hello Sumit Increasing the zookeeper session timeout is often a quick first fix to GC pause "killing" in Hbase. In the longer run If you have GC pauses is because your process is trying to find memory. There can be architectural approaches to this problem: For example does this happen during heavy writes loads in which case you consider doing bulk load when possible. You can also look at your hbase configuration what is your overall allocated memory for Hbase and how is distributed for writes and reads. Do you flush your memstore often, does this lead to many compactions? Lastly you can look at GC tuning. I won't dive into this one but Lars has done a nice introduction blog post on this here:http://hadoop-hbase.blogspot.ie/2014/03/hbase-gc-tuning-observations.html Hope any of this helps

nmaillard1 · ‎04-12-2016

Hello sunile Zeppelin in 2.4 has a bug that has been fixed since then. If you issue any query with a "()" the new parser for prefix will get lost. In your log if your query is "%hive select count(*)" from table you will see the query being sent look like "elect count(*) from table" This is because the parser looks for a "(prefix_name)" and here mistakes "(*)" for a prefix. The workaround is use %hive(default) or wait for next release

nmaillard1 · ‎04-11-2016

Hello Kiran Spark is not yet a GA feature in Hive, still very much in dev phase. You can however use SparkSQL to issue queries in a hive context to use Hive tables.

nmaillard1 · ‎04-06-2016

Hello arunkumar As a general rule it will come back to what you are trying to achieve and how you want to service data. Remember that Hbase's performance is directly derived from the rowkey and hence how you access data. Hbase will split up data in regions served by region servers and on a lower level data will be split by Column Family. A single entry however will be served by the same region. At high level the difference between tall-narrow and flat-wide comes back to scans vs gets. Since Hbase has an ordered on the rowkey storage policy and full scans are costly. A Tall-narrow approach would be to have a more complex rowkey giving adjacency of similar elements and allowing to do focused scans for logical group of entries. A Flat-wide approach would ahve much more information in the entry itself, you "get" the entry through the rowkey and the entry would have sufficient information to do your compute or answer your query. hope this helps

nmaillard1 · ‎04-04-2016

As you see when you increased the user-limit-factor it allocated more containers and you got 200% of the queue, now if you where to give 2,5 you would get the full queue. For the second part if you want the ituser queue to release the extra containers to service the price queue, you can either wait for it to happen naturally as the job rolls out or better set the yarn preemption mechanism.http://hortonworks.com/blog/better-slas-via-resource-preemption-in-yarns-capacityscheduler/

nmaillard1 · ‎04-04-2016

Hello Alena On top of the queue distribution and elasticity there are other elements that can be configured to help share the ressources. For example you have .root.it.user-limit-factor=1 which means a user cannot use more than 100% of the allocated queue capacity, this can limit or negate the optionnal elasticity given to a queue. Try setting it to 2 and then 3 so see the result. regards

nmaillard1 · ‎03-29-2016

Hello Santosh When creating a Phoenix table through the DSL, as you are doing, phoenix will handle all the magic before pushing using Hbase as a store. In this scenario you will get a complex rowkey in the order you have written it: Market_key-Product_Key-Period-Key so the order in which you declare in your statement is important as it will be the order of your complex rowkey. Furthermore to separate the values Phoenix will use a 0 byte in between each element of the key, or use size encoding info if applicable. So, for example if you have (varchar,u_long, varchar) primary key, the rowkey for values like 'X',1,'Y'will be : ('X', 0x00) (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01), ('Y') Lastly the Primary Key elements becoming the rowkey they will not be Hbase columns of your table so keep that in mind will designing it if has importance.

Online	Offline
Last Visited	‎10-17-2018 10:48 AM

Member Since	‎09-17-2015 07:33 PM
Last Visited	‎10-17-2018 10:48 AM
Posts	70
Kudos received	79

Cloudera Community

Re: To what extent is schema evolution available i...

Re: CONTROL SIZE OUTPUT FILE SIZE WITHOUT ADDING M...

Re: how to compute regionserver's normal region co...

Re: What I need to check if Job taking 15-20 min m...

Re: Integration between Apache Pig, Apache Nifi an...

Re: What type of results that can I get using Apac...

Re: Executing Spark-submit with yarn-cluster mode ...

Re: Executing Spark-submit with yarn-cluster mode ...

Re: Do you increase zookeeper max session timeout ...

Re: Zeppelin hive null point exception HDP 2.4

Re: Unable to use hive.execution.engine=spark In H...

Re: Hbase tall-narrow or flat wide design

Re: Capacity Scheduler does not use all the resour...

Re: Capacity Scheduler does not use all the resour...

Re: How are the primary keys in Phoenix are conver...