Member since
09-17-2015
70
Posts
79
Kudos Received
20
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2309 | 02-27-2018 08:03 AM | |
2122 | 02-27-2018 08:00 AM | |
2464 | 10-09-2016 07:59 PM | |
864 | 10-03-2016 07:27 AM | |
947 | 06-17-2016 03:30 PM |
04-29-2016
08:39 AM
Hello Pedro Spark core is a general purpose in memory analytics engine. Adding to spark core things like sparkSQL or SparkML you can do many interesting analytics or Datascience modelling, in a programatic or sql fashion. Maybe this tutorial can help you in your first steps. http://hortonworks.com/hadoop-tutorial/hands-on-tour-of-apache-spark-in-5-minutes/ http://hortonworks.com/blog/data-science-hadoop-spark-scala-part-2/
... View more
04-14-2016
08:03 AM
Hello nelson I don't think you need the Hive configuration explicitly set anymore. aka this part "-Djavax.jdo.option.ConnectionURL=jdbc:mysql://testip/hive?createDatabaseIfNotExist=true -Dhive.metastore.uris=thrift://testip:9083 "
... View more
04-13-2016
09:10 AM
1 Kudo
Hello Nelson Instead of putting the Hive info in different properties could you try to add the hive-site.xml : (--files=/etc/hive/conf/hive-site.xml) just to make sure all is consistent. Without this spark could launch a embedded metastore causing the out of memory condition. Could you also share a little bit the app , what type of data ORC,CSV etc... Size of he table let's see if this helps
... View more
04-13-2016
08:18 AM
3 Kudos
Hello Sumit Increasing the zookeeper session timeout is often a quick first fix to GC pause "killing" in Hbase. In the longer run If you have GC pauses is because your process is trying to find memory. There can be architectural approaches to this problem: For example does this happen during heavy writes loads in which case you consider doing bulk load when possible. You can also look at your hbase configuration what is your overall allocated memory for Hbase and how is distributed for writes and reads. Do you flush your memstore often, does this lead to many compactions? Lastly you can look at GC tuning. I won't dive into this one but Lars has done a nice introduction blog post on this here:http://hadoop-hbase.blogspot.ie/2014/03/hbase-gc-tuning-observations.html Hope any of this helps
... View more
04-12-2016
01:31 PM
4 Kudos
Hello sunile Zeppelin in 2.4 has a bug that has been fixed since then. If you issue any query with a "()" the new parser for prefix will get lost. In your log if your query is "%hive select count(*)" from table you will see the query being sent look like "elect count(*) from table" This is because the parser looks for a "(prefix_name)" and here mistakes "(*)" for a prefix. The workaround is use %hive(default) or wait for next release
... View more
04-11-2016
11:12 AM
3 Kudos
Hello Kiran Spark is not yet a GA feature in Hive, still very much in dev phase. You can however use SparkSQL to issue queries in a hive context to use Hive tables.
... View more
04-06-2016
07:05 AM
4 Kudos
Hello arunkumar As a general rule it will come back to what you are trying to achieve and how you want to service data. Remember that Hbase's performance is directly derived from the rowkey and hence how you access data. Hbase will split up data in regions served by region servers and on a lower level data will be split by Column Family. A single entry however will be served by the same region. At high level the difference between tall-narrow and flat-wide comes back to scans vs gets. Since Hbase has an ordered on the rowkey storage policy and full scans are costly. A Tall-narrow approach would be to have a more complex rowkey giving adjacency of similar elements and allowing to do focused scans for logical group of entries. A Flat-wide approach would ahve much more information in the entry itself, you "get" the entry through the rowkey and the entry would have sufficient information to do your compute or answer your query. hope this helps
... View more
04-04-2016
12:32 PM
As you see when you increased the user-limit-factor it allocated more containers and you got 200% of the queue, now if you where to give 2,5 you would get the full queue. For the second part if you want the ituser queue to release the extra containers to service the price queue, you can either wait for it to happen naturally as the job rolls out or better set the yarn preemption mechanism.http://hortonworks.com/blog/better-slas-via-resource-preemption-in-yarns-capacityscheduler/
... View more
04-04-2016
09:10 AM
Hello Alena On top of the queue distribution and elasticity there are other elements that can be configured to help share the ressources. For example you have .root.it.user-limit-factor=1 which means a user cannot use more than 100% of the allocated queue capacity, this can limit or negate the optionnal elasticity given to a queue. Try setting it to 2 and then 3 so see the result. regards
... View more
03-29-2016
08:58 AM
6 Kudos
Hello Santosh When creating a Phoenix table through the DSL, as you are doing, phoenix will handle all the magic before pushing using Hbase as a store. In this scenario you will get a complex rowkey in the order you have written it: Market_key-Product_Key-Period-Key so the order in which you declare in your statement is important as it will be the order of your complex rowkey. Furthermore to separate the values Phoenix will use a 0 byte in between each element of the key, or use size encoding info if applicable. So, for example if you have (varchar,u_long, varchar) primary key, the rowkey for values like 'X',1,'Y'will be : ('X', 0x00) (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01), ('Y') Lastly the Primary Key elements becoming the rowkey they will not be Hbase columns of your table so keep that in mind will designing it if has importance.
... View more