About schausson

schausson · ‎06-15-2017

Well, I did clean everything as suggested, double checked configuration you mentionned, but still the same problem occurs. The result of the ps command returns this : [root@sandbox ~]# ps -ef | grep ^ams ams 14734 1 0 19:38 ? 00:00:00 bash /usr/lib/ams-hbase/bin/hbase-daemon.sh --config /etc/ams-hbase/conf foreground_start master ams 14748 14734 10 19:38 ? 00:00:26 /usr/lib/jvm/java/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-201706141938 -Xms1536m -Xmx1536m -Xmn256m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase-ams-master-sandbox.hortonworks.com.log -Dhbase.home.dir=/usr/lib/ams-hbase/ -Dhbase.id.str=ams -Dhbase.root.logger=INFO,RFA -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.master.HMaster start And ambari-metrics-collector.log (cf attachment ambari-metrics-collector.zip) file contains several errors. The latest ones are : 2017-06-14 19:39:29,423 ERROR org.apache.helix.controller.GenericHelixController: ClusterEventProcessor failed while running the controller pipeline java.lang.NullPointerException at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:276) at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595) 2017-06-14 19:39:29,430 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked Any clue about this ? Could it be the reason why AMS automatically stops ? Thanks

schausson · ‎06-15-2017

@Jay Thanks a lot for your quick answer ! I checked the property you mentionned and it actually has the "embedded" value => I stopped HBase service on the sandbox. I also stopped Flume, Hive, Spark2, Knox, but still the same behavior : metrics collector starts, and does down after a while...Is there a particular place (logs...) to check if my VM runs out of memory ? (I dedicated 4 cores and 16GB for the sandbox, but dont know how much ambari metrics expects to work...) Your comments also raise an additional question : in distributed mode, AMS "embedded hbase" writes metrics to HDFS - instead of local disk. Does it means that AMS never uses the standard HBase service of the cluster, but only an embedded hbase, or is there a mean to change this somehow ? Thanks again

schausson · ‎06-15-2017

Hi, I'm trying to proceed with tutorial about ambari metrics (http://bryanbende.com/development/2015/07/31/ambari-metrics-part1-metrics-collector), using HDP sandbox. But I got stuck with the very first steps 😞 I did everything about port forwarding successfully, but when I tried starting ambari metrics, it didn't complain about anything, displayed a success status in the progress bar, but stayed in "stopped" status afterward. It took me a while to figure out that this was (maybe) because HBase is not running out of the box in HDP sandbox. After I started HBase, ambari metrics successfully started, but automatically stopped after a few minutes... My questions are quite simple : Is it possible to run ambari metrics without HBase (In the context of HDP sandbox)? How hbase and ambari metrics "cooperate", because HBase seems to send data to ambari metrics and ambari metrics relies on HBase (or I missed something ?) Thanks for your help

schausson · ‎05-30-2017

In fact, was my fault : when my box went back from stand-bye mode, I had not restarted hbase ... It works now 🙂

schausson · ‎05-30-2017

Thanks for your help, I got rid with python error. Anyway, I still cannot connect from my python client to hbase thrift server (about your question, I'm trying to use a vanilla HBase, without any other component), because I can see following error on server side : 2017-05-30 08:01:11,816 WARN [thrift-worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connexion refusée at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-05-30 08:01:11,916 ERROR [thrift-worker-0] zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts 2017-05-30 08:01:11,917 WARN [thrift-worker-0] zookeeper.ZKUtil: hconnection-0x3330972d0x0, quorum=localhost:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid) org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:419) at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:905) It seems that HBase is trying to reach zookeeper, but I didn't setup any zookeeper...Is it possible to run HBase without zookeeper ? Thanks again Regards

schausson · ‎05-28-2017

Hi, I've been successfully using HBase with java client for months, but now I would like to implement a python application to reach my data. I've read several posts about thrift but cannot figure out something : I installed a standalone HBase instance on my linux VM, and it seems that I can execute thrift server directly with "hbase thrift start" command. Did I understand correctly ? Does HBase provide an embedded thrift server ? All the posts I've read suggest to download an instance of thrift server and to compile it in order to generate language (python) specific bindings. Can't I use the "embedded" thrift from hbase for that ? Does it mean that the downloaded & compiled thrift server will "only" be used to generate a few python packages that will be used by my application (but will not be launched itself) ? After having processed with all steps from https://acadgild.com/blog/connecting-hbase-with-python-application-using-thrift-server/, when I try to execute table.py, I'm facing following error : Traceback (most recent call last): File "table.py", line 1, in <module> from thrift.transport.TSocket import TSocket ModuleNotFoundError: No module named 'thrift' I don't understand what I missed here...Should I install a thrift package for python somehow (in addition to my generated bindings) ? Thanks for your help, Regards, Sebastien

schausson · ‎04-13-2017

Hi, I created a spark application that is configured thanks to a bunch of properties files that I specify at runtime with --files option of spark-submit command. These local files are automatically copied to spark containers so that my job running in executors can read them to adjust its behavior. Great, this works like a charm. Now, I want to schedule this spark-submit action every hour with oozie, but couldn't find how to proceed to pass these configuration files properly to my spark job thanks to oozie... I guess I have to copy these files to HDFS and ask oozie to launch the spark action and pass it thoses hdfs files, but cannot figure out how to achieve this... Does anyone have a clue about this ? Thanks a lot for your help Sebastien

schausson · ‎04-06-2017

After more reading, it seems that region replication may be used for read high availability... If I understand properly, it means that when a RS fails, its regions are moved to other "valid" region servers and are still available, but it may take a while ... So region replication's purpose is just to reduce this waiting period ? Nothing related to data physical replication in order to guarantee that we won't loose any data, right ?

schausson · ‎04-06-2017

Hi, I'm currently looking at "HA" feature of HBase, but cannot figure out how it works exactly. I first created tables using default java API, without specifying any region replication value, and thinking that default HDFS replication mechanism would guarantee data availability. Actually, when I look at region files on HDFS, they are shown with "3" as replication factors : Ex : [myuser@myhost ~]$ hdfs dfs -ls /apps/hbase/data/data/default/MY_TEST_TABLE/f24af874470de9b85c2e1bd0ff5f80b3/0 Found 1 items -rw------- 3 hbase hdfs 12234 2017-03-29 15:44 /apps/hbase/data/data/default/MY_TEST_TABLE/f24af874470de9b85c2e1bd0ff5f80b3/0/125b6555b2274e64b1ba4e9a8ef42885 So why should I set a region replication value (eg. 3) in addition to default HDFS one ? Does it means that my data will eventually be replicated by 9 ? Thanks for any clue about this... Sebastien

schausson · ‎02-24-2017

Among the things I gave a try to, I suspected that maybe parallelizing the whole set of file paths (more than 100000) one shot without specifying the number of partitions was a problem, because it seemed that some containers were assigned very few files whereas others had many many files to process, and thus some containers were not used after a while, waiting after the others to complete their tasks... For that reason, I tried to reduce the number of files to distribute (in my driver application, I added a loop to "parallelize" those files by "batch" of 1000 files, so that every 1000 files a fair distribution is done to obtain better repartition over my containers, but couldnt understand exactly how spark proceeded. Basically, for each iteration of my driver, I have 1000 files to process. I arbitrarily splitted these files in 100 partitions => spark has 100 tasks to complete, each one focusing on 10 files. If I ask for 20 executors with 2 cores per executors (for example, and supposing that yarn actually provides me such resources), it means that spark should be able to "distribute" 40 tasks in parallel, and keep 60 tasks to feed again the containers that would first complete their task...Am I right ? But one additional detail that may be important : it seems that first tasks are "pretty fast" (almost as fast as non-spark implementation), but subsequent ones are getting slower and slower along time, till a new batch of 1000 files is started

Online	Offline
Last Visited	‎02-14-2018 04:59 PM

Member Since	‎09-27-2016 08:14 AM
Last Visited	‎02-14-2018 04:59 PM
Posts	73
Kudos received	9

Cloudera Community

Re: Where do custom spark properties end ?

Re: how to properly override mapper's JVM options

Re: Ambari metrics not starting on sandbox

Re: Ambari metrics not starting on sandbox

Ambari metrics not starting on sandbox

Re: HBase thrift python example

Re: HBase thrift python example

HBase thrift python example

How to pass --files to spark action with oozie

Re: HBase HA vs HDFS replication...

HBase HA vs HDFS replication...

Re: Spark "long running" tasks low performance, ho...