About morten_riedel

morten_riedel · ‎03-23-2017

I configured the execution engine to MapReduce instead of Tez, this works for me. Since I´m only testing and learning some stuff its ok for me.. (I did not know hive got several engines)

morten_riedel · ‎03-22-2017

I just found that if I adjust the code like the following, the execution works: // execute statement ResultSet res = stmt.executeQuery("select Country from hbase_table_climate"); while (res.next()) { System.out.println(res.getString(1)); } If I adjust a little bit more complex select statement it crashes with the mentioned exception: ResultSet res = stmt.executeQuery("select distinct City, Country from hbase_table_climate where country like \"Germany\"");

morten_riedel · ‎03-22-2017

Hi, I´m using the vm´s of ambari and installed a cluster of 5 nodes (c7001 - c7005). I imported a csv file into HBase and mapped a Hive table to it. I want to write a Java programm which executes some simple queries like count, read, filters, group by.. I get the following exception: [hive@c7002 vagrant]$ java -cp SparkThesisTest-4.3.0-SNAPSHOT-jar-with-dependencies.jar thesis.test.sparkthesistest.hbase.read.HiveRead log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:348) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:251) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:434) at thesis.test.sparkthesistest.hbase.read.HiveRead.main(HiveRead.java:35) The Java Code is simple and looks like: // Register driver and create driver instance Class.forName("org.apache.hive.jdbc.HiveDriver"); // get connection Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "hive", ""); // create statement Statement stmt = con.createStatement(); // execute statement stmt.executeQuery("INSERT OVERWRITE DIRECTORY '/tmp/hive_read_countries.csv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select distinct City, Country from hbase_table_climate where country like \"Germany\""); con.close(); Im executing the Java Application on c7002. This is where the HiveServer2 runs. Im using maven and installed the following dependency: <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>2.1.0</version> </dependency> If execute the exact same query through the "Hive View" or inside the "Hive Shell" it works INSERT OVERWRITE DIRECTORY '/tmp/hive_read_countries.csv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select Country from hbase_table_climate

morten_riedel · ‎03-20-2017

thanks for the answer. I doing a research for the university. For this I really want to use spark on hbase. I did not find one working example. Why do u suggest to not use spark with hbase? Is it hard or just not good to do it?

morten_riedel · ‎03-20-2017

Hi, I´m currently trying to do some simple reads, sorts, aggregations and filters to some date in Hbase using Spark (JAVA). I found some sample code and tried a lot but nothing realy works. Since I´m a beginner I dont really know how to get this work. I loaded a DataSet of some climate data into hbase via bulk load. The table name is "climate" and I got the column families: "date", "temp", "place". The rows itself are organized like the following: id,date:dt,temp:AverageTemperature,temp:AverageTemperatureUncertainty,place:City,place:Country,place:Latitude,place:Longitude I tried some sample code of some guide I found and started to work on it. It actually looks like: public class IBMREAD { public static void main(String[] args) throws Exception { // define Spark Context SparkConf sparkConf = new SparkConf().setAppName("SparkHBaseTest"); JavaSparkContext jsc = new JavaSparkContext(sparkConf); SQLContext sqlContext = new SQLContext(jsc); // create connection with HBase Configuration config = null; try { config = HBaseConfiguration.create(); config.addResource(new Path("/etc/hbase/conf/core-site.xml")); config.addResource(new Path("/etc/hbase/conf/hbase-site.xml")); HBaseAdmin.checkHBaseAvailable(config); System.out.println("-----------------------HBase is running!"); } catch (MasterNotRunningException e) { System.out.println("--------------------HBase is not running!"); System.exit(1); }catch (Exception ce){ ce.printStackTrace(); } System.out.println("--------------------SET TABLES!"); config.set(TableInputFormat.INPUT_TABLE, "climate"); System.out.println("--------------------Creating hbase rdd!"); JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = jsc.newAPIHadoopRDD(config, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); // in the rowPairRDD the key is hbase's row key, The Row is the hbase's Row data JavaPairRDD<String, TestData> rowPairRDD = hBaseRDD.mapToPair( new PairFunction<Tuple2<ImmutableBytesWritable, Result>, String, TestData>() { @Override public Tuple2<String, TestData> call( Tuple2<ImmutableBytesWritable, Result> entry) throws Exception { System.out.println("--------------------Getting ID!"); Result r = entry._2; String keyRow = Bytes.toString(r.getRow()); System.out.println("--------------------Define JavaBean!"); TestData cd = new TestData(); cd.setId(keyRow); cd.setDt((String) Bytes.toString(r.getValue(Bytes.toBytes("date"), Bytes.toBytes("dt")))); cd.setAverageTemperature((double) Bytes.toDouble(r.getValue(Bytes.toBytes("temp"), Bytes.toBytes("AverageTemperature")))); cd.setAverageTemperature((double) Bytes.toDouble(r.getValue(Bytes.toBytes("temp"), Bytes.toBytes("AverageTemperatureUncertainty")))); cd.setCity((String) Bytes.toString(r.getValue(Bytes.toBytes("place"), Bytes.toBytes("City")))); cd.setCity((String) Bytes.toString(r.getValue(Bytes.toBytes("place"), Bytes.toBytes("Country")))); cd.setCity((String) Bytes.toString(r.getValue(Bytes.toBytes("place"), Bytes.toBytes("Latitude")))); cd.setCity((String) Bytes.toString(r.getValue(Bytes.toBytes("place"), Bytes.toBytes("Longitude")))); return new Tuple2<String, TestData>(keyRow, cd); } }); System.out.println("--------------------COUNT RDD " + rowPairRDD.count()); System.out.println("--------------------Create DataFrame!"); DataFrame schemaRDD = sqlContext.createDataFrame(rowPairRDD.values(), TestData.class); System.out.println("--------------------Loading Schema"); schemaRDD.printSchema(); System.out.println("--------------------COUNT Schema " + schemaRDD.count()); } } The code seems to connect to HBase, which I´m really happy about and seems to find the table climate. However I´m not sure about that. If I submit this spark application I get the following exception: 17/03/21 04:11:41 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, c7010.ambari.apache.org): java.lang.IllegalArgumentException: offset (0) + length (8) exceed the capacity of the array: 5 at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:632) at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:606) at org.apache.hadoop.hbase.util.Bytes.toDouble(Bytes.java:730) at org.apache.hadoop.hbase.util.Bytes.toDouble(Bytes.java:721) at thesis.test.sparkthesistest.hbase.read.IBMREAD$1.call(IBMREAD.java:75) at thesis.test.sparkthesistest.hbase.read.IBMREAD$1.call(IBMREAD.java:62) at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1018) at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1018) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1633) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1164) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1164) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1882) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1882) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) This exceptions occurs about 3 times with different TID till spark aborts the job. What am I doing wrong? -----------------------------EDIT------------------------------- I also tried this example: SparkConf sparkConf = new SparkConf().setAppName("HBaseRead"); JavaSparkContext jsc = new JavaSparkContext(sparkConf); Configuration conf = HBaseConfiguration.create(); conf.addResource(new Path("/etc/hbase/conf/core-site.xml")); conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml")); JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf); Scan scan = new Scan(); scan.setCaching(100); JavaRDD<Tuple2<ImmutableBytesWritable, Result>> hbaseRdd = hbaseContext.hbaseRDD(TableName.valueOf("climate"), scan); System.out.println("Number of Records found : " + hbaseRdd.count()); If I execute this code I get Exception in thread "dag-scheduler-event-loop" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/regionserver/StoreFileWriter Why is it that hard to reach hbase with spark.. ?

morten_riedel · ‎02-14-2017

Fixed it by using cent7.0 instead of suse vm´s. Seems like there is something broken with the suse vms.

morten_riedel · ‎02-14-2017

I actually destroyed my vm´s via vagrant and try to install it from scratch again. But I configured vagrant to use 4GB of Ram. I haven´t configured any kind of CPU size since it wasn´t in the vagrantfile.

morten_riedel · ‎02-13-2017

I got no firewall on this vm. I tried to configure the ambari-agent.ini as follows: [server] hostname=suse1101.ambari.apache.org url_port=8440 secured_url_port=8441 ambari-agent.log INFO 2017-02-13 15:18:51,558 main.py:74 - loglevel=logging.INFO INFO 2017-02-13 15:18:51,558 main.py:74 - loglevel=logging.INFO INFO 2017-02-13 15:18:51,559 DataCleaner.py:39 - Data cleanup thread started INFO 2017-02-13 15:18:51,561 DataCleaner.py:120 - Data cleanup started INFO 2017-02-13 15:18:51,561 DataCleaner.py:122 - Data cleanup finished INFO 2017-02-13 15:18:51,579 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2017-02-13 15:18:51,580 main.py:289 - Connecting to Ambari server at https://suse1101.ambari.apache.org:8440 (192.168.11.101) INFO 2017-02-13 15:18:51,581 NetUtil.py:60 - Connecting to https://suse1101.ambari.apache.org:8440 INFO 2017-02-13 15:18:51,691 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads WARNING 2017-02-13 15:18:51,691 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs. INFO 2017-02-13 15:18:51,691 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0xe15950>; currently running: False INFO 2017-02-13 15:18:53,697 hostname.py:89 - Read public hostname 'suse1101.ambari.apache.org' using socket.getfqdn() INFO 2017-02-13 15:18:53,794 ExitHelper.py:53 - Performing cleanup before exiting... INFO 2017-02-13 15:18:53,795 ExitHelper.py:67 - Cleanup finished, exiting with code:0 ambari-agent.out is still the same as mentioned above. Note that suse1101.ambari.apache.org runs the ambari-server and an agent.

morten_riedel · ‎02-13-2017

Hi, I tried to install a ambari server on a local machine (suse linux enterprise server) with severall VM´s. I just followed the QuickStart. So far I created 2 VM`S (suse1101 and suse1102). I also installed the ambari-server on suse1101. But if I try to add the agent to suse1101 and suse1102 via the installation wizard it fails. There is no errormessage. ========================== Creating target directory... ========================== Command start time 2017-02-13 14:41:25 Connection to suse1101.ambari.apache.org closed. SSH command execution finished host=suse1101.ambari.apache.org, exitcode=0 Command end time 2017-02-13 14:41:26 ========================== Copying common functions script... ========================== Command start time 2017-02-13 14:41:26 scp /usr/lib/python2.6/site-packages/ambari_commons host=suse1101.ambari.apache.org, exitcode=0 Command end time 2017-02-13 14:41:26 ========================== Copying OS type check script... ========================== Command start time 2017-02-13 14:41:26 scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py host=suse1101.ambari.apache.org, exitcode=0 Command end time 2017-02-13 14:41:26 ========================== Running OS type check... ========================== Command start time 2017-02-13 14:41:26 Cluster primary/cluster OS family is suse11 and local/current OS family is suse11 Connection to suse1101.ambari.apache.org closed. SSH command execution finished host=suse1101.ambari.apache.org, exitcode=0 Command end time 2017-02-13 14:41:26 ========================== Checking 'sudo' package on remote host... ========================== Command start time 2017-02-13 14:41:26 sudo-1.6.9p17-21.3.1 Connection to suse1101.ambari.apache.org closed. SSH command execution finished host=suse1101.ambari.apache.org, exitcode=0 Command end time 2017-02-13 14:41:26 ========================== Copying repo file to 'tmp' folder... ========================== Command start time 2017-02-13 14:41:26 scp /etc/zypp/repos.d/ambari.repo host=suse1101.ambari.apache.org, exitcode=0 Command end time 2017-02-13 14:41:26 ========================== Moving file to repo dir... ========================== Command start time 2017-02-13 14:41:26 Connection to suse1101.ambari.apache.org closed. SSH command execution finished host=suse1101.ambari.apache.org, exitcode=0 Command end time 2017-02-13 14:41:26 ========================== Changing permissions for ambari.repo... ========================== Command start time 2017-02-13 14:41:26 Connection to suse1101.ambari.apache.org closed. SSH command execution finished host=suse1101.ambari.apache.org, exitcode=0 Command end time 2017-02-13 14:41:26 ========================== Copying setup script file... ========================== Command start time 2017-02-13 14:41:26 scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py host=suse1101.ambari.apache.org, exitcode=0 Command end time 2017-02-13 14:41:26 ========================== Running setup agent script... ========================== Command start time 2017-02-13 14:41:26 Important to say is that´ve configured a proxy in /etc/sysconfig/proxy and added the ip to the hosts. I also added the IP adresses of the VM`s etc. Since I don´t know why this happen I tried to install the agent on suse1101 (this is also the VM where the ambari-server is running) manually. I configured /etc/ambari-agent/conf/ambari-agent.ini like this: [server] hostname=127.0.0.1 url_port=8440 secured_url_port=8441 If I start the agent I can access several logs: /var/log/ambari-agent/ambari-agent.log INFO 2017-02-13 14:33:54,507 main.py:74 - loglevel=logging.INFO INFO 2017-02-13 14:33:54,507 main.py:74 - loglevel=logging.INFO INFO 2017-02-13 14:33:54,508 DataCleaner.py:39 - Data cleanup thread started INFO 2017-02-13 14:33:54,509 DataCleaner.py:120 - Data cleanup started INFO 2017-02-13 14:33:54,509 DataCleaner.py:122 - Data cleanup finished INFO 2017-02-13 14:33:54,529 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2017-02-13 14:33:54,531 main.py:289 - Connecting to Ambari server at https://localhost:8440 (127.0.0.1) INFO 2017-02-13 14:33:54,531 NetUtil.py:60 - Connecting to https://localhost:8440/ca INFO 2017-02-13 14:33:54,806 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads WARNING 2017-02-13 14:33:54,806 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs. INFO 2017-02-13 14:33:54,806 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0xe18850>; currently running: False INFO 2017-02-13 14:33:56,826 hostname.py:89 - Read public hostname 'suse1101.ambari.apache.org' using socket.getfqdn() INFO 2017-02-13 14:33:56,916 ExitHelper.py:53 - Performing cleanup before exiting... INFO 2017-02-13 14:33:56,916 ExitHelper.py:67 - Cleanup finished, exiting with code:0 INFO 2017-02-13 14:47:29,614 main.py:74 - loglevel=logging.INFO INFO 2017-02-13 14:47:29,614 main.py:74 - loglevel=logging.INFO INFO 2017-02-13 14:47:29,615 DataCleaner.py:39 - Data cleanup thread started INFO 2017-02-13 14:47:29,616 DataCleaner.py:120 - Data cleanup started INFO 2017-02-13 14:47:29,616 DataCleaner.py:122 - Data cleanup finished INFO 2017-02-13 14:47:29,638 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2017-02-13 14:47:29,640 main.py:289 - Connecting to Ambari server at https://127.0.0.1:8440 (127.0.0.1) INFO 2017-02-13 14:47:29,640 NetUtil.py:60 - Connecting to https://127.0.0.1:8440/ca INFO 2017-02-13 14:47:29,727 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads WARNING 2017-02-13 14:47:29,727 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs. INFO 2017-02-13 14:47:29,727 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0xe17890>; currently running: False INFO 2017-02-13 14:47:31,731 hostname.py:89 - Read public hostname 'suse1101.ambari.apache.org' using socket.getfqdn() INFO 2017-02-13 14:47:31,831 ExitHelper.py:53 - Performing cleanup before exiting... INFO 2017-02-13 14:47:31,831 ExitHelper.py:67 - Cleanup finished, exiting with code:0 /var/log/ambari-agent/ambari-agent.out Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 522, in __bootstrap_inner self.run() File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 374, in run self.register = Register(self.config) File "/usr/lib/python2.6/site-packages/ambari_agent/Register.py", line 34, in __init__ self.hardware = Hardware() File "/usr/lib/python2.6/site-packages/ambari_agent/Hardware.py", line 43, in __init__ self.hardware['mounts'] = Hardware.osdisks() File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/usr/lib/python2.6/site-packages/ambari_agent/Hardware.py", line 91, in osdisks df = subprocess.Popen(command, stdout=subprocess.PIPE) File "/usr/lib64/python2.6/subprocess.py", line 595, in __init__ errread, errwrite) File "/usr/lib64/python2.6/subprocess.py", line 1106, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory I don´t know how to fix this, since I´m quite new to linux and ambari..

Online	Offline
Last Visited	‎03-23-2017 05:06 PM

Member Since	‎02-13-2017 02:05 PM
Last Visited	‎03-23-2017 05:06 PM
Posts	9

Cloudera Community

Re: Hive org.apache.hadoop.hive.ql.exec.tez.TezTa...

Re: Ambari-Agent registration

Re: Hive org.apache.hadoop.hive.ql.exec.tez.TezTa...

Re: Hive org.apache.hadoop.hive.ql.exec.tez.TezTa...

Hive org.apache.hadoop.hive.ql.exec.tez.TezTask e...

Re: Spark On Hbase Read (Java)

Spark On Hbase Read (Java)

Re: Ambari-Agent registration

Re: Ambari-Agent registration

Re: Ambari-Agent registration

Ambari-Agent registration