Member since
02-13-2017
9
Posts
0
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11043 | 03-23-2017 03:19 PM | |
2606 | 02-14-2017 02:45 PM |
03-23-2017
03:19 PM
I configured the execution engine to MapReduce instead of Tez, this works for me. Since I´m only testing and learning some stuff its ok for me.. (I did not know hive got several engines)
... View more
03-22-2017
01:44 PM
I just found that if I adjust the code like the following, the execution works: // execute statement
ResultSet res = stmt.executeQuery("select Country from hbase_table_climate");
while (res.next()) {
System.out.println(res.getString(1));
}
If I adjust a little bit more complex select statement it crashes with the mentioned exception: ResultSet res = stmt.executeQuery("select distinct City, Country from hbase_table_climate where country like \"Germany\"");
... View more
03-22-2017
01:03 PM
Hi, I´m using the vm´s of ambari and installed a cluster of 5 nodes (c7001 - c7005). I imported a csv file into HBase and mapped a Hive table to it. I want to write a Java programm which executes some simple queries like count, read, filters, group by.. I get the following exception: [hive@c7002 vagrant]$ java -cp SparkThesisTest-4.3.0-SNAPSHOT-jar-with-dependencies.jar thesis.test.sparkthesistest.hbase.read.HiveRead
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:348)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:251)
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:434)
at thesis.test.sparkthesistest.hbase.read.HiveRead.main(HiveRead.java:35)
The Java Code is simple and looks like: // Register driver and create driver instance
Class.forName("org.apache.hive.jdbc.HiveDriver");
// get connection
Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "hive", "");
// create statement
Statement stmt = con.createStatement();
// execute statement
stmt.executeQuery("INSERT OVERWRITE DIRECTORY '/tmp/hive_read_countries.csv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select distinct City, Country from hbase_table_climate where country like \"Germany\"");
con.close();
Im executing the Java Application on c7002. This is where the HiveServer2 runs. Im using maven and installed the following dependency: <dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.0</version>
</dependency>
If execute the exact same query through the "Hive View" or inside the "Hive Shell" it works INSERT OVERWRITE DIRECTORY '/tmp/hive_read_countries.csv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select Country from hbase_table_climate
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
03-20-2017
07:08 PM
thanks for the answer. I doing a research for the university. For this I really want to use spark on hbase. I did not find one working example. Why do u suggest to not use spark with hbase? Is it hard or just not good to do it?
... View more
03-20-2017
05:17 PM
Hi, I´m currently trying to do some simple reads, sorts, aggregations and filters to some date in Hbase using Spark (JAVA). I found some sample code and tried a lot but nothing realy works. Since I´m a beginner I dont really know how to get this work. I loaded a DataSet of some climate data into hbase via bulk load. The table name is "climate" and I got the column families: "date", "temp", "place". The rows itself are organized like the following: id,date:dt,temp:AverageTemperature,temp:AverageTemperatureUncertainty,place:City,place:Country,place:Latitude,place:Longitude I tried some sample code of some guide I found and started to work on it. It actually looks like: public class IBMREAD {
public static void main(String[] args) throws Exception {
// define Spark Context
SparkConf sparkConf = new SparkConf().setAppName("SparkHBaseTest");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(jsc);
// create connection with HBase
Configuration config = null;
try {
config = HBaseConfiguration.create();
config.addResource(new Path("/etc/hbase/conf/core-site.xml"));
config.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
HBaseAdmin.checkHBaseAvailable(config);
System.out.println("-----------------------HBase is running!");
}
catch (MasterNotRunningException e) {
System.out.println("--------------------HBase is not running!");
System.exit(1);
}catch (Exception ce){
ce.printStackTrace();
}
System.out.println("--------------------SET TABLES!");
config.set(TableInputFormat.INPUT_TABLE, "climate");
System.out.println("--------------------Creating hbase rdd!");
JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD =
jsc.newAPIHadoopRDD(config, TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
// in the rowPairRDD the key is hbase's row key, The Row is the hbase's Row data
JavaPairRDD<String, TestData> rowPairRDD = hBaseRDD.mapToPair(
new PairFunction<Tuple2<ImmutableBytesWritable, Result>, String, TestData>() {
@Override
public Tuple2<String, TestData> call(
Tuple2<ImmutableBytesWritable, Result> entry) throws Exception {
System.out.println("--------------------Getting ID!");
Result r = entry._2;
String keyRow = Bytes.toString(r.getRow());
System.out.println("--------------------Define JavaBean!");
TestData cd = new TestData();
cd.setId(keyRow);
cd.setDt((String) Bytes.toString(r.getValue(Bytes.toBytes("date"), Bytes.toBytes("dt"))));
cd.setAverageTemperature((double) Bytes.toDouble(r.getValue(Bytes.toBytes("temp"), Bytes.toBytes("AverageTemperature"))));
cd.setAverageTemperature((double) Bytes.toDouble(r.getValue(Bytes.toBytes("temp"), Bytes.toBytes("AverageTemperatureUncertainty"))));
cd.setCity((String) Bytes.toString(r.getValue(Bytes.toBytes("place"), Bytes.toBytes("City"))));
cd.setCity((String) Bytes.toString(r.getValue(Bytes.toBytes("place"), Bytes.toBytes("Country"))));
cd.setCity((String) Bytes.toString(r.getValue(Bytes.toBytes("place"), Bytes.toBytes("Latitude"))));
cd.setCity((String) Bytes.toString(r.getValue(Bytes.toBytes("place"), Bytes.toBytes("Longitude"))));
return new Tuple2<String, TestData>(keyRow, cd);
}
});
System.out.println("--------------------COUNT RDD " + rowPairRDD.count());
System.out.println("--------------------Create DataFrame!");
DataFrame schemaRDD = sqlContext.createDataFrame(rowPairRDD.values(), TestData.class);
System.out.println("--------------------Loading Schema");
schemaRDD.printSchema();
System.out.println("--------------------COUNT Schema " + schemaRDD.count());
}
}
The code seems to connect to HBase, which I´m really happy about and seems to find the table climate. However I´m not sure about that. If I submit this spark application I get the following exception: 17/03/21 04:11:41 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, c7010.ambari.apache.org): java.lang.IllegalArgumentException: offset (0) + length (8) exceed the capacity of the array: 5
at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:632)
at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:606)
at org.apache.hadoop.hbase.util.Bytes.toDouble(Bytes.java:730)
at org.apache.hadoop.hbase.util.Bytes.toDouble(Bytes.java:721)
at thesis.test.sparkthesistest.hbase.read.IBMREAD$1.call(IBMREAD.java:75)
at thesis.test.sparkthesistest.hbase.read.IBMREAD$1.call(IBMREAD.java:62)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1018)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1018)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1633)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1164)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1164)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1882)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1882)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This exceptions occurs about 3 times with different TID till spark aborts the job. What am I doing wrong? -----------------------------EDIT------------------------------- I also tried this example: SparkConf sparkConf = new SparkConf().setAppName("HBaseRead");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
Configuration conf = HBaseConfiguration.create();
conf.addResource(new Path("/etc/hbase/conf/core-site.xml"));
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);
Scan scan = new Scan();
scan.setCaching(100);
JavaRDD<Tuple2<ImmutableBytesWritable, Result>> hbaseRdd = hbaseContext.hbaseRDD(TableName.valueOf("climate"), scan);
System.out.println("Number of Records found : " + hbaseRdd.count());
If I execute this code I get Exception in thread "dag-scheduler-event-loop" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/regionserver/StoreFileWriter
Why is it that hard to reach hbase with spark.. ?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
02-14-2017
02:45 PM
Fixed it by using cent7.0 instead of suse vm´s. Seems like there is something broken with the suse vms.
... View more
02-14-2017
09:35 AM
I actually destroyed my vm´s via vagrant and try to install it from scratch again. But I configured vagrant to use 4GB of Ram. I haven´t configured any kind of CPU size since it wasn´t in the vagrantfile.
... View more
02-13-2017
02:25 PM
I got no firewall on this vm. I tried to configure the ambari-agent.ini as follows: [server]
hostname=suse1101.ambari.apache.org
url_port=8440
secured_url_port=8441
ambari-agent.log INFO 2017-02-13 15:18:51,558 main.py:74 - loglevel=logging.INFO
INFO 2017-02-13 15:18:51,558 main.py:74 - loglevel=logging.INFO
INFO 2017-02-13 15:18:51,559 DataCleaner.py:39 - Data cleanup thread started
INFO 2017-02-13 15:18:51,561 DataCleaner.py:120 - Data cleanup started
INFO 2017-02-13 15:18:51,561 DataCleaner.py:122 - Data cleanup finished
INFO 2017-02-13 15:18:51,579 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2017-02-13 15:18:51,580 main.py:289 - Connecting to Ambari server at https://suse1101.ambari.apache.org:8440 (192.168.11.101)
INFO 2017-02-13 15:18:51,581 NetUtil.py:60 - Connecting to https://suse1101.ambari.apache.org:8440
INFO 2017-02-13 15:18:51,691 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2017-02-13 15:18:51,691 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2017-02-13 15:18:51,691 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0xe15950>; currently running: False
INFO 2017-02-13 15:18:53,697 hostname.py:89 - Read public hostname 'suse1101.ambari.apache.org' using socket.getfqdn()
INFO 2017-02-13 15:18:53,794 ExitHelper.py:53 - Performing cleanup before exiting...
INFO 2017-02-13 15:18:53,795 ExitHelper.py:67 - Cleanup finished, exiting with code:0
ambari-agent.out is still the same as mentioned above. Note that suse1101.ambari.apache.org runs the ambari-server and an agent.
... View more
02-13-2017
02:05 PM
Hi, I tried to install a ambari server on a local machine (suse linux enterprise server) with severall VM´s. I just followed the QuickStart. So far I created 2 VM`S (suse1101 and suse1102). I also installed the ambari-server on suse1101. But if I try to add the agent to suse1101 and suse1102 via the installation wizard it fails. There is no errormessage. ==========================
Creating target directory...
==========================
Command start time 2017-02-13 14:41:25
Connection to suse1101.ambari.apache.org closed.
SSH command execution finished
host=suse1101.ambari.apache.org, exitcode=0
Command end time 2017-02-13 14:41:26
==========================
Copying common functions script...
==========================
Command start time 2017-02-13 14:41:26
scp /usr/lib/python2.6/site-packages/ambari_commons
host=suse1101.ambari.apache.org, exitcode=0
Command end time 2017-02-13 14:41:26
==========================
Copying OS type check script...
==========================
Command start time 2017-02-13 14:41:26
scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py
host=suse1101.ambari.apache.org, exitcode=0
Command end time 2017-02-13 14:41:26
==========================
Running OS type check...
==========================
Command start time 2017-02-13 14:41:26
Cluster primary/cluster OS family is suse11 and local/current OS family is suse11
Connection to suse1101.ambari.apache.org closed.
SSH command execution finished
host=suse1101.ambari.apache.org, exitcode=0
Command end time 2017-02-13 14:41:26
==========================
Checking 'sudo' package on remote host...
==========================
Command start time 2017-02-13 14:41:26
sudo-1.6.9p17-21.3.1
Connection to suse1101.ambari.apache.org closed.
SSH command execution finished
host=suse1101.ambari.apache.org, exitcode=0
Command end time 2017-02-13 14:41:26
==========================
Copying repo file to 'tmp' folder...
==========================
Command start time 2017-02-13 14:41:26
scp /etc/zypp/repos.d/ambari.repo
host=suse1101.ambari.apache.org, exitcode=0
Command end time 2017-02-13 14:41:26
==========================
Moving file to repo dir...
==========================
Command start time 2017-02-13 14:41:26
Connection to suse1101.ambari.apache.org closed.
SSH command execution finished
host=suse1101.ambari.apache.org, exitcode=0
Command end time 2017-02-13 14:41:26
==========================
Changing permissions for ambari.repo...
==========================
Command start time 2017-02-13 14:41:26
Connection to suse1101.ambari.apache.org closed.
SSH command execution finished
host=suse1101.ambari.apache.org, exitcode=0
Command end time 2017-02-13 14:41:26
==========================
Copying setup script file...
==========================
Command start time 2017-02-13 14:41:26
scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py
host=suse1101.ambari.apache.org, exitcode=0
Command end time 2017-02-13 14:41:26
==========================
Running setup agent script...
==========================
Command start time 2017-02-13 14:41:26
Important to say is that´ve configured a proxy in /etc/sysconfig/proxy and added the ip to the hosts. I also added the IP adresses of the VM`s etc. Since I don´t know why this happen I tried to install the agent on suse1101 (this is also the VM where the ambari-server is running) manually. I configured /etc/ambari-agent/conf/ambari-agent.ini like this: [server]
hostname=127.0.0.1
url_port=8440
secured_url_port=8441
If I start the agent I can access several logs: /var/log/ambari-agent/ambari-agent.log INFO 2017-02-13 14:33:54,507 main.py:74 - loglevel=logging.INFO
INFO 2017-02-13 14:33:54,507 main.py:74 - loglevel=logging.INFO
INFO 2017-02-13 14:33:54,508 DataCleaner.py:39 - Data cleanup thread started
INFO 2017-02-13 14:33:54,509 DataCleaner.py:120 - Data cleanup started
INFO 2017-02-13 14:33:54,509 DataCleaner.py:122 - Data cleanup finished
INFO 2017-02-13 14:33:54,529 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2017-02-13 14:33:54,531 main.py:289 - Connecting to Ambari server at https://localhost:8440 (127.0.0.1)
INFO 2017-02-13 14:33:54,531 NetUtil.py:60 - Connecting to https://localhost:8440/ca
INFO 2017-02-13 14:33:54,806 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2017-02-13 14:33:54,806 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2017-02-13 14:33:54,806 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0xe18850>; currently running: False
INFO 2017-02-13 14:33:56,826 hostname.py:89 - Read public hostname 'suse1101.ambari.apache.org' using socket.getfqdn()
INFO 2017-02-13 14:33:56,916 ExitHelper.py:53 - Performing cleanup before exiting...
INFO 2017-02-13 14:33:56,916 ExitHelper.py:67 - Cleanup finished, exiting with code:0
INFO 2017-02-13 14:47:29,614 main.py:74 - loglevel=logging.INFO
INFO 2017-02-13 14:47:29,614 main.py:74 - loglevel=logging.INFO
INFO 2017-02-13 14:47:29,615 DataCleaner.py:39 - Data cleanup thread started
INFO 2017-02-13 14:47:29,616 DataCleaner.py:120 - Data cleanup started
INFO 2017-02-13 14:47:29,616 DataCleaner.py:122 - Data cleanup finished
INFO 2017-02-13 14:47:29,638 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2017-02-13 14:47:29,640 main.py:289 - Connecting to Ambari server at https://127.0.0.1:8440 (127.0.0.1)
INFO 2017-02-13 14:47:29,640 NetUtil.py:60 - Connecting to https://127.0.0.1:8440/ca
INFO 2017-02-13 14:47:29,727 threadpool.py:52 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2017-02-13 14:47:29,727 AlertSchedulerHandler.py:246 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs.
INFO 2017-02-13 14:47:29,727 AlertSchedulerHandler.py:142 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0xe17890>; currently running: False
INFO 2017-02-13 14:47:31,731 hostname.py:89 - Read public hostname 'suse1101.ambari.apache.org' using socket.getfqdn()
INFO 2017-02-13 14:47:31,831 ExitHelper.py:53 - Performing cleanup before exiting...
INFO 2017-02-13 14:47:31,831 ExitHelper.py:67 - Cleanup finished, exiting with code:0 /var/log/ambari-agent/ambari-agent.out Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib64/python2.6/threading.py", line 522, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 374, in run
self.register = Register(self.config)
File "/usr/lib/python2.6/site-packages/ambari_agent/Register.py", line 34, in __init__
self.hardware = Hardware()
File "/usr/lib/python2.6/site-packages/ambari_agent/Hardware.py", line 43, in __init__
self.hardware['mounts'] = Hardware.osdisks()
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/ambari_agent/Hardware.py", line 91, in osdisks
df = subprocess.Popen(command, stdout=subprocess.PIPE)
File "/usr/lib64/python2.6/subprocess.py", line 595, in __init__
errread, errwrite)
File "/usr/lib64/python2.6/subprocess.py", line 1106, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I don´t know how to fix this, since I´m quite new to linux and ambari..
... View more
Labels:
- Labels:
-
Apache Ambari