About vergari

vergari · ‎04-21-2016

If you are using VirtualBox, you can access ambari simply opening the browser on you host and pointing to localhost:8080. If it doesn't work, you have to set port forwarding from Machine -> Settings -> Network -> Port Forwarding

vergari · ‎03-15-2016

Hi Tamás, I see you are using openjdk 1.7. Try using openjdk 1.8 instead. Davide

vergari · ‎03-10-2016

Hi Ryan, could you check if you have the right permissions on the local directory? [hawqadmin@hdpmaster01 ~]$ ls -ld /data01/hawq/masterdd/ drwx------ 16 hawqadmin hadoop 4096 Mar 1 09:19 /data01/hawq/masterdd/ [hawqadmin@hdpmaster01 ~]$ ls -l /data01/hawq/masterdd/ total 40 drwx------ 5 hawqadmin hawqadmin 38 Feb 29 15:38 base drwx------ 2 hawqadmin hawqadmin 4096 Mar 1 09:19 global drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_changetracking drwx------ 2 hawqadmin hawqadmin 17 Feb 29 15:38 pg_clog drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_distributedlog drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_distributedxidmap -rw-rw-r-- 1 hawqadmin hawqadmin 4021 Feb 29 15:38 pg_hba.conf -rw------- 1 hawqadmin hawqadmin 1636 Feb 29 15:38 pg_ident.conf drwx------ 2 hawqadmin hawqadmin 156 Mar 1 00:00 pg_log drwx------ 4 hawqadmin hawqadmin 34 Feb 29 15:38 pg_multixact drwx------ 2 hawqadmin hawqadmin 6 Mar 1 09:19 pg_stat_tmp drwx------ 2 hawqadmin hawqadmin 17 Feb 29 15:38 pg_subtrans drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_tblspc drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_twophase drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_utilitymodedtmredo -rw------- 1 hawqadmin hawqadmin 4 Feb 29 15:38 PG_VERSION drwx------ 3 hawqadmin hawqadmin 58 Feb 29 15:38 pg_xlog -rw------- 1 hawqadmin hawqadmin 18393 Feb 29 15:38 postgresql.conf -rw------- 1 hawqadmin hawqadmin 104 Feb 29 15:40 postmaster.opts [hawqadmin@hdpmaster01 ~]$ Also, what are the permissions on the directory on hdfs? [hawqadmin@hdpmaster01 ~]$ hdfs dfs -ls -d /hawq_default drwxr-xr-x - hawqadmin hdfs 0 2016-02-29 15:38 /hawq_default [hawqadmin@hdpmaster01 ~]$ hdfs dfs -ls -R /hawq_default drwx------ - hawqadmin hdfs 0 2016-02-29 15:47 /hawq_default/16385 drwx------ - hawqadmin hdfs 0 2016-03-01 08:54 /hawq_default/16385/16387 drwx------ - hawqadmin hdfs 0 2016-03-01 08:55 /hawq_default/16385/16387/16513 -rw------- 3 hawqadmin hdfs 48 2016-03-01 08:55 /hawq_default/16385/16387/16513/1 -rw------- 3 hawqadmin hdfs 4 2016-02-29 15:47 /hawq_default/16385/16387/PG_VERSION [hawqadmin@hdpmaster01 ~]$

vergari · ‎03-10-2016

We used the guides you posted and now all works right! Thank you!

vergari · ‎03-09-2016

Hi all, we are developing a storm topology to write streaming data in hive database but the following errors occurs during executions: 1) Using Hive library version 1.2.1 (http://search.maven.org/#artifactdetails|org.apache.hive|hive|1.2.1|pom) and configuration as in the attached pom1.xml file, the error is: 43088 [Thread-12-hiveBolt] ERROR b.s.d.executor - java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NoSuchFieldError: METASTORE_FILTER_HOOK at org.apache.storm.hive.common.HiveWriter.callWithTimeout(HiveWriter.java:357) ~[StormTopology-0.1.jar:?] at org.apache.storm.hive.common.HiveWriter.newConnection(HiveWriter.java:226) ~[StormTopology-0.1.jar:?] at org.apache.storm.hive.common.HiveWriter.<init>(HiveWriter.java:69) ~[StormTopology-0.1.jar:?] at org.apache.storm.hive.common.HiveUtils.makeHiveWriter(HiveUtils.java:45) ~[StormTopology-0.1.jar:?] at org.apache.storm.hive.bolt.HiveBolt.getOrCreateWriter(HiveBolt.java:219) ~[StormTopology-0.1.jar:?] at org.apache.storm.hive.bolt.HiveBolt.execute(HiveBolt.java:102) [StormTopology-0.1.jar:?] at backtype.storm.daemon.executor$fn__5694$tuple_action_fn__5696.invoke(executor.clj:690) [StormTopology-0.1.jar:?] at backtype.storm.daemon.executor$mk_task_receiver$fn__5615.invoke(executor.clj:436) [StormTopology-0.1.jar:?] at backtype.storm.disruptor$clojure_handler$reify__5189.onEvent(disruptor.clj:58) [StormTopology-0.1.jar:?] at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:132) [StormTopology-0.1.jar:?] at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:106) [StormTopology-0.1.jar:?] at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) [StormTopology-0.1.jar:?] at backtype.storm.daemon.executor$fn__5694$fn__5707$fn__5758.invoke(executor.clj:819) [StormTopology-0.1.jar:?] at backtype.storm.util$async_loop$fn__545.invoke(util.clj:479) [StormTopology-0.1.jar:?] at clojure.lang.AFn.run(AFn.java:22) [StormTopology-0.1.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_71] Caused by: java.util.concurrent.ExecutionException: java.lang.NoSuchFieldError: METASTORE_FILTER_HOOK at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_71] at java.util.concurrent.FutureTask.get(FutureTask.java:206) ~[?:1.8.0_71] at org.apache.storm.hive.common.HiveWriter.callWithTimeout(HiveWriter.java:337) ~[StormTopology-0.1.jar:?] ... 15 more Caused by: java.lang.NoSuchFieldError: METASTORE_FILTER_HOOK at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.loadFilterHooks(HiveMetaStoreClient.java:240) ~[StormTopology-0.1.jar:?] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:192) ~[StormTopology-0.1.jar:?] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:181) ~[StormTopology-0.1.jar:?] at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.getMetaStoreClient(HiveEndPoint.java:448) ~[StormTopology-0.1.jar:?] at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:274) ~[StormTopology-0.1.jar:?] at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:243) ~[StormTopology-0.1.jar:?] at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:180) ~[StormTopology-0.1.jar:?] at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:157) ~[StormTopology-0.1.jar:?] at org.apache.storm.hive.common.HiveWriter$5.call(HiveWriter.java:229) ~[StormTopology-0.1.jar:?] at org.apache.storm.hive.common.HiveWriter$5.call(HiveWriter.java:226) ~[StormTopology-0.1.jar:?] at org.apache.storm.hive.common.HiveWriter$9.call(HiveWriter.java:332) ~[StormTopology-0.1.jar:?] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_71] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_71] ... 1 more 2) Using Hive library version 2.0.0 (http://search.maven.org/#artifactdetails|org.apache.hive|hive|2.0.0|pom) and configuration as in the attached pom2.xml file, the error returned is: 32028 [Thread-12-hiveBolt] ERROR b.s.d.executor - java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf at org.apache.storm.hive.common.HiveWriter.callWithTimeout(HiveWriter.java:357) ~[storm-hive-0.10.0.jar:0.10.0] at org.apache.storm.hive.common.HiveWriter.newConnection(HiveWriter.java:226) ~[storm-hive-0.10.0.jar:0.10.0] at org.apache.storm.hive.common.HiveWriter.<init>(HiveWriter.java:69) ~[storm-hive-0.10.0.jar:0.10.0] at org.apache.storm.hive.common.HiveUtils.makeHiveWriter(HiveUtils.java:45) ~[storm-hive-0.10.0.jar:0.10.0] at org.apache.storm.hive.bolt.HiveBolt.getOrCreateWriter(HiveBolt.java:219) ~[storm-hive-0.10.0.jar:0.10.0] at org.apache.storm.hive.bolt.HiveBolt.execute(HiveBolt.java:102) [storm-hive-0.10.0.jar:0.10.0] at backtype.storm.daemon.executor$fn__5694$tuple_action_fn__5696.invoke(executor.clj:690) [storm-core-0.10.0.jar:0.10.0] at backtype.storm.daemon.executor$mk_task_receiver$fn__5615.invoke(executor.clj:436) [storm-core-0.10.0.jar:0.10.0] at backtype.storm.disruptor$clojure_handler$reify__5189.onEvent(disruptor.clj:58) [storm-core-0.10.0.jar:0.10.0] at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:132) [storm-core-0.10.0.jar:0.10.0] at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:106) [storm-core-0.10.0.jar:0.10.0] at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) [storm-core-0.10.0.jar:0.10.0] at backtype.storm.daemon.executor$fn__5694$fn__5707$fn__5758.invoke(executor.clj:819) [storm-core-0.10.0.jar:0.10.0] at backtype.storm.util$async_loop$fn__545.invoke(util.clj:479) [storm-core-0.10.0.jar:0.10.0] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_31] Caused by: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_31] at java.util.concurrent.FutureTask.get(FutureTask.java:206) ~[?:1.8.0_31] at org.apache.storm.hive.common.HiveWriter.callWithTimeout(HiveWriter.java:337) ~[storm-hive-0.10.0.jar:0.10.0] ... 15 more Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf at org.apache.hive.hcatalog.streaming.HiveEndPoint.createHiveConf(HiveEndPoint.java:842) ~[hive-hcatalog-streaming-0.14.0.jar:0.14.0] at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:268) ~[hive-hcatalog-streaming-0.14.0.jar:0.14.0] at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:243) ~[hive-hcatalog-streaming-0.14.0.jar:0.14.0] at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:180) ~[hive-hcatalog-streaming-0.14.0.jar:0.14.0] at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:157) ~[hive-hcatalog-streaming-0.14.0.jar:0.14.0] at org.apache.storm.hive.common.HiveWriter$5.call(HiveWriter.java:229) ~[storm-hive-0.10.0.jar:0.10.0] at org.apache.storm.hive.common.HiveWriter$5.call(HiveWriter.java:226) ~[storm-hive-0.10.0.jar:0.10.0] at org.apache.storm.hive.common.HiveWriter$9.call(HiveWriter.java:332) ~[storm-hive-0.10.0.jar:0.10.0] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_31] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_31] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_31] ... 1 more 32029 [Thread-14-__acker] INFO b.s.d.executor - BOLT ack TASK: 1 TIME: TUPLE: source: parserBolt:5, stream: __ack_ack, id: {}, [820336490148731685 6454746331808199053] 32029 [Thread-14-__acker] INFO b.s.d.executor - Execute done TUPLE source Also, we included external configuration files in project (hive-site.xml and hive-env.sh) as indicated in hortonworks guidelines. This is the hive’s bolt code: private void createHiveBolt(TopologyBuilder builder) { try { // Record Writer configuration DelimitedRecordHiveMapper mapper = new DelimitedRecordHiveMapper() .withColumnFields(DataScheme.GetHiveFields()); HiveOptions hiveOptions; hiveOptions = new HiveOptions(topologyConf.HiveMetastore, topologyConf.HiveDbName, topologyConf.HiveTableName, mapper) .withTxnsPerBatch(2) .withBatchSize(100) .withIdleTimeout(10); builder.setBolt(HIVE_BOLT_ID, new HiveBolt(hiveOptions), topologyConf.ParallelHint).shuffleGrouping(PARSER_BOLT_ID); } catch(Exception ex) { logger.error(ex.getMessage()); } } How we can solve this issues? Thank you

vergari · ‎03-01-2016

In this article, we will install Apache Hawq 2.0.0.0_beta in a cluster composed by: 2 masters (1 active, 1 standby) 3 segments (slaves) On each node on the cluster install repository to fetch libhdfs3: [root@hdpmaster01~]# curl -s -L "https://bintray.com/wangzw/rpm/rpm" -o /etc/yum.repos.d/bintray-wangzw-rpm.repo [root@hdpmaster01~]# Install epel repository: [root@hdpmaster01~]# yum -y install epel-release Install missing dependencies: [root@hdpamaster01~]# yum -y install man passwd sudo tar which git mlocate links make bzip2 net-tools autoconf automake libtool m4 gcc gcc-c++ gdb bison flex cmake gperf maven indent libuuid-devel krb5-devel libgsasl-devel expat-devel libxml2-devel perl-ExtUtils-Embed pam-devel python-devel libcurl-devel snappy-devel thrift-devel libyaml-devel libevent-devel bzip2-devel openssl-devel openldap-devel protobuf-devel readline-devel net-snmp-devel apr-devel libesmtp-devel xerces-c-devel python-pip json-c-devel libhdfs3-devel apache-ivy java-1.7.0-openjdk-devel openssh-clients openssh-server Install postgresql-devel to compile python dependencies [root@hdpmaster01~]# yum install postgresql-devel Now, install python dependencies with pip: pip install pg8000 simplejson unittest2 pycrypto pygresql pyyaml lockfile paramiko psi You can now remove postgresql-*, be sure to not erase existing psql instances Download the source code from github: [root@hdpmaster01~]# cd /root [root@hdpmaster01~]# git clone https://github.com/apache/incubator-hawq.git Cloning into 'incubator-hawq'... remote: Counting objects: 34883, done. remote: Total 34883 (delta 0), reused 0 (delta 0), pack-reused 34883 Receiving objects: 100% (34883/34883), 144.95 MiB | 30.04 MiB/s, done. Resolving deltas: 100% (21155/21155), done. [root@hdpmaster01~]# Before compile hawq, you need to compile and install libyarn, c/c++ interface to yarn, that is shipped with the hawq source code [root@hdpmaster01~]# cd /root/incubator-hawq/depends/libyarn/ && mkdir build/ && cd build [root@hdpmaster01 build]# pwd /root/incubator-hawq/depends/libyarn/build [root@hdpmaster01 build]# [root@hdpmaster01 build]# ../bootstrap [...] bootstrap success. Run "make" to build. [root@hdpmaster01 build]# make -j && make install [...] -- Installing: /root/incubator-hawq/depends/libyarn/dist/include/libyarn/records/YARN_containermanagement_protocol.pb.h -- Installing: /root/incubator-hawq/depends/libyarn/dist/include/libyarn/records/YARNSecurity.pb.h -- Installing: /root/incubator-hawq/depends/libyarn/dist/include/libyarn/libyarncommon/Token.h [root@hdpmaster01 build]# Copy the include dir and the lib dir in the correct file system path, and make the library visible to the operating system with ldconfig: [root@hdpmaster01 build]# cp -R /root/incubator-hawq/depends/libyarn/dist/include/libyarn/ /usr/include/ [root@hdpmaster01 build]# cp /root/incubator-hawq/depends/libyarn/dist/lib/libyarn.so.0.1.13 /usr/lib64/ [root@hdpmaster01 build]# [root@hdpmaster01 build]# ln -s /usr/lib64/libyarn.so.0.1.13 /usr/lib64/libyarn.so.1 [root@hdpmaster01 build]# ln -s /usr/lib64/libyarn.so.1 /usr/lib64/libyarn.so [root@hdpmaster01 build]# ldconfig && ldconfig -p | grep libyarn libyarn.so.1 (libc6,x86-64) => /lib64/libyarn.so.1 libyarn.so (libc6,x86-64) => /lib64/libyarn.so [root@hdpmaster01 build] Now, we can compile and install apache hawq. I use /opt/ as installation directory: [root@hdpmaster01 build]# cd /root/incubator-hawq [root@hdpmaster01 incubator-hawq]# ./configure –prefix=/opt/hawq [...] [root@hdpmaster01 incubator-hawq]# make -j8 && make install [...] make[2]: Leaving directory `/root/incubator-hawq/tools/gpnetbench' make[1]: Leaving directory `/root/incubator-hawq/tools' HAWQ installation complete. Create the user hawqadmin and change the ownership for hawq installation directory [root@hdpmaster01 incubator-hawq]# useradd -s /bin/bash hawqadmin [root@hdpmaster01 incubator-hawq]# passwd hawqadmin Changing password for user hawqadmin. New password: Retype new password: passwd: all authentication tokens updated successfully. [root@hdpmaster01 incubator-hawq]# chown -R hawqadmin.hawqadmin /opt/hawq/ [root@hdpmaster01 incubator-hawq]# Repeat the previous steps on all hosts in your cluster. Now, on the primary master, create the key for user hawqadmin and distribuite the public key to the other hosts (do not set a password for your private key). As hawqadmin user: [hawqadmin@hdpmaster01~]$ ssh-keygen [...] [hawqadmin@hdpmaster01~]$ for i in hdpmaster01 hdpmaster02 hdpslave01 hdpslave02 hdpslave03; do > ssh-copy-id $i >done [...] [hawqadmin@hdpmaster01~]$ Repeat the previous loop on the standby master. On the primary master host, edit /opt/hawq/etc/hdfs-client.xml and /opt/hawq/etc/yarn-client.xml as they fit your needs (eg. For namenode and resourcemanager high availability or for kerberos authentication), then edit the following properties in hawq-site.xml <property> <name>hawq_master_address_host</name> <value>hdpmaster01</value> <description>The host name of hawq master.</description> </property> <property> <name>hawq_master_address_port</name> <value>5432</value> <description>The port of hawq master.</description> </property> <property> <name>hawq_standby_address_host</name> <value>hdpmaster02</value> <description>The host name of hawq standby master.</description> </property> <property> <name>hawq_segment_address_port</name> <value>40000</value> <description>The port of hawq segment.</description> </property> <property> <name>hawq_dfs_url</name> <value>hdfsha/hawq_default</value> <description>URL for accessing HDFS.</description> </property> <property> <name>hawq_master_directory</name> <value>/data01/hawq/masterdd</value> <description>The directory of hawq master.</description> </property> <property> <name>hawq_segment_directory</name> <value>/data01/hawq/segmentdd</value> <description>The directory of hawq segment.</description> </property> <property> <name>hawq_global_rm_type</name> <value>yarn</value> </property> <property> <name>hawq_rm_yarn_address</name> <value>hdpmaster02:8032</value> </property> <property> <name>hawq_rm_yarn_scheduler_address</name> <value>hdpmaster02:8030</value> </property> <property> <name>hawq_rm_yarn_queue_name</name> <value>default</value> <description>The YARN queue name to register hawq resource manager.</description> </property> <property> <name>hawq_rm_yarn_app_name</name> <value>hawq</value> <description>The application name to register hawq resource manager in YARN.</description> </property> You can leave the others options unchanged. NOTE: if you have a postgresql instance running on the master nodes, you must change the property hawq_master_address_port Write the slaves FQDN in the /opt/hawq/etc/slaves file e.g. [hawqadmin@hdpmaster01 etc]$ echo -e "hdpslave01\nhdpslave02\nhdpslave03" > slaves [hawqadmin@hdpmaster01 etc]$ cat slaves hdpslave01 hdpslave02 hdpslave03 Copy the configuration files on all other hosts, in the /opt/hawq/etc/ directory Now, as hdfs user, create the hawqadmin home and the data dir on hdfs [hdfs@hdpmaster01~]$ hdfs dfs -mkdir /user/hawqadmin && hdfs dfs -chown hawqadmin /user/hawqadmin [hdfs@hdpmaster01~]$ hdfs dfs -mkdir /hawq_default && hdfs dfs -chown hawqadmin /hawq_default [hdfs@hdpmaster01~]$ On both masters, create the master data dir: mkdir -p /data01/hawq/masterdd && chown -R hawqadmin /data01/hawq Create the segments data dir on all slaves mkdir -p /data01/hawq/segmentdd && chown -R hawqadmin /data01/hawq Initialize the cluster as hawqadmin user. Remember to source the environment file before execute any action (/opt/hawq/greenplum_path.sh) [hawqadmin@hdpmaster01 hawq]$ cd /opt/hawq/ [hawqadmin@hdpmaster01 hawq]$ source greenplum_path.sh [hawqadmin@hdpmaster01 hawq]$ hawq init cluster -av [...] 20160229:15:42:40:158114 hawq_init:hdpmaster01:hawqadmin-[INFO]:-Init HAWQ cluster successfully [hawqadmin@hdpmaster01 hawq]$ The init statement also starts the cluster, so you can now check the cluster state with the following command hawq state cluster You can also see the running application on YARN: [hawqadmin@hdpmaster01 ~]$ yarn application -list | awk '/application_/ {printf ("%s\t%s\t%s\t%s\t%s\n", $1,$2,$3,$4,$5)}' application_1456240841318_0026 hawq YARN hawqadmin default Now, connect to the database and create a sample table: [hawqadmin@hdpmaster01 hawq]$ psql -d postgres psql (8.2.15) Type "help" for help. postgres=# \d No relations found. postgres=# create table test (field1 int, field2 varchar(30)); CREATE TABLE postgres=# \d+ test Append-Only Table "public.test" Column | Type | Modifiers | Storage | Description --------+-----------------------+-----------+----------+------------- field1 | integer | | plain | field2 | character varying(30) | | extended | Compression Type: None Compression Level: 0 Block Size: 32768 Checksum: f Has OIDs: no Options: appendonly=true Distributed randomly postgres=# insert into test (field1, field2) values (1, 'May the hawq be with you'); INSERT 0 1 postgres=# select * from test; field1 | field2 --------+--------------------------- 1 | May the hawq be with you (1 row) postgres=# That's all! 🙂

vergari · ‎02-23-2016

Hi all, I get an java.lang.IndexOutOfBoundsException while trying to execute a select distinct(...) on a big hive table (about 60 GB). This is the log of the Tez vertex: 2016-02-23 16:35:03,039 [ERROR] [TezChild] |tez.TezProcessor|: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:326) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:141) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61) ... 16 more Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkBounds(Buffer.java:567) at java.nio.ByteBuffer.get(ByteBuffer.java:686) at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:285) at org.apache.hadoop.hdfs.BlockReaderLocal.readWithBounceBuffer(BlockReaderLocal.java:609) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:569) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:737) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:793) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:853) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91) at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48) at org.apache.hadoop.hive.ql.exec.Utilities.skipHeader(Utilities.java:3911) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:337) ... 22 more I already tried to disable vectorization and to increment the tez container size, but nothing changed. If I execute the query on the same table, but with less data inside, all goes right. Do you already seen this kind of error? Thank you, D.

vergari · ‎02-16-2016

You're right, I had not though about config groups! Sorry and thank you a lot! 🙂

vergari · ‎02-16-2016

Hi all, I have an ambari-managed cluster in which there are 2 ingestion server on which runs Flume. While I need to have different flume agents, I have to define them on ambari and make them run only on one server. So, I need to start a single agent on server ingestion1 and stop it on server ingestion2. This way, ambari check the flume service as stopped and send me a notification about this. Is there a way to monitor flume in this configuration, or I can tell ambari to not define the single agent in both servers? Thank you, D.

vergari · ‎02-05-2016

After installing zeppelin all goes fine, but when you try to connect you may get a "disconnected" status on the notebook tab and you're unable to create any new note, while the interpreter tab works fine. If you checked you can reach the port on which zeppelin is listen, it may be an issue with your content filter firewall and you should see this log in your firewall application: 2016-02-04 15:47:58 Deny 192.168.0.128 40.112.76.49 http/tcp 57772 9995 1-Ecube 0-Internet ProxyDeny: HTTP Invalid Request-Line Format (TCP-UDP-isoardi OUT-00) HTTP-Client.isoardi proc_id="http-proxy" rc="594" msg_id="1AFF-0005" proxy_act="HTTP-Client.isoardi" line="\x81\x8d\xf2\x9eW\xfe\x89\xbc8\x8e\xd0\xa4u\xae\xbb\xd0\x10\xdc\x8f\x81\x8d\xf8]\xb8z\x83\x7f\xd7\x0a" Traffic The solution is to disable the content filter for the domain on which zeppelin is running

Online	Offline
Last Visited	‎03-13-2019 12:45 PM

Member Since	‎12-10-2015 08:08 AM
Last Visited	‎03-13-2019 12:45 PM
Posts	48
Kudos received	27

Cloudera Community

Re: SSH key question

Re: Install zeppelin on HDP (not Sandbox)

Re: Tutorial: Tag based policies with Apache Range...

Re: Install Apache Hawq on HDP 2.3.4

Re: Install Apache Hawq on HDP 2.3.4

Re: Error in executing Hive Bolt with Storm

Error in executing Hive Bolt with Storm

Install Apache Hawq on HDP 2.3.4

Tez IndexOutOfBoundsException in select distinct

Re: monitoring flume in a clustered environment us...

monitoring flume in a clustered environment using ...

Error accessing Apache Zeppelin Notebook tab