Support Questions
Find answers, ask questions, and share your expertise

Custom Hive Window Function

Explorer

REQUEST:

Does anyone have a complete working custom "hello world" windowing UDAF that compiles and runs on HDP 2.3, including the pom file etc.?

I strongly suspect the problem has to do with the pom file. My goal is to make a Hive windowing UDAF that writes to Phoenix; so if you are aware of that, I'd appreciate that too.

DETAILS:

I can create working UDFs and non-windowing UDAFs, but not a custom Windowing UDAF. In fact, I've tried compiling the source code for the standard Hive ROW_NUMBER() function, and it does not work properly. What I see is:

Custom Hive UDFs and non-windowing UDAFs execute properly via hive-cli and beeline, as both temporary and permanent functions.

A custom Hive windowing UDAF fails in beeline under all circumstances but will work in the hive cli if it's a temporary function, and the function is called more than once. It fails the first time per session. Logs show it is trying to make connections to the worker nodes in the cluster and is getting errors. The errors are:

-- hiveserver2.log

2016-07-06 19:15:12,634 INFO [HiveServer2-Background-Pool: Thread-334]: ipc.Client (Client.java:handleConnectionFailure(869)) - Retrying connect to server: <HOSTNAME>/<IP>:58695. Already tried\

10 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)

-- worker node log

2016-07-06 19:14:46,411 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-useast1b-<HOSTNAME>:50010:DataXceiver error processing unknown operation src: /127.0.0.1:33586 dst: /127.0.0.1:50010

java.io.EOFException

at java.io.DataInputStream.readShort(DataInputStream.java:315)

at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)

at java.lang.Thread.run(Thread.java:745)

The code I was using is taken from the source code for row_number() in the apache Hive code .//ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRowNumber.java.

I'm using HDP 2.3, but get the same behavior in the 2.4 sandbox.

My pom file is:

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">  <modelVersion>4.0.0</modelVersion>  <groupId>com.bosecm</groupId>  <artifactId>phoenix-thick-client-test</artifactId>  <version>0.1.0</version>  <properties><!-- For Hive UDFs, this is not used; the Hive function name is bound to the class name (in the specified Jar) --><defaultExecution.mainClass>com.bosecm.phoenixtest.HiveUDAFSelectDeviceState</defaultExecution.mainClass><!-- HDP sandbox comes with 1.7 and the Hive version 1.2.1.2.3.0.0-2557 source code's pom uses 1.7  --><jdk.version>1.7</jdk.version>    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>    <test.threadCount>4</test.threadCount><!--     rpm -qa | grep hadoop       hadoop_2_3_0_0_2557-2.7.1.2.3.0.0-2557.el6.x86_64       Hadoop version = 2.7.1.2.3.0.0     rpm -qa | grep hive       hive_2_3_0_0_2557-1.2.1.2.3.0.0-2557.el6.noarch     So,       Hortonworks/HDP version = 2.3.0.0_2557       Hadoop version = 2.7.1.2.3.0.0       Hive version = 1.2.1.2.3.0.0-2557     --><apache.hadoop.version>2.7.1.2.3.0.0-2557</apache.hadoop.version> <!-- ls /usr/hdp/*/hadoop/lib/*hadoop-common* --><apache.hive.version>1.2.1.2.3.0.0-2557</apache.hive.version> <!-- ls /usr/hdp/current/hive-client/lib/*hive-exec* --></properties>  <repositories><!-- Hadoop/Hive -->    <!-- Examining http://repo.hortonworks.com/content/ shows the structure of what is available -->    <!-- This has hadoop-common 2.x -->    <!--    <repository>      <id>hortonworks-repositories-releases</id>      <url>http://repo.hortonworks.com/content/repositories/releases</url>    </repository>    For org.mortbay.jetty:jetty:jar:6.1.26.hwx    <repository>      <id>hortonworks-repositories-public</id>      <url>http://repo.hortonworks.com/content/repositories/public</url>    </repository>    -->    <!-- Per http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_user-guide/content/user-guide-setup-maven... --><repository>      <releases>        <enabled>true</enabled>        <updatePolicy>always</updatePolicy>        <checksumPolicy>warn</checksumPolicy>      </releases>      <snapshots>        <enabled>false</enabled>        <updatePolicy>never</updatePolicy>        <checksumPolicy>fail</checksumPolicy>      </snapshots>      <id>HDPReleases</id>      <name>HDP Releases</name>      <url>http://repo.hortonworks.com/content/repositories/releases/</url>      <layout>default</layout>    </repository><!-- Above is missing: org.mortbay.jetty:jetty:jar:6.1.26.hwx      which is in http://repo.hortonworks.com/content/repositories/jetty-hadoop/org/mortbay/jetty/jetty/ --><repository>      <releases>        <enabled>true</enabled>        <updatePolicy>always</updatePolicy>        <checksumPolicy>warn</checksumPolicy>      </releases>      <snapshots>        <enabled>false</enabled>        <updatePolicy>never</updatePolicy>        <checksumPolicy>fail</checksumPolicy>      </snapshots>      <id>HDP-Patched-Jetty</id>      <name>HDP Patched Jetty</name>      <url>http://repo.hortonworks.com/content/repositories/jetty-hadoop/</url>      <layout>default</layout>    </repository>  </repositories>  <dependencies><!-- Bogus... the client jars should be in Maven repos...    <dependency>      <groupId>org.apache.phoenix</groupId>      <artifactId>phoenix-hbase-client-jar</artifactId>      <version>4.7.0-HBase-1.1</version>      <scope>system</scope>      <systemPath>${basedir}/lib/phoenix-4.7.0-HBase-1.1-client.jar</systemPath>    </dependency>    --><dependency>      <groupId>org.apache.hadoop</groupId>      <artifactId>hadoop-common</artifactId>      <version>${apache.hadoop.version}</version>    </dependency>    <dependency>      <groupId>org.apache.hive</groupId>      <artifactId>hive-exec</artifactId>      <version>${apache.hive.version}</version>    </dependency>  </dependencies>  <build>    <plugins><!-- Add local/project jars to the build -->      <!-- This takes care of the fact that the Phoenix client is not in maven      <plugin>        <groupId>com.googlecode.addjars-maven-plugin</groupId>        <artifactId>addjars-maven-plugin</artifactId>        <version>1.0.5</version>        <executions>          <execution>            <goals>              <goal>add-jars</goal>            </goals>            <configuration>              <resources>                <resource>                  <directory>${basedir}/lib</directory>                </resource>              </resources>            </configuration>          </execution>        </executions>      </plugin>      --><plugin>        <groupId>org.codehaus.mojo</groupId>        <artifactId>exec-maven-plugin</artifactId>        <version>1.4.0</version>        <executions>          <execution>            <goals>              <goal>exec</goal>            </goals>          </execution>        </executions>        <configuration>          <executable>maven</executable>          <workingDirectory>/tmp</workingDirectory>          <arguments>          </arguments>        </configuration>        </plugin>        <plugin>          <artifactId>maven-compiler-plugin</artifactId>          <version>3.1</version>          <configuration>            <source>${jdk.version}</source>            <target>${jdk.version}</target>            <compilerArgs>              <arg>-Xlint:unchecked</arg> <!-- Show detailed warnings for things like unchecked Casts -->              <!-- <arg>-Werror</arg>   Treat compiler warnings as errors. The code has no warnings, and it doesn't Suppress any. Make sure you don't introduce any. DO NOT REMOVE THIS! --></compilerArgs>          </configuration>        </plugin>      <plugin>        <groupId>org.apache.maven.plugins</groupId>        <artifactId>maven-shade-plugin</artifactId>        <version>2.3</version>        <executions>          <execution>            <phase>package</phase>            <goals>              <goal>shade</goal>            </goals>            <configuration>              <filters>                <filter><!-- This avoids problems with signed jar dependencies (e.g. Hadoop/Hive) --><artifact>*:*</artifact>                  <excludes>                    <exclude>META-INF/*.SF</exclude>                    <exclude>META-INF/*.DSA</exclude>                    <exclude>META-INF/*.RSA</exclude>                  </excludes>                </filter>              </filters><!-- Additional configuration. --><transformers>                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">                  <manifestEntries>                    <Main-Class>${defaultExecution.mainClass}</Main-Class>                  </manifestEntries>                </transformer>              </transformers>              <artifactSet>              </artifactSet>              <outputFile>${project.build.directory}/${project.artifactId}-${project.version}-fat.jar</outputFile>            </configuration>          </execution>        </executions>      </plugin>      <plugin>        <groupId>org.apache.maven.plugins</groupId>        <artifactId>maven-surefire-plugin</artifactId>        <version>2.19</version>        <configuration><!-- testing in parallel --><parallel>methods</parallel>          <threadCount>${test.threadCount}</threadCount> <!-- http://maven.apache.org/surefire/maven-surefire-plugin/examples/junit.html --></configuration>      </plugin>    </plugins>    <resources>      <resource>        <directory>${project.basedir}/src/main/resources</directory>      </resource>    </resources>  </build></project>
1 REPLY 1

Re: Custom Hive Window Function

Try increasing your datanode heap size as a custom Hive windowing UDAF may required more data to be generated on your dataset. You may need to decrease heaps of other roles to make space, or move roles around so there isn't so much contention for memory on a single host.