Member since
03-03-2017
4
Posts
0
Kudos Received
0
Solutions
03-27-2017
01:08 PM
My getGeo udf cannot use the ip mapping file which stored in the hdfs when I use it in the Hive on MR some times. It only works using the simple sql select getGeo(ip, 'code') from xxxx; It will error when using the sql select a, max(getGeo(ip, 'code')) from xxxx group by a; It cause the NULLPointException in for (Path p : paths). public class UDFGetGeo extends UDF {
private static String filePath = null;
static {
String dirPath = "/group/avazu/user/avazu/data/raw_log/ip_geo/";
Configuration conf = new Configuration();
Path inputPath = new Path(dirPath);
FileSystem fs = null;
FileStatus[] fss = null;
try {
fs = FileSystem.get(inputPath.toUri(), conf);
fss = fs.listStatus(inputPath);
} catch (Exception e) {
e.printStackTrace();
}
Path[] paths = FileUtil.stat2Paths(fss);
for (Path p : paths) {
try {
fs = FileSystem.get(p.toUri(), conf);
fss = fs.listStatus(p);
if(fss.length > 0) {
filePath = p.toString();
}
} catch (Exception e) {
e.printStackTrace();
}
}
... View more
- Tags:
- Hadoop Core
- HDFS
- udf
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Hadoop
03-03-2017
10:53 AM
@mquresh Thank you very much. You are right, I only set mapreduce.job.queuename, but not set the tez.queue.name, so It only use the default queue which only using 30% computational resource using Hive on Tez.
... View more
03-03-2017
03:37 AM
it can use 100% Computational resource of my cluster when using Hive on MR, but only use 50% computational resource when using Hive on Tez. I use Ambari to deploy my HDFS, YARN, Hive, Tez, Zookeeper, and use the recommended configure. I want to know how to make hive on tez to use 100% computational resource
... View more
- Tags:
- Data Processing
- Mapreduce
- tez
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Hadoop
-
Apache Tez