Member since
02-04-2016
189
Posts
70
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3656 | 07-12-2018 01:58 PM | |
7671 | 03-08-2018 10:44 AM | |
3618 | 06-24-2017 11:18 AM | |
23041 | 02-10-2017 04:54 PM | |
2218 | 01-19-2017 01:41 PM |
05-18-2016
05:09 PM
For oozie, I changed the URL to point to my server, but it gives me back an empty array: {
"href" : "<my server>:8080/api/v1/clusters/c1/configurations/service_config_versions?service_name=OOZIE&is_current=true",
"items" : [ ]
}
Any ideas?
We don't use Oozie for anything, as far as I know
... View more
05-18-2016
02:54 PM
We just encountered this upgrading from 2.1.1 to 2.2.2 We removed the settings referenced and that got us around the issue, but are those settings important?
... View more
05-11-2016
06:32 PM
Thanks, @mbalakrishnan To further clarify, I found my Ambari properties file at /etc/ambari-server/conf/ambari.properties
... View more
05-11-2016
06:21 PM
I'm looking for help in determining which database (PostgreSQL, Oracle, MySQL, etc) is being used by Ambari, Hive Metastore, and Oozie on an existing cluster. Can someone please point me in the right direction?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
-
Apache Oozie
04-04-2016
04:12 PM
1 Kudo
I'm trying to tune our cluster to optimize performance. Currently, we still have default values for hive.exec.reducers.bytes.per.reducer and hive.exec.reducers.max. According to the documentation, in Hive 0.13, hive.exec.reducers.bytes.per.reducer should default to 256mb, but Ambari (our HDP stack is 2.2.8) appears to be defaulting this to 64mb. But on Hive 0.14, the default is the all the way up to 1GB. And then for hive.exec.reducers.max, the HDP default is 1,009. I'm trying to understand how best to set these values. It seems like there is a relationship between these values, the cluster specs, and also the YARN settings, and I'm trying to understand the relationship. For hive.exec.reducers.max, I would think it should be a multiple of: number data nodes x number of CPUs per node. So for a cluster with 10 data nodes and 16 CPUs per nodes, it would probably be a multiple of 160. Right? Maybe 320 or 480? hive.exec.reducers.bytes.per.reducer is a bit more mysterious. The default went up by a factor of 20 between 0.13 and 0.14. Why? And then how does this all relate to YARN container sizes? Any thoughts?
... View more
Labels:
03-29-2016
03:08 PM
1 Kudo
Cool! I'll check it out. And to answer your question: we are on 2.2.8
... View more
03-29-2016
01:30 PM
1 Kudo
Thanks @Sourygna Luangsay We do use Tez for many things. I honestly haven't found it to be "much faster than MR", though it is usually a bit faster. But I like MR because it integrates very well with the Application Manager GUI. I can find all my logs very easily through the GUI, and even share links with my team when there is a stack track or something in the logging that needs attention. It also makes it very easy to diagnose when one node on our cluster is a bottleneck. When a query runs slowly, I can watch the mappers and reducers, and can easily see which servers are taking the longest. I don't know of a good way to do any of those things with Tez. We use the Tez View, but it is buggy. And when it works, it takes many more clicks to find answers. That's just my experience. Maybe there's a better way to leverage Tez...
... View more
03-29-2016
01:13 PM
3 Kudos
We have several queries that fail on MR but succeed on Tez. When they fail, the logs are full of errors like the ones below. They usually point to specific rows. However, if I reduce the scope of the query, but include the "bad" rows, the queries usually succeed without errors. So it clearly isn't specific to those rows. I'm guessing there is some kind of overflow happening internally. I have submitted several instances of this in support tickets, and the feedback is always "please upgrade or just use Tez", but that really isn't a solution, and we just upgraded recently. I'm looking for guidance on ways that we might tune our Hive or MR settings to work around this. Thanks. 2016-03-29 08:30:03,751 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {<row data>}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:159)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1450)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1346)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
at org.apache.hadoop.io.BytesWritable.write(BytesWritable.java:186)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1146)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:607)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:531)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:380)
... 15 more
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
03-23-2016
08:14 PM
Thanks Benjamin
... View more
03-23-2016
04:34 PM
2 Kudos
If I use a pig script like the one described below, I am able to leverage mapreduce to compress a ton of data and I get a pretty good ratio. However, when I try to de-compress the data, I lose the individual files. For example, if my original, uncompressed folder has a.dat through z.dat, the compressed folder will have something like part-m-00001.bz2, part-m-00002.bz2, etc That's fine. But then, when I try to do the same thing in reverse, to get back my original content, I just get larger files that look like part-m-00001, part-m-00002, etc. Is there a way to leverage our cluster to compress HDFS files in such a way that I can get back the original files - including the file name? Thanks! set
output.compression.enabled true; set
output.compression.codec org.apache.hadoop.io.compress.BZip2Codec; InputFiles = LOAD
'/my/hdfs/path/' using PigStorage(); STORE InputFiles INTO
'/my/hdfs/path_compressed/' USING PigStorage();
... View more
Labels:
- Labels:
-
Apache Pig