About zack_riesland

zack_riesland · ‎05-18-2016

For oozie, I changed the URL to point to my server, but it gives me back an empty array: { "href" : "<my server>:8080/api/v1/clusters/c1/configurations/service_config_versions?service_name=OOZIE&is_current=true", "items" : [ ] } Any ideas? We don't use Oozie for anything, as far as I know

zack_riesland · ‎05-18-2016

We just encountered this upgrading from 2.1.1 to 2.2.2 We removed the settings referenced and that got us around the issue, but are those settings important?

zack_riesland · ‎05-11-2016

Thanks, @mbalakrishnan To further clarify, I found my Ambari properties file at /etc/ambari-server/conf/ambari.properties

zack_riesland · ‎05-11-2016

I'm looking for help in determining which database (PostgreSQL, Oracle, MySQL, etc) is being used by Ambari, Hive Metastore, and Oozie on an existing cluster. Can someone please point me in the right direction?

zack_riesland · ‎04-04-2016

I'm trying to tune our cluster to optimize performance. Currently, we still have default values for hive.exec.reducers.bytes.per.reducer and hive.exec.reducers.max. According to the documentation, in Hive 0.13, hive.exec.reducers.bytes.per.reducer should default to 256mb, but Ambari (our HDP stack is 2.2.8) appears to be defaulting this to 64mb. But on Hive 0.14, the default is the all the way up to 1GB. And then for hive.exec.reducers.max, the HDP default is 1,009. I'm trying to understand how best to set these values. It seems like there is a relationship between these values, the cluster specs, and also the YARN settings, and I'm trying to understand the relationship. For hive.exec.reducers.max, I would think it should be a multiple of: number data nodes x number of CPUs per node. So for a cluster with 10 data nodes and 16 CPUs per nodes, it would probably be a multiple of 160. Right? Maybe 320 or 480? hive.exec.reducers.bytes.per.reducer is a bit more mysterious. The default went up by a factor of 20 between 0.13 and 0.14. Why? And then how does this all relate to YARN container sizes? Any thoughts?

zack_riesland · ‎03-29-2016

Cool! I'll check it out. And to answer your question: we are on 2.2.8

zack_riesland · ‎03-29-2016

Thanks @Sourygna Luangsay We do use Tez for many things. I honestly haven't found it to be "much faster than MR", though it is usually a bit faster. But I like MR because it integrates very well with the Application Manager GUI. I can find all my logs very easily through the GUI, and even share links with my team when there is a stack track or something in the logging that needs attention. It also makes it very easy to diagnose when one node on our cluster is a bottleneck. When a query runs slowly, I can watch the mappers and reducers, and can easily see which servers are taking the longest. I don't know of a good way to do any of those things with Tez. We use the Tez View, but it is buggy. And when it works, it takes many more clicks to find answers. That's just my experience. Maybe there's a better way to leverage Tez...

zack_riesland · ‎03-29-2016

We have several queries that fail on MR but succeed on Tez. When they fail, the logs are full of errors like the ones below. They usually point to specific rows. However, if I reduce the scope of the query, but include the "bad" rows, the queries usually succeed without errors. So it clearly isn't specific to those rows. I'm guessing there is some kind of overflow happening internally. I have submitted several instances of this in support tickets, and the feedback is always "please upgrade or just use Tez", but that really isn't a solution, and we just upgraded recently. I'm looking for guidance on ways that we might tune our Hive or MR settings to work around this. Thanks. 2016-03-29 08:30:03,751 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {<row data>} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:159) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 9 more Caused by: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1450) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1346) at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) at org.apache.hadoop.io.BytesWritable.write(BytesWritable.java:186) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1146) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:607) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:531) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:380) ... 15 more

zack_riesland · ‎03-23-2016

Thanks Benjamin

zack_riesland · ‎03-23-2016

If I use a pig script like the one described below, I am able to leverage mapreduce to compress a ton of data and I get a pretty good ratio. However, when I try to de-compress the data, I lose the individual files. For example, if my original, uncompressed folder has a.dat through z.dat, the compressed folder will have something like part-m-00001.bz2, part-m-00002.bz2, etc That's fine. But then, when I try to do the same thing in reverse, to get back my original content, I just get larger files that look like part-m-00001, part-m-00002, etc. Is there a way to leverage our cluster to compress HDFS files in such a way that I can get back the original files - including the file name? Thanks! set output.compression.enabled true; set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec; InputFiles = LOAD '/my/hdfs/path/' using PigStorage(); STORE InputFiles INTO '/my/hdfs/path_compressed/' USING PigStorage();

Online	Offline
Last Visited	‎06-10-2019 05:13 PM

Member Since	‎02-04-2016 01:07 PM
Last Visited	‎06-10-2019 05:13 PM
Posts	189
Kudos received	70

Cloudera Community

Re: Help with spark partition syntax (scala)

Re: Can I control naming patterns for HDFS chunks

Re: How to connect to Spark2 Thrift Server via JDB...

Re: Hive: Convert int timestamp to date

Re: How to clear temp data from dataflow / nifi?

Re: How to find the database used by Ambari, Hive ...

Re: Hbase region servers crash with Out of Memory

Re: How to find the database used by Ambari, Hive ...

How to find the database used by Ambari, Hive Meta...

Guidance for setting hive.exec.reducers.bytes.per....

Re: Solution for "Hive Runtime Error while process...

Re: Solution for "Hive Runtime Error while process...

Solution for "Hive Runtime Error while processing ...

Re: Options for decompressing HDFS data using Pig

Options for decompressing HDFS data using Pig