About ravi1

ravi1 · ‎11-02-2015

Going through the code, I found the way to set GzipCodec compression level. Right now, GzipCodec supports BEST_COMPRESSION, BEST_SPEED, NO_COMPRESSION and DEFAULT. Gzip itself supports 1-9 compression levels. But GzipCodec can use only BEST_COMPRESSION(9), BEST_SPEED(1) and DEFAULT (6). You can set them by passing zlib.compress.level = BEST_SPEED or BEST_COMRPESSION. However, looking at numbers in our tests, a compression level of 4 seems to be best compression per CPU time. This is not possible right now to set level 4. P.S. HDP 2.4 onwards, you can add other compression levels like 4. https://issues.apache.org/jira/browse/HADOOP-12794 has more details.

ravi1 · ‎10-29-2015

We are just not thinking on the lines of errors and warnings from logs. Its more in the lines of things that are in logs that are not in metrics. Maybe its good to somehow put them into metrics at some point. But till that time, we would like to read things like timeouts if there are any, fetcher times etc.

ravi1 · ‎10-29-2015

Good pointer on TFile. We can read TFiles. I just loaded it in pig using org.apache.tez.tools.TFileLoader which is in tez-tools (built from source from git)

ravi1 · ‎10-28-2015

Interesting find. It is still not clear how we can send this configuration for mapreduce output. Right now, we just specify Codec.

ravi1 · ‎10-28-2015

I went into /app-logs/<username>/ to get the logs. But I don't see how these files are stored. I tried getting the file and find format using 'file' but it just says 'data'. hdfs dfs -text also just yields garbled text. We are looking to run some pig jobs of container logs to gain some insights.

ravi1 · ‎10-28-2015

Do we have a list of things that can be monitoring from hadoop logs (datanode/nodemanager/namenode/resourcemanager)? Right now, I am working on ingesting logs into Kibana and they monitor Errors, Exceptions and application statistics. Are there any other things that we can get from these logs or has someone already worked on these logs to gain some intelligence on working of the cluster?

ravi1 · ‎10-28-2015

Looking at the code, it looks like GZipCodec uses Deflater.DEFAULT_COMPRESSION. Is there a way to tweak compression levels of Gzip for mapreduce output?

ravi1 · ‎10-07-2015

I looked at avro and it has 23 files. Grouping min and max are default so its 16MB and 1GB. There were 56 blocks on 24 files and a total size of 300MB. It seemed to have 16MB blocks in grouping since the queue is empty. However, using it ran longer with smaller maps than when it ran with 50 mappers.

ravi1 · ‎10-05-2015

I am looking into a simple select count(*) query based by avro. If we use mapreduce, I see around 50 mappers spawned for this. If we use tez, I see 367 mappers being used. Overall query time increased with more mappers from 55sec to 105 secs. What factors are determining the number of mappers? What is the best way to reduce the number of mappers in this case? Could it be related to table being in avro format?

ravi1 · ‎09-29-2015

Once you setup one way trust and windows workstation is in domain, you don't need a separate kerberos client requesting tickets. You can refer to http://hortonworks.com/blog/enabling-kerberos-hdp-active-directory-integration/ (slightly older but should work)

Online	Offline
Last Visited	‎12-18-2021 05:54 PM

Member Since	‎01-09-2019 05:01 PM
Last Visited	‎12-18-2021 05:54 PM
Posts	401
Kudos received	163

Cloudera Community

Re: 2 hosts not running master services

Re: ambari restart and service restart updating kr...

Re: How to automate sqoop incremental import using...

Re: Path to core-site.xml in sandbox?

Re: Curious to know why majority of the people are...

Re: Is there a way to change compression level in ...

Re: Hadoop Log Monitoring

Re: In which format are yarn container logs stored...

Re: Is there a way to change compression level in ...

In which format are yarn container logs stored in ...

Hadoop Log Monitoring

Is there a way to change compression level in Gzip...

Re: How are number of mappers determined for a que...

How are number of mappers determined for a query w...

Re: Kerberos client for Windows workstations?