Member since
01-09-2019
401
Posts
163
Kudos Received
80
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1963 | 06-21-2017 03:53 PM | |
2965 | 03-14-2017 01:24 PM | |
1909 | 01-25-2017 03:36 PM | |
3050 | 12-20-2016 06:19 PM | |
1484 | 12-14-2016 05:24 PM |
11-02-2015
04:24 AM
3 Kudos
Going through the code, I found the way to set GzipCodec compression level. Right now, GzipCodec supports BEST_COMPRESSION, BEST_SPEED, NO_COMPRESSION and DEFAULT. Gzip itself supports 1-9 compression levels. But GzipCodec can use only BEST_COMPRESSION(9), BEST_SPEED(1) and DEFAULT (6). You can set them by passing zlib.compress.level = BEST_SPEED or BEST_COMRPESSION. However, looking at numbers in our tests, a compression level of 4 seems to be best compression per CPU time. This is not possible right now to set level 4. P.S. HDP 2.4 onwards, you can add other compression levels like 4. https://issues.apache.org/jira/browse/HADOOP-12794 has more details.
... View more
10-29-2015
10:09 PM
We are just not thinking on the lines of errors and warnings from logs. Its more in the lines of things that are in logs that are not in metrics. Maybe its good to somehow put them into metrics at some point. But till that time, we would like to read things like timeouts if there are any, fetcher times etc.
... View more
10-29-2015
12:16 AM
Good pointer on TFile. We can read TFiles. I just loaded it in pig using org.apache.tez.tools.TFileLoader which is in tez-tools (built from source from git)
... View more
10-28-2015
10:48 PM
Interesting find. It is still not clear how we can send this configuration for mapreduce output. Right now, we just specify Codec.
... View more
10-28-2015
10:37 PM
I went into /app-logs/<username>/ to get the logs. But I don't see how these files are stored. I tried getting the file and find format using 'file' but it just says 'data'. hdfs dfs -text also just yields garbled text. We are looking to run some pig jobs of container logs to gain some insights.
... View more
Labels:
- Labels:
-
Apache Pig
-
Apache YARN
-
HDFS
10-28-2015
07:54 PM
Do we have a list of things that can be monitoring from hadoop logs (datanode/nodemanager/namenode/resourcemanager)? Right now, I am working on ingesting logs into Kibana and they monitor Errors, Exceptions and application statistics. Are there any other things that we can get from these logs or has someone already worked on these logs to gain some intelligence on working of the cluster?
... View more
Labels:
- Labels:
-
Apache Hadoop
10-28-2015
04:39 PM
Looking at the code, it looks like GZipCodec uses Deflater.DEFAULT_COMPRESSION. Is there a way to tweak compression levels of Gzip for mapreduce output?
... View more
Labels:
- Labels:
-
MapReduce
10-07-2015
10:19 PM
I looked at avro and it has 23 files. Grouping min and max are default so its 16MB and 1GB. There were 56 blocks on 24 files and a total size of 300MB. It seemed to have 16MB blocks in grouping since the queue is empty. However, using it ran longer with smaller maps than when it ran with 50 mappers.
... View more
10-05-2015
05:13 PM
4 Kudos
I am looking into a simple select count(*) query based by avro. If we use mapreduce, I see around 50 mappers spawned for this. If we use tez, I see 367 mappers being used. Overall query time increased with more mappers from 55sec to 105 secs. What factors are determining the number of mappers? What is the best way to reduce the number of mappers in this case? Could it be related to table being in avro format?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
09-29-2015
02:59 PM
1 Kudo
Once you setup one way trust and windows workstation is in domain, you don't need a separate kerberos client requesting tickets. You can refer to http://hortonworks.com/blog/enabling-kerberos-hdp-active-directory-integration/ (slightly older but should work)
... View more
- « Previous
- Next »