Member since
09-23-2015
800
Posts
898
Kudos Received
185
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5432 | 08-12-2016 01:02 PM | |
2205 | 08-08-2016 10:00 AM | |
2613 | 08-03-2016 04:44 PM | |
5523 | 08-03-2016 02:53 PM | |
1430 | 08-01-2016 02:38 PM |
02-17-2016
07:23 PM
2 Kudos
I only understand half of the s3 problems but it might be that you need to upgrade if a custom url is what you want. https://issues.apache.org/jira/browse/HADOOP-11261 "It also enables using a custom url pointing to an S3-compatible object store."
Fix Version/s:2.7.0
... View more
02-17-2016
04:13 PM
4 Kudos
So you want to read/write files in HDFS? This is what webhdfs is for: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
... View more
02-17-2016
03:59 PM
1 Kudo
Multicluster mode in Ambari is perhaps one of the most requested features. However its a BIG implementation effort.
... View more
02-17-2016
03:30 PM
1 Kudo
Good question the yarn documentation says: "Also, there are safe-guards to ensure that users cannot view and/or modify applications from other users." However we have setup the administer settings in our cluster and I can still see all applications. So there must be some other setting to enable these "safeguards".
... View more
02-17-2016
03:23 PM
So you definitely have the possibility to restrict control over a subset of queues. ( We had problems getting it to run in a non-kerberized cluster but in a kerberized cluster they work fine. ) Let me see if I find a way to restrict seeing applications as well. yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications The ACL which controls who can submit applications to the given queue. If the given user/group has necessary ACLs on the given queue or one of the parent queues in the hierarchy they can submit applications. ACLs for this property are inherited from the parent queue if not specified. yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue The ACL which controls who can administer applications on the given queue. If the given user/group has necessary ACLs on the given queue or one of the parent queues in the hierarchy they can administer applications. ACLs for this property are inherited from the parent queue if not specified.
... View more
02-17-2016
01:33 PM
2 Kudos
https://issues.apache.org/jira/browse/HADOOP-11261 Which version of Hadoop are you using? "It also enables using a custom url pointing to an S3-compatible object store."
... View more
02-17-2016
12:59 PM
Still works on yarn, the official new one is mapreduce.job.reduces but I always used the one above and he still takes it.
... View more
02-17-2016
12:57 PM
OK just to repeat. You can access s3 through hive with simple queries? So it cannot be a connection problem right? Perhaps too many parallel connections timing out when all the mappers spin up? Do you see some tasks successfully completing and then some tasks failing after 3 retries? In this case it sounds like a timeout issue. I have seen some issues in google like this that tried to fix it by increasing connection timeouts and retries. However mostly in presto forums. However there are s3 parameters available in the hdfs-site configuration https://hadoop.apache.org/docs/r2.6.3/hadoop-project-dist/hadoop-common/core-default.xml fs.s3a.connection.timeout
... View more
02-17-2016
12:00 PM
2 Kudos
Its mapred.reduce.tasks, if you run a mapreduce program from the hadoop client you would set it like this: -Dmapred.reduce.tasks=x Pig and Hive have different ways to predict reducer numbers.
... View more
02-17-2016
11:14 AM
3 Kudos
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html The Edit log is the "transaction log" in HDFS. This means a transaction ( create a file, delete it ... ) is committed once it has been persisted to the edit log. In the good old times the edit log was local to the Namenode and merged into the FSImage by the secondary namenode. In HDFS HA the Edit log has been distributed to three Journalnodes. They still write an edit log but now on in a Quorum. ( I.e. the change needs to be persisted by a majority of the journalnodes ). But really the link explains it all very nicely.
... View more