Created on 02-17-2016 04:38 AM - edited 09-16-2022 03:04 AM
Namespace1:
<property>
<name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node1.example.com:8485;node2.example.com:8485;node3.example.com:8485/mycluster1</value> </property>
Namespace2:
<property>
<name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node1.example.com:8485;node2.example.com:8485;node3.example.com:8485/mycluster2</value> </property>
Please advise.
Created 02-17-2016 06:59 AM
It is definitely possible to do that, however I would not recommend it, especially in a production environment. These JN processes are just lightweight daemons, so you can place them on the same nodes with other master services. Using one Quorum for multiple clusters increases the risk and chance of affecting the health/stability of all the attached clusters. For example if Cluster A brings down your JN Quorum (for whatever reason), the Namenodes of Cluster B cant synchronize their state and will shutdown eventually because the Quorum is not available =>
2016-02-16 22:55:55,550 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [XXXXX:8485, XXXXXX:8485, xXXXX:8485], stream=QuorumOutputStream starting at txid 51260)) java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
Created 02-17-2016 06:59 AM
It is definitely possible to do that, however I would not recommend it, especially in a production environment. These JN processes are just lightweight daemons, so you can place them on the same nodes with other master services. Using one Quorum for multiple clusters increases the risk and chance of affecting the health/stability of all the attached clusters. For example if Cluster A brings down your JN Quorum (for whatever reason), the Namenodes of Cluster B cant synchronize their state and will shutdown eventually because the Quorum is not available =>
2016-02-16 22:55:55,550 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [XXXXX:8485, XXXXXX:8485, xXXXX:8485], stream=QuorumOutputStream starting at txid 51260)) java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
Created 02-17-2016 06:42 PM
+1 Another consideration is upgrades. Sharing the same set of JournalNodes across multiple clusters would complicate upgrade plans, because an upgrade of software on those JournalNodes potentially impacts every cluster served by those JournalNodes.