Support Questions

sunile_manjee · ‎09-13-2016

I read the Atlas HA doc here

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/Ambari-Trunk/bk_Ambari_Users_Guide/content/...

Does atlas failover to secondary metastore? how many metastores are allowed?

chennuri_gouris · ‎09-13-2016

Configuring clients to use the High Availability feature

The Atlas Web Service can be accessed in two ways:

Using the Atlas Web UI: This is a browser based client that can be used to query the metadata stored in Atlas.
Using the Atlas REST API: As Atlas exposes a RESTful API, one can use any standard REST client including libraries in other applications. In fact, Atlas ships with a client called AtlasClient that can be used as an example to build REST client access.

In order to take advantage of the High Availability feature in the clients, there are two options possible.

HAProxy.

Here is an example HAProxy configuration that can be used. Note this is provided for illustration only, and not as a recommended production configuration. For that, please refer to the HAProxy documentation for appropriate instructions.

frontend atlas_fe
  bind *:41000
  default_backend atlas_be

backend atlas_be
  mode http
  option httpchk get /api/atlas/admin/status
  http-check expect string ACTIVE
  balance roundrobin
  server host1_21000 host1:21000 check
  server host2_21000 host2:21000 check backup

listen atlas
  bind localhost:42000

The above configuration binds HAProxy to listen on port 41000 for incoming client connections. It then routes the connections to either of the hosts host1 or host2 depending on a HTTP status check. The status check is done using a HTTP GET on the REST URL /api/atlas/admin/status, and is deemed successful only if the HTTP response contains the string ACTIVE.

Using automatic detection of active instance

If one does not want to setup and manage a separate proxy, then the other option to use the High Availability feature is to build a client application that is capable of detecting status and retrying operations. In such a setting, the client application can be launched with the URLs of all Atlas Web Service instances that form the ensemble. The client should then call the REST URL /api/atlas/admin/status on each of these to determine which is the active instance. The response from the Active instance would be of the form {Status:ACTIVE}. Also, when the client faces any exceptions in the course of an operation, it should again determine which of the remaining URLs is active and retry the operation.

The AtlasClient class that ships with Atlas can be used as an example client library that implements the logic for working with an ensemble and selecting the right Active server instance.

Utilities in Atlas, like quick_start.py and import-hive.sh can be configured to run with multiple server URLs. When launched in this mode, the AtlasClient automatically selects and works with the current active instance. If a proxy is set up in between, then its address can be used when running quick_start.py or import-hive.sh.

View solution in original post

chennuri_gouris · ‎09-13-2016

Configuring clients to use the High Availability feature

The Atlas Web Service can be accessed in two ways:

Using the Atlas Web UI: This is a browser based client that can be used to query the metadata stored in Atlas.
Using the Atlas REST API: As Atlas exposes a RESTful API, one can use any standard REST client including libraries in other applications. In fact, Atlas ships with a client called AtlasClient that can be used as an example to build REST client access.

In order to take advantage of the High Availability feature in the clients, there are two options possible.

HAProxy.

Here is an example HAProxy configuration that can be used. Note this is provided for illustration only, and not as a recommended production configuration. For that, please refer to the HAProxy documentation for appropriate instructions.

frontend atlas_fe
  bind *:41000
  default_backend atlas_be

backend atlas_be
  mode http
  option httpchk get /api/atlas/admin/status
  http-check expect string ACTIVE
  balance roundrobin
  server host1_21000 host1:21000 check
  server host2_21000 host2:21000 check backup

listen atlas
  bind localhost:42000

The above configuration binds HAProxy to listen on port 41000 for incoming client connections. It then routes the connections to either of the hosts host1 or host2 depending on a HTTP status check. The status check is done using a HTTP GET on the REST URL /api/atlas/admin/status, and is deemed successful only if the HTTP response contains the string ACTIVE.

Using automatic detection of active instance

If one does not want to setup and manage a separate proxy, then the other option to use the High Availability feature is to build a client application that is capable of detecting status and retrying operations. In such a setting, the client application can be launched with the URLs of all Atlas Web Service instances that form the ensemble. The client should then call the REST URL /api/atlas/admin/status on each of these to determine which is the active instance. The response from the Active instance would be of the form {Status:ACTIVE}. Also, when the client faces any exceptions in the course of an operation, it should again determine which of the remaining URLs is active and retry the operation.

The AtlasClient class that ships with Atlas can be used as an example client library that implements the logic for working with an ensemble and selecting the right Active server instance.

Utilities in Atlas, like quick_start.py and import-hive.sh can be configured to run with multiple server URLs. When launched in this mode, the AtlasClient automatically selects and works with the current active instance. If a proxy is set up in between, then its address can be used when running quick_start.py or import-hive.sh.

Cloudera Community

Support Questions

Does Atlas automatically failover to secondary metastore?

Configuring clients to use the High Availability feature

Configuring clients to use the High Availability feature