GET Health Check
Design discussion for new Health Check implementation.
Objectives
The goal for this design is to implement a new Health check for mojaloop switch services that allows for a greater level of detail.
It Features:
Clear HTTP Statuses (no need to inspect the response to know there are no issues)
Backwards compatibility with existing health checks- No longer a requirement. See this discussion.Information about the version of the API, and how long it has been running for
Information about sub-service (kafka, logging sidecar and mysql) connections
Request Format
/health
Uses the newly implemented health check. As discussed here since there will be no added connection overhead (e.g. pinging a database) as part of implementing the health check, there is no need to complicate things with a simple and detailed version.
Responses Codes:
200
- Success. The API is up and running, and is sucessfully connected to necessary services.502
- Bad Gateway. The API is up and running, but the API cannot connect to necessary service (eg.kafka
).503
- Service Unavailable. This response is not implemented in this design, but will be the default if the api is not and running
Response Format
Name | Type | Description | Example |
---|---|---|---|
|
| The status of the service. Options are |
|
|
| How long (in seconds) the service has been alive for. |
|
|
| When the service was started (UTC) |
|
|
| The current version of the service. |
|
|
| A list of services this service depends on, and their connection status | see below |
serviceHealth
Name | Type | Description | Example |
---|---|---|---|
|
| The sub-service name. See |
|
|
| The status of the service. Options are |
|
subServiceEnum
The subServiceEnum enum describes a name of the subservice:
Options:
datastore
-> The database for this service (typically a MySQL Database).broker
-> The message broker for this service (typically Kafka).sidecar
-> The logging sidecar sub-service this service attaches to.cache
-> The caching sub-service this services attaches to.
statusEnum
The status enum represents status of the system or sub-service.
It has two options:
OK
-> The service or sub-service is healthy.DOWN
-> The service or sub-service is unhealthy.
When a service is OK
: the API is considered healthy, and all sub-services are also considered healthy.
If any sub-service is DOWN
, then the entire health check will fail, and the API will be considered DOWN
.
Defining Sub-Service health
It is not enough to simply ping a sub-service to know if it is healthy, we want to go one step further. These criteria will change with each sub-service.
datastore
datastore
For datastore
, a status of OK
means:
An existing connection to the database
The database is not empty (contains more than 1 table)
broker
broker
For broker
, a status of OK
means:
An existing connection to the kafka broker
The necessary topics exist. This will change depending on which service the health check is running for.
For example, for the central-ledger
service to be considered healthy, the following topics need to be found:
sidecar
sidecar
For sidecar
, a status of OK
means:
An existing connection to the sidecar
cache
cache
For cache
, a status of OK
means:
An existing connection to the cache
Swagger Definition
Note: These will be added to the existing swagger definitions for the following services:
ml-api-adapter
central-ledger
central-settlement
central-event-processor
email-notifier
Example Requests and Responses:
Successful Legacy Health Check:
Successful New Health Check:
Failed Health Check, but API is up:
Failed Health Check:
Sequence Diagram
Sequence design diagram for the GET Health
Last updated