GET Health Check
Design discussion for new Health Check implementation.
Objectives
The goal for this design is to implement a new Health check for mojaloop switch services that allows for a greater level of detail.
It Features:
Clear HTTP Statuses (no need to inspect the response to know there are no issues)
Backwards compatibility with existing health checks- No longer a requirement. See this discussion.Information about the version of the API, and how long it has been running for
Information about sub-service (kafka, logging sidecar and mysql) connections
Request Format
/health
Uses the newly implemented health check. As discussed here since there will be no added connection overhead (e.g. pinging a database) as part of implementing the health check, there is no need to complicate things with a simple and detailed version.
Responses Codes:
200
- Success. The API is up and running, and is sucessfully connected to necessary services.502
- Bad Gateway. The API is up and running, but the API cannot connect to necessary service (eg.kafka
).503
- Service Unavailable. This response is not implemented in this design, but will be the default if the api is not and running
Response Format
status
statusEnum
The status of the service. Options are OK
and DOWN
. See statusEnum
below.
"OK"
uptime
number
How long (in seconds) the service has been alive for.
123456
started
string
(ISO formatted date-time)
When the service was started (UTC)
"2019-05-31T05:09:25.409Z"
versionNumber
string
(semver)
The current version of the service.
"5.2.5"
services
Array<serviceHealth>
A list of services this service depends on, and their connection status
see below
serviceHealth
name
subServiceEnum
The sub-service name. See subServiceEnum
below.
"broker"
status
enum
The status of the service. Options are OK
and DOWN
"OK"
subServiceEnum
The subServiceEnum enum describes a name of the subservice:
Options:
datastore
-> The database for this service (typically a MySQL Database).broker
-> The message broker for this service (typically Kafka).sidecar
-> The logging sidecar sub-service this service attaches to.cache
-> The caching sub-service this services attaches to.
statusEnum
The status enum represents status of the system or sub-service.
It has two options:
OK
-> The service or sub-service is healthy.DOWN
-> The service or sub-service is unhealthy.
When a service is OK
: the API is considered healthy, and all sub-services are also considered healthy.
If any sub-service is DOWN
, then the entire health check will fail, and the API will be considered DOWN
.
Defining Sub-Service health
It is not enough to simply ping a sub-service to know if it is healthy, we want to go one step further. These criteria will change with each sub-service.
datastore
datastore
For datastore
, a status of OK
means:
An existing connection to the database
The database is not empty (contains more than 1 table)
broker
broker
For broker
, a status of OK
means:
An existing connection to the kafka broker
The necessary topics exist. This will change depending on which service the health check is running for.
For example, for the central-ledger
service to be considered healthy, the following topics need to be found:
sidecar
sidecar
For sidecar
, a status of OK
means:
An existing connection to the sidecar
cache
cache
For cache
, a status of OK
means:
An existing connection to the cache
Swagger Definition
Note: These will be added to the existing swagger definitions for the following services:
ml-api-adapter
central-ledger
central-settlement
central-event-processor
email-notifier
Example Requests and Responses:
Successful Legacy Health Check:
Successful New Health Check:
Failed Health Check, but API is up:
Failed Health Check:
Sequence Diagram
Sequence design diagram for the GET Health
Last updated