HTTP-gRPC Status Codes guidelines

As we keep building more and more services everyday, using a standard response definition becomes more and more important. This doc tries to define standard HTTP/gRPC status codes to be used in the services.

Use the HTTP code for HTTP/1.1 service and gRPC for gRPC services.

Success

HTTP: 200 / gRPC: OK

This is the easiest and sometimes most controversial, all success cases should return HTTP: 200 / gRPC: OK.

One thing to note is that we should avoid returning 200/OK with a body representing the error since we use HTTP/gRPC status to imply business case success or failure, 200/OK implies that the response contains a payload which represents the outcome of the requested resource/operation. An error message usually is not a representation of that.

Client Error

HTTP: 400 / gRPC: OUT_OF_RANGE, FAILED_PRECONDITION

All the cases where the request does not make sense to the server should go in this like invalid request, invalid parameters, request parsing errors, etc.

The response should contain an explanation of what went wrong along with a trace ID. (see Reporting Errors)

HTTP: 401 / gRPC: UNAUTHENTICATED

All authentication related error should use this, please read AuthN & AuthZ to understand difference between Authentication and Authorization.

The response should contain an explanation of what went wrong along with a trace ID. (see Reporting Errors)

HTTP: 403 / gRPC: PERMISSION_DENIED

All authorization related error should use this, please read AuthN & AuthZ to understand difference between Authentication and Authorization.

The response should contain an explanation of what went wrong along with a trace ID. (see Reporting Errors)

HTTP: 404 / gRPC: NOT_FOUND

When the resource is not found or the operation is not available service should return this

The response should contain an explanation of what went wrong along with a trace ID. (see Reporting Errors)

Server Error

HTTP: 500 / gRPC: UNKNOWN, INTERNAL, DATA_LOSS

When the request was valid but server is unable to serve this request because of any of the following

  • Internal error in the server
  • Some unknown/unexpected scenario
  • unrecoverable error returned by some dependent service

The response should contain an explanation of what went wrong along with a trace ID. (see Reporting Errors)

HTTP: 503 / gRPC: UNAVAILABLE

When the request was valid but the server is unable to serve this request at this moment but can be corrected by client retrying this request with a back-off.

The response should contain an explanation of what went wrong along with a trace ID. (see Reporting Errors)

Please keep in mind that it may not be safe to retry operations that are NOT idempotent

HTTP: 504 / gRPC: DEADLINE_EXCEEDED

When the request was valid but the server is unable to serve this request because of any of the following

  • Internal timeouts
  • Unrecoverable timeout on some dependent service

The response should contain an explanation of what went wrong along with a trace ID. (see Reporting Errors)

Reporting Errors

When an error happens it is important for your service to let client know that something went wrong, its also important for you to provide them with some description of the issue.

Your service should

  • Provide a trace ID that client can use to report errors back to you
  • Provide a small description of what went wrong

Your service should NOT

  • Return debug info like stack-traces, private data, etc.
  • Return dependent service errors as is to clients

FAQ

My Service calls multiple other services what should I do if one or more of them return an error ?

Your service will always be treated as a black box by Clients which mean that clients should NOT be aware of other services that you end up calling, any errors in them should be handled in your service or bubbled up to the client.

The best way to approach this is to think of your API as an action, If your service has a number of other services that are in the critical path for you to complete that action, when these dependent services return an error you should return an error too. Another case where you need to call a number of different services but only one services is required for you to perform that action, when that critical service returns a success its safe for you to return 200/OK.

My service does a search for items, should it return 404 when the item is not found ?

Search is a good example for a case where we can distinguish between a 404 and 200.

Your API will be a search API only when it can return multiple valid items for a query, for example searching for a flight from London to Singapore for a particular date can result in multiple such flights, so you API will have some type of list/array in it (e.g see elastic search API). In this case it is completely valid to return a 200 with an empty list/array since this is an expected outcome.

Your API is a get/fetch API when you always return a single item, for example /flight/<flight-code>/ , in this case its perfectly valid for you to return a 404 with a not found error when the <flight-code> is not found

One of the dependent service returns a 404, can I forward it back to client ?

It makes sense for your service to return a 400 and NOT 404 to clients when they sent you some information that is wrong, lets take an example your service is a booking engine, clients have to send userID and flightID in the request so that you can book that flight for them, you call dependent services to resolve flight from flightID and the dependent service returns a 404. This is a validation failure for you and you can return a 400 to the client with description “invalid flightID”

Your service should return 5XX to client when you receive an error from a dependent service that the client is not aware of, for example in the above example when you query for the userID and flightID you receive valid data but the OTA returns an error.

comments powered by Disqus