The need for asynchronous FaaS call chains in serverless systems

Time spent waiting is money wasted with serverless — and synchronous invocation of other functions means double billing

Functions as a Service (FaaS) provide massive gains in efficiency since users are no longer billed for idle time. FaaS is billed based on execution time and allocated resources — whether you use those resources or not. So time spent waiting is money wasted — and synchronous invocation of other functions means double billing.

Existing server-based approaches techniques to reduce resources expended on waiting don’t work very well for FaaS. Despite steps in the right direction, asynchronous call chains are not sufficiently supported by providers’ platforms

Update: Azure has an alpha preview available for “Durable Functions”, which do not charge during “await” calls. Read the blog post; it’s a very interesting and relevant feature.

Asynchronous call chaining is missing piece

In a modern microservices architecture, RPC is usually the communication method of choice between services — whether over HTTP, gRPC, or others.

There are alternatives to RPC — and in the next blog we’ll explore natively event-based architectures that are better suited for the serverless future. For this post, I’ll focus on RPC.

In an RPC-centric architecture, if Service A requires some information from Service B to take some action, a synchronous RPC call is made from AB to retrieve this information. While Service B is collecting the information to return, Service A must wait.

In this post, we’ll discover that the tactics used to minimize the wastefulness of waiting for synchronous RPC calls in server-based architectures are ill-suited for serverless architectures. As the most common use case, new functionality is needed to better support asynchronous RPC calls.

10 Million Idle Servers
With a server-based architecture, server utilization is an very important topic. A user pays for the time a server is running — per hour on AWS, or per minute on GCP — whether the CPU on that server is running at 100% or 0%.

The lower the utilization, the less value you’re getting for your money. Yes, you need to allow head room in a server, but to a first approximation more utilization = more better.

It’s been estimated that there are 10 million servers out there sitting completely idle. During the time that a server for Service A is waiting for Service B to return, that server is in danger of idling.

Asynchronous IO
Enter asynchronous IO to save the day! Asynchronous IO is a feature of languages and libraries that allows the “waiting” sections of a program to consume very little resources.

Asynchronous IO allows the CPU to be utilized by other parts of the program, enabled more concurrent work to take place, increasing the overall utilization of the server. Note that the server still needs to be provisioned for the maximum utilization, and is billed at that fixed rate.

Callbacks
There are multiple ways to implement asynchronous IO support — a common one is using callbacks. When you begin an asynchronous IO operation, you provide a function that should be run using the output of that operation.

For example, when a web page makes an HTTP request in the background to retrieve some data, the callback will take the HTTP response and update the page to display that data. In essence, a callback allows synchronous calls to be replaced by a sequence of asynchronous calls — without changing the overall information flow.

Paying for Wait Time
An often-touted feature of Functions as a Service (FaaS) like AWS Lambda, Azure Functions, and a growing list of others is that you don’t pay for idle.

When your function is executing, you pay for some fine-grained interval — often 100 milliseconds. When your function isn’t executing, you don’t pay anything. This is great! The problem of those 10 million idle servers will go away forever!

Unfortunately, it’s not that simple. Let’s explore what happens with a single service that relies on asynchronous IO for efficiency within a serverless architecture. Each function gets its own container with a guaranteed amount of resources. You’re billed for only those resources and the amount of time you’re running. But remember — it doesn’t matter whether you use those resources or not.

While a server-based architecture can make more efficient use resources with asynchronous IO, this isn’t true of Functions as a Service. You don’t pay for idle, but you do pay for wait. As I’ve discussed in prior posts, it’s not that useful to use asynchronous code within a function. They should generally be single-threaded, single-task — otherwise, you’re just using FaaS as a platform to build servers.

Compounding the Wait Time
This problem gets worse as we expand our view from a single service to a set of services interacting over RPC.

When a function makes a call to another service, you pay for the waiting time — even if the code is using async IO. As we discovered earlier, you’re also paying for the execution that’s happening in the called service — so you’re paying double during this time.

The problem is compounded if the called service itself makes calls to other services. At any given time, only one function is actually making use of its resources, but all functions waiting on it are billing for their resources. We’ve essentially brought back the idle server problem.

Let’s take another look at the resource graph for a function using asynchronous IO with callbacks on a server:

The Best of Both Worlds
Instead of referring to functions in our code, what if the diagram referred to functions in our infrastructure — and the FaaS platform managed the invocation of our callbacks? If that was the case, then our server resource diagram becomes our infrastructure diagram:

Since each function is billed independently, we don’t have a fixed provisioning. As a result, we’re getting the benefits of FaaS without the paying-for-wait problem — the best of both worlds!

The Wrinkle
A callback-based serverless architecture would provide really good resource utilization. Unfortunately, it’s not possible on today’s FaaS providers.

A callback is a cooperative effort between caller and callee. The caller provides the callback function, and the callee agrees to execute it when the work the callee promises to carry out is finished. As we’ll discover in a moment, the latter point is an important wrinkle.

First, it’s crucial that the caller provide the callback function. This allows multiple callers to leverage the same callee. Azure Functions provides bindings, a way of adding a downstream receiver for data generated by a function — but the binding lives with the function, rather than being dynamically provided by the caller. This means that the results of a function can’t be reused in two separate, isolated data flows.

Second, it is important that the callback only be executed once the contract provided by the callee is finished. For a simple function — for example, a quick database lookup — this occurs when the function returns. But what if the callee itself also calls some third function with a callback?

For instance, let’s say that the database lookup is a long-running operation that exceeds the maximum timeout allowed by the FaaS platform (this is a great use case for callbacks, by the way).

Let’s take some pseudo-Python code for two FaaS functions …

def my_func_handler(event, context):
input = get_inputs(event)
output = faas.invoke_sync('get_from_db', input)
process_output(output)
def get_from_db_handler(event, context):
input = get_inputs(event)
response = db.get(input)
return format(response)

… and turn it into an asynchronous version:

def my_func_handler(event, context):
input = get_inputs(event)
faas.invoke_async('get_from_db', input,
callback='my_func_callback')
def my_func_callback_handler(event, context):
process_output(event)
def get_from_db_handler(event, context):
input = get_inputs(event)
db.get(input, callback='db_callback')
def db_callback_handler(event, context):
return format(event)

We want the execution order to be my_func, get_from_db, db_callback, my_func_callback. But, if the FaaS platform executes callbacks after the immediately-called function returns, the execution order will be my_func, get_from_db, and then, independently, my_func_callback invoked with the (empty) result from get_from_db , and db_callback.

One possible remedy is for the context object to allow a function to transfer ownership of the callback:

def get_from_db_handler(event, context):
input = get_inputs(event)
context.dispatch('db.get', input, callback='db_callback')

It’s possible to implement some of this in the function code itself:

def my_func_handler(event, context):
input = get_inputs(event)
input['callback'] = 'my_func_callback'
faas.invoke_async('get_from_db', input,)
def my_func_callback_handler(event, context):
process_output(event)
def get_from_db_handler(event, context):
input = get_inputs(event)
response = db.get(input) # this is no longer callback-based
faas.invoke_async(event['callback'], response)

When you start piling error handling, retries, dead letter queues, etc. on top of this, it very quickly becomes complicated — and undifferentiated heavy lifting. This method can’t integrate with any other services in the platform, like the database call.

An alternative approach is to leverage a coordination service, for example Azure Logic Apps or AWS Step Functions. These services provide the ability to establish a flow — “call this function, and route its output to this other function”.

While I think this avenue has a lot of potential, it’s not tuned for this use case — and is therefore too expensive. In addition, if these flows are not able to be created at invocation time in an ad hoc manner, then they would need integration with Service Discovery as a Service to allow the called functions to change over time.

Summing This All Up
RPC-based architectures generally rely on turning synchronous RPC calls into sequences of asynchronous function invocations for efficiency. The move from server-based to serverless is going to require “sequence of asynchronous function invocations” to be a construct expressible in infrastructure — but FaaS providers’ existing platforms don’t have sufficient support for this yet.

In the next post, we’ll discuss how non-RPC-based architectures (i.e., those that are event-based) are a more natural fit with serverless, but are lacking the critical features to make prime-time-ready systems.

Cloud Robotics Research Scientist at @iRobot

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store