Unidirectional data flow architectures in JS

A new breed of Javascript framework is emerging that emphasizes unidirectional data flow and reactive programming. These frameworks/architectures, like Flux, its derivatives, and my favorite, re-frame, are billed as ways to escape the so-called "callback hell" of the event-driven async programming model by simplifying and explicitly describing how state changes propagate through the application. Of course, the ideas these architectures use aren't exactly new -- they've been used in GUI development for a long time. But they are starting to be rediscovered in the single-page application design sphere, which is still a fledgling field (relatively speaking).

I have fallen in love with unidirectional data flow as an SPA design pattern. But at the same time it's clear that these architectures are so simple that using a 3rd party framework is usually overkill. Even Flux does not come with a framework, only example code (although it has been implemented as a framework by many other people). In order to be able to construct our own UDF applications without using a framework, we need to figure out what it is that characterizes a UDF architecture. Toward that end I have been thinking about the following "attributes" or characteristics of a unidirectional dataflow architecture.

These are not meant to be prescriptive or comprehensive. They are just my attempt at describing the current state of UDF architectures in SPA development, based on my own observations. But I do think that each attribute has benefits that it brings to the table, and all four of them work together to make an application simpler and more stable. Combined, they lay a flexible and powerful groundwork that your application can build on.

  1. Centralized data store contains all state
  2. Events are managed by a dispatcher
  3. Handlers update state (or raise events that do)
  4. The UI is a function of the application state

Centralized data store contains all state

A major paradigm shift in these architectures is using a central store for all application state. In Flux, the group of Store components together contain all the application's state. In re-frame there is an app-db hash table that contains everything. The important thing is that the application state is centralized. If all application state is collected into a single "in-memory database", this means the rest of the application can be stateless.

Compare this with an object-oriented event-based architecture, where components manage their individual state and hide it from others. This means that each component becomes responsible for watching its own internal state and triggering state change updates if it's modified. Also, since each component is stateful, we have to be very careful about how we handle updating or recreating component objects, and we can't reason about state updates in general because there is no single thing that corresponds to a state update which can be observed or logged or analyzed.

If all application state is in one place, we have a lot of power. We can easily implement undo and redo by keeping a history of state updates, which is much harder if components have internal private state. We can easily know when to redraw the UI by watching for state updates. We can easily implement saving/loading of the state, which, for example, lets us resume user sessions at exactly the same spot when they revisit the page. We can also avoid situations where shared state is stored in two components, unnecessarily wasting space and causing potential synchronization issues, or where one component wants to read the "private" state of another component. There's no privacy among friends, and we're all friends here, right?

It's important to note that we are not just talking about long-term state. Transient, interface-related state (like which todos are selected for editing, or what prompts or error messages should be visible) are also stored in the same global structure. This is the type of state that tends to be privately held by the component diaspora in an object oriented architecture.

Events are managed by a dispatcher

Events -- like a user interaction or an asynchronous process completion -- are not allowed to propagate willy-nilly. This prevents some types of issues: you should never accidentally capture state in callback closures, callbacks are never nested, and business logic is centralized. Although we can't get away from having to attach event listeners or promise callbacks because we are stuck with the Javascript browser API, we can make them as simple as possible. Event listeners or callbacks do not include any logic and they do not accidentally enclose any state. All they do is immediately hand off their event to a centralized event handling switchboard, which Flux and re-frame both call a dispatcher. When the user hits Ctrl+S, the listener doesn't validate the current state, or make an AJAX call, or attach another callback handler to update the UI when the save is complete. All it does is inform the dispatcher, "Hey: I'm raising a 'save' event". Events passed to the dispatcher usually have an event type and possibly an event value which may be captured by the closure.

// all event listeners look like these
function (e) { dispatchEvent({ type: "save" }); }
function (e) { dispatchEvent({ type: "selection", value: selectedThing }); }

The dispatcher is responsible for deciding what to do with incoming events, but usually it farms out this decision by dispatching the event to registered handlers. In Flux, the dispatcher informs the Stores about the event, expecting them to handle it by updating their state. In re-frame, the dispatcher calls a handler function instead, which is able to update the application state directly. Of course, the dispatcher is completely stateless (in terms of application state -- it may have a dispatch table or something similar, but we do not expect it to be observable).

There are surprising benefits to using a layer of indirection (the dispatcher) over directly attaching the handlers as event listeners. The handler is not directly defined or called within the context of the event listener attachment, so the only information it receives is the event object itself. This also means that callbacks cannot be nested since each callback will just raise an event whose handler is defined elsewhere, which keeps code "flat". The dispatcher itself gives us an opportunity to add logging or state tracking or throttling or validation over all events. Finally, handlers are completely separate from the DOM, so events cannot (easily) sidestep the architecture by directly manipulating the UI.

Handlers update state (or raise events that do)

Once the dispatcher hands off an event to a handler, there are only two things that a handler can do. It can raise another event (or begin an asynchronous action that raises an event when it completes), or it can update the application state. Of course it could also do other things along the way; handlers can and should contain most of your application's business logic. But in the end, if they don't update the application state or raise another event that eventually will result in a state update, the handler might as well be a no-op.

If the handler doesn't seem like it updates the state or raises an event, but it still affects the application, be careful: you may have stumbled on some hidden state that should be dealt with. If possible, exhume the state and inter it in the centralized store.

On a related note, handlers should be the only things that update your application state. When you think about it, there is no other place for it to really happen, as long as the views don't trigger state updates, and you don't use two-way data binding or event listeners that are not handled by your dispatcher. The fact that the handlers are the only things that update your state is what makes the application's data flow "unidirectional". Data flows from the state, to the views, to the dispatcher, to the handler, and back to the state. If you lose this property, it becomes much harder to think about your application. Repeat after me: only handlers should modify application state.

The UI is a function of the application state

The application UI should depend on the centralized application state -- for example in Flux the views can depend on one or more Stores. The UI cannot depend on anything else. This rule makes it simple to know when you should re-render the view: do it every time the centralized state was changed (by a handler, hopefully). Of course, we don't always want to actually re-render the whole UI whenever the state changes. Since browser reflows are expensive, we don't want to trigger them by updating DOM elements that don't need to be changed. Usually, only a tiny part of the UI needs to actually be changed when the state is updated.

There are different ways to handle this issue. A reactive programming strategy would be to model the central state as a group of streams (a.k.a. observables), so each component can listen to only the streams it cares about, and redraw itself when a new value comes from upstream. This is conceptually similar to the approach taken by Facebook's Relay and GraphQL library combo.

Another powerful solution in vogue is the use of a virtual DOM. Instead of directly modifying the DOM, your views just return a "virtual DOM", which is just a data structure that represents the DOM but which can be updated without triggering UI reflows. The virtual DOM your view returns is "diffed" with the previous virtual DOM to determine what has changed. Then those changes are "patched" to the actual DOM in a way that minimizes the reflow cost, for example, only changing the text inside a single cell of a table instead of removing and re-creating the entire table. The virtual DOM serves as an intermediary layer which allows the application to freely trigger a re-render of the entire application on each state change, safe in the knowledge that the actual expensive DOM manipulation is minimized even if you are re-creating the virtual DOM tree each time.

Libraries

UI rendering is probably the only place where using a library is almost a necessity -- if you don't use one you will end up writing one yourself. Flux recommends React. Re-frame uses Reagent, which is a ClojureScript wrapper for React. Relay and GraphQL are built to be used with React. But React is not the only option. Other virtual DOM providers include the unassuming virtual-dom and the lightning fast mithril. Even Ember is working on a virtual DOM rendering engine called Glimmer in order to take advantage of the dramatic performance improvements that a virtual DOM can provide. The library du jour changes over time, but it's not that important to pick the latest and hippest. It's more important that the view layer supports the UDF paradigm, which means using one-way data bindings only, and that it is able to perform fast enough that the user experience isn't disturbed, whether it be through use of a virtual DOM or some other performance enhancing technique.

Next episode: the ultralightweight antiframework

Things like Flux and re-frame are designed for large applications. But I am interested in getting all the benefits of UDF even for relatively simple applications where a Flux-style architecture, whether from a 3rd party framework or not, would be plain overkill. My next post in this series will discuss an ultra-lightweight application design pattern that is suitable for getting the benefits of UDF even for very simple applications. We will arrive at the design pattern by looking very skeptically at every component in a Flux-style architecture. What is unnecessary for a not-Facebook-scale SPA? What components can be simplified, and how? In the end we will come to find that with a little forethought, a UDF architecture can be baked in to your SPA with hardly any effort and with practically no "framework-style" glue code at all.