How floro works (part 2)

(back to part 1)

Floro wouldn't be much of a version control system if it didn't have diffs. But before explaining how diffs work in floro, it's best to take a step back and do a quick overview of diffing in general.

Diffing is the process of describing how one sequence transforms into a different sequence. Typically diffing involves finding the most concise way to transform between the two sequences. For sake of consistency, we will be discussing the two states of a diff as the "before" state and the "after" state. 

The algorithm used for diffing is the Longest Common Subsequence or LCS algorithm. There's a plethora of YouTube videos giving explanations of how to find the LCS for two sequences (we liked this one)
. However, to help build intuition for the algorithm, we've provided you an LCS string calculator that delimits on characters. Feel free to type around and see how the LCS changes, when you change either of the strings.

string 1
string 2

String 1: ACBAACD

String 2: ADBDADC


The key thing to know about LCS, is that the opposite of the LCS is the diff, which makes sense when you think about it for a moment. The LCS tells us what is the same between two sequences, by comparing the LCS to the before sequence and after sequence, we can easily determine what was removed from the before sequence and what was added to the after sequence.

The diff calculator below demonstrates this process.

before string
after string

Before: This is the string before

After: And this is the string after



T ->@ idx: 0

b ->@ idx: 19

f ->@ idx: 21

o ->@ idx: 22

e ->@ idx: 24

A ->@ idx: 0

n ->@ idx: 1

d ->@ idx: 2

‚óľ ->@ idx: 3

t ->@ idx: 4

a ->@ idx: 23

f ->@ idx: 24

t ->@ idx: 25

Finally, we can return back to our Localization example to see how we can apply diffing to our plugin states. The big philosophical question in diffing is how do you produce a meaningful diff? Unfortunately, diffing documents is highly subjective to the task at hand. However, floro has a couple of tricks for making good diffs easy for plugin developers to create.

First let's think about the before and after states of our localization document.

Before State¬† --őĒ-> After State

If we imagine these two states, in their key value form it becomes a lot easier to see how we could actually diff them. If we just imagine the keys of the spreadsheets above, we see the two states would be the following

Before Keys -> After Keys

To see the diff we're going to change the colors that are highlighted. Instead of highlighting the key relationships of the plugins, we highlight the keys that were removed from the before state and the keys that were added to the after state.

We should now be able to see this diff far more easily in our spreadsheet interface!

And that's the way that plugin diffing really works. While the user interfaces with the plugin via a user interface and alters the plugin's state tree through the GUI, the plugin interfaces with floro through its keys. In fact all that is communicated to a plugin about a diff, is which keys were removed and which keys were added.

The idea that all a document diff requires is two views, the before, and the after, as well as the keys that were removed from the before state and the keys that were added to the after state is the fundamental idea of document diffing.

In the next section we discuss merging and data granularity. One key thing to know at this point, is that if something can be diffed (meaningfully) then it can also be merged (meaningfully)... even when conflicts arise.

Bringing it all together

Unfortunately, there is not a lot of high-quality content that comprehensively covers the sequence merging cases Floro deals with. Some similar problems appear in Conflict Free Replicated Data Types, but they diverge significantly enough from our use case that it is probably best for us to explain merging ourselves. That being said, this section will be slightly more technical than the previous sections.

There are two distinct types of merges that occur in version control that we need to delineate. The first type of merging is the merging of sequences. We will cover this in this section. The second type of merging is the merging of commit trees, which concentrates on how commit graphs are merged (if you search 3-way-merge 99% of the results are discussing commit tree merging, that is not what we are talking about here). To see how floro manages merging the commit tree, please see the docs on merging.

Like most version control systems, Floro uses a variant of three-way merging. You can view the implementation of Floro's merge algorithm.

To illustrate how it works, we need to visualize a starting state (origin) of a sequence and two changes that occur concurrently.

In our example, our origin is "BCE". Our left change is "BCDE", and our right change is "ABC".

We need to take the Greatest Common Longest Common Subsequence (GC-LCS) of our diagram.

  1. We first take the LCS of our origin sequence (BCE) and the left sequence (BCDE). This gives us a left LCS of BCE.
  2. Next we take the LCS of our origin sequence (BCE) and the right sequence (ABC). This gives us a right LCS of BC.
  3. Since the left LCS is BCE and the right LCS is BC our Greatest Common LCS (the LCS of both the left and right LCS with respect to the origin) is BC.

Next, we need to reconcile both our left and right merge segments using the GC-LCS.

First, we reconcile the left side:

You can consider reconciliation as the process of subtracting the origin from the changed sequence.

You could think of this as saying BCDE - BCE = BCD.

This is heavily oversimplified, but conceptually that's all that's happening here.

Next, we reconcile the right side:

Again, you could think of this as saying ABC - BCE = ABC.

Finally, we add our reconciled left and right sequences:

We can view this as saying BCD + ABC = ABCD.

So does this always work?

Let's consider this example.

In our second example, our origin is "BCE". Our left change is "BCDE", and our right change is "ABCF". The only difference in this example is our right change now ends with an "F".

Again, we take the Greatest Common Longest Common Subsequence (GC-LCS).

  1. We first take the LCS of our origin sequence (BCE) and the left sequence (BCDE). This gives us a left LCS of BCE.
  2. Next we take the LCS of our origin sequence (BCE) and the right sequence (ABCF). This gives us a right LCS of BC.
  3. Since the left LCS is BCE and the right LCS is BC, our Greatest Common LCS (the LCS of both the left and right LCS with respect to the origin) is BC.

Next, we reconcile the left side:

Again, you could think of this as saying BCDE - BCE = BCD.

Next, we reconcile the right side:

We could think of this as saying ABCF - BCE = ABCF.

Finally, we add our reconciled left and right sequences:

...Ruh Roh!!!

This leaves us with a bit of a predicament. Is the sequence ABCDF or ABCFD correct?

This is a merge conflict!

Either sequence is technically correct.

When a merge operation results in the same sequence, whether merged from the left or from the right (i.e. if the merge operation commutes a + b = b + a), we say the sequences are auto-mergeable. However, when the two sequences differ, it indicates a merge conflict and the sequences cannot be merged automatically.

Technical Note: There can sometimes be more than one automatic merge sequence. This can occur when there are repeated GC-LCS sequences present in one of the reconciled LCS segments. This topic is beyond the scope of this article, but floro is biased towards "right-sided" reconciliation. Sometimes, left-sided automatic merges exist when a right automatic merge does not exist, given the infrequency of this and the performance overhead of checking both left and right reconciliations, floro only uses right sided reconciliation. Please see the link to the merge algorithm.

Perhaps, neither ABCDF nor ABCFD are correct, the correct sequence could be ABCXYZ. This is one area where floro significantly differs from automatic merge technologies. Rather than attempting to guess the merge (or use a last write wins strategy or Lamport timestamps), when conflicts occur, floro, like git, uses user intent to reconcile conflicts. Unlike git, merge conflicts in floro are not too cumbersome to resolve. The primary reason for this is that structured data is significantly easier to merge than plain text. Floro cannot have syntactic conflicts, only semantic conflicts.

Merge granularity¬†ūüćö

It's easy to see how single-character sequences can be merged. However, the real power of merging emerges when we consider the granularity of what's being merged. It was entirely our decision to delimit our merge sequences by characters. We could have chosen to merge on a binary level, a word level, or a sentence level. The choice of character level was made because 'BCE', 'ABC', and 'BCDE' did not contain spaces or periods. Strictly speaking, any information can be merged at a binary level, but without structure or a sense of granularity, the result is likely to be nonsensical.

To illustrate, let's use a merge calculator.

origin string
left string
right string

Left Merge (char delimited): Left side first, Hiello there world! End on the right.

Right Merge (char delimited): Left side first, Helloi there world! End on the right.

Auto-Merges? (char delimited):

Left Merge (space delimited): Left side first, Hi there world! End on the right.

Right Merge (space delimited): Left side first, Hi there world! End on the right.

Auto-Merges? (space delimited):

If you experiment a little, you'll find that issues start cropping up when we don't delimit by spaces. Looking at the default sentences, the words "Hello" and "Hi" merge into a conflicted state and become "Hiello" and "Helloi", when they're delimited on a character level.

So, what is the granularity of git? Line numbers!

Line numbers make a lot of sense for plain text. Yet, merging on line numbers is not sensible when it comes to merging structured data. Fortunately, a Floro plugin state naturally orders into granularity levels that are suitable for merging. The choice remains with the plugin designer to select the appropriate granularity levels for their application. A significant part of designing a schema in floro is considering the merge and diff ramifications of the data structures.

Now we finally get to ask the question, how do you merge colors?

Generally, in programming, we think of colors as combinations of red, green, and blue values. If we considered red, green, and blue as independent variables, we could end up with a schema like the one pictured below.

This could be a valid schema and an effective way to represent the data structure. However, it might lead to some unexpected merge behaviors.

Should changes from the color red to yellow and magenta result in an automatic resolution to the color white? Probably not, but maybe that's the expected behavior of the application.

To resolve this issue, we could opt to not treat each variable as independent, and instead define a schema where red, green, and blue are grouped as part of the same category. Hex codes naturally do this. This means that red, green, and blue cannot merge independently of each other. A user would then have to choose either yellow or magenta, or manually resolve the color to something else.

This is an area where floro significantly differs from traditional relational data stores. Whether the properties of a category are normalized or denormalized directly affects how the category and its sub-categories are merged and differentiated.

Order matters ūüď∂

As we said before. Floro uses sets for its KVs. What we mean by sets, is lists that have elements that have primary keys that can only occur once. When floro deserializes a KV set into a tree state, if it encounters an element with the same key more than once, it will skip over the elements with keys it has already encountered.

In the example above, we can see that the key $(LocalizedKeys).Key<Welcome Hero>.Translations.Locale<English>, occurs twice. If we go back to our merge conflict example this could occur if we had a conflict between the sequences ABCD and ABDC. If we choose the first order we would end up with the string (ABCD) "Welcome to our website". However, if we choose the second sequence we would end up with the string (ABDC) "Hi there, welcome our app!".

Technical Side Note: Floro does actually support array structures as well, although during serialization processes, they are really treated as sets. The difference between an array and a set in floro is that the primary key of an array element is its index. We remove the indices of array elements before diffing and merging operations, which allows the LCS based algorithms to still work, since duplicate values and their ordinality are necessary for diffing and merging to work correctly with LCS.

Why not git (for structured data)?ūüėļ

If you recall in an earlier section we said, "Floro does not really ever see the plugin state as a tree state." We lied a little bit... sorry. After floro merges two KV sequences, it subsequently cascades the merge result. This is how floro guarantees consistency even though it is blindly merging data structures it knows nothing about other than their schemas. While you can write custom merge-drivers for git, it would be impossible to enforce cascading rules in an unstructured version control system (if not impossible, impractical to the point that you might as well write a new version control system).

Cascaded Merging:

If we didn't cascade the merge results in the above diagram, we would end up with an inconsistent state. That also means that many merges that would end up with conflicts in a plain text version control system, (often) do not have conflicts when structured. Structure and merging are intimately coupled. The reason this isn't a problem for git is because git depends on programmers to resolve the inconsistencies it has when conflicts arise.

We're not advocating anyone stop using git. Git, Mercurial, Pijul and Fossil are all fantastic for code and plain text. Floro would be an awful tool for plain text use cases (like code). Floro is designed to interface with plain text version control systems, not replace them.

Wrapping up & Offline-first

If you made it this far, thank you for reading all of this. Offline-first technology is fascinating to us and we hope we conveyed some of the ideas that get us most excited. We've been heavily inspired by many folks working on CRDTs and other offline-first technologies. One talk we found particularly relevant and inspiring to the development of floro is a talk by Martin Fowler from 2016 on CQRS & Event Sourcing.

In our experience we've seen many great offline-first applications for personal productivity. We think there's a lot of potential for offline-first to be used in many business cases too. Regardless of if you are thinking of building on top of floro, contributing to floro or just want our help thinking through a hard merge problem, we'd love to hear from you. Feel free to reach out to us on Discord.