Luau is the primary programming language to place the facility of semantic subtyping within the fingers of thousands and thousands of creators.

## Minimizing false positives

One of many points with sort error reporting in instruments just like the Script Evaluation widget in Roblox Studio is *false positives*. These are warnings which can be artifacts of the evaluation, and don’t correspond to errors which might happen at runtime. For instance, this system

native x = CFrame.new() native y if (math.random()) then y = CFrame.new() else y = Vector3.new() finish native z = x * y

stories a sort error which can’t occur at runtime, since `CFrame`

helps multiplication by each `Vector3`

and `CFrame`

. (Its sort is `((CFrame, CFrame) -> CFrame) & ((CFrame, Vector3) -> Vector3)`

.)

False positives are particularly poor for onboarding new customers. If a type-curious creator switches on typechecking and is instantly confronted with a wall of spurious purple squiggles, there’s a robust incentive to right away swap it off once more.

Inaccuracies in sort errors are inevitable, since it’s not possible to resolve forward of time whether or not a runtime error will probably be triggered. Kind system designers have to decide on whether or not to stay with false positives or false negatives. In Luau that is decided by the mode: `strict`

mode errs on the facet of false positives, and `nonstrict`

mode errs on the facet of false negatives.

Whereas inaccuracies are inevitable, we attempt to take away them at any time when attainable, since they end in spurious errors, and imprecision in type-driven tooling like autocomplete or API documentation.

## Subtyping as a supply of false positives

One of many sources of false positives in Luau (and lots of different comparable languages like TypeScript or Move) is *subtyping*. Subtyping is used at any time when a variable is initialized or assigned to, and at any time when a perform is known as: the sort system checks that the kind of the expression is a subtype of the kind of the variable. For instance, if we add varieties to the above program

native x : CFrame = CFrame.new() native y : Vector3 | CFrame if (math.random()) then y = CFrame.new() else y = Vector3.new() finish native z : Vector3 | CFrame = x * y

then the sort system checks that the kind of `CFrame`

multiplication is a subtype of `(CFrame, Vector3 | CFrame) -> (Vector3 | CFrame)`

.

Subtyping is a really helpful characteristic, and it helps wealthy sort constructs like sort union (`T | U`

) and intersection (`T & U`

). For instance, `quantity?`

is applied as a union sort `(quantity | nil)`

, inhabited by values which can be both numbers or `nil`

.

Sadly, the interplay of subtyping with intersection and union varieties can have odd outcomes. A easy (however reasonably synthetic) case in older Luau was:

native x : (quantity?) & (string?) = nil native y : nil = nil y = x -- Kind '(quantity?) & (string?)' couldn't be transformed into 'nil' x = y

This error is attributable to a failure of subtyping, the outdated subtyping algorithm stories that `(quantity?) & (string?)`

shouldn’t be a subtype of `nil`

. This can be a false optimistic, since `quantity & string`

is uninhabited, so the one attainable inhabitant of `(quantity?) & (string?)`

is `nil`

.

That is a man-made instance, however there are actual points raised by creators attributable to the issues, for instance https://devforum.roblox.com/t/luau-recap-july-2021/1382101/5. At the moment, these points largely have an effect on creators making use of subtle sort system options, however as we make sort inference extra correct, union and intersection varieties will grow to be extra frequent, even in code with no sort annotations.

This class of false positives now not happens in Luau, as we now have moved from our outdated strategy of *syntactic subtyping* to an alternate referred to as *semantic subtyping*.

## Syntactic subtyping

AKA “what we did earlier than.”

Syntactic subtyping is a syntax-directed recursive algorithm. The fascinating instances to take care of intersection and union varieties are:

- Reflexivity:
`T`

is a subtype of`T`

- Intersection L:
`(T₁ & … & Tⱼ)`

is a subtype of`U`

at any time when a number of the`Tᵢ`

are subtypes of`U`

- Union L:
`(T₁ | … | Tⱼ)`

is a subtype of`U`

at any time when the entire`Tᵢ`

are subtypes of`U`

- Intersection R:
`T`

is a subtype of`(U₁ & … & Uⱼ)`

at any time when`T`

is a subtype of the entire`Uᵢ`

- Union R:
`T`

is a subtype of`(U₁ | … | Uⱼ)`

at any time when`T`

is a subtype of a number of the`Uᵢ`

.

For instance:

- By Reflexivity:
`nil`

is a subtype of`nil`

- so by Union R:
`nil`

is a subtype of`quantity?`

- and:
`nil`

is a subtype of`string?`

- so by Intersection R:
`nil`

is a subtype of`(quantity?) & (string?)`

.

Yay! Sadly, utilizing these guidelines:

`quantity`

isn’t a subtype of`nil`

- so by Union L:
`(quantity?)`

isn’t a subtype of`nil`

- and:
`string`

isn’t a subtype of`nil`

- so by Union L:
`(string?)`

isn’t a subtype of`nil`

- so by Intersection L:
`(quantity?) & (string?)`

isn’t a subtype of`nil`

.

That is typical of syntactic subtyping: when it returns a “sure” outcome, it’s right, however when it returns a “no” outcome, it could be mistaken. The algorithm is a *conservative approximation*, and since a “no” outcome can result in sort errors, it is a supply of false positives.

## Semantic subtyping

AKA “what we do now.”

Slightly than pondering of subtyping as being syntax-directed, we first take into account its semantics, and later return to how the semantics is applied. For this, we undertake semantic subtyping:

- The semantics of a sort is a set of values.
- Intersection varieties are regarded as intersections of units.
- Union varieties are regarded as unions of units.
- Subtyping is regarded as set inclusion.

For instance:

Kind | Semantics |
---|---|

`quantity` |
{ 1, 2, 3, … } |

`string` |
{ “foo”, “bar”, … } |

`nil` |
{ nil } |

`quantity?` |
{ nil, 1, 2, 3, … } |

`string?` |
{ nil, “foo”, “bar”, … } |

`(quantity?) & (string?)` |
{ nil, 1, 2, 3, … } ∩ { nil, “foo”, “bar”, … } = { nil } |

and since subtypes are interpreted as set inclusions:

Subtype | Supertype | As a result of |
---|---|---|

`nil` |
`quantity?` |
{ nil } ⊆ { nil, 1, 2, 3, … } |

`nil` |
`string?` |
{ nil } ⊆ { nil, “foo”, “bar”, … } |

`nil` |
`(quantity?) & (string?)` |
{ nil } ⊆ { nil } |

`(quantity?) & (string?)` |
`nil` |
{ nil } ⊆ { nil } |

So in keeping with semantic subtyping, `(quantity?) & (string?)`

is equal to `nil`

, however syntactic subtyping solely helps one route.

That is all tremendous and good, but when we wish to use semantic subtyping in instruments, we want an algorithm, and it seems checking semantic subtyping is non-trivial.

## Semantic subtyping is tough

NP-hard to be exact.

We are able to scale back graph coloring to semantic subtyping by coding up a graph as a Luau sort such that checking subtyping on varieties has the identical outcome as checking for the impossibility of coloring the graph

For instance, coloring a three-node, two shade graph might be performed utilizing varieties:

sort Purple = "purple" sort Blue = "blue" sort Shade = Purple | Blue sort Coloring = (Shade) -> (Shade) -> (Shade) -> boolean sort Uncolorable = (Shade) -> (Shade) -> (Shade) -> false

Then a graph might be encoded as an overload perform sort with subtype `Uncolorable`

and supertype `Coloring`

, as an overloaded perform which returns `false`

when a constraint is violated. Every overload encodes one constraint. For instance a line has constraints saying that adjoining nodes can’t have the identical shade:

sort Line = Coloring & ((Purple) -> (Purple) -> (Shade) -> false) & ((Blue) -> (Blue) -> (Shade) -> false) & ((Shade) -> (Purple) -> (Purple) -> false) & ((Shade) -> (Blue) -> (Blue) -> false)

A triangle is analogous, however the finish factors additionally can’t have the identical shade:

sort Triangle = Line & ((Purple) -> (Shade) -> (Purple) -> false) & ((Blue) -> (Shade) -> (Blue) -> false)

Now, `Triangle`

is a subtype of `Uncolorable`

, however `Line`

shouldn’t be, because the line might be 2-colored. This may be generalized to any finite graph with any finite variety of colours, and so subtype checking is NP-hard.

We take care of this in two methods:

- we cache varieties to scale back reminiscence footprint, and
- hand over with a “Code Too Advanced” error if the cache of varieties will get too massive.

Hopefully this doesn’t come up in observe a lot. There may be good proof that points like this don’t come up in observe from expertise with sort programs like that of Normal ML, which is EXPTIME-complete, however in observe you must exit of your solution to code up Turing Machine tapes as varieties.

## Kind normalization

The algorithm used to resolve semantic subtyping is *sort normalization*. Slightly than being directed by syntax, we first rewrite varieties to be normalized, then examine subtyping on normalized varieties.

A normalized sort is a union of:

- a normalized nil sort (both
`by no means`

or`nil`

) - a normalized quantity sort (both
`by no means`

or`quantity`

) - a normalized boolean sort (both
`by no means`

or`true`

or`false`

or`boolean`

) - a normalized perform sort (both
`by no means`

or an intersection of perform varieties) and so forth

As soon as varieties are normalized, it’s simple to examine semantic subtyping.

Each sort might be normalized (sigh, with some technical restrictions round generic sort packs). The necessary steps are:

- eradicating intersections of mismatched primitives, e.g.
`quantity & bool`

is changed by`by no means`

, and - eradicating unions of capabilities, e.g.
`((quantity?) -> quantity) | ((string?) -> string)`

is changed by`(nil) -> (quantity | string)`

.

For instance, normalizing `(quantity?) & (string?)`

removes `quantity & string`

, so all that’s left is `nil`

.

Our first try at implementing sort normalization utilized it liberally, however this resulted in dreadful efficiency (complicated code went from typechecking in lower than a minute to working in a single day). The rationale for that is annoyingly easy: there may be an optimization in Luau’s subtyping algorithm to deal with reflexivity (`T`

is a subtype of `T`

) that performs an inexpensive pointer equality examine. Kind normalization can convert pointer-identical varieties into semantically-equivalent (however not pointer-identical) varieties, which considerably degrades efficiency.

Due to these efficiency points, we nonetheless use syntactic subtyping as our first examine for subtyping, and solely carry out sort normalization if the syntactic algorithm fails. That is sound, as a result of syntactic subtyping is a conservative approximation to semantic subtyping.

## Pragmatic semantic subtyping

Off-the-shelf semantic subtyping is barely completely different from what’s applied in Luau, as a result of it requires fashions to be *set-theoretic*, which requires that inhabitants of perform varieties “act like capabilities.” There are two explanation why we drop this requirement.

**Firstly**, we normalize perform varieties to an intersection of capabilities, for instance a horrible mess of unions and intersections of capabilities:

((quantity?) -> quantity?) | (((quantity) -> quantity) & ((string?) -> string?))

normalizes to an overloaded perform:

((quantity) -> quantity?) & ((nil) -> (quantity | string)?)

Set-theoretic semantic subtyping doesn’t help this normalization, and as a substitute normalizes capabilities to *disjunctive regular kind* (unions of intersections of capabilities). We don’t do that for ergonomic causes: overloaded capabilities are idiomatic in Luau, however DNF shouldn’t be, and we don’t wish to current customers with such non-idiomatic varieties.

Our normalization depends on rewriting away unions of perform varieties:

((A) -> B) | ((C) -> D) → (A & C) -> (B | D)

This normalization is sound in our mannequin, however not in set-theoretic fashions.

**Secondly**, in Luau, the kind of a perform utility `f(x)`

is `B`

if `f`

has sort `(A) -> B`

and `x`

has sort `A`

. Unexpectedly, this isn’t all the time true in set-theoretic fashions, resulting from uninhabited varieties. In set-theoretic fashions, if `x`

has sort `by no means`

then `f(x)`

has sort `by no means`

. We don’t wish to burden customers with the concept perform utility has a particular nook case, particularly since that nook case can solely come up in useless code.

In set-theoretic fashions, `(by no means) -> A`

is a subtype of `(by no means) -> B`

, it doesn’t matter what `A`

and `B`

are. This isn’t true in Luau.

For these two causes (that are largely about ergonomics reasonably than something technical) we drop the set-theoretic requirement, and use *pragmatic* semantic subtyping.

## Negation varieties

The opposite distinction between Luau’s sort system and off-the-shelf semantic subtyping is that Luau doesn’t help all negated varieties.

The frequent case for wanting negated varieties is in typechecking conditionals:

-- initially x has sort T if (sort(x) == "string") then -- on this department x has sort T & string else -- on this department x has sort T & ~string finish

This makes use of a negated sort `~string`

inhabited by values that aren’t strings.

In Luau, we solely enable this sort of typing refinement on *check varieties* like `string`

, `perform`

, `Half`

and so forth, and *not* on structural varieties like `(A) -> B`

, which avoids the frequent case of common negated varieties.

## Prototyping and verification

Through the design of Luau’s semantic subtyping algorithm, there have been adjustments made (for instance initially we thought we had been going to have the ability to use set-theoretic subtyping). Throughout this time of fast change, it was necessary to have the ability to iterate rapidly, so we initially applied a prototype reasonably than leaping straight to a manufacturing implementation.

Validating the prototype was necessary, since subtyping algorithms can have surprising nook instances. Because of this, we adopted Agda because the prototyping language. In addition to supporting unit testing, Agda helps mechanized verification, so we’re assured within the design.

The prototype doesn’t implement all of Luau, simply the useful subset, however this was sufficient to find delicate characteristic interactions that might most likely have surfaced as difficult-to-fix bugs in manufacturing.

Prototyping shouldn’t be excellent, for instance the primary points that we hit in manufacturing had been about efficiency and the C++ commonplace library, that are by no means going to be caught by a prototype. However the manufacturing implementation was in any other case pretty simple (or at the least as simple as a 3kLOC change might be).

## Subsequent steps

Semantic subtyping has eliminated one supply of false positives, however we nonetheless have others to trace down:

- Overloaded perform purposes and operators
- Property entry on expressions of complicated sort
- Learn-only properties of tables
- Variables that change sort over time (aka typestates)

The hunt to take away spurious purple squiggles continues!

## Acknowledgments

Because of Giuseppe Castagna and Ben Greenman for useful feedback on drafts of this publish.

*Alan coordinates the design and implementation of the Luau sort system, which helps drive lots of the options of growth in Roblox Studio. Dr. Jeffrey has over 30 years of expertise with analysis in programming languages, has been an lively member of quite a few open-source software program initiatives, and holds a DPhil from the College of Oxford, England.*