Tagged union
In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which type is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as that value, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.
Description
[edit]Tagged unions are most important in functional programming languages such as ML and Haskell, where they are called datatypes (see algebraic data type) and the compiler can verify that all cases of a tagged union are always handled, avoiding many types of errors. Compile-time checked sum types are also extensively used in Rust, where they are called enum. They can, however, be constructed in nearly any programming language, and are much safer than untagged unions, often simply called unions, which are similar but do not explicitly track which member of a union is currently in use.
Tagged unions are often accompanied by the concept of a constructor, which is similar but not the same as a constructor for a class. A constructor is a function or an expression that produces a value of the tagged union type, given a tag and a value of the corresponding type.
Mathematically, tagged unions correspond to disjoint or discriminated unions, usually written using +. Given an element of a disjoint union , it is possible to determine whether it came from or . If an element lies in both, there will be two effectively distinct copies of the value in , one from and one from .
In type theory, a tagged union is called a sum type. Sum types are the dual of product types. Notations vary, but usually the sum type comes with two introduction forms (injections) and . The elimination form is case analysis, known as pattern matching in ML-style languages: if has type and and have type under the assumptions and respectively, then the term has type . The sum type corresponds to intuitionistic logical disjunction under the Curry–Howard correspondence.
An enumerated type can be seen as a degenerate case: a tagged union of unit types. It corresponds to a set of nullary constructors and may be implemented as a simple tag variable, since it holds no additional data besides the value of the tag.
Many programming techniques and data structures, including rope, lazy evaluation, class hierarchy (see below), arbitrary-precision arithmetic, CDR coding, the indirection bit, and other kinds of tagged pointers, are usually implemented using some sort of tagged union.
A tagged union can be seen as the simplest kind of self-describing data format. The tag of the tagged union can be seen as the simplest kind of metadata.
In languages with flow-sensitive typing, tagged unions can be implemented by a combination of union types and record types.[1]
Advantages and disadvantages
[edit]The primary advantage of a tagged union over an untagged union is that all accesses are safe, and the compiler can even check that all cases are handled. Untagged unions depend on program logic to correctly identify the currently active field, which may result in strange behavior and hard-to-find bugs if that logic fails.
The primary advantage of a tagged union over a simple record containing a field for each type is that it saves storage by overlapping storage for all the types. Some implementations reserve enough storage for the largest type, while others dynamically adjust the size of a tagged union value as needed. When the value is immutable, it is simple to allocate just as much storage as is needed.
The main disadvantage of tagged unions is that the tag occupies space. Since there are usually a small number of alternatives, the tag can often be squeezed into 2 or 3 bits wherever space can be found, but sometimes even these bits are not available. In this case, a helpful alternative may be folded, computed or encoded tags, where the tag value is dynamically computed from the contents of the union field. Common examples are the use of reserved values, where, for example, a function returning a positive number may return -1 to indicate failure, and sentinel values, most often used in tagged pointers.
Sometimes, untagged unions are used to perform bit-level conversions between types, or type punning. Tagged unions are not intended for this purpose; typically a new value is assigned whenever the tag is changed.
Many languages support, to some extent, a universal data type, which is a type that includes every value of every other type, and often a way is provided to test the actual type of a value of the universal type. These are sometimes referred to as variants. While universal data types are comparable to tagged unions in their formal definition, typical tagged unions include a relatively small number of cases, and these cases form different ways of expressing a single coherent concept, such as a data structure node or instruction. Also, there is an expectation that every possible case of a tagged union will be dealt with when it is used. The values of a universal data type are not related and there is no feasible way to deal with them all.
Like option types and exception handling, tagged unions are sometimes used to handle the occurrence of exceptional results. Often these tags are folded into the type as reserved values, and their occurrence is not consistently checked: this is a fairly common source of programming errors. This use of tagged unions can be formalized as a monad with the following functions:
where and are the constructors of the union type, and are valid result types and is the type of error conditions. Alternately, the same monad may be described by and two additional functions, and :
Examples
[edit]Say we wanted to build a binary tree of integers. In ML, we would do this by creating a datatype like this:
datatype tree = Leaf
| Node of (int * tree * tree)
This is a tagged union with two cases: one, the leaf, is used to terminate a path of the tree, and functions much like a null value would in imperative languages. The other branch holds a node, which contains an integer and a left and right subtree. Leaf and Node are the constructors, which enable us to actually produce a particular tree, such as:
Node(5, Node(1, Leaf, Leaf), Node(3, Leaf, Node(4, Leaf, Leaf)))
which corresponds to this tree:

Now we can easily write a typesafe function that, for example, counts the number of nodes in the tree:
fun countNodes(Leaf) = 0
| countNodes(Node(int, left, right)) =
1 + countNodes(left) + countNodes(right)
Language support
[edit]ALGOL 68
[edit]In ALGOL 68, tagged unions are called united modes, the tag is implicit, and the case construct is used to determine which field is tagged:
mode node = union (real, int, compl, string);
Usage example for union case of node:
node n := "1234";
case n in
(real r): print(("real:", r)),
(int i): print(("int:", i)),
(compl c): print(("compl:", c)),
(string s): print(("string:", s))
out print(("?:", n))
esac
In ALGOL 68, a union can be automatically coerced into a wider union, for example if all its constituents can be handled by the union parameter of print, a union can simply be passed to print as in the out case above.
Ada
[edit]In Ada, these are called "discriminated types".
type Shape_Kind is (Square, Rectangle, Circle);
type Shape (Kind : Shape_Kind) is record
Center_X : Integer;
Center_Y : Integer;
case Kind is
when Square =>
Side : Integer;
when Rectangle =>
Width, Height : Integer;
when Circle =>
Radius : Integer;
end case;
end record;
-- Any attempt to access a member which existence depends
-- on a certain value of the discriminant, while the
-- discriminant is not the expected one, raises an error.
C++
[edit]While C only provides union, an untagged union type, C++ offers a class std::variant<Ts...> (since C++17), but rather than being a core language feature like union, it is a standard library class.
import std;
using std::string;
using std::variant;
struct Cat {
string name;
};
struct Dog {
string name;
};
struct Bird {
string name;
};
using Pet = variant<Cat, Dog, Bird>;
Pet p1 = Cat("Whiskers");
Pet p2 = Dog("Rex");
The "overload pattern" is a common design pattern that implements algebraic pattern matching using variadic template inheritance.[2]
// Helper type for visitor
template <typename... Ts>
struct Overload : public Ts... {
using Ts::operator()...;
}
// Deduction guide
template <typename... Ts>
Overload(Ts...) -> Overload<Ts...>;
Then, the std::visit() function may be used to call each respective case, with each case handled by a lambda:[3]
using Value = variant<int, double, string>;
Value v = 3.14;
// prints "double: 3.14"
std::visit(
Overload {
[](int i) -> void { std::println("int: {}", i); },
[](double d) -> void { std::println("double: {}", d); },
[](const string& s) -> void { std::println("string: {}", s); }
},
v
);
Additionally, std::visit() may be made to return a type:
Pet p = Bird("Polly");
string name = std::visit(
Overload {
[](const Cat& c) -> string { return c.name; },
[](const Dog& d) -> string { return d.name; },
[](const Bird& b) -> string { return b.name; }
},
p
);
// Prints "Pet name: Polly"
std::println("Pet name: {}", name);
The result type for error handling (like std::result::Result<T, E> in Rust), was introduced to C++23 as std::expected<T, E>.
C#
[edit]C# traditionally did not have tagged unions. The closest way to approximate these was through pattern matching over record types. With C# 15, tagged unions were introduced with the keyword union.[4]
record class Car(string Model);
record class Bicycle(string Model);
record class Bus(string Model);
union Vehicle(Car, Bicycle, Bus);
Vehicle car = new Car("Tesla Model 3");
Console.WriteLine(car.Value); // Car { Model = Tesla Model 3 }
Vehicle bike = new Bicycle("Giant Escape 3");
Console.WriteLine(bike.Value); // Bicycle { Model = Giant Escape 3 }
Vehicle bus = new Bus("Volvo 9700");
Console.WriteLine(bus.Value); // Bus { Model = Volvo 9700 }
Vehicle v = /* some vehicle here */;
string model = v switch
{
Car c => c.Model,
Bicycle bk => b.Model,
Bus bs => bs.Model,
};
The default value of a union is null, but if all types in the union are non-nullable, then a switch expression need not check for null.
Cyclone
[edit]Cyclone, a dialect of C with safety enhancements, offers tagged unions.[5] These were declared with a @tagged qualifier.
@tagged union Foo {
int i;
double d;
char* @fat s;
};
void printFoo(union Foo x) {
// A missed case would be warned by the compiler
switch (x) {
case { .i = i }:
printf("%d", i);
break;
case { .d = d }:
printf("%g", d);
break;
case { .s = s }:
printf("%s", s);
break;
}
}
D
[edit]D provides the std.variant module, with types like Variant (representing an any type), and std.variant.Algebraic!(T...) (representing an algebraic data type).[6]
import std.variant;
Algebraic!(int, string) v = 10;
int result = v.visit!(
(string s) => cast(int) s.length,
(int i) => i,
() => -1
)();
writeln(result); // 10
F#
[edit]F# has discriminated unions:
type Tree =
| Leaf
| Node of value: int * left: Tree * right: Tree
let tree = Node(5, Node(1, Leaf, Leaf), Node(3, Leaf, Node(4, Leaf, Leaf)))
Because the defined cases are exhaustive, the compiler can check that all cases are handled in a pattern match:
match tree with
| Node (x, _, _) -> printfn "top level node value: %i" x
| Leaf -> printfn "top level node is a leaf"
Haxe
[edit]Haxe's enums also work as tagged unions:[7]
enum Color {
Red;
Green;
Blue;
Rgb(r:Int, g:Int, b:Int);
}
These can be matched using a switch expression:
switch (color) {
case Red: trace("Color was red");
case Green: trace("Color was green");
case Blue: trace("Color was blue");
case Rgb(r, g, b): trace("Color had a red value of " +r);
}
Java
[edit]In Java, the closest way to implement a tagged union is through a sealed class, which directly constrains what types may inherit the class. Pattern matching itself may be done over switch expressions.[8]
sealed interface Shape permits Circle, Rectangle, Triangle {}
record Circle(double radius) implements Shape {}
record Rectangle(double width, double height) implements Shape {}
record Triangle(double base, double height) implements Shape {}
double area(Shape s) {
return switch (shape) {
case Circle(double r) -> Math.PI * r * r;
case Rectangle(double w, double h) -> w * h;
case Triangle(double b, double h) -> 0.5 * b * h;
};
}
Standard ML
[edit]In Standard ML, a tagged union is an "algebraic data type" or a "sum type".
datatype shape = Circle of real
| Rectangle of real * real
| Point
fun area s =
case s of
Circle r => 3.1415926535 * r * r
| Rectangle (w, h) => w * h
| Point => 0.0
Nim
[edit]Nim has object variants[9] similar in declaration to those in Pascal and Ada:
type
ShapeKind = enum
skSquare, skRectangle, skCircle
Shape = object
centerX, centerY: int
case kind: ShapeKind
of skSquare:
side: int
of skRectangle:
length, height: int
of skCircle:
radius: int
Macros can be used to emulate pattern matching or to create syntactic sugar for declaring object variants, seen here as implemented by the package patty:
import patty
proc `~`[A](a: A): ref A =
new(result)
result[] = a
variant List[A]:
Nil
Cons(x: A, xs: ref List[A])
proc listHelper[A](xs: seq[A]): List[A] =
if xs.len == 0: Nil[A]()
else: Cons(xs[0], ~listHelper(xs[1 .. xs.high]))
proc list[A](xs: varargs[A]): List[A] = listHelper(@xs)
proc sum(xs: List[int]): int = (block:
match xs:
Nil: 0
Cons(y, ys): y + sum(ys[])
)
echo sum(list(1, 2, 3, 4, 5))
OCaml
[edit]In OCaml, tagged union syntax varies slightly from Standard ML, but is still roughly the same.
type shape =
| Circle of float
| Rectangle of float * float
| Point
let area = function
| Circle r -> Float.pi *. r *. r
| Rectangle (w, h) -> w *. h
| Point -> 0.0
Pascal
[edit]In Pascal, these are called "variant records".
type shapeKind = (square, rectangle, circle);
shape = record
centerx : integer;
centery : integer;
case kind : shapeKind of
square : (side : integer);
rectangle : (width, height : integer);
circle : (radius : integer);
end;
Python
[edit]Python 3.9 introduces support for typing annotations that can be used to define a tagged union type (PEP-593[10]):
from typing import Annotated, TypedDict
Currency = Annotated[
TypedDict('Currency', {'dollars': float, 'pounds': float}, total=False),
TaggedUnion,
]
Rust
[edit]The Rust language has extensive support for tagged unions, called enums.[11] For example:
enum Tree {
Leaf,
Node(i64, Box<Tree>, Box<Tree>)
}
It also allows matching on unions:
let tree: Tree = Tree::Node(
2,
Box::new(Tree::Node(0, Box::new(Tree::Leaf), Box::new(Tree::Leaf))),
Box::new(Tree::Node(3, Box::new(Tree::Leaf),
Box::new(Tree::Node(4, Box::new(Tree::Leaf), Box::new(Tree::Leaf)))))
);
fn add_values(tree: Tree) -> i64 {
match tree {
Tree::Node(v, a, b) => v + add_values(*a) + add_values(*b),
Tree::Leaf => 0
}
}
fn main() {
assert_eq!(add_values(tree), 9);
}
Rust's error handling model relies extensively on these tagged unions, especially the std::option::Option<T> type, which is either None or Some(T), and the std::result::Result<T, E> type, which is either Ok(T) or Err(E).[12]
Scala
[edit]Scala has case classes:
sealed abstract class Tree
case object Leaf extends Tree
case class Node(value: Int, left: Tree, right: Tree) extends Tree
val tree = Node(5, Node(1, Leaf, Leaf), Node(3, Leaf, Node(4, Leaf, Leaf)))
Because the class hierarchy is sealed, the compiler can check that all cases are handled in a pattern match:
tree match {
case Node(x, _, _) => println("top level node value: " + x)
case Leaf => println("top level node is a leaf")
}
Scala's case classes also permit reuse through subtyping:
sealed abstract class Shape(centerX: Int, centerY: Int)
case class Square(side: Int, centerX: Int, centerY: Int) extends Shape(centerX, centerY)
case class Rectangle(length: Int, height: Int, centerX: Int, centerY: Int) extends Shape(centerX, centerY)
case class Circle(radius: Int, centerX: Int, centerY: Int) extends Shape(centerX, centerY)
Enums are added in Scala 3,[13] allowing us to rewrite the earlier Scala examples more concisely:
enum Tree[+T]:
case Leaf
case Node(x: Int, left: Tree[T], right: Tree[T])
enum Shape(centerX: Int, centerY: Int):
case Square(side: Int, centerX: Int, centerY: Int) extends Shape(centerY, centerX)
case Rectangle(length: Int, height: Int, centerX: Int, centerY: Int) extends Shape(centerX, centerY)
case Circle(radius: Int, centerX: Int, centerY: Int) extends Shape(centerX, centerY)
Swift
[edit]Swift also has substantial support for tagged unions via enumerations.[14] For example:
enum Tree {
case leaf
indirect case node(Int, Tree, Tree)
}
let tree = Tree.node(
2,
.node(0, .leaf, .leaf),
.node(3, .leaf, .node(4, .leaf, .leaf))
)
func add_values(_ tree: Tree) -> Int {
switch tree {
case let .node(v, a, b):
return v + add_values(a) + add_values(b)
case .leaf:
return 0
}
}
assert(add_values(tree) == 9)
TypeScript
[edit]With TypeScript it is also possible to create tagged unions. For example:
interface Leaf {
kind: "leaf";
}
interface Node {
kind: "node";
value: number;
left: Tree;
right: Tree;
}
type Tree = Leaf | Node;
const root: Tree = {
kind: "node",
value: 5,
left: {
kind: "node",
value: 1,
left: { kind: "leaf" },
right: { kind: "leaf" }
},
right: {
kind: "node",
value: 3,
left: { kind: "leaf" },
right: {
kind: "node",
value: 4,
left: { kind: "leaf" },
right: { kind: "leaf" }
}
}
};
function visit(tree: Tree) {
switch (tree.kind) {
case "leaf":
break;
case "node":
console.log(tree.value);
visit(tree.left);
visit(tree.right);
break;
}
}
Class hierarchies as tagged unions
[edit]In a typical class hierarchy in object-oriented programming, each subclass can encapsulate data unique to that class. The metadata used to perform virtual method lookup (for example, the object's vtable pointer in most C++ implementations) identifies the subclass and so effectively acts as a tag identifying the data stored by the instance (see RTTI). An object's constructor sets this tag, and it remains constant throughout the object's lifetime.
Nevertheless, a class hierarchy involves true subtype polymorphism. It can be extended by creating further subclasses of the same base type, which could not be handled correctly under a tag/dispatch model. Hence, it is usually not possible to do case analysis or dispatch on a subobject's 'tag' as one would for tagged unions. Some languages such as Scala allow base classes to be "sealed", and unify tagged unions with sealed base classes.
See also
[edit]- Discriminator, the type tag for discriminated unions in CORBA
- Variant type (COM)
References
[edit]- ↑ https://arxiv.org/pdf/2111.03354 p. 8
- ↑ Bartlomiej Filipek (10 November 2023). "2 Lines of Code and 3 C++17 Features - The Overload Pattern". isocpp.org. ISO C++.
- ↑ cppreference.com (29 June 2026). "std::visit". cppreference.com. cppreference.com.
- ↑ Bill Wagner (2 April 2026). "Explore union types in C# 15". devblogs.microsoft.com. Microsoft.
- ↑ "Cyclone: Tagged Unions".
- ↑ Andrei Alexandrescu (29 June 2026). "std.variant". dlang.org. D Language.
- ↑ "Using Enums - Haxe - The Cross-platform Toolkit". Haxe Foundation.
- ↑ Oracle Corporation (29 June 2026). "Switch Expressions". docs.oracle.com. Oracle Corporation.
- ↑ "Nim Manual". nim-lang.org. Retrieved 2020-01-23.
- ↑ "PEP 593 -- Flexible function and variable annotations". Python.org. Retrieved 2021-06-20.
- ↑ "The Rust Programming Language". Mozilla.
- ↑ "Rust By Example". Mozilla.
- ↑ "Scala 3 Language Reference: Enumerations". The Scala Team.
- ↑ "Enumerations — The Swift Programming Language (Swift 5.4)". docs.swift.org. Retrieved 2021-04-28.
