Language Specification

Overview

ƿit uses DEC64 as its number format. DEC64 represents numbers as coefficient * 10^exponent in a 64-bit word. This eliminates the rounding errors that plague IEEE 754 binary floating point — 0.1 + 0.2 is exactly 0.3.

DEC64 was designed by Douglas Crockford as a general-purpose number type suitable for both business and scientific computation.

Format

A DEC64 number is a 64-bit value:

[coefficient: 56 bits][exponent: 8 bits]

Coefficient — a 56-bit signed integer (two’s complement)
Exponent — an 8-bit signed integer (range: -127 to 127)

The value of a DEC64 number is: coefficient * 10^exponent

Examples

Value	Coefficient	Exponent	Hex
`0`	0	0	`0000000000000000`
`1`	1	0	`0000000000000100`
`3.14159`	314159	-5	`000000004CB2FFFB`
`-1`	-1	0	`FFFFFFFFFFFFFF00`
`1000000`	1	6	`0000000000000106`

Special Values

Null

The exponent 0x80 (-128) indicates null. This is the only special value — there is no infinity, no NaN, no negative zero. Operations that would produce undefined results (such as division by zero) return null.

coefficient: any, exponent: 0x80  →  null

Arithmetic Properties

Exact decimals: All decimal fractions with up to 17 significant digits are represented exactly
No rounding: 0.1 + 0.2 == 0.3 is true
Integer range: Exact integers up to 2^55 (about 3.6 * 10^16)
Normalized on demand: The runtime normalizes coefficients to remove trailing zeros when needed for comparison

Comparison with IEEE 754

Property	DEC64	IEEE 754 double
Decimal fractions	Exact	Approximate
Significant digits	~17	~15-16
Special values	null only	NaN, ±Infinity, -0
Rounding errors	None (decimal)	Common
Financial arithmetic	Correct	Requires libraries
Scientific range	±10^127	±10^308

DEC64 trades a smaller exponent range for exact decimal arithmetic. Most applications never need exponents beyond ±127.

In ƿit

All numbers in ƿit are DEC64. There is no separate integer type at the language level — the distinction is internal. The is_integer function checks whether a number has no fractional part.

var x = 42        // coefficient: 42, exponent: 0
var y = 3.14      // coefficient: 314, exponent: -2
var z = 1000000   // coefficient: 1, exponent: 6 (normalized)

is_integer(x)     // true
is_integer(y)     // false
1 / 0             // null

Overview

Every value in ƿit is a 64-bit word called a JSValue. The runtime uses LSB (least significant bit) tagging to pack type information directly into the value, avoiding heap allocation for common types.

Tag Encoding

The lowest bits of a JSValue determine its type:

LSB Pattern	Type	Payload
`xxxxxxx0`	Integer	31-bit signed integer in upper bits
`xxxxx001`	Pointer	61-bit aligned heap pointer
`xxxxx101`	Short float	8-bit exponent + 52-bit mantissa
`xxxxx011`	Special	5-bit tag selects subtype

Integers

If the least significant bit is 0, the value is an immediate 31-bit signed integer. The integer is stored in the upper bits, extracted via v >> 1.

[integer: 31 bits][0]

Range: -1073741824 to 1073741823. Numbers outside this range are stored as short floats or heap-allocated.

Pointers

If the lowest 3 bits are 001, the value is a pointer to a heap object. The pointer is 8-byte aligned, so the low 3 bits are available for the tag. The actual address is extracted by clearing the low 3 bits.

[pointer: 61 bits][001]

All heap objects (arrays, records, blobs, text, functions, etc.) are referenced through pointer-tagged JSValues.

Short Floats

If the lowest 3 bits are 101, the value encodes a floating-point number directly. The format uses an 8-bit exponent (bias 127) and 52-bit mantissa, similar to IEEE 754 but with reduced range.

[sign: 1][exponent: 8][mantissa: 52][101]

Range: approximately ±3.4 * 10^38. Numbers outside this range fall back to null. Zero is always positive zero.

Specials

If the lowest 2 bits are 11, the next 3 bits select a special type:

5-bit Tag	Value
`00011`	Boolean (true/false in upper bits)
`00111`	Null
`01111`	Exception marker
`10111`	Uninitialized
`11011`	Immediate string
`11111`	Catch offset

Immediate Strings

Short ASCII strings (up to 7 characters) are packed directly into the JSValue without heap allocation:

[char6][char5][char4][char3][char2][char1][char0][length: 3][11011]

Each character occupies 8 bits. The length (0-7) is stored in bits 5-7. Only ASCII characters (0-127) qualify — any non-ASCII character forces heap allocation.

var s = "hello"   // 5 chars, fits in immediate string
var t = ""         // immediate (length 0)
var u = "longtext" // 8 chars, heap-allocated

Null

Null is encoded as a special-tagged value with tag 00111. There is no undefined in ƿit — only null.

var x = null       // special tag null
var y = 1 / 0      // also null (division by zero)
var z = {}.missing // null (missing field)

Boolean

True and false are encoded as specials with tag 00011, distinguished by a bit in the upper payload.

Summary

The tagging scheme ensures that the most common values — small integers, booleans, null, and short strings — require zero heap allocation. This significantly reduces GC pressure and improves cache locality.

Object Header

Every heap-allocated object begins with a 64-bit header word (objhdr_t):

[capacity: 56 bits][flags: 5 bits][type: 3 bits]

Type Field (bits 0-2)

Value	Type	Description
0	`OBJ_ARRAY`	Dynamic array of JSValues
1	`OBJ_BLOB`	Binary data (bits)
2	`OBJ_TEXT`	Unicode text string
3	`OBJ_RECORD`	Key-value object with prototype chain
4	`OBJ_FUNCTION`	Function (C, bytecode, register, or mcode)
5	`OBJ_CODE`	Compiled bytecode
6	`OBJ_FRAME`	Stack frame for closures
7	`OBJ_FORWARD`	Forwarding pointer (GC)

Flags (bits 3-7)

Bit 3 (S) — Stone flag. If set, the object is immutable and excluded from GC.
Bit 4 (P) — Properties flag.
Bit 5 (A) — Array flag.
Bit 7 (R) — Reserved.

Capacity (bits 8-63)

The interpretation of the 56-bit capacity field depends on the object type.

Array

struct JSArray {
  objhdr_t header;     // type=0, capacity=element slots
  word_t len;          // current number of elements
  JSValue values[];    // inline flexible array
};

Capacity is the number of JSValue slots allocated. Length is the number currently in use. Arrays grow by reallocating with a larger capacity.

Blob

struct JSBlob {
  objhdr_t header;     // type=1, capacity=allocated bits
  word_t length;       // length in bits
  uint8_t bits[];      // bit-packed data
};

Blobs are bit-addressable. The length field tracks the exact number of bits written. A blob starts as antestone (mutable) for writing, then becomes stone (immutable) for reading.

Text

struct JSText {
  objhdr_t header;     // type=2, capacity=character slots
  word_t length;       // length in codepoints (or hash if stoned)
  word_t packed[];     // two UTF-32 chars per 64-bit word
};

Text is stored as UTF-32, with two 32-bit codepoints packed per 64-bit word. When a text object is stoned, the length field is repurposed to cache the hash value (computed via fash64), since stoned text is immutable and the hash never changes.

Record

struct JSRecord {
  objhdr_t header;     // type=3, capacity=hash table slots
  JSRecord *proto;     // prototype chain pointer
  word_t len;          // number of entries
  slot slots[];        // key-value pairs (hash table)
};

Records use a hash table with linear probing. Slot 0 is reserved for internal metadata (class ID and record ID). Empty slots use JS_NULL as the key; deleted slots use JS_EXCEPTION as a tombstone.

The prototype chain is a linked list of JSRecord pointers, traversed during property lookup.

Function

struct JSFunction {
  objhdr_t header;     // type=4
  JSValue name;        // function name
  int16_t length;      // arity (-1 for variadic)
  uint8_t kind;        // C, bytecode, register, or mcode
  union {
    struct { ... } cfunc;      // C function pointer
    struct { ... } bytecode;   // bytecode + frame
    struct { ... } regvm;      // register VM code
    struct { ... } mcode;      // mcode IR
  } u;
};

The kind field selects which union variant is active. Functions can be implemented in C (native), bytecode (stack VM), register code (mach VM), or mcode (JSON interpreter).

Frame

struct JSFrame {
  objhdr_t header;     // type=6, capacity=slot count
  JSValue function;    // owning function
  JSValue caller;      // parent frame
  uint32_t return_pc;  // return address
  JSValue slots[];     // [this][args][captured][locals][temps]
};

Frames capture the execution context for closures. The slots array contains the function’s this binding, arguments, captured upvalues, local variables, and temporaries. Frames are linked via the caller field for upvalue resolution across closure depth.

Forwarding Pointer

[pointer: 61 bits][111]

During garbage collection, when an object is copied to the new heap, the old header is replaced with a forwarding pointer to the new location. This is type 7 (OBJ_FORWARD) and stores the new address in bits 3-63. See Garbage Collection for details.

Object Sizing

All objects are aligned to 8 bytes. The total size in bytes for each type:

Type	Size
Array	`8 + 8 + capacity * 8`
Blob	`8 + 8 + ceil(capacity / 8)`
Text	`8 + 8 + ceil(capacity / 2) * 8`
Record	`8 + 8 + 8 + (capacity + 1) * 16`
Function	`sizeof(JSFunction)` (fixed)
Code	`sizeof(JSFunctionBytecode)` (fixed)
Frame	`8 + 8 + 8 + 4 + capacity * 8`

Overview

Stone memory is a separate allocation arena for immutable values. Objects in stone memory are permanent — they are never moved, never freed, and never touched by the garbage collector.

The stone() function in ƿit petrifies a value, deeply freezing it and all its descendants. Stoned objects have the S bit set in their object header.

The Stone Arena

Stone memory uses bump allocation from a contiguous arena:

stone_base ──────── stone_free ──────── stone_end
[allocated objects] [free space        ]

Allocation advances stone_free forward. When the arena is exhausted, overflow pages are allocated via the system allocator and linked together:

struct StonePage {
  struct StonePage *next;
  size_t size;
  uint8_t data[];
};

The S Bit

Bit 3 of the object header is the stone flag. When set:

The object is immutable — writes disrupt
The object is excluded from GC — the collector skips it entirely
For text objects, the length field caches the hash instead of the character count (since the text cannot change, the hash is computed once and reused)

What Gets Stoned

When stone(value) is called:

If the value is already stone, return immediately
Recursively walk all nested values (array elements, record fields, etc.)
Copy each mutable object into the stone arena
Set the S bit on each copied object
Return the stoned value

The operation is deep — an entire object graph becomes permanently immutable.

Text Interning

The stone arena maintains a hash table for text interning. When a text value is stoned, it is looked up in the intern table. If an identical string already exists in stone memory, the existing one is reused. This deduplicates strings and makes equality comparison O(1) for stoned text.

The hash is computed with fash64 over the packed UTF-32 words.

Usage Patterns

Module Return Values

Every module’s return value is automatically stoned:

// config.cm
return {
  debug: true,
  timeout: 30
}
// The returned object is stone — shared safely between actors

Message Passing

Messages between actors are stoned before delivery, ensuring actors never share mutable state.

Constants

Literal objects and arrays that can be determined at compile time may be allocated directly in stone memory.

Relationship to GC

The Cheney copying collector only operates on the mutable heap. During collection, when the collector encounters a pointer to stone memory (S bit set), it skips it — stone objects are roots that never move. This means stone memory acts as a permanent root set with zero GC overhead.

Overview

ƿit uses a Cheney copying collector for automatic memory management. Each actor has its own independent heap — actors never share mutable memory, so garbage collection is per-actor with no global pauses.

Algorithm

The Cheney algorithm is a two-space copying collector:

Allocate new space — a fresh memory block for the new heap
Copy roots — copy all live root objects from old space to new space
Scan — walk the new space, updating all internal references
Free old space — the entire old heap is freed at once

Copying and Forwarding

When an object is copied from old space to new space:

The object’s data is copied to the next free position in new space
The old object’s header is overwritten with a forwarding pointer (OBJ_FORWARD) containing the new address
Future references to the old address find the forwarding pointer and follow it to the new location

Old space:                 New space:
┌──────────────┐          ┌──────────────┐
│ OBJ_FORWARD ─┼────────> │ copied object│
│ (new addr)   │          │              │
└──────────────┘          └──────────────┘

Scan Phase

After roots are copied, the collector scans new space linearly. For each object, it examines every JSValue field:

If the field points to old space, copy the referenced object (or follow its forwarding pointer if already copied)
If the field points to stone memory, skip it (stone objects are permanent)
If the field is an immediate value (integer, boolean, null, immediate string), skip it

The scan continues until the scan pointer catches up with the allocation pointer — at that point, all live objects have been found and copied.

Roots

The collector traces from these root sources:

Global object — all global variables
Class prototypes — built-in type prototypes
Exception — the current exception value
Value stack — all values on the operand stack
Frame stack — all stack frames (bytecode and register VM)
GC reference stack — manually registered roots (via JS_PUSH_VALUE / JS_POP_VALUE)
Parser constant pool — during compilation, constants being built

Per-Actor Heaps

Each actor maintains its own heap with independent collection:

No stop-the-world pauses across actors
No synchronization between collectors
Each actor’s GC runs at the end of a turn (between message deliveries)
Heap sizes adapt independently based on each actor’s allocation patterns

Heap Growth

The collector uses a buddy allocator for heap blocks. After each collection, if less than 20% of the heap was recovered, the next block size is doubled. The new space size is: max(live_estimate + alloc_size, next_block_size).

All allocations within a heap block use bump allocation (advance a pointer), which is extremely fast.

Alignment

All objects are aligned to 8-byte boundaries. Object sizes are rounded up to ensure this alignment, which guarantees that the low 3 bits of any heap pointer are always zero — available for JSValue tag bits.

Interaction with Stone Memory

Stone memory objects (S bit set) are never copied by the collector. When the scanner encounters a pointer to stone memory, it leaves it unchanged. This means:

Stone objects are effectively permanent GC roots
No overhead for tracing through immutable object graphs
Module return values and interned strings impose zero GC cost

Overview

The bytecode VM is a stack-based virtual machine. Instructions operate on an implicit operand stack, pushing and popping values. This is the original execution backend for ƿit.

Compilation Pipeline

Source → Tokenize → Parse (AST) → Bytecode → Link → Execute

The compiler emits JSFunctionBytecode objects containing opcode sequences, constant pools, and debug information.

Instruction Categories

Value Loading

Opcode	Description
`push_i32`	Push a 32-bit immediate integer
`push_const`	Push a value from the constant pool
`null`	Push null
`push_false`	Push false
`push_true`	Push true

Stack Manipulation

Opcode	Description
`drop`	Remove top of stack
`dup`	Duplicate top of stack
`dup1` / `dup2` / `dup3`	Duplicate item at depth
`swap`	Swap top two items
`rot3l` / `rot3r`	Rotate top three items
`insert2` / `insert3`	Insert top item deeper
`nip`	Remove second item

Variable Access

Opcode	Description
`get_var`	Load variable by name (pre-link)
`put_var`	Store variable by name (pre-link)
`get_loc` / `put_loc`	Access local variable by index
`get_arg` / `put_arg`	Access function argument by index
`get_env_slot` / `set_env_slot`	Access closure variable (post-link)
`get_global_slot` / `set_global_slot`	Access global variable (post-link)

Variable access opcodes are patched during linking. get_var instructions are rewritten to get_loc, get_env_slot, or get_global_slot depending on where the variable is resolved.

Arithmetic

Opcode	Description
`add` / `sub` / `mul` / `div`	Basic arithmetic
`mod` / `pow`	Modulo and power
`neg` / `inc` / `dec`	Unary operations
`add_loc` / `inc_loc` / `dec_loc`	Optimized local variable update

Comparison and Logic

Opcode	Description
`strict_eq` / `strict_neq`	Equality (ƿit uses strict only)
`lt` / `lte` / `gt` / `gte`	Ordered comparison
`not` / `lnot`	Logical / bitwise not
`and` / `or` / `xor`	Bitwise operations

Control Flow

Opcode	Description
`goto`	Unconditional jump
`if_true` / `if_false`	Conditional jump
`goto8` / `goto16`	Short jumps (size-optimized)
`if_true8` / `if_false8`	Short conditional jumps
`catch`	Set exception handler

Function Calls

Opcode	Description
`call`	Call function with N arguments
`tail_call`	Tail-call optimization
`call_method`	Call method on object
`return`	Return value from function
`return_undef`	Return null from function
`throw`	Throw exception (disrupt)

Property Access

Opcode	Description
`get_field`	Get named property
`put_field`	Set named property
`get_array_el`	Get computed property
`put_array_el`	Set computed property
`define_field`	Define property during object literal

Object Creation

Opcode	Description
`object`	Create new empty object
`array_from`	Create array from stack values

Bytecode Patching

During the link/integrate phase, symbolic variable references are resolved to concrete access instructions. This is a critical optimization — the interpreter does not perform name lookups at runtime.

A get_var "x" instruction becomes:

get_loc 3 — if x is local variable at index 3
get_env_slot 1, 5 — if x is captured from outer scope (depth 1, slot 5)
get_global_slot 7 — if x is a global

Overview

The Mach VM is a register-based virtual machine using 32-bit instructions. It is modeled after Lua’s register VM — operands are register indices rather than stack positions, reducing instruction count and improving performance.

Instruction Formats

All instructions are 32 bits wide. Four encoding formats are used:

iABC — Three-Register

[op: 8][A: 8][B: 8][C: 8]

Used for operations on three registers: R(A) = R(B) op R(C).

iABx — Register + Constant

[op: 8][A: 8][Bx: 16]

Used for loading constants: R(A) = K(Bx).

iAsBx — Register + Signed Offset

[op: 8][A: 8][sBx: 16]

Used for conditional jumps: if R(A) then jump by sBx.

isJ — Signed Jump

[op: 8][sJ: 24]

Used for unconditional jumps with a 24-bit signed offset.

Registers

Each function frame has a fixed number of register slots, determined at compile time. Registers hold:

R(0) — this binding
R(1)..R(arity) — function arguments
R(arity+1).. — local variables and temporaries

Instruction Set

Loading

Opcode	Format	Description
`LOADK`	iABx	`R(A) = K(Bx)` — load from constant pool
`LOADI`	iAsBx	`R(A) = sBx` — load small integer
`LOADNULL`	iA	`R(A) = null`
`LOADTRUE`	iA	`R(A) = true`
`LOADFALSE`	iA	`R(A) = false`
`MOVE`	iABC	`R(A) = R(B)` — register copy

Arithmetic

Opcode	Format	Description
`ADD`	iABC	`R(A) = R(B) + R(C)`
`SUB`	iABC	`R(A) = R(B) - R(C)`
`MUL`	iABC	`R(A) = R(B) * R(C)`
`DIV`	iABC	`R(A) = R(B) / R(C)`
`MOD`	iABC	`R(A) = R(B) % R(C)`
`POW`	iABC	`R(A) = R(B) ^ R(C)`
`NEG`	iABC	`R(A) = -R(B)`
`INC`	iABC	`R(A) = R(B) + 1`
`DEC`	iABC	`R(A) = R(B) - 1`

Comparison

Opcode	Format	Description
`EQ`	iABC	`R(A) = R(B) == R(C)`
`NEQ`	iABC	`R(A) = R(B) != R(C)`
`LT`	iABC	`R(A) = R(B) < R(C)`
`LE`	iABC	`R(A) = R(B) <= R(C)`
`GT`	iABC	`R(A) = R(B) > R(C)`
`GE`	iABC	`R(A) = R(B) >= R(C)`

Property Access

Opcode	Format	Description
`GETFIELD`	iABC	`R(A) = R(B)[K(C)]` — named property
`SETFIELD`	iABC	`R(A)[K(B)] = R(C)` — set named property
`GETINDEX`	iABC	`R(A) = R(B)[R(C)]` — computed property
`SETINDEX`	iABC	`R(A)[R(B)] = R(C)` — set computed property

Variable Resolution

Opcode	Format	Description
`GETNAME`	iABx	Unresolved variable (compiler placeholder)
`GETINTRINSIC`	iABx	Global intrinsic / built-in
`GETENV`	iABx	Module environment variable
`GETUP`	iABC	`R(A) = UpFrame(B).slots[C]` — closure upvalue
`SETUP`	iABC	`UpFrame(A).slots[B] = R(C)` — set closure upvalue

Control Flow

Opcode	Format	Description
`JMP`	isJ	Unconditional jump
`JMPTRUE`	iAsBx	Jump if `R(A)` is true
`JMPFALSE`	iAsBx	Jump if `R(A)` is false
`JMPNULL`	iAsBx	Jump if `R(A)` is null

Function Calls

Opcode	Format	Description
`CALL`	iABC	Call `R(A)` with `B` args starting at `R(A+1)`, `C`=keep result
`RETURN`	iA	Return `R(A)`
`RETNIL`	—	Return null
`CLOSURE`	iABx	Create closure from function pool entry `Bx`

Object / Array

Opcode	Format	Description
`NEWOBJECT`	iA	`R(A) = {}`
`NEWARRAY`	iABC	`R(A) = array(B)`
`PUSH`	iABC	Push `R(B)` to array `R(A)`

JSCodeRegister

The compiled output for a function:

struct JSCodeRegister {
  uint16_t arity;           // argument count
  uint16_t nr_slots;        // total register count
  uint32_t cpool_count;     // constant pool size
  JSValue *cpool;           // constant pool
  uint32_t instr_count;     // instruction count
  MachInstr32 *instructions; // 32-bit instruction array
  uint32_t func_count;      // nested function count
  JSCodeRegister **functions; // nested function table
  JSValue name;             // function name
  uint16_t disruption_pc;   // exception handler offset
};

The constant pool holds all non-immediate values referenced by LOADK instructions: strings, large numbers, and other constants.

Overview

Mcode is a JSON-based intermediate representation that can be interpreted directly. It represents the same operations as the Mach register VM but uses string-based instruction dispatch rather than binary opcodes. Mcode is intended as an intermediate step toward native code compilation.

Pipeline

Source → Tokenize → Parse (AST) → Mcode (JSON) → Interpret
                                                → Compile to Mach (planned)
                                                → Compile to native (planned)

Mcode is produced by the JS_Mcode compiler pass, which emits a cJSON tree. The mcode interpreter walks this tree directly, dispatching on instruction name strings.

JSMCode Structure

struct JSMCode {
  uint16_t nr_args;        // argument count
  uint16_t nr_slots;       // register count
  cJSON **instrs;          // pre-flattened instruction array
  uint32_t instr_count;    // number of instructions

  struct {
    const char *name;      // label name
    uint32_t index;        // instruction index
  } *labels;
  uint32_t label_count;

  struct JSMCode **functions; // nested functions
  uint32_t func_count;

  cJSON *json_root;        // keeps JSON alive
  const char *name;        // function name
  const char *filename;    // source file
  uint16_t disruption_pc;  // exception handler offset
};

Instruction Format

Each instruction is a JSON array. The first element is the instruction name (string), followed by operands:

["LOADK", 0, 42]
["ADD", 2, 0, 1]
["JMPFALSE", 3, "else_label"]
["CALL", 0, 2, 1]

The instruction set mirrors the Mach VM opcodes — same operations, same register semantics, but with string dispatch instead of numeric opcodes.

Labels

Control flow uses named labels instead of numeric offsets:

["LABEL", "loop_start"]
["ADD", 1, 1, 2]
["JMPFALSE", 3, "loop_end"]
["JMP", "loop_start"]
["LABEL", "loop_end"]

Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution.

Differences from Mach

Property	Mcode	Mach
Instructions	cJSON arrays	32-bit binary
Dispatch	String comparison	Switch on opcode byte
Constants	Inline in JSON	Separate constant pool
Jump targets	Named labels	Numeric offsets
Memory	Heap (cJSON nodes)	Off-heap (malloc)

Purpose

Mcode serves as an inspectable, debuggable intermediate format:

Human-readable — the JSON representation can be printed and examined
Language-independent — any tool that produces the correct JSON can target the ƿit runtime
Compilation target — the Mach compiler can consume mcode as input, and future native code generators can work from the same representation

The cost of string-based dispatch makes mcode slower than the binary Mach VM, so it is primarily useful during development and as a compilation intermediate rather than for production execution.