Language Specification

Overview

ƿit uses DEC64 as its number format. DEC64 represents numbers as coefficient * 10^exponent in a 64-bit word. This eliminates the rounding errors that plague IEEE 754 binary floating point — 0.1 + 0.2 is exactly 0.3.

DEC64 was designed by Douglas Crockford as a general-purpose number type suitable for both business and scientific computation.

Format

A DEC64 number is a 64-bit value:

[coefficient: 56 bits][exponent: 8 bits]
  • Coefficient — a 56-bit signed integer (two’s complement)
  • Exponent — an 8-bit signed integer (range: -127 to 127)

The value of a DEC64 number is: coefficient * 10^exponent

Examples

ValueCoefficientExponentHex
0000000000000000000
1100000000000000100
3.14159314159-5000000004CB2FFFB
-1-10FFFFFFFFFFFFFF00
1000000160000000000000106

Special Values

Null

The exponent 0x80 (-128) indicates null. This is the only special value — there is no infinity, no NaN, no negative zero. Operations that would produce undefined results (such as division by zero) return null.

coefficient: any, exponent: 0x80  →  null

Arithmetic Properties

  • Exact decimals: All decimal fractions with up to 17 significant digits are represented exactly
  • No rounding: 0.1 + 0.2 == 0.3 is true
  • Integer range: Exact integers up to 2^55 (about 3.6 * 10^16)
  • Normalized on demand: The runtime normalizes coefficients to remove trailing zeros when needed for comparison

Comparison with IEEE 754

PropertyDEC64IEEE 754 double
Decimal fractionsExactApproximate
Significant digits~17~15-16
Special valuesnull onlyNaN, ±Infinity, -0
Rounding errorsNone (decimal)Common
Financial arithmeticCorrectRequires libraries
Scientific range±10^127±10^308

DEC64 trades a smaller exponent range for exact decimal arithmetic. Most applications never need exponents beyond ±127.

In ƿit

All numbers in ƿit are DEC64. There is no separate integer type at the language level — the distinction is internal. The is_integer function checks whether a number has no fractional part.

var x = 42        // coefficient: 42, exponent: 0
var y = 3.14      // coefficient: 314, exponent: -2
var z = 1000000   // coefficient: 1, exponent: 6 (normalized)

is_integer(x)     // true
is_integer(y)     // false
1 / 0             // null

Overview

Every value in ƿit is a 64-bit word called a JSValue. The runtime uses LSB (least significant bit) tagging to pack type information directly into the value, avoiding heap allocation for common types.

Tag Encoding

The lowest bits of a JSValue determine its type:

LSB PatternTypePayload
xxxxxxx0Integer31-bit signed integer in upper bits
xxxxx001Pointer61-bit aligned heap pointer
xxxxx101Short float8-bit exponent + 52-bit mantissa
xxxxx011Special5-bit tag selects subtype

Integers

If the least significant bit is 0, the value is an immediate 31-bit signed integer. The integer is stored in the upper bits, extracted via v >> 1.

[integer: 31 bits][0]

Range: -1073741824 to 1073741823. Numbers outside this range are stored as short floats or heap-allocated.

Pointers

If the lowest 3 bits are 001, the value is a pointer to a heap object. The pointer is 8-byte aligned, so the low 3 bits are available for the tag. The actual address is extracted by clearing the low 3 bits.

[pointer: 61 bits][001]

All heap objects (arrays, records, blobs, text, functions, etc.) are referenced through pointer-tagged JSValues.

Short Floats

If the lowest 3 bits are 101, the value encodes a floating-point number directly. The format uses an 8-bit exponent (bias 127) and 52-bit mantissa, similar to IEEE 754 but with reduced range.

[sign: 1][exponent: 8][mantissa: 52][101]

Range: approximately ±3.4 * 10^38. Numbers outside this range fall back to null. Zero is always positive zero.

Specials

If the lowest 2 bits are 11, the next 3 bits select a special type:

5-bit TagValue
00011Boolean (true/false in upper bits)
00111Null
01111Exception marker
10111Uninitialized
11011Immediate string
11111Catch offset

Immediate Strings

Short ASCII strings (up to 7 characters) are packed directly into the JSValue without heap allocation:

[char6][char5][char4][char3][char2][char1][char0][length: 3][11011]

Each character occupies 8 bits. The length (0-7) is stored in bits 5-7. Only ASCII characters (0-127) qualify — any non-ASCII character forces heap allocation.

var s = "hello"   // 5 chars, fits in immediate string
var t = ""         // immediate (length 0)
var u = "longtext" // 8 chars, heap-allocated

Null

Null is encoded as a special-tagged value with tag 00111. There is no undefined in ƿit — only null.

var x = null       // special tag null
var y = 1 / 0      // also null (division by zero)
var z = {}.missing // null (missing field)

Boolean

True and false are encoded as specials with tag 00011, distinguished by a bit in the upper payload.

Summary

The tagging scheme ensures that the most common values — small integers, booleans, null, and short strings — require zero heap allocation. This significantly reduces GC pressure and improves cache locality.

Object Header

Every heap-allocated object begins with a 64-bit header word (objhdr_t):

[capacity: 56 bits][flags: 5 bits][type: 3 bits]

Type Field (bits 0-2)

ValueTypeDescription
0OBJ_ARRAYDynamic array of JSValues
1OBJ_BLOBBinary data (bits)
2OBJ_TEXTUnicode text string
3OBJ_RECORDKey-value object with prototype chain
4OBJ_FUNCTIONFunction (C, bytecode, register, or mcode)
5OBJ_CODECompiled bytecode
6OBJ_FRAMEStack frame for closures
7OBJ_FORWARDForwarding pointer (GC)

Flags (bits 3-7)

  • Bit 3 (S) — Stone flag. If set, the object is immutable and excluded from GC.
  • Bit 4 (P) — Properties flag.
  • Bit 5 (A) — Array flag.
  • Bit 7 (R) — Reserved.

Capacity (bits 8-63)

The interpretation of the 56-bit capacity field depends on the object type.

Array

struct JSArray {
  objhdr_t header;     // type=0, capacity=element slots
  word_t len;          // current number of elements
  JSValue values[];    // inline flexible array
};

Capacity is the number of JSValue slots allocated. Length is the number currently in use. Arrays grow by reallocating with a larger capacity.

Blob

struct JSBlob {
  objhdr_t header;     // type=1, capacity=allocated bits
  word_t length;       // length in bits
  uint8_t bits[];      // bit-packed data
};

Blobs are bit-addressable. The length field tracks the exact number of bits written. A blob starts as antestone (mutable) for writing, then becomes stone (immutable) for reading.

Text

struct JSText {
  objhdr_t header;     // type=2, capacity=character slots
  word_t length;       // length in codepoints (or hash if stoned)
  word_t packed[];     // two UTF-32 chars per 64-bit word
};

Text is stored as UTF-32, with two 32-bit codepoints packed per 64-bit word. When a text object is stoned, the length field is repurposed to cache the hash value (computed via fash64), since stoned text is immutable and the hash never changes.

Record

struct JSRecord {
  objhdr_t header;     // type=3, capacity=hash table slots
  JSRecord *proto;     // prototype chain pointer
  word_t len;          // number of entries
  slot slots[];        // key-value pairs (hash table)
};

Records use a hash table with linear probing. Slot 0 is reserved for internal metadata (class ID and record ID). Empty slots use JS_NULL as the key; deleted slots use JS_EXCEPTION as a tombstone.

The prototype chain is a linked list of JSRecord pointers, traversed during property lookup.

Function

struct JSFunction {
  objhdr_t header;     // type=4
  JSValue name;        // function name
  int16_t length;      // arity (-1 for variadic)
  uint8_t kind;        // C, bytecode, register, or mcode
  union {
    struct { ... } cfunc;      // C function pointer
    struct { ... } bytecode;   // bytecode + frame
    struct { ... } regvm;      // register VM code
    struct { ... } mcode;      // mcode IR
  } u;
};

The kind field selects which union variant is active. Functions can be implemented in C (native), bytecode (stack VM), register code (mach VM), or mcode (JSON interpreter).

Frame

struct JSFrame {
  objhdr_t header;     // type=6, capacity=slot count
  JSValue function;    // owning function
  JSValue caller;      // parent frame
  uint32_t return_pc;  // return address
  JSValue slots[];     // [this][args][captured][locals][temps]
};

Frames capture the execution context for closures. The slots array contains the function’s this binding, arguments, captured upvalues, local variables, and temporaries. Frames are linked via the caller field for upvalue resolution across closure depth.

Forwarding Pointer

[pointer: 61 bits][111]

During garbage collection, when an object is copied to the new heap, the old header is replaced with a forwarding pointer to the new location. This is type 7 (OBJ_FORWARD) and stores the new address in bits 3-63. See Garbage Collection for details.

Object Sizing

All objects are aligned to 8 bytes. The total size in bytes for each type:

TypeSize
Array8 + 8 + capacity * 8
Blob8 + 8 + ceil(capacity / 8)
Text8 + 8 + ceil(capacity / 2) * 8
Record8 + 8 + 8 + (capacity + 1) * 16
Functionsizeof(JSFunction) (fixed)
Codesizeof(JSFunctionBytecode) (fixed)
Frame8 + 8 + 8 + 4 + capacity * 8

Overview

Stone memory is a separate allocation arena for immutable values. Objects in stone memory are permanent — they are never moved, never freed, and never touched by the garbage collector.

The stone() function in ƿit petrifies a value, deeply freezing it and all its descendants. Stoned objects have the S bit set in their object header.

The Stone Arena

Stone memory uses bump allocation from a contiguous arena:

stone_base ──────── stone_free ──────── stone_end
[allocated objects] [free space        ]

Allocation advances stone_free forward. When the arena is exhausted, overflow pages are allocated via the system allocator and linked together:

struct StonePage {
  struct StonePage *next;
  size_t size;
  uint8_t data[];
};

The S Bit

Bit 3 of the object header is the stone flag. When set:

  • The object is immutable — writes disrupt
  • The object is excluded from GC — the collector skips it entirely
  • For text objects, the length field caches the hash instead of the character count (since the text cannot change, the hash is computed once and reused)

What Gets Stoned

When stone(value) is called:

  1. If the value is already stone, return immediately
  2. Recursively walk all nested values (array elements, record fields, etc.)
  3. Copy each mutable object into the stone arena
  4. Set the S bit on each copied object
  5. Return the stoned value

The operation is deep — an entire object graph becomes permanently immutable.

Text Interning

The stone arena maintains a hash table for text interning. When a text value is stoned, it is looked up in the intern table. If an identical string already exists in stone memory, the existing one is reused. This deduplicates strings and makes equality comparison O(1) for stoned text.

The hash is computed with fash64 over the packed UTF-32 words.

Usage Patterns

Module Return Values

Every module’s return value is automatically stoned:

// config.cm
return {
  debug: true,
  timeout: 30
}
// The returned object is stone — shared safely between actors

Message Passing

Messages between actors are stoned before delivery, ensuring actors never share mutable state.

Constants

Literal objects and arrays that can be determined at compile time may be allocated directly in stone memory.

Relationship to GC

The Cheney copying collector only operates on the mutable heap. During collection, when the collector encounters a pointer to stone memory (S bit set), it skips it — stone objects are roots that never move. This means stone memory acts as a permanent root set with zero GC overhead.

Overview

ƿit uses a Cheney copying collector for automatic memory management. Each actor has its own independent heap — actors never share mutable memory, so garbage collection is per-actor with no global pauses.

Algorithm

The Cheney algorithm is a two-space copying collector:

  1. Allocate new space — a fresh memory block for the new heap
  2. Copy roots — copy all live root objects from old space to new space
  3. Scan — walk the new space, updating all internal references
  4. Free old space — the entire old heap is freed at once

Copying and Forwarding

When an object is copied from old space to new space:

  1. The object’s data is copied to the next free position in new space
  2. The old object’s header is overwritten with a forwarding pointer (OBJ_FORWARD) containing the new address
  3. Future references to the old address find the forwarding pointer and follow it to the new location
Old space:                 New space:
┌──────────────┐          ┌──────────────┐
│ OBJ_FORWARD ─┼────────> │ copied object│
│ (new addr)   │          │              │
└──────────────┘          └──────────────┘

Scan Phase

After roots are copied, the collector scans new space linearly. For each object, it examines every JSValue field:

  • If the field points to old space, copy the referenced object (or follow its forwarding pointer if already copied)
  • If the field points to stone memory, skip it (stone objects are permanent)
  • If the field is an immediate value (integer, boolean, null, immediate string), skip it

The scan continues until the scan pointer catches up with the allocation pointer — at that point, all live objects have been found and copied.

Roots

The collector traces from these root sources:

  • Global object — all global variables
  • Class prototypes — built-in type prototypes
  • Exception — the current exception value
  • Value stack — all values on the operand stack
  • Frame stack — all stack frames (bytecode and register VM)
  • GC reference stack — manually registered roots (via JS_PUSH_VALUE / JS_POP_VALUE)
  • Parser constant pool — during compilation, constants being built

Per-Actor Heaps

Each actor maintains its own heap with independent collection:

  • No stop-the-world pauses across actors
  • No synchronization between collectors
  • Each actor’s GC runs at the end of a turn (between message deliveries)
  • Heap sizes adapt independently based on each actor’s allocation patterns

Heap Growth

The collector uses a buddy allocator for heap blocks. After each collection, if less than 20% of the heap was recovered, the next block size is doubled. The new space size is: max(live_estimate + alloc_size, next_block_size).

All allocations within a heap block use bump allocation (advance a pointer), which is extremely fast.

Alignment

All objects are aligned to 8-byte boundaries. Object sizes are rounded up to ensure this alignment, which guarantees that the low 3 bits of any heap pointer are always zero — available for JSValue tag bits.

Interaction with Stone Memory

Stone memory objects (S bit set) are never copied by the collector. When the scanner encounters a pointer to stone memory, it leaves it unchanged. This means:

  • Stone objects are effectively permanent GC roots
  • No overhead for tracing through immutable object graphs
  • Module return values and interned strings impose zero GC cost

Overview

The bytecode VM is a stack-based virtual machine. Instructions operate on an implicit operand stack, pushing and popping values. This is the original execution backend for ƿit.

Compilation Pipeline

Source → Tokenize → Parse (AST) → Bytecode → Link → Execute

The compiler emits JSFunctionBytecode objects containing opcode sequences, constant pools, and debug information.

Instruction Categories

Value Loading

OpcodeDescription
push_i32Push a 32-bit immediate integer
push_constPush a value from the constant pool
nullPush null
push_falsePush false
push_truePush true

Stack Manipulation

OpcodeDescription
dropRemove top of stack
dupDuplicate top of stack
dup1 / dup2 / dup3Duplicate item at depth
swapSwap top two items
rot3l / rot3rRotate top three items
insert2 / insert3Insert top item deeper
nipRemove second item

Variable Access

OpcodeDescription
get_varLoad variable by name (pre-link)
put_varStore variable by name (pre-link)
get_loc / put_locAccess local variable by index
get_arg / put_argAccess function argument by index
get_env_slot / set_env_slotAccess closure variable (post-link)
get_global_slot / set_global_slotAccess global variable (post-link)

Variable access opcodes are patched during linking. get_var instructions are rewritten to get_loc, get_env_slot, or get_global_slot depending on where the variable is resolved.

Arithmetic

OpcodeDescription
add / sub / mul / divBasic arithmetic
mod / powModulo and power
neg / inc / decUnary operations
add_loc / inc_loc / dec_locOptimized local variable update

Comparison and Logic

OpcodeDescription
strict_eq / strict_neqEquality (ƿit uses strict only)
lt / lte / gt / gteOrdered comparison
not / lnotLogical / bitwise not
and / or / xorBitwise operations

Control Flow

OpcodeDescription
gotoUnconditional jump
if_true / if_falseConditional jump
goto8 / goto16Short jumps (size-optimized)
if_true8 / if_false8Short conditional jumps
catchSet exception handler

Function Calls

OpcodeDescription
callCall function with N arguments
tail_callTail-call optimization
call_methodCall method on object
returnReturn value from function
return_undefReturn null from function
throwThrow exception (disrupt)

Property Access

OpcodeDescription
get_fieldGet named property
put_fieldSet named property
get_array_elGet computed property
put_array_elSet computed property
define_fieldDefine property during object literal

Object Creation

OpcodeDescription
objectCreate new empty object
array_fromCreate array from stack values

Bytecode Patching

During the link/integrate phase, symbolic variable references are resolved to concrete access instructions. This is a critical optimization — the interpreter does not perform name lookups at runtime.

A get_var "x" instruction becomes:

  • get_loc 3 — if x is local variable at index 3
  • get_env_slot 1, 5 — if x is captured from outer scope (depth 1, slot 5)
  • get_global_slot 7 — if x is a global

Overview

The Mach VM is a register-based virtual machine using 32-bit instructions. It is modeled after Lua’s register VM — operands are register indices rather than stack positions, reducing instruction count and improving performance.

Instruction Formats

All instructions are 32 bits wide. Four encoding formats are used:

iABC — Three-Register

[op: 8][A: 8][B: 8][C: 8]

Used for operations on three registers: R(A) = R(B) op R(C).

iABx — Register + Constant

[op: 8][A: 8][Bx: 16]

Used for loading constants: R(A) = K(Bx).

iAsBx — Register + Signed Offset

[op: 8][A: 8][sBx: 16]

Used for conditional jumps: if R(A) then jump by sBx.

isJ — Signed Jump

[op: 8][sJ: 24]

Used for unconditional jumps with a 24-bit signed offset.

Registers

Each function frame has a fixed number of register slots, determined at compile time. Registers hold:

  • R(0)this binding
  • R(1)..R(arity) — function arguments
  • R(arity+1).. — local variables and temporaries

Instruction Set

Loading

OpcodeFormatDescription
LOADKiABxR(A) = K(Bx) — load from constant pool
LOADIiAsBxR(A) = sBx — load small integer
LOADNULLiAR(A) = null
LOADTRUEiAR(A) = true
LOADFALSEiAR(A) = false
MOVEiABCR(A) = R(B) — register copy

Arithmetic

OpcodeFormatDescription
ADDiABCR(A) = R(B) + R(C)
SUBiABCR(A) = R(B) - R(C)
MULiABCR(A) = R(B) * R(C)
DIViABCR(A) = R(B) / R(C)
MODiABCR(A) = R(B) % R(C)
POWiABCR(A) = R(B) ^ R(C)
NEGiABCR(A) = -R(B)
INCiABCR(A) = R(B) + 1
DECiABCR(A) = R(B) - 1

Comparison

OpcodeFormatDescription
EQiABCR(A) = R(B) == R(C)
NEQiABCR(A) = R(B) != R(C)
LTiABCR(A) = R(B) < R(C)
LEiABCR(A) = R(B) <= R(C)
GTiABCR(A) = R(B) > R(C)
GEiABCR(A) = R(B) >= R(C)

Property Access

OpcodeFormatDescription
GETFIELDiABCR(A) = R(B)[K(C)] — named property
SETFIELDiABCR(A)[K(B)] = R(C) — set named property
GETINDEXiABCR(A) = R(B)[R(C)] — computed property
SETINDEXiABCR(A)[R(B)] = R(C) — set computed property

Variable Resolution

OpcodeFormatDescription
GETNAMEiABxUnresolved variable (compiler placeholder)
GETINTRINSICiABxGlobal intrinsic / built-in
GETENViABxModule environment variable
GETUPiABCR(A) = UpFrame(B).slots[C] — closure upvalue
SETUPiABCUpFrame(A).slots[B] = R(C) — set closure upvalue

Control Flow

OpcodeFormatDescription
JMPisJUnconditional jump
JMPTRUEiAsBxJump if R(A) is true
JMPFALSEiAsBxJump if R(A) is false
JMPNULLiAsBxJump if R(A) is null

Function Calls

OpcodeFormatDescription
CALLiABCCall R(A) with B args starting at R(A+1), C=keep result
RETURNiAReturn R(A)
RETNILReturn null
CLOSUREiABxCreate closure from function pool entry Bx

Object / Array

OpcodeFormatDescription
NEWOBJECTiAR(A) = {}
NEWARRAYiABCR(A) = array(B)
PUSHiABCPush R(B) to array R(A)

JSCodeRegister

The compiled output for a function:

struct JSCodeRegister {
  uint16_t arity;           // argument count
  uint16_t nr_slots;        // total register count
  uint32_t cpool_count;     // constant pool size
  JSValue *cpool;           // constant pool
  uint32_t instr_count;     // instruction count
  MachInstr32 *instructions; // 32-bit instruction array
  uint32_t func_count;      // nested function count
  JSCodeRegister **functions; // nested function table
  JSValue name;             // function name
  uint16_t disruption_pc;   // exception handler offset
};

The constant pool holds all non-immediate values referenced by LOADK instructions: strings, large numbers, and other constants.

Overview

Mcode is a JSON-based intermediate representation that can be interpreted directly. It represents the same operations as the Mach register VM but uses string-based instruction dispatch rather than binary opcodes. Mcode is intended as an intermediate step toward native code compilation.

Pipeline

Source → Tokenize → Parse (AST) → Mcode (JSON) → Interpret
                                                → Compile to Mach (planned)
                                                → Compile to native (planned)

Mcode is produced by the JS_Mcode compiler pass, which emits a cJSON tree. The mcode interpreter walks this tree directly, dispatching on instruction name strings.

JSMCode Structure

struct JSMCode {
  uint16_t nr_args;        // argument count
  uint16_t nr_slots;       // register count
  cJSON **instrs;          // pre-flattened instruction array
  uint32_t instr_count;    // number of instructions

  struct {
    const char *name;      // label name
    uint32_t index;        // instruction index
  } *labels;
  uint32_t label_count;

  struct JSMCode **functions; // nested functions
  uint32_t func_count;

  cJSON *json_root;        // keeps JSON alive
  const char *name;        // function name
  const char *filename;    // source file
  uint16_t disruption_pc;  // exception handler offset
};

Instruction Format

Each instruction is a JSON array. The first element is the instruction name (string), followed by operands:

["LOADK", 0, 42]
["ADD", 2, 0, 1]
["JMPFALSE", 3, "else_label"]
["CALL", 0, 2, 1]

The instruction set mirrors the Mach VM opcodes — same operations, same register semantics, but with string dispatch instead of numeric opcodes.

Labels

Control flow uses named labels instead of numeric offsets:

["LABEL", "loop_start"]
["ADD", 1, 1, 2]
["JMPFALSE", 3, "loop_end"]
["JMP", "loop_start"]
["LABEL", "loop_end"]

Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution.

Differences from Mach

PropertyMcodeMach
InstructionscJSON arrays32-bit binary
DispatchString comparisonSwitch on opcode byte
ConstantsInline in JSONSeparate constant pool
Jump targetsNamed labelsNumeric offsets
MemoryHeap (cJSON nodes)Off-heap (malloc)

Purpose

Mcode serves as an inspectable, debuggable intermediate format:

  • Human-readable — the JSON representation can be printed and examined
  • Language-independent — any tool that produces the correct JSON can target the ƿit runtime
  • Compilation target — the Mach compiler can consume mcode as input, and future native code generators can work from the same representation

The cost of string-based dispatch makes mcode slower than the binary Mach VM, so it is primarily useful during development and as a compilation intermediate rather than for production execution.