Language Specification
Overview
ƿit uses DEC64 as its number format. DEC64 represents numbers as coefficient * 10^exponent in a 64-bit word. This eliminates the rounding errors that plague IEEE 754 binary floating point — 0.1 + 0.2 is exactly 0.3.
DEC64 was designed by Douglas Crockford as a general-purpose number type suitable for both business and scientific computation.
Format
A DEC64 number is a 64-bit value:
[coefficient: 56 bits][exponent: 8 bits]
- Coefficient — a 56-bit signed integer (two’s complement)
- Exponent — an 8-bit signed integer (range: -127 to 127)
The value of a DEC64 number is: coefficient * 10^exponent
Examples
| Value | Coefficient | Exponent | Hex |
|---|---|---|---|
0 | 0 | 0 | 0000000000000000 |
1 | 1 | 0 | 0000000000000100 |
3.14159 | 314159 | -5 | 000000004CB2FFFB |
-1 | -1 | 0 | FFFFFFFFFFFFFF00 |
1000000 | 1 | 6 | 0000000000000106 |
Special Values
Null
The exponent 0x80 (-128) indicates null. This is the only special value — there is no infinity, no NaN, no negative zero. Operations that would produce undefined results (such as division by zero) return null.
coefficient: any, exponent: 0x80 → null
Arithmetic Properties
- Exact decimals: All decimal fractions with up to 17 significant digits are represented exactly
- No rounding:
0.1 + 0.2 == 0.3is true - Integer range: Exact integers up to 2^55 (about 3.6 * 10^16)
- Normalized on demand: The runtime normalizes coefficients to remove trailing zeros when needed for comparison
Comparison with IEEE 754
| Property | DEC64 | IEEE 754 double |
|---|---|---|
| Decimal fractions | Exact | Approximate |
| Significant digits | ~17 | ~15-16 |
| Special values | null only | NaN, ±Infinity, -0 |
| Rounding errors | None (decimal) | Common |
| Financial arithmetic | Correct | Requires libraries |
| Scientific range | ±10^127 | ±10^308 |
DEC64 trades a smaller exponent range for exact decimal arithmetic. Most applications never need exponents beyond ±127.
In ƿit
All numbers in ƿit are DEC64. There is no separate integer type at the language level — the distinction is internal. The is_integer function checks whether a number has no fractional part.
var x = 42 // coefficient: 42, exponent: 0
var y = 3.14 // coefficient: 314, exponent: -2
var z = 1000000 // coefficient: 1, exponent: 6 (normalized)
is_integer(x) // true
is_integer(y) // false
1 / 0 // null
Overview
Every value in ƿit is a 64-bit word called a JSValue. The runtime uses LSB (least significant bit) tagging to pack type information directly into the value, avoiding heap allocation for common types.
Tag Encoding
The lowest bits of a JSValue determine its type:
| LSB Pattern | Type | Payload |
|---|---|---|
xxxxxxx0 | Integer | 31-bit signed integer in upper bits |
xxxxx001 | Pointer | 61-bit aligned heap pointer |
xxxxx101 | Short float | 8-bit exponent + 52-bit mantissa |
xxxxx011 | Special | 5-bit tag selects subtype |
Integers
If the least significant bit is 0, the value is an immediate 31-bit signed integer. The integer is stored in the upper bits, extracted via v >> 1.
[integer: 31 bits][0]
Range: -1073741824 to 1073741823. Numbers outside this range are stored as short floats or heap-allocated.
Pointers
If the lowest 3 bits are 001, the value is a pointer to a heap object. The pointer is 8-byte aligned, so the low 3 bits are available for the tag. The actual address is extracted by clearing the low 3 bits.
[pointer: 61 bits][001]
All heap objects (arrays, records, blobs, text, functions, etc.) are referenced through pointer-tagged JSValues.
Short Floats
If the lowest 3 bits are 101, the value encodes a floating-point number directly. The format uses an 8-bit exponent (bias 127) and 52-bit mantissa, similar to IEEE 754 but with reduced range.
[sign: 1][exponent: 8][mantissa: 52][101]
Range: approximately ±3.4 * 10^38. Numbers outside this range fall back to null. Zero is always positive zero.
Specials
If the lowest 2 bits are 11, the next 3 bits select a special type:
| 5-bit Tag | Value |
|---|---|
00011 | Boolean (true/false in upper bits) |
00111 | Null |
01111 | Exception marker |
10111 | Uninitialized |
11011 | Immediate string |
11111 | Catch offset |
Immediate Strings
Short ASCII strings (up to 7 characters) are packed directly into the JSValue without heap allocation:
[char6][char5][char4][char3][char2][char1][char0][length: 3][11011]
Each character occupies 8 bits. The length (0-7) is stored in bits 5-7. Only ASCII characters (0-127) qualify — any non-ASCII character forces heap allocation.
var s = "hello" // 5 chars, fits in immediate string
var t = "" // immediate (length 0)
var u = "longtext" // 8 chars, heap-allocated
Null
Null is encoded as a special-tagged value with tag 00111. There is no undefined in ƿit — only null.
var x = null // special tag null
var y = 1 / 0 // also null (division by zero)
var z = {}.missing // null (missing field)
Boolean
True and false are encoded as specials with tag 00011, distinguished by a bit in the upper payload.
Summary
The tagging scheme ensures that the most common values — small integers, booleans, null, and short strings — require zero heap allocation. This significantly reduces GC pressure and improves cache locality.
Object Header
Every heap-allocated object begins with a 64-bit header word (objhdr_t):
[capacity: 56 bits][flags: 5 bits][type: 3 bits]
Type Field (bits 0-2)
| Value | Type | Description |
|---|---|---|
| 0 | OBJ_ARRAY | Dynamic array of JSValues |
| 1 | OBJ_BLOB | Binary data (bits) |
| 2 | OBJ_TEXT | Unicode text string |
| 3 | OBJ_RECORD | Key-value object with prototype chain |
| 4 | OBJ_FUNCTION | Function (C, bytecode, register, or mcode) |
| 5 | OBJ_CODE | Compiled bytecode |
| 6 | OBJ_FRAME | Stack frame for closures |
| 7 | OBJ_FORWARD | Forwarding pointer (GC) |
Flags (bits 3-7)
- Bit 3 (S) — Stone flag. If set, the object is immutable and excluded from GC.
- Bit 4 (P) — Properties flag.
- Bit 5 (A) — Array flag.
- Bit 7 (R) — Reserved.
Capacity (bits 8-63)
The interpretation of the 56-bit capacity field depends on the object type.
Array
struct JSArray {
objhdr_t header; // type=0, capacity=element slots
word_t len; // current number of elements
JSValue values[]; // inline flexible array
};
Capacity is the number of JSValue slots allocated. Length is the number currently in use. Arrays grow by reallocating with a larger capacity.
Blob
struct JSBlob {
objhdr_t header; // type=1, capacity=allocated bits
word_t length; // length in bits
uint8_t bits[]; // bit-packed data
};
Blobs are bit-addressable. The length field tracks the exact number of bits written. A blob starts as antestone (mutable) for writing, then becomes stone (immutable) for reading.
Text
struct JSText {
objhdr_t header; // type=2, capacity=character slots
word_t length; // length in codepoints (or hash if stoned)
word_t packed[]; // two UTF-32 chars per 64-bit word
};
Text is stored as UTF-32, with two 32-bit codepoints packed per 64-bit word. When a text object is stoned, the length field is repurposed to cache the hash value (computed via fash64), since stoned text is immutable and the hash never changes.
Record
struct JSRecord {
objhdr_t header; // type=3, capacity=hash table slots
JSRecord *proto; // prototype chain pointer
word_t len; // number of entries
slot slots[]; // key-value pairs (hash table)
};
Records use a hash table with linear probing. Slot 0 is reserved for internal metadata (class ID and record ID). Empty slots use JS_NULL as the key; deleted slots use JS_EXCEPTION as a tombstone.
The prototype chain is a linked list of JSRecord pointers, traversed during property lookup.
Function
struct JSFunction {
objhdr_t header; // type=4
JSValue name; // function name
int16_t length; // arity (-1 for variadic)
uint8_t kind; // C, bytecode, register, or mcode
union {
struct { ... } cfunc; // C function pointer
struct { ... } bytecode; // bytecode + frame
struct { ... } regvm; // register VM code
struct { ... } mcode; // mcode IR
} u;
};
The kind field selects which union variant is active. Functions can be implemented in C (native), bytecode (stack VM), register code (mach VM), or mcode (JSON interpreter).
Frame
struct JSFrame {
objhdr_t header; // type=6, capacity=slot count
JSValue function; // owning function
JSValue caller; // parent frame
uint32_t return_pc; // return address
JSValue slots[]; // [this][args][captured][locals][temps]
};
Frames capture the execution context for closures. The slots array contains the function’s this binding, arguments, captured upvalues, local variables, and temporaries. Frames are linked via the caller field for upvalue resolution across closure depth.
Forwarding Pointer
[pointer: 61 bits][111]
During garbage collection, when an object is copied to the new heap, the old header is replaced with a forwarding pointer to the new location. This is type 7 (OBJ_FORWARD) and stores the new address in bits 3-63. See Garbage Collection for details.
Object Sizing
All objects are aligned to 8 bytes. The total size in bytes for each type:
| Type | Size |
|---|---|
| Array | 8 + 8 + capacity * 8 |
| Blob | 8 + 8 + ceil(capacity / 8) |
| Text | 8 + 8 + ceil(capacity / 2) * 8 |
| Record | 8 + 8 + 8 + (capacity + 1) * 16 |
| Function | sizeof(JSFunction) (fixed) |
| Code | sizeof(JSFunctionBytecode) (fixed) |
| Frame | 8 + 8 + 8 + 4 + capacity * 8 |
Overview
Stone memory is a separate allocation arena for immutable values. Objects in stone memory are permanent — they are never moved, never freed, and never touched by the garbage collector.
The stone() function in ƿit petrifies a value, deeply freezing it and all its descendants. Stoned objects have the S bit set in their object header.
The Stone Arena
Stone memory uses bump allocation from a contiguous arena:
stone_base ──────── stone_free ──────── stone_end
[allocated objects] [free space ]
Allocation advances stone_free forward. When the arena is exhausted, overflow pages are allocated via the system allocator and linked together:
struct StonePage {
struct StonePage *next;
size_t size;
uint8_t data[];
};
The S Bit
Bit 3 of the object header is the stone flag. When set:
- The object is immutable — writes disrupt
- The object is excluded from GC — the collector skips it entirely
- For text objects, the length field caches the hash instead of the character count (since the text cannot change, the hash is computed once and reused)
What Gets Stoned
When stone(value) is called:
- If the value is already stone, return immediately
- Recursively walk all nested values (array elements, record fields, etc.)
- Copy each mutable object into the stone arena
- Set the S bit on each copied object
- Return the stoned value
The operation is deep — an entire object graph becomes permanently immutable.
Text Interning
The stone arena maintains a hash table for text interning. When a text value is stoned, it is looked up in the intern table. If an identical string already exists in stone memory, the existing one is reused. This deduplicates strings and makes equality comparison O(1) for stoned text.
The hash is computed with fash64 over the packed UTF-32 words.
Usage Patterns
Module Return Values
Every module’s return value is automatically stoned:
// config.cm
return {
debug: true,
timeout: 30
}
// The returned object is stone — shared safely between actors
Message Passing
Messages between actors are stoned before delivery, ensuring actors never share mutable state.
Constants
Literal objects and arrays that can be determined at compile time may be allocated directly in stone memory.
Relationship to GC
The Cheney copying collector only operates on the mutable heap. During collection, when the collector encounters a pointer to stone memory (S bit set), it skips it — stone objects are roots that never move. This means stone memory acts as a permanent root set with zero GC overhead.
Overview
ƿit uses a Cheney copying collector for automatic memory management. Each actor has its own independent heap — actors never share mutable memory, so garbage collection is per-actor with no global pauses.
Algorithm
The Cheney algorithm is a two-space copying collector:
- Allocate new space — a fresh memory block for the new heap
- Copy roots — copy all live root objects from old space to new space
- Scan — walk the new space, updating all internal references
- Free old space — the entire old heap is freed at once
Copying and Forwarding
When an object is copied from old space to new space:
- The object’s data is copied to the next free position in new space
- The old object’s header is overwritten with a forwarding pointer (
OBJ_FORWARD) containing the new address - Future references to the old address find the forwarding pointer and follow it to the new location
Old space: New space:
┌──────────────┐ ┌──────────────┐
│ OBJ_FORWARD ─┼────────> │ copied object│
│ (new addr) │ │ │
└──────────────┘ └──────────────┘
Scan Phase
After roots are copied, the collector scans new space linearly. For each object, it examines every JSValue field:
- If the field points to old space, copy the referenced object (or follow its forwarding pointer if already copied)
- If the field points to stone memory, skip it (stone objects are permanent)
- If the field is an immediate value (integer, boolean, null, immediate string), skip it
The scan continues until the scan pointer catches up with the allocation pointer — at that point, all live objects have been found and copied.
Roots
The collector traces from these root sources:
- Global object — all global variables
- Class prototypes — built-in type prototypes
- Exception — the current exception value
- Value stack — all values on the operand stack
- Frame stack — all stack frames (bytecode and register VM)
- GC reference stack — manually registered roots (via
JS_PUSH_VALUE/JS_POP_VALUE) - Parser constant pool — during compilation, constants being built
Per-Actor Heaps
Each actor maintains its own heap with independent collection:
- No stop-the-world pauses across actors
- No synchronization between collectors
- Each actor’s GC runs at the end of a turn (between message deliveries)
- Heap sizes adapt independently based on each actor’s allocation patterns
Heap Growth
The collector uses a buddy allocator for heap blocks. After each collection, if less than 20% of the heap was recovered, the next block size is doubled. The new space size is: max(live_estimate + alloc_size, next_block_size).
All allocations within a heap block use bump allocation (advance a pointer), which is extremely fast.
Alignment
All objects are aligned to 8-byte boundaries. Object sizes are rounded up to ensure this alignment, which guarantees that the low 3 bits of any heap pointer are always zero — available for JSValue tag bits.
Interaction with Stone Memory
Stone memory objects (S bit set) are never copied by the collector. When the scanner encounters a pointer to stone memory, it leaves it unchanged. This means:
- Stone objects are effectively permanent GC roots
- No overhead for tracing through immutable object graphs
- Module return values and interned strings impose zero GC cost
Overview
The bytecode VM is a stack-based virtual machine. Instructions operate on an implicit operand stack, pushing and popping values. This is the original execution backend for ƿit.
Compilation Pipeline
Source → Tokenize → Parse (AST) → Bytecode → Link → Execute
The compiler emits JSFunctionBytecode objects containing opcode sequences, constant pools, and debug information.
Instruction Categories
Value Loading
| Opcode | Description |
|---|---|
push_i32 | Push a 32-bit immediate integer |
push_const | Push a value from the constant pool |
null | Push null |
push_false | Push false |
push_true | Push true |
Stack Manipulation
| Opcode | Description |
|---|---|
drop | Remove top of stack |
dup | Duplicate top of stack |
dup1 / dup2 / dup3 | Duplicate item at depth |
swap | Swap top two items |
rot3l / rot3r | Rotate top three items |
insert2 / insert3 | Insert top item deeper |
nip | Remove second item |
Variable Access
| Opcode | Description |
|---|---|
get_var | Load variable by name (pre-link) |
put_var | Store variable by name (pre-link) |
get_loc / put_loc | Access local variable by index |
get_arg / put_arg | Access function argument by index |
get_env_slot / set_env_slot | Access closure variable (post-link) |
get_global_slot / set_global_slot | Access global variable (post-link) |
Variable access opcodes are patched during linking. get_var instructions are rewritten to get_loc, get_env_slot, or get_global_slot depending on where the variable is resolved.
Arithmetic
| Opcode | Description |
|---|---|
add / sub / mul / div | Basic arithmetic |
mod / pow | Modulo and power |
neg / inc / dec | Unary operations |
add_loc / inc_loc / dec_loc | Optimized local variable update |
Comparison and Logic
| Opcode | Description |
|---|---|
strict_eq / strict_neq | Equality (ƿit uses strict only) |
lt / lte / gt / gte | Ordered comparison |
not / lnot | Logical / bitwise not |
and / or / xor | Bitwise operations |
Control Flow
| Opcode | Description |
|---|---|
goto | Unconditional jump |
if_true / if_false | Conditional jump |
goto8 / goto16 | Short jumps (size-optimized) |
if_true8 / if_false8 | Short conditional jumps |
catch | Set exception handler |
Function Calls
| Opcode | Description |
|---|---|
call | Call function with N arguments |
tail_call | Tail-call optimization |
call_method | Call method on object |
return | Return value from function |
return_undef | Return null from function |
throw | Throw exception (disrupt) |
Property Access
| Opcode | Description |
|---|---|
get_field | Get named property |
put_field | Set named property |
get_array_el | Get computed property |
put_array_el | Set computed property |
define_field | Define property during object literal |
Object Creation
| Opcode | Description |
|---|---|
object | Create new empty object |
array_from | Create array from stack values |
Bytecode Patching
During the link/integrate phase, symbolic variable references are resolved to concrete access instructions. This is a critical optimization — the interpreter does not perform name lookups at runtime.
A get_var "x" instruction becomes:
get_loc 3— if x is local variable at index 3get_env_slot 1, 5— if x is captured from outer scope (depth 1, slot 5)get_global_slot 7— if x is a global
Overview
The Mach VM is a register-based virtual machine using 32-bit instructions. It is modeled after Lua’s register VM — operands are register indices rather than stack positions, reducing instruction count and improving performance.
Instruction Formats
All instructions are 32 bits wide. Four encoding formats are used:
iABC — Three-Register
[op: 8][A: 8][B: 8][C: 8]
Used for operations on three registers: R(A) = R(B) op R(C).
iABx — Register + Constant
[op: 8][A: 8][Bx: 16]
Used for loading constants: R(A) = K(Bx).
iAsBx — Register + Signed Offset
[op: 8][A: 8][sBx: 16]
Used for conditional jumps: if R(A) then jump by sBx.
isJ — Signed Jump
[op: 8][sJ: 24]
Used for unconditional jumps with a 24-bit signed offset.
Registers
Each function frame has a fixed number of register slots, determined at compile time. Registers hold:
- R(0) —
thisbinding - R(1)..R(arity) — function arguments
- R(arity+1).. — local variables and temporaries
Instruction Set
Loading
| Opcode | Format | Description |
|---|---|---|
LOADK | iABx | R(A) = K(Bx) — load from constant pool |
LOADI | iAsBx | R(A) = sBx — load small integer |
LOADNULL | iA | R(A) = null |
LOADTRUE | iA | R(A) = true |
LOADFALSE | iA | R(A) = false |
MOVE | iABC | R(A) = R(B) — register copy |
Arithmetic
| Opcode | Format | Description |
|---|---|---|
ADD | iABC | R(A) = R(B) + R(C) |
SUB | iABC | R(A) = R(B) - R(C) |
MUL | iABC | R(A) = R(B) * R(C) |
DIV | iABC | R(A) = R(B) / R(C) |
MOD | iABC | R(A) = R(B) % R(C) |
POW | iABC | R(A) = R(B) ^ R(C) |
NEG | iABC | R(A) = -R(B) |
INC | iABC | R(A) = R(B) + 1 |
DEC | iABC | R(A) = R(B) - 1 |
Comparison
| Opcode | Format | Description |
|---|---|---|
EQ | iABC | R(A) = R(B) == R(C) |
NEQ | iABC | R(A) = R(B) != R(C) |
LT | iABC | R(A) = R(B) < R(C) |
LE | iABC | R(A) = R(B) <= R(C) |
GT | iABC | R(A) = R(B) > R(C) |
GE | iABC | R(A) = R(B) >= R(C) |
Property Access
| Opcode | Format | Description |
|---|---|---|
GETFIELD | iABC | R(A) = R(B)[K(C)] — named property |
SETFIELD | iABC | R(A)[K(B)] = R(C) — set named property |
GETINDEX | iABC | R(A) = R(B)[R(C)] — computed property |
SETINDEX | iABC | R(A)[R(B)] = R(C) — set computed property |
Variable Resolution
| Opcode | Format | Description |
|---|---|---|
GETNAME | iABx | Unresolved variable (compiler placeholder) |
GETINTRINSIC | iABx | Global intrinsic / built-in |
GETENV | iABx | Module environment variable |
GETUP | iABC | R(A) = UpFrame(B).slots[C] — closure upvalue |
SETUP | iABC | UpFrame(A).slots[B] = R(C) — set closure upvalue |
Control Flow
| Opcode | Format | Description |
|---|---|---|
JMP | isJ | Unconditional jump |
JMPTRUE | iAsBx | Jump if R(A) is true |
JMPFALSE | iAsBx | Jump if R(A) is false |
JMPNULL | iAsBx | Jump if R(A) is null |
Function Calls
| Opcode | Format | Description |
|---|---|---|
CALL | iABC | Call R(A) with B args starting at R(A+1), C=keep result |
RETURN | iA | Return R(A) |
RETNIL | — | Return null |
CLOSURE | iABx | Create closure from function pool entry Bx |
Object / Array
| Opcode | Format | Description |
|---|---|---|
NEWOBJECT | iA | R(A) = {} |
NEWARRAY | iABC | R(A) = array(B) |
PUSH | iABC | Push R(B) to array R(A) |
JSCodeRegister
The compiled output for a function:
struct JSCodeRegister {
uint16_t arity; // argument count
uint16_t nr_slots; // total register count
uint32_t cpool_count; // constant pool size
JSValue *cpool; // constant pool
uint32_t instr_count; // instruction count
MachInstr32 *instructions; // 32-bit instruction array
uint32_t func_count; // nested function count
JSCodeRegister **functions; // nested function table
JSValue name; // function name
uint16_t disruption_pc; // exception handler offset
};
The constant pool holds all non-immediate values referenced by LOADK instructions: strings, large numbers, and other constants.
Overview
Mcode is a JSON-based intermediate representation that can be interpreted directly. It represents the same operations as the Mach register VM but uses string-based instruction dispatch rather than binary opcodes. Mcode is intended as an intermediate step toward native code compilation.
Pipeline
Source → Tokenize → Parse (AST) → Mcode (JSON) → Interpret
→ Compile to Mach (planned)
→ Compile to native (planned)
Mcode is produced by the JS_Mcode compiler pass, which emits a cJSON tree. The mcode interpreter walks this tree directly, dispatching on instruction name strings.
JSMCode Structure
struct JSMCode {
uint16_t nr_args; // argument count
uint16_t nr_slots; // register count
cJSON **instrs; // pre-flattened instruction array
uint32_t instr_count; // number of instructions
struct {
const char *name; // label name
uint32_t index; // instruction index
} *labels;
uint32_t label_count;
struct JSMCode **functions; // nested functions
uint32_t func_count;
cJSON *json_root; // keeps JSON alive
const char *name; // function name
const char *filename; // source file
uint16_t disruption_pc; // exception handler offset
};
Instruction Format
Each instruction is a JSON array. The first element is the instruction name (string), followed by operands:
["LOADK", 0, 42]
["ADD", 2, 0, 1]
["JMPFALSE", 3, "else_label"]
["CALL", 0, 2, 1]
The instruction set mirrors the Mach VM opcodes — same operations, same register semantics, but with string dispatch instead of numeric opcodes.
Labels
Control flow uses named labels instead of numeric offsets:
["LABEL", "loop_start"]
["ADD", 1, 1, 2]
["JMPFALSE", 3, "loop_end"]
["JMP", "loop_start"]
["LABEL", "loop_end"]
Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution.
Differences from Mach
| Property | Mcode | Mach |
|---|---|---|
| Instructions | cJSON arrays | 32-bit binary |
| Dispatch | String comparison | Switch on opcode byte |
| Constants | Inline in JSON | Separate constant pool |
| Jump targets | Named labels | Numeric offsets |
| Memory | Heap (cJSON nodes) | Off-heap (malloc) |
Purpose
Mcode serves as an inspectable, debuggable intermediate format:
- Human-readable — the JSON representation can be printed and examined
- Language-independent — any tool that produces the correct JSON can target the ƿit runtime
- Compilation target — the Mach compiler can consume mcode as input, and future native code generators can work from the same representation
The cost of string-based dispatch makes mcode slower than the binary Mach VM, so it is primarily useful during development and as a compilation intermediate rather than for production execution.