Precode – I'm Still Free!

Precode is the intermediate representation used between Precedent syntax and native machine code. It is not a bytecode, and there is no Precode virtual machine. Precode is a structured compiler representation that can also be written in an assembler-like source form when low-level control is needed. This intermediate representation is a necessary abstraction point allowing Precedent code to target multiple architectures. In some cases, Precode also provides controlled access to platform-specific calling or linking behavior.

When written directly by a programmer, Precode appears as an assembler-like syntax. Conceptually, Precode can be understood as assembly language for a hypothetical CPU. No such CPU exists, and there is no byte-level encoding of Precode instructions. Instead, Precode lives at the abstraction point between Precedent syntax and a real native CPU. In this document, the “Precode CPU” may be referred to as though it were a real CPU, but this is only a metaphor used to explain Precode semantics.

All Precedent code operations are lowered to a Precode representation by the compiler, however, Precode may also be used directly by the programmer in source code form. Direct Precode bypasses some of the higher-level semantic protections provided by normal Precedent syntax. It should be used sparingly, primarily for runtime libraries, compiler testing, ABI work, or operations that require low-level control. Normal application code should usually be written in Precedent syntax and allowed to lower to Precode automatically.

Using Precode is done in one of two ways. Either the Precode is injected as the body of a routine, or it is injected as an inline statement within a larger routine.

program example;

procedure DoSomething;
precode
  LDI A, 5  ; <- This is a Precode instruction within a Precode routine body.
end;

begin

end.

program example;

begin
  precode
    LDI A, 5 ; <- This is a Precode instruction within a Precode inline statement.
  end;
end.

In either case, Precode instructions will take one of the following three forms…

label_Identifier:
  Mnemonic  [<datatype>] operand1, operand2, ..... operandN
  datatype <datatype>

Where the three forms are explained as follows.
1) A simple labeled location in the Precode source, used for example as branch targets.
2) A Precode instruction, consists of a mnemonic, an optional data-type specifier, and then a list of comma-separated operands.
3) A datatype directive, followed by the data-type specifier.

An additional form of instruction exists for the sake of the CALL and ICALL instructions, but will be discussed separately with the documentation for each mnemonic.

One key to understanding Precode is that each instruction carries a data type, whether that type is written explicitly on the instruction line or inherited from the current datatype mode. As explained above, the datatype setting is optional for instruction lines, however, a running-default is maintained while parsing the Precode instructions. The default datatype mode is uint64. Each instruction will be encoded with that type unless the default is altered by a datatype directive.

The datatype directive alters the default datatype embedded into all instructions that appear after that directive. This allows the programmer to switch between data types in a way that feels like “switching modes” – Starting in uint64 mode, a directive “datatype uint8” switches to uint8 mode, and so on.
An instruction line which contains an explicit type, i.e. “LDI uint16 A, 5” does not alter the default datatype setting for subsequent encoded instructions, but instead, encodes the current LDI instruction with the uint16 type, regardless of the current default.

The semantics surrounding data types are critical to understand in order to make effective use of Precode, and in particular, in order to work with Precode registers…

Precode Registers

The “Precode CPU” is intentionally a very simple CPU from the programmer perspective, designed to appear almost as a familiar older 8-bit CPU, the MOS 6502. This is deceptive, the Precode CPU were it real hardware, would be far more complex to manufacture, but Precode is not and will never be real hardware, and so while designing it, liberties could be taken that are only possible at the API facing side of it being a soft CPU. The older MOS 6502 had only three general-purpose registers A, X and Y. In keeping with that simplicity, the Precode CPU supports five general-purpose registers A, W, X, Y and Z.

It is important to consider these registers as temporaries. For instance, when lowering Precode instructions to native x86-64 Intel targets, no one of the Precode registers maps directly to an Intel register. For instance, the A register is not equivalent to RAX, nor RBX etc… Instead, the A register is a virtual “slot” into which data may be placed. During lowering, the compiler will determine based on current context, which native register or even memory location should be used to represent the A register. Why? This approach allows a relatively small number of registers to be used by Precode in order to reflect the intent of the program, rather than trying to make the imaginary Precode CPU conform to some common representation between its real target architectures.

The size of each register depends on the data-type specified in the executing instruction.
For instance…

  LDI uint8 A, 5    ; <- The A register is a uint8 for the purposes of this instruction.
  LDI uint16 A, 6  ; <- The A register is a uint16 for the purposes of this instruction.

This essentially increases the temporary nature of Precode registers further.
Precode registers are typed temporary slots. A value written to a register under one datatype must not be read later under another datatype unless an explicit conversion instruction is used. Once a register has been used as one data-type, when it is later used as another, there is no guarantee that the data put into the register under the older data-type instruction is retained. For instance, if the A register is written as a uint8 value ‘5’ but later read as a uint16 value, the read is undefined, there is no signed-extension conversion of the earlier value from uint8 to uint16, the register has simply changed modes, meaning that its earlier use is invalidated.

There are conversion mnemonics within the precode instruction set to convert register stored data between data-types when necessary.

Beyond these five general-purpose registers, there are no others. There is no program counter, no stack pointer, no flags register. All such things are not necessary in Precode because the abstraction of the imaginary CPU essentially makes them irrelevant.

The program counter doesn’t exist and can therefore not be set manually, instead, branching instructions allow branches to labeled locations in code, and symbolic references to Precedent defined data storage is provided.

The flags register doesn’t exist and therefore can’t be read or set manually. Instead, Precode provides instruction mnemonics to make the flags register unnecessary. For instance, instead of a typical x86-64 style “compare” operation to set flags, followed by a “conditional branch” to branch based on the earlier comparison, Precode provides all-in-one “compare and conditionally branch” instructions.

The omission of typical CPU level semantics here is very intentional, it allows for any target CPU to be used to implement Precode instructions, without having to match the flags, program counter, stack pointer semantics of some imaginary CPU. It is not necessary to know when writing Precode for example, if the target is an Intel processor with a “DF” direction flag, or an aarch64 processor which lacks that flag. Instead, the programmer (or Precedent compiler) can focus on the features supported in Precode, and trust the lowering process to determine the right means of implementing the Precode instructions.

This is not a lowest-common-denominator approach, but in many ways, it is the opposite. Precode instructions may embed far more complex behaviors than those implemented in silicon, allowing the lowering process to translate those complex behaviors to the available target architecture instructions. What is true however, is that the Precode “architecture” is intentionally kept simplistic in order to keep it easy to reason about.

Precode Data

Unlike real assemblers for real CPUs, Precode syntax does not include provision for declaring data. Data is instead declared at the Precedent syntax level in the form of variables, but may be used from Precode as symbolic references. The following example is a little silly or pointless, but illustrates variable use from within Precode blocks.

function AddFive( const Value: uint8 ): uint8;
var
  B: uint8;
begin
  B := 5;
  precode       
    datatype uint8 ; <- Put us into uint8 data-type mode to match the outer variable/parameter types.
    LDI A, ?B     ; <- Load the value from the B variable into the A register
    ADD A, ?Value   ; <- Add the value from the Value parameter to the A register
    STORE ?Result, A  ; <- Store the value from the A register into the Result parameter.
  end;
end;

The remainder of this Precode section lists the available Precode mnemonics and their semantics.