The New Programming Language Ace

Ace Rodstin
7 min readFeb 10, 2023

--

No need to explain why necessity of a new programming language is come. One of many reasons is redundancy of existing programming languages. (Exactly-once statement of Python, assignment as expression almost every single statement in Kotlin etc.) Another reason is legacy interface concept of C++, and outdated snake_case notation of C++’s standard library.

Prerequisites

The goal is source code compilation, written in Ace programming language to the executable file (console app). Compiler should read line by line the source code and perform calculations, defined in that line. As output compiler prints calculation results to the console.

Requirements:

  • Constant declaration
  • Variable declaration
  • Integer type support
  • Floating-point type support
  • Durability (if error occur, compiler prints error message instead crush)

Architecture

Compiler’s core compose from:

  • Source Code Manager — opens source code file, located on hard drive and loads it content to RAM
  • Text Reader — reads source code and tracks cursor position
  • Token Parser — translates line of source code to the list of lexical tokens
  • Syntactic Analyzer — builds abstract syntax tree using tokens with satisfaction to programming language grammar
  • Scope — stores a list of derived variables
  • Compiler — coordinates interaction of all enumerated modules

Source Code Manager

When it comes to compiling files, then a module required, which interacts with operating system. (File system to be more precise.) It purpose — open and read (load to RAM) source code file. And it is important the RAII (Resource Acquisition Is Initialization) technique in C++. Thus memory deallocation should be considered too.

The module got this interface.

class SourceCodeManager {
var sourceCode: String { get }
var fileName: String { get }
var fileExtension: String { get }

func open(file path: String) throws
func close()
}

Text Reader

This module gets string as input parameter and reads it with position tracking. That is why the compiler prints a line and a column in an error message, if an error occur of course.

Additionally often undo operation required. That requirement arise because sometimes while reading source code wrong character discovered and which must not skip.

The module got this interface.

class TextReader {
struct Position {
var line: Int { get }
var column: Int { get }
}

enum Unit {
case beginOfFile
case character(Character)
case endOfFile
}

var position: Position { get }
var unit: Unit { get }

func load(string: String)
func read() -> Unit
func unread(_ unit: Unit)
}

Token Parser

That component parses source code line (string) and returns array of lexical tokens. And here is a parsing error might thrown, if a programmer make a typo for example. Usually this component designed for first step source code parsing which simplifies following processing.

As it was explained, several lexical tokens is present. Suite of all tokens is below.

enum Token {
case keyword(Keyword)
case punctuator(Punctuator)
case literal(Literal)
case identifier(String)
}

Token Parser’s interface is simple. As an input parameter is source code string, as an output parameter is array of lexical tokens.

class TokenParser {
func parse(string: String) throws -> [Token]
}

Syntactic Analyzer

The most valuable and most important module. It builds an abstract syntax tree with respect to programming language grammar. It is clear tree building is error prone because a programmer can write incorrect syntactic constructions. Wrong variable declaration for example.

Another one implementation complication is number of subnodes in a tree node. It is more than two, so the tree is not binary. Thus it is hard to debug and print results.

The atomic unit of the tree is a node. The node can have unlimited number of subnodes or be non-terminal. (See programming language grammar.) Interface of the entity is below.

class SyntacticNode: Node {
var id: UUID { get }
var children: [SyntacticNode] { get }
var kind: Kind { get }
var value: String? { get set }

init(kind: Kind, value: String?)
init(kind: Kind, value: Character)

func joined() -> String
}

It is not excessive to say that a stream concept (Stream) used for lexical tokens acquisition. This concept allows to get one token, put it back or preview next token.

The module is huge, so concerns separation is required. Thus the module consists of:

class SyntacticAnalyzer {
func parse(tokens: [Token]) throws -> SyntacticNode
}
class StatementsParser {
func parseStatement(stream: Stream<[Token]>) throws -> SyntacticNode
}

Performs statements parsing.

class DeclarationsParser {
func parseDeclaration(stream: Stream<[Token]>) throws -> SyntacticNode
}

Performs declarations parsing. Either constant or variable.

class IdentifiersParser {
func parseIdentifierList(stream: Stream<[Token]>) throws -> SyntacticNode
func parseIdentifier(stream: Stream<[Token]>) throws -> SyntacticNode
}

Performs identifiers parsing or identifier list parsing.

class TypesParser {
func parseType(stream: Stream<[Token]>) throws -> SyntacticNode
}

Performs types parsing. Usually type is specified by annotation followed a “:” (colon).

class ExpressionsParser {
func parseExpression(stream: Stream<[Token]>) throws -> SyntacticNode
}

Performs expressions parsing considering addition (subtraction) and multiplication (division) operators precedence.

class LiteralsParser {
func parseLiteral(stream: Stream<[Token]>) throws -> SyntacticNode
}

Performs literals parsing. As integer literals as is floating-point literals.

class IntegerLiteralsParser {
func parseIntegerLiteral(stream: Stream<[Token]>) throws -> SyntacticNode
}

Performs integer literals parsing.

class FloatingPointLiteralsParser {
func parseFloatLiteral(stream: Stream<[Token]>) throws -> SyntacticNode
}

Performs floating-point literals parsing.

class OperatorsParser {
func parseBinaryOperator(stream: Stream<[Token]>) throws -> SyntacticNode
func parseUnaryOperator(stream: Stream<[Token]>) throws -> SyntacticNode
func parseAssignOperator(stream: Stream<[Token]>) throws -> SyntacticNode
}

Performs unary, binary and assignment operators parsing.

class CharactersParser {
func parseLetter(character: Character) throws -> SyntacticNode
func parseDecimalDigit(character: Character) throws -> SyntacticNode
func parseBinaryDigit(character: Character) throws -> SyntacticNode
func parseOctalDigit(character: Character) throws -> SyntacticNode
func parseHexadecimalDigit(character: Character) throws -> SyntacticNode
func parseUnicodeLetter(character: Character) throws -> SyntacticNode
func parseUnicodeDigit(character: Character) throws -> SyntacticNode
}

Performs characters parsing as base units of programming language.

Executor

Having built abstract syntax tree, need to process it. By process it is mean to execute it. (Source code instructions execution.) For that purpose this module is designed.

Taking into account huge amount of tree’s branches, which require analysis, additional component is required. This component should collect semantic information while traversing the tree. This component name is SemanticCollector.

class SemanticCollector {
enum Operation {
case declaration
case assignment
}

var operation: Operation? { get set}
var id: Symbol.Id? { get set}
var type: BaseType? { get set}
var value: Symbol.Value? { get set}
var mutability: Symbol.Mutability? { get set}
}

It is easy to see, there was concept of Symbol added. It models a variable, which declares and with which operates a programmer in source code.

class Symbol {
typealias Id = String
typealias Value = Any

enum Mutability {
case constant
case variable
}

var id: Id { get }
var type: BaseType { get }
var mutability: Mutability { get }
var value: Value? { get set }
}

The module’s interface is trivial and has one method, which gets the abstract syntax tree as an input parameter. But for correct workflow another one object is required. The module delegates to that object operations with symbol list.

protocol ExecutorDelegate: AnyObject {
func declare(symbol: Symbol) throws
func value(of identifier: Symbol.Id) throws -> Symbol.Value?
func changeValue(of identifier: Symbol.Id, newValue: Symbol.Value) throws
}

As for Executor, it is tree traversing left to implement. And at the same time don’t forget that the abstract syntax tree is a complicated structure, composed form huge amount of branches. That’s why the module is separated to components.

class Executor {
weak var delegate: ExecutorDelegate? { get set }
func execute(syntacticTree root: SyntacticNode) throws
}
class ExpressionsExecutor {
weak var delegate: ExecutorDelegate? { get set }
func executeExpression(node: SyntacticNode) throws -> Symbol.Value
}

Performs calculation of tree’s expression.

class IntegerLiteralsExecutor {
func executeIntegerLiteral(node: SyntacticNode) throws -> Symbol.Value
}

Performs calculation of integer literal.

class FloatLiteralsExecutor {
func executeFloatLiteral(node: SyntacticNode) throws -> Symbol.Value
}

Performs calculation of floating-point literal.

Scope

Also a module, which stores all variables is required. Usually this module has name Scope.

At now Ace programming language has only once global scope. But if concept of Scope is present, then it is not difficult to develop a class scope, a method scope etc.

The module, as said above, stores all variables and provides operations to perform with it. Interface is present below.

class Scope: ExecutorDelegate {
var recordTable: RecordTable { get }

func declare(symbol: Symbol) throws
func value(of identifier: Symbol.Id) throws -> Symbol.Value?
func changeValue(of identifier: Symbol.Id, newValue: Symbol.Value) throws
}

Attentive reader may ask about var: recordTable: RecordTable property. This is a table, which stores information about variables. But this information is read-only. This design simplifies program debugging. A Record entity has next interface.

struct Record {
typealias Id = Symbol.Id
typealias Value = Symbol.Value

var id: Id { get }
var type: BaseType { get }
var value: Value? { get }
}

Compiler

A final module coordinates interactions of all of the above modules. Thus there is coordination of source code reading, lexical tokens parsing, abstract syntax tree building, instructions execution and results returning.

Thanks to a modular architecture code is simple, readable and maintainable and the module has exactly one method.

import Foundation

final class Compiler {

// MARK: - Methods

func compile(file path: String) throws -> RecordTable {
let sourceCodeManager = SourceCodeManager()
try sourceCodeManager.open(file: path)

let sourceCode = sourceCodeManager.sourceCode
let lines = sourceCode.split(separator: "\n")

let tokenParser = TokenParser()
let syntacticAnalyzer = SyntacticAnalyzer()

let executor = Executor()
let scope = Scope()
executor.delegate = scope

for line in lines {
let tokens = try tokenParser.parse(string: String(line))
let syntacticTree = try syntacticAnalyzer.parse(tokens: tokens)
try executor.execute(syntacticTree: syntacticTree)
}

return scope.recordTable
}
}

Conclusion

As a result a compiler was developed, which compiles new programming language Ace. Furthermore, all prerequisites was satisfied and a modular architecture simplifies developing of new features. Method declaration, class declaration, polymorphic behavior for example.

Additionally an example of program execution is present below. This source code:

val value = 0.5

var another = 2.5
another = 3.0

val result = value + another

will produce that output:

value: Double = 0.5
another: Double = 3.0
result: Double = 3.5

The project is open source and available at The Ace Programming Language.

References

  1. Original article
  2. Clean Code: A Handbook of Agile Software Craftsmanship
  3. Code Complete 2nd ed.
  4. Compilers: Principles, Techniques, and Tools
  5. Kotlin in Action

--

--