aboutsummaryrefslogtreecommitdiff
╒═════════════════════════════════════════════════════════════════════════════╕
│ THIS PROJECT IS NEW and the code is UNFINISHED. DON'T USE IT                │
╘═════════════════════════════════════════════════════════════════════════════╛

Littlescript

LittleScript is a BASIC dialect for microcontrollers. It's meant for adding scriptability to your firmware, not for writing your entire code in. LS has some restrictions to make it more embedded-friendly:

  • Identifiers capped at 6 characters
  • Integer arithmetic by default
  • No garbage collection (strings work like C strings)
  • Allocation from a pool of fixed blocks
  • Code is not stored all at once in RAM - can execute in place from slow external devices with some caching, etc.

Copying and use

This software disclaims copyright. Do what you want with it. Be gay, do crime. Originally written by Alexis Lockwood in 2021. Ⓐ

Building

Dependencies (for all builds):

  • Python 3

Dependencies (command-line tools only):

  • libexplain (apt install libexplain-dev)
  • uthash (apt install uthash-dev)

Building locally, running tests, and cleaning:

make
make check
make clean

Cross-compiling only the library and/or with extra flags:

make CC=avr-gcc EXTRA_FLAGS=-fshort-enums

So, how small is it?

The short answer: about 15 kB of flash and a bit over 1 kB of RAM.

That's not so small!

Well, it's Littlescript, not Microscript. Not so small that it's impractical to use. My target platforms are RAM-limited but have plenty of flash (e.g. larger 8-bit microcontrollers like AVR-Dx and ATxmega). The flash usage is pretty high, around 15 kB (TODO update this when closer to a release). The RAM usage, however, is much smaller than a typical BASIC. For example, here is an estimation of RAM usage on an 8-bit platform with 16-bit addressing like AVR:

Count Each Total
Pool entries 64 15 960
Main ls_t object 1 57 57
Input buffers 2 64 128
Grand total 1145

All data is allocated from the pool, which can be arbitrarily sized (up to 64k entries). The 64-byte buffers listed are for storing source code, and are not needed if the entire source is already in RAM or ROM.

The following things use pool entries:

  • The global scope (1)
  • Named labels, which are stored as they are found (1 each)
  • Variables (1 each for integers, 2 each for other types, plus as many as required for the data inside containers: each holds one list element or 8 string characters)
  • Expression evaluation (number determined by the expression complexity, typically 2-4, these are freed when the expression is finished being evaluated)
  • GOSUB calls, to save the execution context (1, plus 1 per argument for the new scope)

What's implemented so far

  • Basic high-level API
  • Main lex-parse loop
  • Expression evaluator
  • Variable storage
  • Keyword: GOSUB...RETURN, GOTO, IF, LET, PRINT, REM, WHILE...WEND, FOR...NEXT, ON...GOTO, ON...GOSUB

Todo list

  • Other keywords
  • Non-integer variable types
  • String will always be supported
  • Floating point as optional compile-in
  • Better testing
  • Compile-time options for:
  • Internal asserts for better LS_INTERNAL_ERROR info
  • Hardwiring the ls object to a static, this should reduce code size a lot
  • Disabling features not needed, including hooks, fetcher (xiram)
  • It'd be cool if ls_minify could access full idents beyond LS_IDENT_LEN; the number of possible minified six-letter idents is enormous (about 52 billion if I did the math right) so if you're always going to minify idents you might as well get full-length in the pre-min code.
  • BUG? Something weird is happening when running minify_ident_test. ls_minify locks up somewhere suspiciously near pc = UINT16_MAX / 2. That file is way bigger than the max though so I'll need to make a better test case.
  • Can we make the minifier accept code longer than the max? I think it'd have to have its own build of the lexer... We could just make ls_addr_t into size_t, that wouldn't have any effect on 8-bit anyway, though it'd make the pool more wasteful on 32 bit.
  • All the _handle* functions in ls_expr.c return differently, make them more consistent
  • Some sort of whole-system test framework, probably involving .bas files (this probably requires ON ERROR implemented so they can intentionally fail)
  • I kinda like the idea of "cursor" instead of "pc"
  • Scanning Statement is a bad name, these should be Block Statements
  • A lot of small internal functions are bloating the code size by requiring a self argument for error-throwing. They should return errors instead.

Coding style, contributions

I'm trying to follow a fairly consistent coding style now, please see my style guide for that.

Because this project is still in its early stages, I don't really want contributions -- a lot of this is still massively in flux and quite a few things still exist mostly in my brain. It'll mostly just confuse me. Once I reach a releasable stage, I will cheerfully accept useful patches.

THE SCRIPTING LANGUAGE

Positions

Unlike oldschool BASIC which uses line numbers, LittleScript uses a system of named labels and numbered small directional jumps. These are inspired by GNU Assembler jump targets:

REM examples/positions.bas
GOSUB hello
PRINT "Returned"
END

hello:  PRINT "Hello, world"
        GOTO +1
        PRINT "This line is skipped"
1:      RETURN

There is no requirement of declaration order. If the target of a GOSUB, GOTO, or any other line number argument is an identifier that has not been seen yet, the interpreter will scan forward until it finds it.

When a number is used, it must be prefixed with + or -, and refers to the next line forward or backward matching that number.

Note that named labels consume memory, and are meant to be used for globally important sections like functions. For local control flow, use numbers.

Types

Note that strings are not implemented yet.

LittleScript currently has no floating or fixed point support; all numbers are integers and this is the default type. As with most BASIC variants, the $ suffix can be used for string variables:

S$ = "Hello, world"
PRINT S$

Strings are interpreted as ASCII and can contain all byte values including 0. This allows them to be used to contain arbitrary binary data. Internally, they are represented as a list of eight-character chunks.

Lists are single-dimensional only, and are implemented using linked lists rather than arrays. No DIM is necessary to set them up since new elements can always be linked in from the object pool, but DIM may still be used to fail early if memory is limited. To optimize sequential access, a pointer to the last element accessed is stored in the array object, thus re-accessing the last element or the next element is O(1) and iterating over the sequence is O(n).

Identifiers

Identifiers are six characters long. An identifier must be created (via LET or another assignment keyword) before their values can be accessed - they have no default. With no suffix, an identifier has integer type. With a $ suffix, it is a string, and with a () suffix, it is a list.

Minifier

Because small embedded arguments are the target, a minifier is provided that condenses a script to a smaller form. In particular, every keyword becomes a single byte. The minifier can also un-minify a script to make it readable again (though most comments are lost).

Functions

Littlescript supports true functions as an extention to the GOSUB syntax. Arguments are given in parentheses in key=value format, and a new scope is created in the called function with these values. The AS keyword can create a return value:

REM examples/add.bas
a = 2
b = 3
GOSUB Add(a = a, b = b) AS c
PRINT "a + b = ", c
END

REM It is recommended to document the function arguments and return value with
REM a comment.
Add: 'Add(a, b) => a + b
        RETURN a + b

Because a new scope is created, values are passed by value: if the function modifies them, the modification only occurs in the new scope. To pass by value, name the variables accordingly and omit from the GOSUB statement; they will be located and modified in the next scope up.

Native functions

(TODO)

Files

Many LittleScript systems will not have a traditional file storage, but may want a way to read and write streams. The OPEN, CLOSE, READ, and WRITE keywords defer to callbacks provided to LittleScript by the implementing system. The following syntax is proposed for stream descriptors:

COMn [speed] [parity] [data]
FILE device /path/to/file

Operators

The following operators are supported, with higher precedence values binding more tightly:

OPER PREC OPERATION
ABS n 13 Absolute value of n
NOT n 13 Bitwise inverse of n
-n 13 Negation of n
x * y 12 Multiplication of x by y
x / y 12 Division of x by y
x % y 12 Remainder of x divided by y
x + y 11 Addition of x and y
x - y 11 Subtraction from x of y
x << y 10 Shift of x left by y bits. If y < 0, shift right.
x >> y 10 Shift of x right by y bits. If y < 0, shift left.
x < y 9 Comparison of x < y
x > y 9 Comparison of x > y
x <= y 9 Comparison of x ≤ y
x >= y 9 Comparison of x ≥ y
x = y 8 Comparison of x = y
x <> y 8 Comparison of x ≠ y
x and y 7 Bitwise AND of x with y
x xor y 7 Bitwise XOR of x with y
x or y 7 Bitwise OR of x with y
x eqv y 7 Bitwise equivalency of x and y (not (x xor y))
x imp y 7 Bitwise implication of x and y (not (x and not y)

The operator precedence is based on that of C.

Statements

(More verbose documentation is provided below, when useful)

KEYWORD DOES
ASC(s$[, n]) Return the byte value of the first or nth character of s$
BIN s$ n Append the binary representation of n to s$
CALL fun(...) Call native function fun with arguments ...
CAT s$ t$ Concatenate string t$ onto the end of s$
CHR s$ n Append the byte value n onto the end of string s$
CLOSE [#n[,#n]...] Close file numbers (or all files). Defined by implementation
DATA expr[ expr...] Data for READ
DEC s$ n Append the decimal representation of n to s$
DEF FN name(args...) expr Create a function with the given name
END Stop execution and return to caller
ERASE ident [, ident...] Deallocate any space held by ident and delete it.
FN name(args...) Execute the function name, returning its value
FOR ident = a TO b [STEP c] Repeat until NEXT with ident running from a to b in steps of c.
GOSUB f[(args...)] [AS s] Call label f as a function, optionally passing and returning data
GOTO label Go directly to label
GOTO +n or GOTO -n Go directly forward or backward to numbered label n
HEX s$ n Append the hexadecimal representation of n to s$
IF cond THEN ... If cond is nonzero, execute statement ...
IF cond GOTO ... Equivalent to IF cond THEN GOTO ...
LEN(s$) Return the length of the string or list
[LET] ident = val Assign the variable.
NEXT Go back to FOR (see that keyword).
OCT s$ n Append the octal representation of n to s$
ON expr GOTO n1, n2, ... Go to the exprth line given.
ON expr GOSUB [AS s] s1, s2, ... Call the exprth subroutine given.
PACK n... AS s$ [USING fmt] Pack values into binary format
PRINT [TO #n] val... [,] Print values, optionally to file #n, optionally without newline (,)
RANDOMIZE [seed] Initialize the PRNG.
READ ident... Read from DATA statements
REM ... Comment
RESTORE [label] Restore the read pointer, either to null or to the location of label
RETURN [expr] Return to the last GOSUB, optionally returning a value
RND Return a random number from 0 to 65535.
SWAP m, n Swap the contents of two variables or list indices
UNPACK n... FROM s$ [USING fmt] Unpack values from binary format
VAL(s$) Returns the numerical value of a string
WEND Go back to the last WHILE. Must occur by itself, not inside an IF or similar.
WHILE cond Loop as long as cond is nonzero.
Mathematical functions
ABS n Return the absolute value of n
ATN n Arctangent of n (floating point only)
COS n Cosine of n (floating point only)
LOG n Natural logarithm of n (floating point only)
SIN n Sine of n (floating point only)
SQR n Square root of n (floating point only)
TAN n Tangent of n (floating point only)
EXP n Exponential of n (floating point only)
SGN n Signum of n (-1, 0, or 1 depending on sign)
RND Random number between 0 and 65535
Type conversions
INT n Convert n to integer
FLOAT n Convert n to floating point

(TODO: define file I/O variables. This will probably look nothing like traditional BASICs)

Scanning Statements

A Scanning Statement is a statement that scans around in the code rather than simply executing forward. These are not permitted by some nestable keywords like IF. Scanning Statments are:

  • FOR and NEXT
  • WHILE and WEND

Statements which simply set the program counter somewhere else are not Scanning Statements. These include GOTO, GOSUB, and RETURN.

ERASE ident [, ident...]

Note that because there is no garbage collection, you are responsible for erasing any unused variables.

FOR ident = a TO b [STEP c]

If the iterator does not land directly on b because of the step size, iteration stops just before it passes b.

FOR loops, like WHILE loops, do not create a full scope. The loop iterator can shadow other variables, but all other variables created inside the loop exist in its enclosing context.

Starting and terminating limits may be any valid integer value. Step is limited to the range of a signed 16-bit integer, from -32768 to 32767. Loops that will iterate forever (step = 0) or for a very long time (overflowing and wrapping around) are not guaranteed to produce errors.

GOSUB f[(args...)] [AS s]

Pushes a new stack frame and jumps to label f. Label must be named, not numeric. args... should be formatted as key1 = value1, key2 = value2, ...; each argument will be added to the new scope as a variable. On return, if a value has been returned it will be stored in s, which will be created if necessary.

`IF cond [THEN|GOTO] ... [ELSE ...]

If the value of cond is true, THEN or GOTO is executed (GOTO is equivalent to THEN GOTO). If it is false, then the interpreter moves ahead to ELSE, or to the end of the line (whichever comes next), and resumes execution.

IF statmeents may nest. When scanning for ELSE, the nesting level is tracked and scanning only stops if the nesting level is zero. This translates to the GW-BASIC equivalent behavior: if the number of ELSE and THEN sections are not matched, then each ELSE pairs with the nearest unmatched THEN.

It is not permitted to include Scanning Statements inside an IF statement.

[LET] ident = val

This is the default keyword: the LET itself may be omitted.

ON expr GOTO n1, n2, ...

n1 and so on can be anything valid to appear after GOTO. If expr is not positive or if it is too high, proceed directly to the next statement. There is no bound for the number of choices.

ON expr GOSUB [AS s] s1, s2, ...

See also ON expr GOTO. If AS is given, the return value of the subroutine will be placed in this variable (if the subroutine returns nothing, the variable is not created or modified). Each choice may have arguments:

ON x GOSUB AS rtn FOO(x), FOO(x + 2), BAR(x)

PACK and UNPACK

These work a bit like Python's struct.pack/struct.unpack and can be used to assemble arbirary binary data. TODO: format TBD.

RANDOMIZE and RND

If the seed is not specified for RANDOMIZE, it will be initialized with the system's preferred random seed source. Not all systems guarantee good quality seeds.

READ and DATA

Because Littlescript doesn't have access to the entire program at once like a traditional BASIC, READ/DATA work a bit differently. By default, the read pointer is uninitialized; on the first READ, the interpreter seeks forward to the first DATA statement. Once it executes the DATA statement, the read pointer is reset. Every stack context has its own read pointer. The intended use is for every group of READ statements to have a set of DATA statements just a short while after.

Note that if the input medium is slow, for example a serial EEPROM, the distance between the beginning of the READ statement and the end of its associated DATA statements should be small enough to fit in whatever cache the implementation uses. Otherwise, READ and DATA operations could be quite slow.

REM

Comments may also start with an apostrophe ('), and apostrophe comments may appear anywhere in a line.

Note that the minifier strips out all apostrophe comments, but may be configured to leave alone REM comments. It is therefore recommended that any important comments that must remain in the minified text use REM.

Keywords

TODO: list all valid keywords and their abbreviations