│ THIS PROJECT IS NEW and the code is UNFINISHED. DON'T USE IT │
LittleScript is a BASIC dialect for microcontrollers. It's meant for
adding scriptability to your firmware, not for writing your entire code
in. LS has some restrictions to make it more embedded-friendly:
- Identifiers capped at 6 characters
- Integer arithmetic by default
- No garbage collection (strings work like C strings)
- Allocation from a pool of fixed blocks
- Code is not stored all at once in RAM - can execute in place from slow
external devices with some caching, etc.
## Copying and use
This software disclaims copyright. Do what you want with it. Be gay, do crime.
Originally written by Alexis Lockwood in 2021. Ⓐ
### Dependencies (for all builds):
- Python 3
### Dependencies (command-line tools only):
- libexplain (`apt install libexplain-dev`)
- uthash (`apt install uthash-dev`)
### Building locally, running tests, and cleaning:
### Cross-compiling only the library:
make CC=avr-gcc EXTRA_FLAGS=-fshort-enums
## So, how small is it?
The short answer: about 15 kB of flash and a bit over 1 kB of RAM.
_That's not so small!_
Well, it's Littlescript, not Microscript. Not so small that it's impractical to
use. My target platforms are RAM-limited but have plenty of flash (e.g. larger
8-bit microcontrollers like AVR-Dx and ATxmega). The flash usage is pretty
high, around 15 kB (TODO update this when closer to a release). The RAM usage,
however, is much smaller than a typical BASIC. For example, here is an
estimation of RAM usage on an 8-bit platform with 16-bit addressing like AVR:
| | Count | Each | Total |
| Pool entries | 64 | 15 | 960 |
| Main `ls_t` object | 1 | 57 | 57 |
| Input buffers | 2 | 64 | 128 |
| **Grand total** | | | 1145 |
All data is allocated from the pool, which can be arbitrarily sized (up to
64k entries). The 64-byte buffers listed are for storing source code, and are
not needed if the entire source is already in RAM or ROM.
The following things use pool entries:
- The global scope (1)
- Named labels, which are stored as they are found (1 each)
- Variables (1 each for integers, 2 each for other types, plus as many as
required for the data inside containers: each holds one list element or
8 string characters)
- Expression evaluation (number determined by the expression complexity,
typically 2-4, these are freed when the expression is finished being
- GOSUB calls, to save the execution context (1, plus 1 per argument for the
## What's implemented so far
- Basic high-level API
- Main lex-parse loop
- Expression evaluator
- Variable storage
- Keyword: GOSUB...RETURN, GOTO, IF, LET, PRINT, REM, WHILE...WEND, FOR...NEXT,
## Todo list
- Other keywords
- Non-integer variable types
- String will always be supported
- Floating point as optional compile-in
- Better testing
- Compile-time options for:
- Internal asserts for better `LS_INTERNAL_ERROR` info
- Hardwiring the ls object to a static, this should reduce code size a lot
- Disabling features not needed, including hooks, fetcher (xiram)
- All keywords (except block definers like loops) should cleanly stop parsing
at the first keyword they don't recognize. This will enable things like
- Add leftshift, rightshift operators
- It'd be cool if `ls_minify` could access full idents beyond `LS_IDENT_LEN`;
the number of possible minified six-letter idents is enormous (about 52
billion if I did the math right) so if you're always going to minify idents
you might as well get full-length in the pre-min code.
- BUG? Something weird is happening when running `minify_ident_test`. `ls_minify`
locks up somewhere suspiciously near `pc = UINT16_MAX / 2`. That file is
way bigger than the max though so I'll need to make a better test case.
- Can we make the minifier accept code longer than the max? I think it'd have
to have its own build of the lexer... We could just make `ls_addr_t` into
`size_t`, that wouldn't have any effect on 8-bit anyway, though it'd make
the pool more wasteful on 32 bit.
- All the `_handle*` functions in `ls_expr.c` return differently, make them
- Some sort of whole-system test framework, probably involving .bas files
(this probably requires ON ERROR implemented so they can intentionally fail)
- I kinda like the idea of "cursor" instead of "pc"
- Scanning Statement is a bad name, these should be Block Statements
## Coding style, contributions
I'm trying to follow a fairly consistent coding style now, please see
[my style guide](https://git.alexisvl.rocks/avlstyle/plain/avlstyle.7.html?h=trunk)
Because this project is still in its early stages, I don't really want
contributions -- a lot of this is still massively in flux and quite a few things
still exist mostly in my brain. It'll mostly just confuse me. Once I reach a
releasable stage, I will cheerfully accept useful patches.
# THE SCRIPTING LANGUAGE
Unlike oldschool BASIC which uses line numbers, LittleScript uses a system of
named labels and numbered small directional jumps. These are inspired by GNU
Assembler jump targets:
hello: PRINT "Hello, world"
PRINT "This line is skipped"
There is no requirement of declaration order. If the target of a GOSUB, GOTO,
or any other line number argument is an identifier that has not been seen yet,
the interpreter will scan forward until it finds it.
When a number is used, it must be prefixed with `+` or `-`, and refers to the
next line forward or backward matching that number.
Note that named labels consume memory, and are meant to be used for globally
important sections like functions. For local control flow, use numbers.
**Note** that strings are not implemented yet.
LittleScript currently has no floating or fixed point support; all numbers are
integers and this is the default type. As with most BASIC variants, the `$`
suffix can be used for string variables:
S$ = "Hello, world"
Strings are interpreted as ASCII and can contain all byte values including 0.
This allows them to be used to contain arbitrary binary data. Internally,
they are represented as a list of eight-character chunks.
Lists are single-dimensional only, and are implemented using linked lists
rather than arrays. No DIM is necessary to set them up since new elements can
always be linked in from the object pool, but DIM may still be used to fail
early if memory is limited. To optimize sequential access, a pointer to the
last element accessed is stored in the array object, thus re-accessing the
last element or the next element is O(1) and iterating over the sequence is
Identifiers are six characters long. An identifier must be created (via LET or
another assignment keyword) before their values can be accessed - they have no
default. With no suffix, an identifier has integer type. With a $ suffix, it is
a string, and with a () suffix, it is a list.
Because small embedded arguments are the target, a minifier is provided that
condenses a script to a smaller form. In particular, every keyword becomes a
single byte. The minifier can also un-minify a script to make it readable
again (though most comments are lost).
Littlescript supports true functions as an extention to the GOSUB syntax.
Arguments are given in parentheses in key=value format, and a new scope is
created in the called function with these values. The AS keyword can create
a return value:
a = 2
b = 3
GOSUB Add(a = a, b = b) AS c
PRINT "a + b = ", c
REM It is recommended to document the function arguments and return value with
REM a comment.
Add: 'Add(a, b) => a + b
RETURN a + b
Because a new scope is created, values are passed by value: if the function
modifies them, the modification only occurs in the new scope. To pass by
value, name the variables accordingly and omit from the GOSUB statement; they
will be located and modified in the next scope up.
## Native functions
Many LittleScript systems will not have a traditional file storage, but may
want a way to read and write streams. The OPEN, CLOSE, READ, and WRITE keywords
defer to callbacks provided to LittleScript by the implementing system. The
following syntax is proposed for stream descriptors:
COMn [speed] [parity] [data]
FILE device /path/to/file
(More verbose documentation is provided below, when useful)
| **KEYWORD** | **DOES** |
| `ABS n` | Return the absolute value of `n` |
| `ASC(s$[, n])` | Return the byte value of the first or `n`th character of `s$` |
| `BIN s$ n` | Append the binary representation of `n` to `s$` |
| `CALL fun(...)` | Call native function `fun` with arguments `...` |
| `CAT s$ t$` | Concatenate string `t$` onto the end of `s$` |
| `CHR s$ n` | Append the byte value `n` onto the end of string `s$` |
| `CLOSE [#n[,#n]...]` | Close file numbers (or all files). Defined by implementation |
| `DATA expr[ expr...]` | Data for `READ` |
| `DEC s$ n` | Append the decimal representation of `n` to `s$` |
| `DEF FN name(args...) expr` | Create a function with the given name |
| `END` | Stop execution and return to caller |
| `ERASE ident [, ident...]` | Deallocate any space held by `ident` and delete it. |
| `FN name(args...)` | Execute the function `name`, returning its value |
| `FOR ident = a TO b [STEP c]` | Repeat until `NEXT` with `ident` running from `a` to `b` in steps of `c`. |
| `GOSUB f[(args...)] [AS s]` | Call label `f` as a function, optionally passing and returning data |
| `GOTO label` | Go directly to `label` |
| `GOTO +n` or `GOTO -n` | Go directly forward or backward to numbered label `n` |
| `HEX s$ n` | Append the hexadecimal representation of `n` to `s$` |
| `IF cond THEN ...` | If `cond` is nonzero, execute statement `...` |
| `IF cond GOTO ...` | Equivalent to `IF cond THEN GOTO ...` |
| `LEN(s$)` | Return the length of the string or list |
| `[LET] ident = val` | Assign the variable. |
| `NEXT` | Go back to `FOR` (see that keyword). |
| `OCT s$ n` | Append the octal representation of `n` to `s$` |
| `ON expr GOTO n1, n2, ...` | Go to the `expr`th line given. |
| `ON expr GOSUB [AS s] s1, s2, ...` | Call the `expr`th subroutine given. |
| `PACK n... AS s$ [USING fmt]` | Pack values into binary format |
| `PRINT [TO #n] val... [,]` | Print values, optionally to file `#n`, optionally without newline (`,`) |
| `RANDOMIZE [seed]` | Initialize the PRNG. |
| `READ ident...` | Read from `DATA` statements |
| `REM ...` | Comment |
| `RESTORE [label]` | Restore the read pointer, either to null or to the location of `label` |
| `RETURN [expr]` | Return to the last GOSUB, optionally returning a value |
| `RND` | Return a random number from 0 to 65535. |
| `SWAP m, n` | Swap the contents of two variables or list indices |
| `UNPACK n... FROM s$ [USING fmt]` | Unpack values from binary format |
| `VAL(s$)` | Returns the numerical value of a string |
| `WEND` | Go back to the last `WHILE`. Must occur by itself, not inside an IF or similar. |
| `WHILE cond` | Loop as long as `cond` is nonzero. |
(TODO: define file I/O variables. This will probably look nothing like
### Scanning Statements
A Scanning Statement is a statement that scans around in the code rather than
simply executing forward. These are not permitted by some nestable keywords
like IF. Scanning Statments are:
- `FOR` and `NEXT`
- `WHILE` and `WEND`
Statements which simply set the program counter somewhere else are not Scanning
Statements. These include `GOTO`, `GOSUB`, and `RETURN`.
### `ERASE ident [, ident...]`
Note that because there is no garbage collection, you are responsible for
erasing any unused variables.
### `FOR ident = a TO b [STEP c]`
If the iterator does not land directly on `b` because of the step size,
iteration stops just before it passes `b`.
FOR loops, like WHILE loops, do not create a full scope. The loop iterator
can shadow other variables, but all other variables created inside the loop
exist in its enclosing context.
Starting and terminating limits may be any valid integer value. Step is
limited to the range of a signed 16-bit integer, from -32768 to 32767. Loops
that will iterate forever (step = 0) or for a very long time (overflowing and
wrapping around) are not guaranteed to produce errors.
### `GOSUB f[(args...)] [AS s]`
Pushes a new stack frame and jumps to label `f`. Label must be named, not
numeric. `args...` should be formatted as `key1 = value1, key2 = value2, ...`;
each argument will be added to the new scope as a variable. On return, if a
value has been returned it will be stored in `s`, which will be created if
### `IF cond [THEN|GOTO] ... [ELSE ...]
If the value of `cond` is true, `THEN` or `GOTO` is executed (`GOTO` is
equivalent to `THEN GOTO`). If it is false, then the interpreter moves ahead
to `ELSE`, or to the end of the line (whichever comes next), and resumes
IF statmeents may nest. When scanning for ELSE, the nesting level is tracked
and scanning only stops if the nesting level is zero. This translates to the
GW-BASIC equivalent behavior: if the number of ELSE and THEN sections are not
matched, then each ELSE pairs with the nearest unmatched THEN.
It is not permitted to include Scanning Statements inside an IF statement.
### `[LET] ident = val`
This is the default keyword: the `LET` itself may be omitted.
### `ON expr GOTO n1, n2, ...`
`n1` and so on can be anything valid to appear after `GOTO`. If `expr` is
not positive or if it is too high, proceed directly to the next statement.
There is no bound for the number of choices.
### `ON expr GOSUB [AS s] s1, s2, ...`
See also `ON expr GOTO`. If `AS` is given, the return value of the subroutine
will be placed in this variable (if the subroutine returns nothing, the
variable is not created or modified). Each choice may have arguments:
ON x GOSUB AS rtn FOO(x), FOO(x + 2), BAR(x)
### `PACK` and `UNPACK`
These work a bit like Python's struct.pack/struct.unpack and can be used to
assemble arbirary binary data. TODO: format TBD.
### `RANDOMIZE` and `RND`
If the seed is not specified for `RANDOMIZE`, it will be initialized with the
system's preferred random seed source. Not all systems guarantee good quality
### `READ` and `DATA`
Because Littlescript doesn't have access to the entire program at once like
a traditional BASIC, READ/DATA work a bit differently. By default, the read
pointer is uninitialized; on the first `READ`, the interpreter seeks forward
to the first `DATA` statement. Once it _executes_ the `DATA` statement, the
read pointer is reset. Every stack context has its own read pointer. The
intended use is for every group of `READ` statements to have a set of `DATA`
statements just a short while after.
Note that if the input medium is slow, for example a serial EEPROM, the
distance between the beginning of the `READ` statement and the end of its
associated `DATA` statements should be small enough to fit in whatever cache
the implementation uses. Otherwise, READ and DATA operations could be quite
Comments may also start with an apostrophe (`'`), and apostrophe comments
may appear anywhere in a line.
Note that the minifier strips out all apostrophe comments, but may be
configured to leave alone `REM` comments. It is therefore recommended that
any important comments that must remain in the minified text use `REM`.
TODO: list all valid keywords and their abbreviations