aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 2203099990e2fd043197c01297a52ebf40b564c4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
```
╒═════════════════════════════════════════════════════════════════════════════╕
│ THIS PROJECT IS NEW and the code is UNFINISHED. DON'T USE IT                │
╘═════════════════════════════════════════════════════════════════════════════╛
```

# Littlescript

LittleScript is a BASIC dialect for microcontrollers. It's meant for
adding scriptability to your firmware, not for writing your entire code
in. LS has some restrictions to make it more embedded-friendly:

- Identifiers capped at 6 characters
- Integer arithmetic by default
- No garbage collection (strings work like C strings)
- Allocation from a pool of fixed blocks
- Code is not stored all at once in RAM - can execute in place from slow
  external devices with some caching, etc.

## Copying and use

This software disclaims copyright. Do what you want with it. Be gay, do crime.
Originally written by Alexis Lockwood in 2021. Ⓐ

## Building

### Dependencies (for all builds):

- Python 3

### Dependencies (command-line tools only):

- libexplain (`apt install libexplain-dev`)
- uthash (`apt install uthash-dev`)

### Building locally, running tests, and cleaning:

```
make
make check
make clean
```

### Cross-compiling only the library and/or with extra flags:

```
make CC=avr-gcc EXTRA_FLAGS=-fshort-enums
```

## So, how small is it?

The short answer: about 15 kB of flash and a bit over 1 kB of RAM.

_That's not so small!_

Well, it's Littlescript, not Microscript. Not so small that it's impractical to
use. My target platforms are RAM-limited but have plenty of flash (e.g. larger
8-bit microcontrollers like AVR-Dx and ATxmega). The flash usage is pretty
high, around 15 kB (TODO update this when closer to a release). The RAM usage,
however, is much smaller than a typical BASIC. For example, here is an
estimation of RAM usage on an 8-bit platform with 16-bit addressing like AVR:

|                    | Count | Each | Total |
|--------------------|------:|-----:|------:|
| Pool entries       |    64 |   15 |   960 |
| Main `ls_t` object |     1 |   57 |    57 |
| Input buffers      |     2 |   64 |   128 |
| **Grand total**    |       |      |  1145 |

All data is allocated from the pool, which can be arbitrarily sized (up to
64k entries). The 64-byte buffers listed are for storing source code, and are
not needed if the entire source is already in RAM or ROM.

The following things use pool entries:

- The global scope (1)
- Named labels, which are stored as they are found (1 each)
- Variables (1 each for integers, 2 each for other types, plus as many as
   required for the data inside containers: each holds one list element or
   8 string characters)
- Expression evaluation (number determined by the expression complexity,
   typically 2-4, these are freed when the expression is finished being
   evaluated)
- GOSUB calls, to save the execution context (1, plus 1 per argument for the
   new scope)

## What's implemented so far

- Basic high-level API
- Main lex-parse loop
- Expression evaluator
- Variable storage
- Keyword: GOSUB...RETURN, GOTO, IF, LET, PRINT, REM, WHILE...WEND, FOR...NEXT,
  ON...GOTO, ON...GOSUB

## Todo list

- Other keywords
- Non-integer variable types
  - String will always be supported
  - Floating point as optional compile-in
- Better testing
- Compile-time options for:
  - Internal asserts for better `LS_INTERNAL_ERROR` info
  - Hardwiring the ls object to a static, this should reduce code size a lot
  - Disabling features not needed, including hooks, fetcher (xiram)
- It'd be cool if `ls_minify` could access full idents beyond `LS_IDENT_LEN`;
  the number of possible minified six-letter idents is enormous (about 52
  billion if I did the math right) so if you're always going to minify idents
  you might as well get full-length in the pre-min code.
- BUG? Something weird is happening when running `minify_ident_test`. `ls_minify`
  locks up somewhere suspiciously near `pc = UINT16_MAX / 2`. That file is
  way bigger than the max though so I'll need to make a better test case.
- Can we make the minifier accept code longer than the max? I think it'd have
  to have its own build of the lexer... We could just make `ls_addr_t` into
  `size_t`, that wouldn't have any effect on 8-bit anyway, though it'd make
  the pool more wasteful on 32 bit.
- All the `_handle*` functions in `ls_expr.c` return differently, make them
  more consistent
- Some sort of whole-system test framework, probably involving .bas files
  (this probably requires ON ERROR implemented so they can intentionally fail)
- I kinda like the idea of "cursor" instead of "pc"
- Scanning Statement is a bad name, these should be Block Statements
- A lot of small internal functions are bloating the code size by requiring a
    self argument for error-throwing. They should return errors instead.

## Coding style, contributions

I'm trying to follow a fairly consistent coding style now, please see
[my style guide](https://git.alexisvl.rocks/avlstyle/plain/avlstyle.7.html?h=trunk)
for that.

Because this project is still in its early stages, I don't really want
contributions -- a lot of this is still massively in flux and quite a few things
still exist mostly in my brain. It'll mostly just confuse me. Once I reach a
releasable stage, I will cheerfully accept useful patches.

# THE SCRIPTING LANGUAGE

## Positions

Unlike oldschool BASIC which uses line numbers, LittleScript uses a system of
named labels and numbered small directional jumps. These are inspired by GNU
Assembler jump targets:

```
REM examples/positions.bas
GOSUB hello
PRINT "Returned"
END

hello:  PRINT "Hello, world"
        GOTO +1
        PRINT "This line is skipped"
1:      RETURN
```

There is no requirement of declaration order. If the target of a GOSUB, GOTO,
or any other line number argument is an identifier that has not been seen yet,
the interpreter will scan forward until it finds it.

When a number is used, it must be prefixed with `+` or `-`, and refers to the
next line forward or backward matching that number.

Note that named labels consume memory, and are meant to be used for globally
important sections like functions. For local control flow, use numbers.


## Types

**Note** that strings are not implemented yet.

LittleScript currently has no floating or fixed point support; all numbers are
integers and this is the default type. As with most BASIC variants, the `$`
suffix can be used for string variables:

```
S$ = "Hello, world"
PRINT S$
```

Strings are interpreted as ASCII and can contain all byte values including 0.
This allows them to be used to contain arbitrary binary data. Internally,
they are represented as a list of eight-character chunks.

Lists are single-dimensional only, and are implemented using linked lists
rather than arrays. No DIM is necessary to set them up since new elements can
always be linked in from the object pool, but DIM may still be used to fail
early if memory is limited. To optimize sequential access, a pointer to the
last element accessed is stored in the array object, thus re-accessing the
last element or the next element is O(1) and iterating over the sequence is
O(n).


## Identifiers

Identifiers are six characters long.  An identifier must be created (via LET or
another assignment keyword) before their values can be accessed - they have no
default. With no suffix, an identifier has integer type. With a $ suffix, it is
a string, and with a () suffix, it is a list.


## Minifier

Because small embedded arguments are the target, a minifier is provided that
condenses a script to a smaller form. In particular, every keyword becomes a
single byte. The minifier can also un-minify a script to make it readable
again (though most comments are lost).


## Functions

Littlescript supports true functions as an extention to the GOSUB syntax.
Arguments are given in parentheses in key=value format, and a new scope is
created in the called function with these values. The AS keyword can create
a return value:

```
REM examples/add.bas
a = 2
b = 3
GOSUB Add(a = a, b = b) AS c
PRINT "a + b = ", c
END

REM It is recommended to document the function arguments and return value with
REM a comment.
Add: 'Add(a, b) => a + b
        RETURN a + b
```

Because a new scope is created, values are passed by value: if the function
modifies them, the modification only occurs in the new scope. To pass by
value, name the variables accordingly and omit from the GOSUB statement; they
will be located and modified in the next scope up.


## Native functions

(TODO)


## Files

Many LittleScript systems will not have a traditional file storage, but may
want a way to read and write streams. The OPEN, CLOSE, READ, and WRITE keywords
defer to callbacks provided to LittleScript by the implementing system. The
following syntax is proposed for stream descriptors:

```
COMn [speed] [parity] [data]
FILE device /path/to/file
```

## Operators

The following operators are supported, with higher precedence values binding
more tightly:

| **OPER**  | **PREC** | **OPERATION ** |
|:---------:|---------:|----------------|
| `ABS n`   | 13       | Absolute value of n |
| `NOT n`   | 13       | Bitwise inverse of n |
| `-n`      | 13       | Negation of n |
| `x * y`   | 12       | Multiplication of x by y |
| `x / y`   | 12       | Division of x by y |
| `x % y`   | 12       | Remainder of x divided by y |
| `x + y`   | 11       | Addition of x and y |
| `x - y`   | 11       | Subtraction from x of y |
| `x << y`  | 10       | Shift of x left by y bits. If y < 0, shift right. |
| `x >> y`  | 10       | Shift of x right by y bits. If y < 0, shift left. |
| `x < y`   | 9        | Comparison of x < y |
| `x > y`   | 9        | Comparison of x > y |
| `x <= y`  | 9        | Comparison of x ≤ y |
| `x >= y`  | 9        | Comparison of x ≥ y |
| `x = y`   | 8        | Comparison of x = y |
| `x <> y`  | 8        | Comparison of x ≠ y |
| `x and y` | 7        | Bitwise AND of x with y |
| `x xor y` | 7        | Bitwise XOR of x with y |
| `x or y`  | 7        | Bitwise OR of x with y |
| `x eqv y` | 7        | Bitwise equivalency of x and y (`not (x xor y)`) |
| `x imp y` | 7        | Bitwise implication of x and y (`not (x and not y`) |

The operator precedence is based on that of C.

## Statements

(More verbose documentation is provided below, when useful)

| **KEYWORD** 			| **DOES** |
|-------------------------------|-|
| `ASC(s$[, n])`		| Return the byte value of the first or `n`th character of `s$` |
| `BIN s$ n`			| Append the binary representation of `n` to `s$` |
| `CALL fun(...)`		| Call native function `fun` with arguments `...` |
| `CAT s$ t$`			| Concatenate string `t$` onto the end of `s$` |
| `CHR s$ n`			| Append the byte value `n` onto the end of string `s$` |
| `CLOSE [#n[,#n]...]`		| Close file numbers (or all files). Defined by implementation |
| `DATA expr[ expr...]`		| Data for `READ` |
| `DEC s$ n`			| Append the decimal representation of `n` to `s$` |
| `DEF FN name(args...) expr`	| Create a function with the given name |
| `END`				| Stop execution and return to caller |
| `ERASE ident [, ident...]`	| Deallocate any space held by `ident` and delete it. |
| `FN name(args...)`		| Execute the function `name`, returning its value |
| `FOR ident = a TO b [STEP c]`	| Repeat until `NEXT` with `ident` running from `a` to `b` in steps of `c`. |
| `GOSUB f[(args...)] [AS s]`	| Call label `f` as a function, optionally passing and returning data |
| `GOTO label`			| Go directly to `label` |
| `GOTO +n` or `GOTO -n`	| Go directly forward or backward to numbered label `n` |
| `HEX s$ n`			| Append the hexadecimal representation of `n` to `s$` |
| `IF cond THEN ...`		| If `cond` is nonzero, execute statement `...` |
| `IF cond GOTO ...`		| Equivalent to `IF cond THEN GOTO ...` |
| `LEN(s$)`			| Return the length of the string or list |
| `[LET] ident = val`		| Assign the variable. |
| `NEXT`			| Go back to `FOR` (see that keyword). |
| `OCT s$ n`			| Append the octal representation of `n` to `s$` |
| `ON expr GOTO n1, n2, ...`	| Go to the `expr`th line given. |
| `ON expr GOSUB [AS s] s1, s2, ...`	| Call the `expr`th subroutine given. |
| `PACK n... AS s$ [USING fmt]`	| Pack values into binary format |
| `PRINT [TO #n] val... [,]`	| Print values, optionally to file `#n`, optionally without newline (`,`) |
| `RANDOMIZE [seed]`		| Initialize the PRNG. |
| `READ ident...`		| Read from `DATA` statements |
| `REM ...`			| Comment |
| `RESTORE [label]`		| Restore the read pointer, either to null or to the location of `label` |
| `RETURN [expr]`		| Return to the last GOSUB, optionally returning a value |
| `RND`				| Return a random number from 0 to 65535. |
| `SWAP m, n`			| Swap the contents of two variables or list indices |
| `UNPACK n... FROM s$ [USING fmt]`	| Unpack values from binary format |
| `VAL(s$)`			| Returns the numerical value of a string |
| `WEND`			| Go back to the last `WHILE`. Must occur by itself, not inside an IF or similar. |
| `WHILE cond`			| Loop as long as `cond` is nonzero. |
| *Mathematical functions*      | |
| `ABS n`			| Return the absolute value of `n` |
| `ATN n`			| Arctangent of `n` (floating point only) |
| `COS n`			| Cosine of `n` (floating point only) |
| `LOG n`			| Natural logarithm of `n` (floating point only) |
| `SIN n`			| Sine of `n` (floating point only) |
| `SQR n`			| Square root of `n` (floating point only) |
| `TAN n`			| Tangent of `n` (floating point only) |
| `EXP n`			| Exponential of `n` (floating point only) |
| `SGN n`			| Signum of `n` (-1, 0, or 1 depending on sign) |
| `RND`				| Random number between 0 and 65535
| *Type conversions*      | |
| `INT n`			| Convert `n` to integer |
| `FLOAT n`			| Convert `n` to floating point |

(TODO: define file I/O variables. This will probably look nothing like
traditional BASICs)

### Scanning Statements

A Scanning Statement is a statement that scans around in the code rather than
simply executing forward. These are not permitted by some nestable keywords
like IF. Scanning Statments are:

- `FOR` and `NEXT`
- `WHILE` and `WEND`

Statements which simply set the program counter somewhere else are not Scanning
Statements. These include `GOTO`, `GOSUB`, and `RETURN`.

### `ERASE ident [, ident...]`

Note that because there is no garbage collection, you are responsible for
erasing any unused variables.

### `FOR ident = a TO b [STEP c]`

If the iterator does not land directly on `b` because of the step size,
iteration stops just before it passes `b`.

FOR loops, like WHILE loops, do not create a full scope. The loop iterator
can shadow other variables, but all other variables created inside the loop
exist in its enclosing context.

Starting and terminating limits may be any valid integer value. Step is
limited to the range of a signed 16-bit integer, from -32768 to 32767. Loops
that will iterate forever (step = 0) or for a very long time (overflowing and
wrapping around) are not guaranteed to produce errors.

### `GOSUB f[(args...)] [AS s]`

Pushes a new stack frame and jumps to label `f`. Label must be named, not
numeric. `args...` should be formatted as `key1 = value1, key2 = value2, ...`;
each argument will be added to the new scope as a variable. On return, if a
value has been returned it will be stored in `s`, which will be created if
necessary.

### `IF cond [THEN|GOTO] ... [ELSE ...]

If the value of `cond` is true, `THEN` or `GOTO` is executed (`GOTO` is
equivalent to `THEN GOTO`). If it is false, then the interpreter moves ahead
to `ELSE`, or to the end of the line (whichever comes next), and resumes
execution.

IF statmeents may nest. When scanning for ELSE, the nesting level is tracked
and scanning only stops if the nesting level is zero. This translates to the
GW-BASIC equivalent behavior: if the number of ELSE and THEN sections are not
matched, then each ELSE pairs with the nearest unmatched THEN.

It is not permitted to include Scanning Statements inside an IF statement.

### `[LET] ident = val`

This is the default keyword: the `LET` itself may be omitted.

### `ON expr GOTO n1, n2, ...`

`n1` and so on can be anything valid to appear after `GOTO`. If `expr` is
not positive or if it is too high, proceed directly to the next statement.
There is no bound for the number of choices.

### `ON expr GOSUB [AS s] s1, s2, ...`

See also `ON expr GOTO`. If `AS` is given, the return value of the subroutine
will be placed in this variable (if the subroutine returns nothing, the
variable is not created or modified). Each choice may have arguments:

```
ON x GOSUB AS rtn FOO(x), FOO(x + 2), BAR(x)
```

### `PACK` and `UNPACK`

These work a bit like Python's struct.pack/struct.unpack and can be used to
assemble arbirary binary data. TODO: format TBD.

### `RANDOMIZE` and `RND`

If the seed is not specified for `RANDOMIZE`, it will be initialized with the
system's preferred random seed source. Not all systems guarantee good quality
seeds.

### `READ` and `DATA`

Because Littlescript doesn't have access to the entire program at once like
a traditional BASIC, READ/DATA work a bit differently. By default, the read
pointer is uninitialized; on the first `READ`, the interpreter seeks forward
to the first `DATA` statement. Once it _executes_ the `DATA` statement, the
read pointer is reset. Every stack context has its own read pointer. The
intended use is for every group of `READ` statements to have a set of `DATA`
statements just a short while after.

Note that if the input medium is slow, for example a serial EEPROM, the
distance between the beginning of the `READ` statement and the end of its
associated `DATA` statements should be small enough to fit in whatever cache
the implementation uses. Otherwise, READ and DATA operations could be quite
slow.

### `REM`

Comments may also start with an apostrophe (`'`), and apostrophe comments
may appear anywhere in a line.

Note that the minifier strips out all apostrophe comments, but may be
configured to leave alone `REM` comments. It is therefore recommended that
any important comments that must remain in the minified text use `REM`.

## Keywords

TODO: list all valid keywords and their abbreviations