lfe_scan.erl - Lexical Scanner
Purpose: Convert raw LFE source text into a stream of tokens for parsing.
Location: src/lfe_scan.erl
Size: 897 LOC, 35KB
Module Classification: Compiler frontend, lexical analysis
Public API
Primary Scanning Functions:
string(String) -> {ok, Tokens, Line} | {error, ErrorInfo, Line}
string(String, StartLine) -> Result
string(String, StartLine, Options) -> Result
Tokens = [Token]
Token = {Type, Line, Value}
Type = symbol | number | string | binary | '(' | ')' | '[' | ']' | ...
Scan a complete string. Located at lfe_scan.erl:66-68.
Incremental Scanning (for REPL):
token(Continuation, Chars) -> {more, Continuation1}
| {done, Result, RestChars}
tokens(Continuation, Chars) -> {more, Continuation1}
| {done, Result, RestChars}
Continuation-based scanning for streaming input. Located at lfe_scan.erl:45-54.
Token Types
Delimiters: '(', ')', '[', ']', '.'
Special Syntactic Markers:
'\''- Quote ('expr)''- Backquote (``expr ``)','- Unquote (,expr)',@'- Unquote-splicing (,@expr)
Hash Forms:
'#('- Tuple literal#(a b c)'#.'- Eval-at-read#.(+ 1 2)'#B('- Binary literal#B(42 (f 32))'#M('- Map literal#M(a 1 b 2)'#\''- Function reference#'module:function/arity
Literals:
symbol- Atoms and identifiers (foo,foo-bar,|complex symbol|)number- Integers and floats (42,3.14,#2r1010,#16rFF)string- String literals ("hello","""multi\nline""")binary- Binary strings (#"bytes")
Token Structure
Each token is a tuple: {Type, Line, Value} or {Type, Line} for delimiters.
Examples:
{symbol, 1, foo}
{number, 1, 42}
{string, 2, "hello"}
{'(', 1}
Special Features
Triple-Quoted Strings (lfe_scan.erl:394-450):
Supports Python/Elixir-style triple-quoted strings:
"""
Multi-line string
with "quotes" inside
"""
Quoted Symbols (lfe_scan.erl:350-365):
Allows arbitrary characters in symbol names:
|complex-symbol-name!@#$|
|with spaces|
Based Numbers (lfe_scan.erl:473-493):
Supports bases 2-36:
#2r1010 ; Binary
#8r755 ; Octal
#16rDEADBEEF ; Hexadecimal
#36rZZZ ; Base-36
Character Literals (lfe_scan.erl:552-573):
#\a ; Character 'a'
#\n ; Newline
#\x41 ; Hex character code
Elixir Module Name Hack (lfe_scan.erl:542-549):
Transforms #Emodule → 'Elixir.module' for Elixir interop:
#EEnum ; Becomes 'Elixir.Enum'
Comment Handling
Line Comments (lfe_scan.erl:277-282):
; This is a comment
;; Also a comment
Block Comments (lfe_scan.erl:283-311):
#|
Block comment
can span multiple lines
|#
Nested Block Comments: Supported via counter tracking.
Internal Structure
Scanner State:
-record(lfe_scan, {}). % Currently unused, reserved for future
The scanner uses functional continuation passing for incremental parsing rather than explicit state records.
Key Functions:
scan/3(line 136): Main dispatch loopscan_symbol/4(line 236): Symbol scanningscan_number/4(line 452): Number parsingscan_string/5(line 375): String literal handlingscan_comment/3(line 277): Comment skipping
Dependencies
Erlang stdlib:
lists- List operationsstring- String manipulationunicode- UTF-8 handling
No LFE module dependencies - Scanner is self-contained.
Used By
lfe_parse- Consumes token streamlfe_comp- Via parse (indirectly)lfe_shell- For REPL inputlfe_io:read*functions
Key Algorithms
Continuation-Based Scanning (lfe_scan.erl:45-54):
The scanner supports incremental input for REPL use:
{more, Continuation} % Need more input
{done, {ok, Token, Line}, RestChars} % Token complete
This allows reading from a terminal where input arrives character-by-character.
Symbol Validation (lfe_scan.erl:601-645):
Symbols must:
- Not start with digits (unless quoted)
- Not contain special delimiters (unless quoted)
- Support Unicode characters
- Handle reserved characters in quoted form
Number Parsing (lfe_scan.erl:452-535):
Supports:
- Decimal integers:
42,-17 - Floats:
3.14,1.5e10,6.022e23 - Based integers:
#16rFF,#2r1010 - Sign handling for all numeric forms
Special Considerations
Performance:
- Scanner is O(n) in input length
- Minimal memory allocation (continuation-based)
- Hot path: symbol scanning (most common token)
Unicode Support:
- Full UTF-8 support in strings and comments
- Symbol names can include Unicode characters
- Proper handling of multi-byte sequences
Error Recovery:
- Errors include line numbers for reporting
- Invalid characters reported with context
- Unterminated strings/comments detected
Compatibility:
- Inherited structure from Erlang's
erl_scan - Extended with LFE-specific features (hash forms, quoted symbols)