Auto LSP
A Rust crate for creating Abstract Syntax Trees (AST) and Language Server Protocol (LSP) servers.
auto_lsp
is designed to be as language-agnostic as possible, allowing any Tree-sitter grammar to be used.
Defining a simple AST involves two steps: writing the queries and then defining the corresponding AST structures in Rust.
cargo add auto_lsp
Quick example
Let's say you have a toy language with a root node named document containing a list of function nodes, each containing a unique name.
A simple query file to capture the root document and function names:
(document) @document
(function
(name) @name) @function
The corresponding AST definition in Rust:
use auto_lsp::seq;
#[seq(query = "document")]
struct Document {
functions: Vec<Function>
}
#[seq(query = "function")]
struct Function {
name: Name
}
#[seq(query = "name")]
struct Name {}
Now that you have your AST defined, you can:
- Implement the AST traits and create a LSP server (with the
lsp_server
feature). - Add your own logic for testing purposes, code_generation, etc.
Simplicity
auto-lsp
only has 2 macros to define an AST:
All symbols are thread-safe and have their own parse function via blanket implementations. This means any symbol can be used as a root node, allowing you to:
- Create a full AST from any Tree-sitter grammar.
- Derive a subset of the grammar, depending on your needs.
However, this level of flexibility and permissiveness comes with some caveats. It can be more prone to errors and requires careful attention when writing your queries.
To address this, auto_lsp
provides testing and logging utilities to help you ensure that the AST behaves as intended.
Features
deadlock_detection
: Enableparking_lot
's deadlock detection (not compatible withwasm
).log
: Enable logging. (usesstderrlog
)lsp_server
: Enable the LSP server (useslsp_server
).rayon
: Enablerayon
support (not compatible withwasm
).wasm
: Enable wasm support.html
: Enable the html workspace mock for testing purposes.python
: Enable the python workspace mock for testing purposes.
Creating an AST
In this chapter, we will see how to create an AST using the auto_lsp
crate.
Any AST is created in two main steps:
- Write the query to capture the nodes you are interested in.
- Define the corresponding AST structures in Rust.
There is no specific order to follow; you can perform both steps in parallel.
In the next chapter, we will explore how to write the core query to capture the nodes.
Core Query
When defining the main query for creating the AST, it is important to keep in mind that auto_lsp
captures nodes in the order they appear in the Tree-sitter tree.
The following query works as expected:
(document) @document
(function
(identifier) @name) @function
Duplicate nodes
If you use common nodes like identifier, Tree-sitter will capture them multiple times.
Given the following AST:
use auto_lsp::seq;
#[seq(query = "document")]
struct Document {
functions: Vec<Function>,
}
#[seq(query = "function")]
struct Function {
name: Identifier,
}
#[seq(query = "identifier")]
struct Identifier {}
The core query could be written as:
(document) @document
(function
(identifier) @name) @function
(identifier) @identifier
In this case, identifier will be captured twice, once as a name and once as an identifier — which will result in an unknown symbol error.
You can resolve this in two ways:
1 - Constrain the Capture
Use one of Tree-sitter's operators
or predicates
to constrain the capture of duplicate nodes.
2 - Merge parts of the Query
Remove the name capture, since name is already an identifier:
(document) @document
(function) @function
(identifier) @identifier
Anonymous nodes
Sometimes, Tree-sitter has anonymous nodes that are not visible in the tree or can't be captured via queries.
In this case, you can identify the part where the anonymous rules occur, add a wildcard node, and create a #seq
node to handle it.
If a field is already defined, this makes it even easier.
(function
"body" (_) @body) @function
(identifier) @identifier
use auto_lsp::seq;
#[seq(query = "function")]
struct Function {
body: Body,
}
#[seq(query = "body")]
struct Body {
/* ... */
}
Aliased nodes
When creating a new tree-sitter grammar, be cautious with aliased nodes.
Tree-sitter allows a single node type to represent multiple different syntax structures through aliasing.
However, this creates a problem: you can only write one query definition per node type, and tree-sitter doesn't provide a way to determine if a node is using its primary type or an alias.
This limitation means that if you use aliased nodes, your AST might not accurately represent the different syntactic structures in your code.
Seq macro
#seq
is used to define a sequence of nodes and it only works with structs.
Fields can be named however you want, auto_lsp
relies on query names to build the AST.
A field can either be:
- Another struct (a nested sequence of nodes).
- An enum (a choice between multiple sequences of nodes).
- A
Vec
of structs or enums built with the same macros. - An
Option
of a struct or enum.
A Vec
can contain 0 or any number of elements.
Since tree sitter already defines repeat and repeat1, a repeat1 would return an error from the tree-sitter lexer if the Vec
is empty.
use auto_lsp::core::ast::*;
use auto_lsp::{seq, choice};
#[seq(query = "document")]
struct Document {
// A simple field
name: Identifier,
// A vec of structs
functions: Vec<Function>,
// A vec of enums
elements: Vec<Element>,
// An optional field
return_type: Option<Type>,
}
#[seq(query = "function")]
struct Function {}
#[choice]
enum Element {
Statement(Statement),
Expression(Expression),
}
#[seq(query = "statement")]
struct Statement {}
#[seq(query = "expression")]
struct Expression {}
#[seq(query = "type")]
struct Identifier {}
#[seq(query = "type")]
struct Type {}
Seq Attributes
query
: The name of the query used to capture the node.
All other attributes are optional. By default #seq
will generate an empty version of each trait.
(Since rust does not have stable specialization)
When an attribute is provided, the corresponding trait must be implemented manually.
To activate an attribute, just add it to any #seq
macro parameters:
use auto_lsp::seq;
use auto_lsp::core::ast::{BuildDocumentSymbols, BuildCodeActions};
use auto_lsp::core::document_symbols_builder::DocumentSymbolsBuilder;
// Here, we tell auto_lsp that document_symbols and code_actions
// will be implemented manually.
// If an attribute is declared but no implementation is provided,
// your code won't compile.
#[seq(query = "function",
document_symbols,
code_actions,
)]
struct Function {}
impl BuildDocumentSymbols for Function {
fn build_document_symbols(&self, doc: &Document, builder: &mut DocumentSymbolsBuilder {
/* ... */
}
}
impl BuildCodeActions for Function {
fn build_code_actions(&self, doc: &Document, acc: &mut Vec<lsp_types::CodeAction>) {
/* ... */
}
}
LSP traits
code_actions
:BuildCodeActions
trait.code_lenses
:BuildCodeLenses
trait.completions
:BuildCompletionItems
trait.declaration
:GetGoToDeclaration
trait.definition
:GetGoToDefinition
trait.document_symbols
:BuildDocumentSymbols
trait.hover
:GetHover
trait.inlay_hints
:BuildInlayHints
trait.invoked_completions
:BuildInvokedCompletionItems
trait.semantic_tokens
:BuildSemanticTokens
trait.
Special traits
-
comment
: mark this node as a node that can potentially contain a comment. If the comments query is provided in the parser configuration, comments found above the node will be attached to it. -
check
:Check
trait.
The Check
trait is a special trait used to validate a symbol.
When implemented, auto_lsp will execute the check method to verify the symbol's validity.
check
returns a CheckStatus
to indicate whether the validation was successful. To add diagnostics, push them into the provided diagnostics
vector.
use auto_lsp::seq;
use auto_lsp::core::ast::{Check, CheckStatus};
#[seq(query = "document", check)]
struct Document {}
impl Check for Document {
fn check(
&self,
doc: &Document,
diagnostics: &mut Vec<lsp_types::Diagnostic>,
) -> CheckStatus {
let source = doc.texter.text.as_bytes();
let document_text = self.read().get_text(source);
if document_text.starts_with("Hello, World") {
return CheckStatus::Ok
} else {
diagnostics.push(lsp_types::Diagnostic {
range: self.get_lsp_range(document),
severity: Some(lsp_types::DiagnosticSeverity::Error),
message: "Document must start with 'Hello, World'".to_string(),
..Default::default()
});
return CheckStatus::Fail;
}
}
}
Choice Macro
#choice
is used to define a choice between multiple sequences of nodes, it only works with enums.
Unlike #seq
, #choice
does not have any attribute.
Instead, choice will try to find the correct variant at runtime by testing the query name of each variant.
Then the underlying variant will have the according trait methods called if implemented.
Variants behave similarly to #seq
fields, they can be named however you want, only the value is important.
use auto_lsp::{seq, choice};
#[choice]
enum Element {
AStatement(Statement),
SimpleExpression(Expression),
}
#[seq(query = "statement")]
struct Statement {}
#[seq(query = "expression")]
struct Expression {}
Pattern Matching
The #[choice]
attribute generates standard Rust enums that fully support pattern matching. This makes it easy to work with nested AST structures.
For example, consider an expression that can contain nested types:
#[choice]
pub enum Expression {
PrimaryExpression(PrimaryExpression),
Identifier(Identifier),
}
#[choice]
pub enum PrimaryExpression {
Integer(Integer),
Bool(Bool)
}
You can pattern match through multiple layers using standard Rust match expressions:
impl Expression {
pub fn is_integer(&self) -> bool {
matches!(self, Expression::PrimaryExpression(PrimaryExpression::Integer(_)))
}
pub fn is_bool(&self) -> bool {
matches!(self, Expression::PrimaryExpression(PrimaryExpression::Bool(_)))
}
}
The AST tree
The AST Tree is a linked list of strongly typed nodes.
Each node is a Symbol<T>
where T
is a type implementing AstSymbol
.
When using one of the #seq
, or #choice
macros, auto_lsp
will generate two types of symbols:
- The symbol itself with thread safe fields
- The builder associated with the symbol
For example a struct named Module with an optional Function field:
use auto_lsp::core::seq;
#[seq(query = "module")]
struct Module {
function: Option<Function>,
}
#[seq(query = "function")]
struct Function {}
This would generate:
#[derive(Clone)]
pub struct Module {
pub function: Option<Symbol<Function>>,
}
#[derive(Clone)]
pub struct ModuleBuilder {
function: MaybePendingSymbol,
}
Interacting with the tree
Static and Dynamic Symbols
The AST tree is composed of 3 types of symbols:
- Static symbols:
Symbol<T>
where T implementsAstSymbol
. - Dynamic symbols:
DynSymbol
which is a trait object that wraps aSymbol<T>
. - Weak symbols:
WeakSymbol
which is a weak reference to aDynSymbol
.
Dynamic symbols implement downcasting thanks to the downcast_rs
crate.
Weak symbols can be upgraded to a dynamic symbol using the WeakSymbol::to_dyn
method.
Static symbols offer better performance due to static dispatch and type safety, while dynamic symbols are useful for referencing symbols anywhere in the tree or performing method calls without needing to worry about the type.
Walking the tree
While the tree does not implement iterators, it still provides methods to locate a node or walk inside:
descendant_at
: Find the lowest node in the tree at the given offset.descendant_at_and_collect
: Find the lowest node in the tree at the given offset and clones all nodes matching the closure's condition.traverse_and_collect
: Find the lowest node in the tree at the given offset and clone all nodes that match the closure's condition.
All methods that imply walking the tree will return a DynSymbol
that can be downcasted to the desired type.
In addition, all symbols have a get_parent
mmethod to retrieve the parent symbol.
Since the parent might be dropped, which could invalidate the child nodes, a WeakSymbol
returned and must be upgraded to a DynSymbol
when used.
It is strongly discouraged to store symbols or manually edit them, as this may lead to memory leaks or an inconsistent tree.
Lexer
The tree-sitter lexer handles syntax analysis and reports:
- Syntax errors
- Missing nodes
- Invalid token sequences
auto_lsp
requires a valid Concrete Syntax Tree (CST) from tree-sitter to generate an AST.
Automatic errors
During AST construction, auto_lsp
automatically detects and reports errors:
There are 2 types of errors:
-
Missing Fields: Occurs when required fields in an AST node aren't matched by the query
-
Query Mismatch: Happens when query captures don't align with the AST structure.
Workspace and Document
To test or run your queries and your AST, you need to create a Workspace
.
The Workspace
in auto-lsp
is not related to a Visual Studio Code workspace.
Instead, it refers to an internal structure that contains the AST and related information, such as diagnostics and parsers."
The document and tree-sitter parts are stored in the Document
struct.
Workspace struct
Workspace
contains the following fields:
url
: The url of the document associated with the workspace.parsers
: A staticParsers
struct that contains all the necessary tools to generate an AST.diagnostics
: A list of diagnostics kept in sync with the AST.ast
: The AST (if any).unsolved_checks
: A list of symbols that still need to be resolved.unsolved_references
: A list of references that still need to be resolved.changes
: A list of last changes made to the AST.
Additionally, a Workspace
has also a few useful methods:
parse
: Parse a givenDocument
and generate the AST.
It's preferable to use from_utf8
or from_texter
to create a workspace, see #creating-a-workspace.
get_tree_sitter_errors
: Get all errors from the tree-sitter parser and converts them to Diagnostics (acts as a lexer).set_comments
: Find all comments in the document and attempts to attach them to corresponding nodes.find_all_with_regex
: Find all slices that match a given regex.
Document struct
Struct has the following fields:
texter
: a texter struct that stores the document.tree
: The tree-sitter tree.
Configuring Parsers
In order to create a workspace, you need to configure the parsers that will be used to create an AST.
To simplify parser configuration, you can use the configure_parsers!
macro to create a list of parsers.
configure_parsers!
takes as first argument the name of the list, then each entry is a parser configuration.
specifies the language for which the parser will be used and the value is a struct that contains the following fields:
A parser requires the following informations:
- The tree-sitter language fn.
- The node types.
- The AST root node (often Module, Document, SourceFile nodes ...).
- The core query associated with the AST.
Optional fields include:
- The comment query.
- The fold query.
- The highlights query.
The fold
query is used to define code folding regions, while the highlights
query can be used to specify syntax highlighting rules.
Example
The following example demonstrates how to configure a parser for the Python language:
use auto_lsp::seq;
use auto_lsp::configure_parsers;
use auto_lsp::core::ast::*;
static CORE_QUERY: &'static str = "
(module) @module
(function_definition
name: (identifier) @function.name) @function
";
static COMMENT_QUERY: &'static str = "
(comment) @comment
";
#[seq(query = "module")]
struct Module {}
configure_parsers!(
MY_PARSER_LIST,
"python" => {
language: tree_sitter_python::LANGUAGE,
node_types: tree_sitter_python::NODE_TYPES,
ast_root: Module,
core: CORE_QUERY,
comment: Some(COMMENT_QUERY),
fold: None,
highlights: None
}
);
Creating a workspace
Once you are done configuring the parsers, you can create a workspace using the Workspace
struct.
Workspace
will create both the AST and virtual document, returned as a tuple of (Workspace
, Document
).
Use from_utf8
method when you want to create a workspace from raw source code as a string.
If you have a Texter instance, use from_texter
instead.
use auto_lsp::core::ast::*;
use auto_lsp::core::workspace::Workspace;
use lsp_types::Url;
let source_code = r#"function foo() {}"#;
// From a string
let (workspace, document) = Workspace::from_utf8(
&PARSER_LIST.get("python").unwrap(),
Url::parse("file://test").unwrap(),
source_code.into(),
).unwrap();
// From Texter
use auto_lsp::texter::core::text::Texter;
let texter = Texter::new(source_code);
let (workspace, document) = Workspace::from_texter(
&PARSER_LIST.get("python").unwrap(),
Url::parse("file://test").unwrap(),
texter,
).unwrap();
Updating a document
Use document.update()
to process document changes:
update
takes two parameters:
- The tree-sitter parser instance.
- A list of
lsp_types::TextDocumentChangeEvent
changes.
#![allow(unused)] fn main() { let change = lsp_types::TextDocumentContentChangeEvent { range: Some(lsp_types::Range { start: lsp_types::Position { line: 0, character: 0, }, end: lsp_types::Position { line: 0, character: 0, }, }), range_length: Some(26), text: "<div></div>".into(), }; // Apply changes and get edits // this list can then be passed to a Workspace let edits = document .update( &mut workspace.parsers.tree_sitter.parser.write(), &vec![change], ) .unwrap(); }
Updating a workspace
After document changes, update the workspace using the parse
method.
#![allow(unused)] fn main() { workspace.parse(Some(&edits), &document); }
Configuring Document Links
Document links are declared outside the AST.
auto-lsp
enables finding document links by running a regular expression on the comments.
Example
// Create a document or use an existing one
let (workspace, document) = Workspace::from_utf8(
&PARSER_LIST.get("HTML").unwrap(),
Url::parse("file://index.html").unwrap(),
r#"<!DOCTYPE html>
<!-- source:file1.txt:52 -->
<div>
<!-- source:file2.txt:25 -->
</div>"#
.into()
).unwrap();
let regex = Regex::new(r" source:(\w+\.\w+):(\d+)").unwrap();
let results = workspace.find_all_with_regex(&document, ®ex);
assert_eq!(results.len(), 2);
Configuring Semantic Tokens
To configure semantic tokens, you need to use the define_semantic_token_types
and define_semantic_token_modifiers
macros.
Token Types
use auto_lsp::define_semantic_token_types;
define_semantic_token_types![
standard {
"namespace" => NAMESPACE,
"type" => TYPE,
"function" => FUNCTION,
}
custom {
"custom" => CUSTOM,
}
];
This macro generates three components to streamline working with semantic token types:
- Constants: Creates a constant for each standard and custom token type.
- Supported Token Types: Generates a slice (
SUPPORTED_TYPES
) containing all supported token types that can be reused to inform the LSP client about available tokens.
Token Modifiers
use auto_lsp::define_semantic_token_modifiers;
define_semantic_token_modifiers![
standard {
DOCUMENTATION,
DECLARATION,
}
custom {
(READONLY, "readonly"),
(STATIC, "static"),
}
];
This generates:
- Constants for standard (
DOCUMENTATION
,DECLARATION
) and custom (READONLY
,STATIC
) modifiers. - A
SUPPORTED_MODIFIERS
slice that includes both standard and custom modifiers.
Example in AST
use auto_lsp::semantic_tokens::{SemanticToken, SemanticTokenType, SemanticTokenModifier};
define_semantic_token_types![
standard {
FUNCTION,
}
custom {}
];
define_semantic_token_modifiers![
standard {
DECLARATION,
}
custom {}
];
impl BuildSemanticTokens for MyType {
fn build_semantic_tokens(&self, doc: &Document, builder: &mut SemanticTokensBuilder) {
builder.push(
self.name.read().get_lsp_range(doc),
SUPPORTED_TYPES.iter().position(|x| *x == FUNCTION).unwrap() as u32,
SUPPORTED_MODIFIERS.iter().position(|x| *x == DECLARATION).unwrap() as u32,
);
}
}
LSP Server Initialization
To inform the LSP client about the supported token types and modifiers, you need to pass the SemanticTokensList
to the LspOptions
struct.
fn main() -> Result<(), Box<dyn Error + Sync + Send>> {
let mut session = Session::create(InitOptions {
parsers: &PYTHON_PARSERS,
lsp_options: LspOptions {
semantic_tokens: Some(SemanticTokensList {
semantic_token_types: SUPPORTED_TYPES,
semantic_token_modifiers: SUPPORTED_MODIFIERS,
})
..Default::default()
},
})?;
}
Configuring a server
Starting a server
auto-lsp
uses lsp_server
from rust analyzer and crossbeam
to launch the server.
To configure the lsp_server
, you need to use the create
method from the Session
struct wich takes 2 arguments.
Parsers
: A list of parsers (previously defined with theconfigure_parsers!
macro)LspOptions
: Options to configure the LSP server, see LSP Options.
To start a session, you need to provide the InitOptions struct.
```rust
use std::error::Error;
use auto_lsp::server::{InitOptions, LspOptions, Session};
use auto_lsp::python::PYTHON_PARSERS;
fn main() -> Result<(), Box<dyn Error + Sync + Send>> {
let mut session = Session::create(InitOptions {
parsers: &PYTHON_PARSERS,
lsp_options: LspOptions {
document_symbols: true,
diagnostics: true,
..Default::default()
},
})?;
// Run the server and wait for the two threads to end (typically by trigger LSP Exit event).
session.main_loop()?;
session.io_threads.join()?;
// Shut down gracefully.
eprintln!("Shutting down server");
Ok(())
}
LSP Options
The LspOptions
struct contains various settings to enable or disable different LSP features like diagnostics, document symbols, and more.
Depending on how your AST is structured, all requests are fullfiled automatically.
Just 2 options require specific implementations:
Document Links
Configuring Document Links requires a RegexToDocumentLink
struct.
use auto_lsp::server::{RegexToDocumentLink, Session};
use auto_lsp::core::document::Document;
use auto_lsp::core::workspace::Workspace;
use auto_lsp::lsp_types::{DocumentLink, Url};
use auto_lsp::regex::Regex;
let regex = Regex::new(r"(\w+):(\d+)").unwrap();
fn to_document_link(m: regex::Match, line: usize, document: &Document, workspace: &Workspace, acc: &mut Vec<DocumentLink>) -> lsp_types::DocumentLink {
lsp_types::DocumentLink {
data: None,
tooltip: Some(m.as_str().to_string()),
target:None,
range: lsp_types::Range {
start: lsp_types::Position {
line: line as u32,
character: m.start() as u32,
},
end: lsp_types::Position {
line: line as u32,
character: m.end() as u32,
},
},
}
}
RegexToDocumentLink {
regex,
to_document_link,
};
Semantic Tokens
Semantic Tokens that are defined previously with the define_semantic_token_types!
and define_semantic_token_modifiers!
macros
must be provided to the LSP Server.
use auto_lsp::lsp_types::SemanticTokenType;
use auto_lsp::define_semantic_token_types;
use phf::phf_map;
define_semantic_token_types! {
standard {
"namespace" => NAMESPACE,
"type" => TYPE,
"function" => FUNCTION,
}
}
define_semantic_token_modifiers![standard {
"declaration" => DECLARATION,
"readonly" => READONLY,
}];
let lsp_options = LspOptions {
semantic_tokens: Some (SemanticTokensList {
token_types: &TOKEN_TYPES,
token_modifiers: &TOKEN_MODIFIERS,
}),
..Default::default()
},
Configuring a client
Acknowledgement
Thanks to texter
crate, the client can send text in any encoding to the server.
texter
also provides an efficient way to update documents incrementally.
File extensions
The LSP server must know how each file extensions are associated with a parser.
The client is responsible for sending this information to the server.
Using VScode
LSP client, this is done via providing perFileParser
object in the initializationOptions
of LanguageClientOptions
.
import { LanguageClient, LanguageClientOptions, ServerOptions, RequestType } from 'vscode-languageclient/node';
// We tell the server that .py files are associated with the python parser defined via the configure_parsers! macro.
const initializationOptions = {
perFileParser: {
"py": "python"
}
}
const clientOptions: LanguageClientOptions = {
documentSelector: [{ language: 'python' }],
synchronize: {
fileEvents: workspace.createFileSystemWatcher('**/*.py')
},
outputChannel: channel,
uriConverters: createUriConverters(),
initializationOptions
};
You have an example in the vscode-python-wasi-lsp
folder in the auto-lsp
repository.
Logging
To ensure compatibility with WebAssembly, all logs use stderr output.
Logs are displayed with a timestamp and log levels.
Logs show in real time:
- All nodes received from the Core query, including unknown nodes.
- Number of unsolved checks and refercnes.
Reports
Each symbol defined in the AST can be tested independently using the test_parse
method.
use auto_lsp::core::ast::*;
use auto_lsp::{seq, choice};
#[seq(query = "document")]
struct Document {
functions: Vec<Function>,
}
#[seq(query = "function")]
struct Function {
name: Identifier,
}
#[seq(query = "identifier")]
struct Identifier {}
// Test Function independently from Document
#[test]
fn function() -> TestParseResult {
Function::test_parse(
r#"function foo()"#)
}
If test_parse
fails, an Ariadne report is generated.
This report contains both the error locations and the AST tree, making it easier to diagnose parsing issues
Example with a type error in Python
#[test]
fn function() -> TestParseResult {
Function::test_parse(
r#"
def foo(param1, param2: int = "5"):
pass
"#,
&PYTHON_PARSERS.get("python").unwrap(),
)
}
This code will return the following error when running tests:

Targets
auto-lsp
has been tested on both windows and linux targets.
If you plan to use WebAssembly, you can use the vscode Wasi Lsp
wich runs on was32-wasip1-threads
target.
You'll also need to enable the wasm
featuree.
Note that some functionalities, such as deadlock detection are not available on WebAssembly.
You have an example in the vscode-python-wasi-lsp
folder in the auto-lsp
repository.