关于本指南
本指南旨在帮助记录rustc——Rust编译器的工作方式,并帮助新的参与者参与rustc开发。
本指南分为六个部分:
- 构建,调试和为rustc贡献代码:包含无论您准备如何作出贡献,这部分信息都应该是有用的,例如作出贡献的一般过程,如何构建编译器等。
- 编译器的高层架构:讨论编译器的高级架构和编译过程的各个阶段。
- 源码表示:描述了从用户那里获取源代码并将其转换为编译器可以轻松使用的各种形式的过程。
- 分析:讨论编译器用来检查代码的各种属性并通知编译过程的后期阶段(例如,类型检查)的分析过程。
- 从MIR到Binaries:如何生成链接好的可执行机器代码。
- 附录:其中有大量实用的参考信息,包括词汇表。
该指南本身当然也是开源的,可以在GitHub repository中找到本书的源码。如果您在本书中发现任何错误,请开一个issue,或更好的,开一个带有更正内容的PR!
其他信息
以下站点也可能对您来说有用:
- rustc API docs -- 编译器的rustdoc文档
- Forge -- 包含rust基础设施、团队工作流程、以及更多
- compiler-team -- rust编译器团队的基地,其中包含对团队工作流程,活动中的工作组和团队日历的描述。
第1部分:构建,调试和向rustc
贡献代码
rustc-dev-guide的这一部分包含对您无论您在编译器的哪个部分上工作都很有用的知识。 这既包括 技术信息和提示(例如,如何编译和调试编译器)和Rust项目中的工作流程的信息(例如,稳定性和编译器小组的信息)。
About the compiler team
rustc is maintained by the Rust compiler team. The people who belong to this team collectively work to track regressions and implement new features. Members of the Rust compiler team are people who have made significant contributions to rustc and its design.
Discussion
Currently the compiler team chats in 2 places:
- The
t-compiler
stream on the Zulip instance - The
compiler
channel on the rust-lang discord
Expert map
If you're interested in figuring out who can answer questions about a particular part of the compiler, or you'd just like to know who works on what, check out our experts directory. It contains a listing of the various parts of the compiler and a list of people who are experts on each one.
Rust compiler meeting
The compiler team has a weekly meeting where we do triage and try to generally stay on top of new bugs, regressions, and other things. They are held on Zulip. It works roughly as follows:
- Review P-high bugs: P-high bugs are those that are sufficiently important for us to actively track progress. P-high bugs should ideally always have an assignee.
- Look over new regressions: we then look for new cases where the compiler broke previously working code in the wild. Regressions are almost always marked as P-high; the major exception would be bug fixes (though even there we often aim to give warnings first).
- Check I-nominated issues: These are issues where feedback from the team is desired.
- Check for beta nominations: These are nominations of things to backport to beta.
- Possibly WG checking: A WG may give an update at this point, if there is time.
The meeting currently takes place on Thursdays at 10am Boston time (UTC-4 typically, but daylight savings time sometimes makes things complicated).
The meeting is held over a "chat medium", currently on zulip.
Team membership
Membership in the Rust team is typically offered when someone has been making significant contributions to the compiler for some time. Membership is both a recognition but also an obligation: compiler team members are generally expected to help with upkeep as well as doing reviews and other work.
If you are interested in becoming a compiler team member, the first thing to do is to start fixing some bugs, or get involved in a working group. One good way to find bugs is to look for open issues tagged with E-easy or E-mentor.
r+ rights
Once you have made a number of individual PRs to rustc, we will often offer r+ privileges. This means that you have the right to instruct "bors" (the robot that manages which PRs get landed into rustc) to merge a PR (here are some instructions for how to talk to bors).
The guidelines for reviewers are as follows:
- You are always welcome to review any PR, regardless of who it is
assigned to. However, do not r+ PRs unless:
- You are confident in that part of the code.
- You are confident that nobody else wants to review it first.
- For example, sometimes people will express a desire to review a PR before it lands, perhaps because it touches a particularly sensitive part of the code.
- Always be polite when reviewing: you are a representative of the Rust project, so it is expected that you will go above and beyond when it comes to the Code of Conduct.
high-five
Once you have r+ rights, you can also be added to the high-five
rotation. high-five is the bot that assigns incoming PRs to
reviewers. If you are added, you will be randomly selected to review
PRs. If you find you are assigned a PR that you don't feel comfortable
reviewing, you can also leave a comment like r? @so-and-so
to assign
to someone else — if you don't know who to request, just write r? @nikomatsakis for reassignment
and @nikomatsakis will pick someone
for you.
Getting on the high-five list is much appreciated as it lowers the review burden for all of us! However, if you don't have time to give people timely feedback on their PRs, it may be better that you don't get on the list.
Full team membership
Full team membership is typically extended once someone made many contributions to the Rust compiler over time, ideally (but not necessarily) to multiple areas. Sometimes this might be implementing a new feature, but it is also important — perhaps more important! — to have time and willingness to help out with general upkeep such as bugfixes, tracking regressions, and other less glamorous work.
如何构建并运行编译器
编译器使用 x.py
工具构建。您将需要安装Python才能运行它。 但是在此之前,如果您打算修改rustc
的代码,则需要调整编译器的配置。因为默认配置面向以用户而不是开发人员来进行构建。
创建一个 config.toml
先将 config.toml.example
复制为 config.toml
:
> cd $RUST_CHECKOUT
> cp config.toml.example config.toml
然后,您将需要打开这个文件并更改以下设置(根据需求不同可能也要修改其他的设置,例如llvm.ccache
):
[llvm]
# Enables LLVM assertions, which will check that the LLVM bitcode generated
# by the compiler is internally consistent. These are particularly helpful
# if you edit `codegen`.
assertions = true
[rust]
# This will make your build more parallel; it costs a bit of runtime
# performance perhaps (less inlining) but it's worth it.
codegen-units = 0
# This enables full debuginfo and debug assertions. The line debuginfo is also
# enabled by `debuginfo-level = 1`. Full debuginfo is also enabled by
# `debuginfo-level = 2`. Debug assertions can also be enabled with
# `debug-assertions = true`. Note that `debug = true` will make your build
# slower, so you may want to try individually enabling debuginfo and assertions
# or enable only line debuginfo which is basically free.
debug = true
如果您已经构建过了rustc
,那么您可能必须执行rm -rf build
才能使配置更改生效。
请注意,./x.py clean
不会导致重新构建LLVM。
因此,如果您的配置更改影响LLVM,则在重新构建之前,您将需要手动rm -rf build /
。
x.py
是什么?
x.py
是用于编排rustc
存储库的各种构建的脚本。
该脚本可以构建文档,运行测试并编译rustc
。
现在它替代了以前的makefile,是构建rustc
的首选方法。下面是利用x.py
来有效处理各种任务的常见方式。
本章侧重于提高生产力的基础知识,但是,如果您想了解有关x.py
的更多信息,请在此处阅读其README.md。
自举
要记住的一件事是rustc
是一个自举式编译器。
也就是说,由于rustc
是用Rust编写的,因此我们需要使用较旧版本的编译器来编译较新的版本。
特别是,新版本的编译器以及构建该编译器所需的一些组件,例如libstd
和其他工具,可能在内部使用一些unstable的特性,因此需要能使用这些unstable特性的特定版本。
因此编译rustc
需要分阶段完成:
-
Stage 0:stage0中使用的编译器通常是当前最新的的beta 版本
rustc
编译器及其关联的动态库(您也可以将x.py配置为使用其他版本的编译器)。 此stage 0编译器仅用于编译rustbuild
,std
和rustc
。编译
rustc
时,此stage0编译器使用新编译的std
。这里有两个概念:一个编译器(及其依赖)及其“目标”或“对象”库(
std
和rustc
)。两者均会在此阶段出现,但以交错方式进行。 -
Stage 1:然后使用stage0编译器编译你的代码库中的代码,以生成新的stage1编译器。
但是,它是使用较旧的编译器(stage0)构建的,因此为了优化stage1编译器,我们需要进入下一阶段。
-
从理论上讲,stage1编译器在功能上与stage2编译器相同,但实际上它们存在细微差别。
特别是,stage1中使用的编译器本身是由stage0编译器构建的,而不是由您的工作目录中的源构建的。
这意味着,在编译器源代码中使用的符号名称可能与stage1编译器生成的符号名称不匹配。
这在使用动态链接时非常重要(例如,带有derive的代码)。有时这意味着某些测试在与stage1运行时不起作用。
-
-
Stage 2:我们使用stage1中得到的编译器重新构建其自身,以产生具有所有最新优化的stage2编译器。 (默认情况下,我们复制stage1中的库供stage2编译器使用,因为它们应该是相同的。)
-
(可选)Stage 3:要完全检查我们的新编译器,我们可以使用stage2编译器来构建库。除非出现故障,否则结果应与之前相同。 要了解有关自举过程的更多信息,请阅读本章。
构建编译器
要完整构建编译器,请运行./x.py build
。 这将完成上述整个引导过程,并从您的源代码中生成可用的编译器工具链。 这需要很长时间,因此通常不需要真的运行这条命令(稍后会详细介绍)。
您可以将许多标志传递给x.py
的build命令,这些标志可以减少编译时间或适应您可能需要更改的其他内容。 他们是:
Options:
-v, --verbose use verbose output (-vv for very verbose)
-i, --incremental use incremental compilation
--config FILE TOML configuration file for build
--build BUILD build target of the stage0 compiler
--host HOST host targets to build
--target TARGET target targets to build
--on-fail CMD command to run on failure
--stage N stage to build
--keep-stage N stage to keep without recompiling
--src DIR path to the root of the rust checkout
-j, --jobs JOBS number of jobs to run in parallel
-h, --help print this help message
对于一些hacking,通常构建stage 1编译器就足够了,但是对于最终测试和发布,则使用stage 2编译器。
./x.py check
可以快速构建rust编译器。 当您执行某种“基于类型的重构”(例如重命名方法或更改某些函数的签名)时,它特别有用。
在创建了config.toml之后,就可以运行x.py
。 这里有很多选项,但让我们从构建本地rust的最佳命令开始:
./x.py build -i --stage 1 src/libstd
看起来好像它仅构建libstd
,但事实并非如此。该命令的作用如下:
- 使用stage0编译器构建
libstd
(使用增量) - 使用stage0编译器构建
librustc
(使用增量)- 这产生了stage1编译器
- 使用stage1编译器构建
libstd
(不能使用增量式)
最终产品 (stage1编译器+使用该编译器构建的库)是构建其他rust程序所需要的(除非使用#![no_std]
或#![no_core]
)。
该命令包括-i
开关,该开关启用增量编译。这将用于加快该过程的前两个步骤:特别是,如果您进行了较小的更改,我们应该能够使用您上一次编译的结果来更快地生成stage1编译器。
不幸的是,不能使用增量来加速stage1库的构建。
这是因为增量仅在连续运行同一编译器两次时才起作用。
在这种情况下,我们每次都会构建一个新的stage1编译器。
因此,旧的增量结果可能不适用。
您可能会发现构建stage1 libstd
对您来说是一个瓶颈 —— 但不要担心,这有一个(hacky的)解决方法。请参阅下面“推荐的工作流程”部分。
请注意,这整个命令只是为您提供完整rustc构建的一部分。完整的rustc构建(即./x.py build
命令)还有很多步骤:
- 使用stage1编译器构建librustc和rustc。
- 此处生成的编译器为stage2编译器。
- 使用stage2编译器构建
libstd
。 - 使用stage2编译器构建
librustdoc
和其他内容。
构建特定组件
只构建 libcore 库
./x.py build src/libcore
只构建 libcore 和 libproc_macro 库
./x.py build src/libcore src/libproc_macro
只构建到 Stage 1 为止的 libcore
./x.py build src/libcore --stage 1
有时您可能只想测试您正在处理的部分是否可以编译。 使用这些命令,您可以在进行较大的构建之前进行测试,以确保它可以与编译器一起使用。 如前所示,您还可以在末尾传递标志,例如--stage。
创建一个rustup工具链
成功构建rustc之后,您在构建目录中已经创建了一堆文件。
为了实际运行生成的rustc
,我们建议创建两个rustup工具链。
第一个将运行stage1编译器(上面构建的结果)。
第二个将执行stage2编译器(我们尚未构建这个编译器,但是您可能需要在某个时候构建它;例如,如果您想运行整个测试套件)。
rustup toolchain link stage1 build/<host-triple>/stage1
rustup toolchain link stage2 build/<host-triple>/stage2
<host-triple>
一般来说是以下三者之一:
- Linux:
x86_64-unknown-linux-gnu
- Mac:
x86_64-apple-darwin
- Windows:
x86_64-pc-windows-msvc
现在,您可以运行构建出的rustc
。 如果使用-vV
运行,则应该看到以-dev
结尾的版本号,表示从本地环境构建的版本:
$ rustc +stage1 -vV
rustc 1.25.0-dev
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.25.0-dev
LLVM version: 4.0
其他 x.py
命令
这是其他一些有用的x.py
命令。我们将在其他章节中详细介绍其中一些:
- 构建:
./x.py clean
– 清理构建目录 (rm -rf build
也能达到这个效果,但你必须重新构建LLVM)./x.py build --stage 1
– 使用stage 1 编译器构建所有东西,不止是libstd
./x.py build
– 构建 stage2 编译器
- 运行测试 (见 运行测试 章节):
-
./x.py test --stage 1 src/libstd
– 为libstd
运行#[test]
测试 -
./x.py test --stage 1 src/test/ui
– 运行ui
测试组 -
./x.py test --stage 1 src/test/ui/const-generics
- 运行ui
测试组下的const-generics/
子文件夹中的测试 -
./x.py test --stage 1 src/test/ui/const-generics/const-types.rs
- 运行
ui
测试组下的const-types.rs
中的测试
- 运行
-
清理构建文件夹
有时您需要重新开始,但是通常情况并非如此。
如果您感到需要这么做,那么其实有可能是rustbuild
无法正确执行,此时你应该提出一个bug来告知我们什么出错了。
如果确实需要清理所有内容,则只需运行一个命令!
./x.py clean
推荐的工作流程
完整的自举过程需要比较长的时间。 这里有三个建议能使您的生活更轻松。
Check, check,再 check
第一个工作流程在执行简单重构时非常有用,那就是持续地运行./x.py check
。
这样,您只是在检查编译器是否可以编译,但这通常就是您所需要的(例如,重命名方法时)。
然后,当您实际需要运行测试时,可以运行./x.py build
。
实际上,有时即使您不完全确定该代码将起作用,也可以暂时把测试放在一边。
然后,您可以继续构建重构,提交代码,并稍后再运行测试。
然后,您可以使用git bisect
来精确地跟踪导致问题的提交。
这种风格的一个很好的副作用是最后能得到一组相当细粒度的提交,所有这些提交都能构建并通过测试。 这通常有助于审核代码。
使用 --keep-stage
持续构建
有时仅检查编译器是否能够编译是不够的。
一个常见的例子是您需要添加一条debug!
语句来检查某些状态的值或更好地理解问题。
在这种情况下,您确实需要完整的构建。
但是,通过利用增量,您通常可以使这些构建非常快速地完成(例如,大约30秒)。
唯一的问题是,这需要一些伪造,并且可能会导致编译器无法正常工作(但是很容易检测和修复)。
所需的命令序列如下:
- 初始构建:
./x.py build -i --stage 1 src/libstd
- 如 上文所述,这将在运行所有stage0命令,包括括构建一个stage1编译器与与其相兼容的
libstd
, 并运行"stage 1 actions"的前几步,到“stage1 (sysroot stage1) builds libstd”为止。
- 如 上文所述,这将在运行所有stage0命令,包括括构建一个stage1编译器与与其相兼容的
- 接下来的构建:
./x.py build -i --stage 1 src/libstd --keep-stage 1
- 注意我们在此添加了
--keep-stage 1
flag
- 注意我们在此添加了
如前所述,-keep-stage 1
的作用是我们假设可以复用旧的标准库。
如果你修改的是编译器的话,几乎总是这样:毕竟,您还没有更改标准库。
但是有时候,这是不行的:例如,如果您正在编辑编译器的“元数据”部分,
该部分控制着编译器如何将类型和其他状态编码到rlib
文件中,
或者您正在编辑的部分会体现在元数据上(例如MIR的定义)。
TL;DR,使用--keep-stage 1
时您得到的结果可能会有奇怪的行为,
例如,奇怪的ICE或其他panic。
在这种情况下,您只需从命令中删除--keep-stage 1
,然后重新构建。
这样应该就能解决问题了。
使用系统自带的 LLVM 构建
默认情况下,LLVM是从源代码构建的,这可能会花费大量时间。 一种替代方法是使用计算机上已经安装的LLVM。
在 config.toml
中的 target
一节进行配置:
[target.x86_64-unknown-linux-gnu]
llvm-config = "/path/to/llvm/llvm-7.0.1/bin/llvm-config"
我们之前已经观察到以下路径,这些路径可能与您的系统上的不同:
/usr/bin/llvm-config-8
/usr/lib/llvm-8/bin/llvm-config
请注意,您需要安装LLVMFileCheck
工具,该工具用于代码生成测试。
这个工具通常是LLVM内建的,但是如果您使用自己的预装LLVM,则需要以其他方式提供FileCheck
。
在基于Debian的系统上,您可以安装llvm-N-tools
软件包(其中N
是LLVM版本号,例如llvm-8-tools
)。
或者,您可以使用config.toml
中的llvm-filecheck
配置项指定FileCheck
的路径,
也可以使用config.toml
中的codegen-tests
项禁用代码生成测试。
编译器自举
本小节与自举过程有关。
运行x.py
时,您将看到如下输出:
Building stage0 std artifacts
Copying stage0 std from stage0
Building stage0 compiler artifacts
Copying stage0 rustc from stage0
Building LLVM for x86_64-apple-darwin
Building stage0 codegen artifacts
Assembling stage1 compiler
Building stage1 std artifacts
Copying stage1 std from stage1
Building stage1 compiler artifacts
Copying stage1 rustc from stage1
Building stage1 codegen artifacts
Assembling stage2 compiler
Uplifting stage1 std
Copying stage2 std from stage1
Generating unstable book md files
Building stage0 tool unstable-book-gen
Building stage0 tool rustbook
Documenting standalone
Building rustdoc for stage2
Documenting book redirect pages
Documenting stage2 std
Building rustdoc for stage1
Documenting stage2 whitelisted compiler
Documenting stage2 compiler
Documenting stage2 rustdoc
Documenting error index
Uplifting stage1 rustc
Copying stage2 rustc from stage1
Building stage2 tool error_index_generator
在这里可以更深入地了解x.py
的各个阶段:
请记住,此图只是一个简化,即rustdoc
可以在不同阶段构建,当传递诸如--keep-stage
之类的标志或存在和宿主机类型不同的目标时,该过程会有所不同。
下表列出了各种阶段操作的输出:
Stage 0 动作 | Output |
---|---|
提取beta | build/HOST/stage0 |
stage0 构建 bootstrap | build/bootstrap |
stage0 构建 libstd | build/HOST/stage0-std/TARGET |
复制 stage0-std (HOST only) | build/HOST/stage0-sysroot/lib/rustlib/HOST |
stage0 使用stage0-sysroot 构建 rustc | build/HOST/stage0-rustc/HOST |
复制 stage0-rustc (可执行文件除外) | build/HOST/stage0-sysroot/lib/rustlib/HOST |
构建 llvm | build/HOST/llvm |
stage0 使用stage0-sysroot 构建 codegen | build/HOST/stage0-codegen/HOST |
stage0 使用stage0-sysroot 构建 rustdoc | build/HOST/stage0-tools/HOST |
--stage=0
到此为止。
Stage 1 动作 | Output |
---|---|
复制 (提升) stage0-rustc 可执行文件到 stage1 | build/HOST/stage1/bin |
复制 (提升) stage0-codegen 到 stage1 | build/HOST/stage1/lib |
复制 (提升) stage0-sysroot 到 stage1 | build/HOST/stage1/lib |
stage1 构建 libstd | build/HOST/stage1-std/TARGET |
复制 stage1-std (HOST only) | build/HOST/stage1/lib/rustlib/HOST |
stage1 构建 rustc | build/HOST/stage1-rustc/HOST |
复制 stage1-rustc (可执行文件除外) | build/HOST/stage1/lib/rustlib/HOST |
stage1 构建 codegen | build/HOST/stage1-codegen/HOST |
--stage=1
到此为止。
Stage 2 动作 | Output |
---|---|
复制 (提升) stage1-rustc 可执行文件 | build/HOST/stage2/bin |
复制 (提升) stage1-sysroot | build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST |
stage2 构建 libstd (除 HOST?) | build/HOST/stage2-std/TARGET |
复制 stage2-std (非 HOST 目标) | build/HOST/stage2/lib/rustlib/TARGET |
stage2 构建 rustdoc | build/HOST/stage2-tools/HOST |
复制 rustdoc | build/HOST/stage2/bin |
--stage=2
到此为止。
注意,x.py
使用的约定是:
- “stage N 产品”是由stage N编译器产生的制品。
- “stage (N+1)编译器”由“stage N 产品”组成。
- “--stage N”标志表示使用stage N构建。
简而言之,stage 0使用stage0编译器创建stage0产品,随后将其提升为stage1。
每次编译任何主要产品(std
和rustc
)时,都会执行两个步骤。
当std
由N级编译器编译时,该std
将链接到由N级编译器构建的程序(包括稍后构建的rustc
)。 stage (N+1)编译器还将使用它与自身链接。
如果有人认为stage (N+1)编译器“只是”我们正在使用阶段N编译器构建的另一个程序,那么这有点直观。
在某些方面,可以将rustc
(二进制文件,而不是rustbuild
步骤)视为少数no_core
二进制文件之一。
因此,“stage0 std制品”实际上是下载的stage0编译器的输出,并且将用于stage0编译器构建的任何内容:
例如 rustc
制品。 当它宣布正在“构建stage1 std制品”时,它已进入下一个自举阶段。 在以后的阶段中,这种模式仍在继续。
还要注意,根据stage的不同,构建主机std
和目标std
的情况有所不同(例如,在表格中看到stage2仅构建非主机std
目标。
这是因为在stage2期间,主机std
是从stage 1std
提升过来的——
特别地,当宣布“ Building stage 1 artifacts”时,它也随后被复制到stage2中(编译器的libdir
和sysroot
))。
对于编译器的任何有用的工作,这个std
都是非常必要的。
具体来说,它用作新编译的编译器所编译程序的std
(因此,当您编译fn main() {}
时,它将链接到使用x.py build --stage 1 src/libstd
编译的最后一个std
)。
由stage0编译器生成的rustc
链接到新构建的libstd
,这意味着在大多数情况下仅需要对std
进行cfg门控,以便rustc
可以立即使用添加到std的功能。
添加后,无需进入下载的Beta。由stage1/bin/rustc
编译器构建的libstd
,也称“stage1 std”构件,不一定与该编译器具有ABI兼容性。
也就是说,rustc
二进制文件很可能无法使用此std
本身。
然而,它与stage1/bin/rustc
二进制文件所构建的任何程序(包括其自身)都具有ABI兼容性,因此从某种意义上讲,它们是配对的。
这也是--keep-stage 1 src/libstd
起作用的地方。
由于对编译器的大多数更改实际上并未更改ABI,因此,一旦在阶段1中生成了libstd
,您就可以将其与其他编译器一起使用。
如果ABI没变,那就很好了,不需要花费时间重新编译std
。
--keep-stage
假设先前的编译没有问题,然后将这些制品复制到适当的位置,从而跳过cargo调用。
我们首先构建std
,然后构建rustc
的原因基本上是因为我们要最小化rustc
代码中的cfg(stage0)
。
当前rustc
总是与“新的”std
链接,因此它不必关心std的差异。它可以假定std尽可能新。
我们需要两次构建它的原因是因为ABI兼容性。 Beta编译器具有自己的ABI,而stage1/bin/rustc
编译器将使用新的ABI生成程序/库。
我们曾经要编译3次,但是由于我们假设ABI在代码库中是恒定的,
我们假定“stage2”编译器生成的库(由stage1/bin/rustc
编译器产生)与stage1/bin/rustc
编译器产生的库ABI兼容。
这意味着我们可以跳过最后一次编译 —— 只需使用stage2/bin/rustc
编译器自身所使用的库。
这个stage2/bin/rustc
编译器和stage 1 {std, rustc}
一起被交付给最终用户。
环境变量
在自举过程中,使用了很多编译器内部的环境变量。 如果您尝试运行rustc
的中间版本,有时可能需要手动设置其中一些环境变量。 否则,您将得到如下错误:
thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', src/libcore/result.rs:1165:5
如果./stageN/bin/rustc
给出了有关环境变量的错误,那通常意味着有些不对劲
——或者您正在尝试编译例如librustc
或libstd
或依赖于环境变量的东西。 在极少数情况下,您才会需要在这种情况下调用rustc
,
您可以通过在x.py
命令中添加以下标志来找到环境变量值:--on-fail=print-env
。
Build distribution artifacts
You might want to build and package up the compiler for distribution. You’ll want to run this command to do it:
./x.py dist
Install distribution artifacts
If you’ve built a distribution artifact you might want to install it and test that it works on your target system. You’ll want to run this command:
./x.py install
Note: If you are testing out a modification to a compiler, you might want to use it to compile some project. Usually, you do not want to use ./x.py install for testing. Rather, you should create a toolchain as discussed in here.
For example, if the toolchain you created is called foo, you
would then invoke it with rustc +foo ...
(where ... represents
the rest of the arguments).
Documenting rustc
You might want to build documentation of the various components available like the standard library. There’s two ways to go about this. You can run rustdoc directly on the file to make sure the HTML is correct, which is fast. Alternatively, you can build the documentation as part of the build process through x.py. Both are viable methods since documentation is more about the content.
Document everything
./x.py doc
If you want to avoid the whole Stage 2 build
./x.py doc --stage 1
First the compiler and rustdoc get built to make sure everything is okay and then it documents the files.
Document specific components
./x.py doc src/doc/book
./x.py doc src/doc/nomicon
./x.py doc src/doc/book src/libstd
Much like individual tests or building certain components you can build only the documentation you want.
Document internal rustc items
Compiler documentation is not built by default. To enable it, modify config.toml:
[build]
compiler-docs = true
Note that when enabled, documentation for internal compiler items will also be built.
Compiler Documentation
The documentation for the rust components are found at rustc doc.
ctags
One of the challenges with rustc is that the RLS can't handle it, since it's a
bootstrapping compiler. This makes code navigation difficult. One solution is to
use ctags
.
ctags
has a long history and several variants. Exuberant Ctags seems to be
quite commonly distributed but it does not have out-of-box Rust support. Some
distributions seem to use Universal Ctags, which is a maintained fork
and does have built-in Rust support.
The following script can be used to set up Exuberant Ctags: https://github.com/nikomatsakis/rust-etags.
ctags
integrates into emacs and vim quite easily. The following can then be
used to build and generate tags:
$ rust-ctags src/lib* && ./x.py build <something>
This allows you to do "jump-to-def" with whatever functions were around when you last built, which is ridiculously useful.
The compiler testing framework
The Rust project runs a wide variety of different tests, orchestrated by
the build system (x.py test
). The main test harness for testing the
compiler itself is a tool called compiletest (located in the
src/tools/compiletest
directory). This section gives a brief
overview of how the testing framework is setup, and then gets into some
of the details on how to run tests as well as how to
add new tests.
Compiletest test suites
The compiletest tests are located in the tree in the src/test
directory. Immediately within you will see a series of subdirectories
(e.g. ui
, run-make
, and so forth). Each of those directories is
called a test suite – they house a group of tests that are run in
a distinct mode.
Here is a brief summary of the test suites and what they mean. In some cases, the test suites are linked to parts of the manual that give more details.
ui
– tests that check the exact stdout/stderr from compilation and/or running the testrun-pass-valgrind
– tests that ought to run with valgrindrun-fail
– tests that are expected to compile but then panic during executioncompile-fail
– tests that are expected to fail compilation.parse-fail
– tests that are expected to fail to parsepretty
– tests targeting the Rust "pretty printer", which generates valid Rust code from the ASTdebuginfo
– tests that run in gdb or lldb and query the debug infocodegen
– tests that compile and then test the generated LLVM code to make sure that the optimizations we want are taking effect. See LLVM docs for how to write such tests.assembly
– similar tocodegen
tests, but verifies assembly output to make sure LLVM target backend can handle provided code.mir-opt
– tests that check parts of the generated MIR to make sure we are building things correctly or doing the optimizations we expect.incremental
– tests for incremental compilation, checking that when certain modifications are performed, we are able to reuse the results from previous compilations.run-make
– tests that basically just execute aMakefile
; the ultimate in flexibility but quite annoying to write.rustdoc
– tests for rustdoc, making sure that the generated files contain the expected documentation.*-fulldeps
– same as above, but indicates that the test depends on things other thanlibstd
(and hence those things must be built)
Other Tests
The Rust build system handles running tests for various other things, including:
-
Tidy – This is a custom tool used for validating source code style and formatting conventions, such as rejecting long lines. There is more information in the section on coding conventions.
Example:
./x.py test tidy
-
Formatting – Rustfmt is integrated with the build system to enforce uniform style across the compiler. In the CI, we check that the formatting is correct. The formatting check is also automatically run by the Tidy tool mentioned above.
Example:
./x.py fmt --check
checks formatting an exits with an error if formatting is needed.Example:
./x.py fmt
runs rustfmt on the codebase.Example:
./x.py test tidy --bless
does formatting before doing other tidy checks. -
Unit tests – The Rust standard library and many of the Rust packages include typical Rust
#[test]
unittests. Under the hood,x.py
will runcargo test
on each package to run all the tests.Example:
./x.py test src/libstd
-
Doc tests – Example code embedded within Rust documentation is executed via
rustdoc --test
. Examples:./x.py test src/doc
– Runsrustdoc --test
for all documentation insrc/doc
../x.py test --doc src/libstd
– Runsrustdoc --test
on the standard library. -
Link checker – A small tool for verifying
href
links within documentation.Example:
./x.py test src/tools/linkchecker
-
Dist check – This verifies that the source distribution tarball created by the build system will unpack, build, and run all tests.
Example:
./x.py test distcheck
-
Tool tests – Packages that are included with Rust have all of their tests run as well (typically by running
cargo test
within their directory). This includes things such as cargo, clippy, rustfmt, rls, miri, bootstrap (testing the Rust build system itself), etc. -
Cargo test – This is a small tool which runs
cargo test
on a few significant projects (such asservo
,ripgrep
,tokei
, etc.) just to ensure there aren't any significant regressions.Example:
./x.py test src/tools/cargotest
Testing infrastructure
When a Pull Request is opened on Github, Azure Pipelines will automatically
launch a build that will run all tests on some configurations
(x86_64-gnu-llvm-6.0 linux. x86_64-gnu-tools linux, mingw-check linux). In
essence, it runs ./x.py test
after building for each of them.
The integration bot bors is used for coordinating merges to the master branch. When a PR is approved, it goes into a queue where merges are tested one at a time on a wide set of platforms using Azure Pipelines (currently over 50 different configurations). Most platforms only run the build steps, some run a restricted set of tests, only a subset run the full suite of tests (see Rust's platform tiers).
Testing with Docker images
The Rust tree includes Docker image definitions for the platforms used on Azure Pipelines in src/ci/docker. The script src/ci/docker/run.sh is used to build the Docker image, run it, build Rust within the image, and run the tests.
TODO: What is a typical workflow for testing/debugging on a platform that you don't have easy access to? Do people build Docker images and enter them to test things out?
Testing on emulators
Some platforms are tested via an emulator for architectures that aren't
readily available. There is a set of tools for orchestrating running the
tests within the emulator. Platforms such as arm-android
and
arm-unknown-linux-gnueabihf
are set up to automatically run the tests under
emulation on Travis. The following will take a look at how a target's tests
are run under emulation.
The Docker image for armhf-gnu includes QEMU to emulate the ARM CPU
architecture. Included in the Rust tree are the tools remote-test-client
and remote-test-server which are programs for sending test programs and
libraries to the emulator, and running the tests within the emulator, and
reading the results. The Docker image is set up to launch
remote-test-server
and the build tools use remote-test-client
to
communicate with the server to coordinate running tests (see
src/bootstrap/test.rs).
TODO: What are the steps for manually running tests within an emulator?
./src/ci/docker/run.sh armhf-gnu
will do everything, but takes hours to run and doesn't offer much help with interacting within the emulator.Is there any support for emulating other (non-Android) platforms, such as running on an iOS emulator?
Is there anything else interesting that can be said here about running tests remotely on real hardware?
It's also unclear to me how the wasm or asm.js tests are run.
Crater
Crater is a tool for compiling and running tests for every crate on crates.io (and a few on GitHub). It is mainly used for checking for extent of breakage when implementing potentially breaking changes and ensuring lack of breakage by running beta vs stable compiler versions.
When to run Crater
You should request a crater run if your PR makes large changes to the compiler or could cause breakage. If you are unsure, feel free to ask your PR's reviewer.
Requesting Crater Runs
The rust team maintains a few machines that can be used for running crater runs on the changes introduced by a PR. If your PR needs a crater run, leave a comment for the triage team in the PR thread. Please inform the team whether you require a "check-only" crater run, a "build only" crater run, or a "build-and-test" crater run. The difference is primarily in time; the conservative (if you're not sure) option is to go for the build-and-test run. If making changes that will only have an effect at compile-time (e.g., implementing a new trait) then you only need a check run.
Your PR will be enqueued by the triage team and the results will be posted when they are ready. Check runs will take around ~3-4 days, with the other two taking 5-6 days on average.
While crater is really useful, it is also important to be aware of a few caveats:
-
Not all code is on crates.io! There is a lot of code in repos on GitHub and elsewhere. Also, companies may not wish to publish their code. Thus, a successful crater run is not a magically green light that there will be no breakage; you still need to be careful.
-
Crater only runs Linux builds on x86_64. Thus, other architectures and platforms are not tested. Critically, this includes Windows.
-
Many crates are not tested. This could be for a lot of reasons, including that the crate doesn't compile any more (e.g. used old nightly features), has broken or flaky tests, requires network access, or other reasons.
-
Before crater can be run,
@bors try
needs to succeed in building artifacts. This means that if your code doesn't compile, you cannot run crater.
Perf runs
A lot of work is put into improving the performance of the compiler and preventing performance regressions. A "perf run" is used to compare the performance of the compiler in different configurations for a large collection of popular crates. Different configurations include "fresh builds", builds with incremental compilation, etc.
The result of a perf run is a comparison between two versions of the compiler (by their commit hashes).
You should request a perf run if your PR may affect performance, especially if it can affect performance adversely.
Further reading
The following blog posts may also be of interest:
- brson's classic "How Rust is tested"
Running tests
You can run the tests using x.py
. The most basic command – which
you will almost never want to use! – is as follows:
./x.py test
This will build the full stage 2 compiler and then run the whole test suite. You probably don't want to do this very often, because it takes a very long time, and anyway bors / travis will do it for you. (Often, I will run this command in the background after opening a PR that I think is done, but rarely otherwise. -nmatsakis)
The test results are cached and previously successful tests are
ignored
during testing. The stdout/stderr contents as well as a
timestamp file for every test can be found under build/ARCH/test/
.
To force-rerun a test (e.g. in case the test runner fails to notice
a change) you can simply remove the timestamp file.
Note that some tests require a Python-enabled gdb. You can test if
your gdb install supports Python by using the python
command from
within gdb. Once invoked you can type some Python code (e.g.
print("hi")
) followed by return and then CTRL+D
to execute it.
If you are building gdb from source, you will need to configure with
--with-python=<path-to-python-binary>
.
Running a subset of the test suites
When working on a specific PR, you will usually want to run a smaller set of tests, and with a stage 1 build. For example, a good "smoke test" that can be used after modifying rustc to see if things are generally working correctly would be the following:
./x.py test --stage 1 src/test/{ui,compile-fail}
This will run the ui
and compile-fail
test suites,
and only with the stage 1 build. Of course, the choice of test suites
is somewhat arbitrary, and may not suit the task you are doing. For
example, if you are hacking on debuginfo, you may be better off with
the debuginfo test suite:
./x.py test --stage 1 src/test/debuginfo
If you only need to test a specific subdirectory of tests for any
given test suite, you can pass that directory to x.py test
:
./x.py test --stage 1 src/test/ui/const-generics
Likewise, you can test a single file by passing its path:
./x.py test --stage 1 src/test/ui/const-generics/const-test.rs
Run only the tidy script
./x.py test tidy
Run tests on the standard library
./x.py test src/libstd
Run the tidy script and tests on the standard library
./x.py test tidy src/libstd
Run tests on the standard library using a stage 1 compiler
> ./x.py test src/libstd --stage 1
By listing which test suites you want to run you avoid having to run tests for components you did not change at all.
Warning: Note that bors only runs the tests with the full stage 2 build; therefore, while the tests usually work fine with stage 1, there are some limitations.
Running an individual test
Another common thing that people want to do is to run an individual
test, often the test they are trying to fix. As mentioned earlier,
you may pass the full file path to achieve this, or alternatively one
may invoke x.py
with the --test-args
option:
./x.py test --stage 1 src/test/ui --test-args issue-1234
Under the hood, the test runner invokes the standard rust test runner
(the same one you get with #[test]
), so this command would wind up
filtering for tests that include "issue-1234" in the name. (Thus
--test-args
is a good way to run a collection of related tests.)
Editing and updating the reference files
If you have changed the compiler's output intentionally, or you are
making a new test, you can pass --bless
to the test subcommand. E.g.
if some tests in src/test/ui
are failing, you can run
./x.py test --stage 1 src/test/ui --bless
to automatically adjust the .stderr
, .stdout
or .fixed
files of
all tests. Of course you can also target just specific tests with the
--test-args your_test_name
flag, just like when running the tests.
Passing --pass $mode
Pass UI tests now have three modes, check-pass
, build-pass
and
run-pass
. When --pass $mode
is passed, these tests will be forced
to run under the given $mode
unless the directive // ignore-pass
exists in the test file. For example, you can run all the tests in
src/test/ui
as check-pass
:
./x.py test --stage 1 src/test/ui --pass check
By passing --pass $mode
, you can reduce the testing time. For each
mode, please see here.
Using incremental compilation
You can further enable the --incremental
flag to save additional
time in subsequent rebuilds:
./x.py test --stage 1 src/test/ui --incremental --test-args issue-1234
If you don't want to include the flag with every command, you can
enable it in the config.toml
, too:
# Whether to always use incremental compilation when building rustc
incremental = true
Note that incremental compilation will use more disk space than usual.
If disk space is a concern for you, you might want to check the size
of the build
directory from time to time.
Running tests manually
Sometimes it's easier and faster to just run the test by hand. Most tests are
just rs
files, so you can do something like
rustc +stage1 src/test/ui/issue-1234.rs
This is much faster, but doesn't always work. For example, some tests include directives that specify specific compiler flags, or which rely on other crates, and they may not run the same without those options.
Adding new tests
In general, we expect every PR that fixes a bug in rustc to come accompanied by a regression test of some kind. This test should fail in master but pass after the PR. These tests are really useful for preventing us from repeating the mistakes of the past.
To add a new test, the first thing you generally do is to create a file, typically a Rust source file. Test files have a particular structure:
- They should have some kind of comment explaining what the test is about;
- next, they can have one or more header commands, which are special comments that the test interpreter knows how to interpret.
- finally, they have the Rust source. This may have various error annotations which indicate expected compilation errors or warnings.
Depending on the test suite, there may be some other details to be aware of:
- For the
ui
test suite, you need to generate reference output files.
What kind of test should I add?
It can be difficult to know what kind of test to use. Here are some rough heuristics:
- Some tests have specialized needs:
- need to run gdb or lldb? use the
debuginfo
test suite - need to inspect LLVM IR or MIR IR? use the
codegen
ormir-opt
test suites - need to run rustdoc? Prefer a
rustdoc
test - need to inspect the resulting binary in some way? Then use
run-make
- need to run gdb or lldb? use the
- For most other things, a
ui
(orui-fulldeps
) test is to be preferred:ui
tests subsume both run-pass, compile-fail, and parse-fail tests- in the case of warnings or errors,
ui
tests capture the full output, which makes it easier to review but also helps prevent "hidden" regressions in the output
Naming your test
We have not traditionally had a lot of structure in the names of
tests. Moreover, for a long time, the rustc test runner did not
support subdirectories (it now does), so test suites like
src/test/ui
have a huge mess of files in them. This is not
considered an ideal setup.
For regression tests – basically, some random snippet of code that
came in from the internet – we often name the test after the issue
plus a short description. Ideally, the test should be added to a
directory that helps identify what piece of code is being tested here
(e.g., src/test/ui/borrowck/issue-54597-reject-move-out-of-borrow-via-pat.rs
)
If you've tried and cannot find a more relevant place,
the test may be added to src/test/ui/issues/
.
Still, do include the issue number somewhere.
When writing a new feature, create a subdirectory to store your
tests. For example, if you are implementing RFC 1234 ("Widgets"),
then it might make sense to put the tests in a directory like
src/test/ui/rfc1234-widgets/
.
In other cases, there may already be a suitable directory. (The proper directory structure to use is actually an area of active debate.)
Comment explaining what the test is about
When you create a test file, include a comment summarizing the point of the test at the start of the file. This should highlight which parts of the test are more important, and what the bug was that the test is fixing. Citing an issue number is often very helpful.
This comment doesn't have to be super extensive. Just something like "Regression test for #18060: match arms were matching in the wrong order." might already be enough.
These comments are very useful to others later on when your test breaks, since they often can highlight what the problem is. They are also useful if for some reason the tests need to be refactored, since they let others know which parts of the test were important (often a test must be rewritten because it no longer tests what is was meant to test, and then it's useful to know what it was meant to test exactly).
Header commands: configuring rustc
Header commands are special comments that the test runner knows how to
interpret. They must appear before the Rust source in the test. They
are normally put after the short comment that explains the point of
this test. For example, this test uses the // compile-flags
command
to specify a custom flag to give to rustc when the test is compiled:
// Test the behavior of `0 - 1` when overflow checks are disabled.
// compile-flags: -Coverflow-checks=off
fn main() {
let x = 0 - 1;
...
}
Ignoring tests
These are used to ignore the test in some situations, which means the test won't be compiled or run.
ignore-X
whereX
is a target detail or stage will ignore the test accordingly (see below)only-X
is likeignore-X
, but will only run the test on that target or stageignore-pretty
will not compile the pretty-printed test (this is done to test the pretty-printer, but might not always work)ignore-test
always ignores the testignore-lldb
andignore-gdb
will skip a debuginfo test on that debugger.ignore-gdb-version
can be used to ignore the test when certain gdb versions are used
Some examples of X
in ignore-X
:
- Architecture:
aarch64
,arm
,asmjs
,mips
,wasm32
,x86_64
,x86
, ... - OS:
android
,emscripten
,freebsd
,ios
,linux
,macos
,windows
, ... - Environment (fourth word of the target triple):
gnu
,msvc
,musl
. - Pointer width:
32bit
,64bit
. - Stage:
stage0
,stage1
,stage2
.
Other Header Commands
Here is a list of other header commands. This list is not
exhaustive. Header commands can generally be found by browsing the
TestProps
structure found in header.rs
from the compiletest
source.
run-rustfix
for UI tests, indicates that the test produces structured suggestions. The test writer should create a.fixed
file, which contains the source with the suggestions applied. When the test is run, compiletest first checks that the correct lint/warning is generated. Then, it applies the suggestion and compares against.fixed
(they must match). Finally, the fixed source is compiled, and this compilation is required to succeed. The.fixed
file can also be generated automatically with the--bless
option, described in this section.min-gdb-version
specifies the minimum gdb version required for this test; see alsoignore-gdb-version
min-lldb-version
specifies the minimum lldb version required for this testrust-lldb
causes the lldb part of the test to only be run if the lldb in use contains the Rust pluginno-system-llvm
causes the test to be ignored if the system llvm is usedmin-llvm-version
specifies the minimum llvm version required for this testmin-system-llvm-version
specifies the minimum system llvm version required for this test; the test is ignored if the system llvm is in use and it doesn't meet the minimum version. This is useful when an llvm feature has been backported to rust-llvmignore-llvm-version
can be used to skip the test when certain LLVM versions are used. This takes one or two arguments; the first argument is the first version to ignore. If no second argument is given, all subsequent versions are ignored; otherwise, the second argument is the last version to ignore.build-pass
for UI tests, indicates that the test is supposed to successfully compile and link, as opposed to the default where the test is supposed to error out.compile-flags
passes extra command-line args to the compiler, e.g.compile-flags -g
which forces debuginfo to be enabled.should-fail
indicates that the test should fail; used for "meta testing", where we test the compiletest program itself to check that it will generate errors in appropriate scenarios. This header is ignored for pretty-printer tests.gate-test-X
whereX
is a feature marks the test as "gate test" for feature X. Such tests are supposed to ensure that the compiler errors when usage of a gated feature is attempted without the proper#![feature(X)]
tag. Each unstable lang feature is required to have a gate test.
Error annotations
Error annotations specify the errors that the compiler is expected to emit. They are "attached" to the line in source where the error is located.
~
: Associates the following error level and message with the current line~|
: Associates the following error level and message with the same line as the previous comment~^
: Associates the following error level and message with the previous line. Each caret (^
) that you add adds a line to this, so~^^^^^^^
is seven lines up.
The error levels that you can have are:
ERROR
WARNING
NOTE
HELP
andSUGGESTION
*
* Note: SUGGESTION
must follow immediately after HELP
.
Revisions
Certain classes of tests support "revisions" (as of the time of this writing, this includes compile-fail, run-fail, and incremental, though incremental tests are somewhat different). Revisions allow a single test file to be used for multiple tests. This is done by adding a special header at the top of the file:
#![allow(unused_variables)] fn main() { // revisions: foo bar baz }
This will result in the test being compiled (and tested) three times,
once with --cfg foo
, once with --cfg bar
, and once with --cfg baz
. You can therefore use #[cfg(foo)]
etc within the test to tweak
each of these results.
You can also customize headers and expected error messages to a particular
revision. To do this, add [foo]
(or bar
, baz
, etc) after the //
comment, like so:
#![allow(unused_variables)] fn main() { // A flag to pass in only for cfg `foo`: //[foo]compile-flags: -Z verbose #[cfg(foo)] fn test_foo() { let x: usize = 32_u32; //[foo]~ ERROR mismatched types } }
Note that not all headers have meaning when customized to a revision.
For example, the ignore-test
header (and all "ignore" headers)
currently only apply to the test as a whole, not to particular
revisions. The only headers that are intended to really work when
customized to a revision are error patterns and compiler flags.
Guide to the UI tests
The UI tests are intended to capture the compiler's complete output,
so that we can test all aspects of the presentation. They work by
compiling a file (e.g., ui/hello_world/main.rs
),
capturing the output, and then applying some normalization (see
below). This normalized result is then compared against reference
files named ui/hello_world/main.stderr
and
ui/hello_world/main.stdout
. If either of those files doesn't exist,
the output must be empty (that is actually the case for
this particular test). If the test run fails, we will print out
the current output, but it is also saved in
build/<target-triple>/test/ui/hello_world/main.stdout
(this path is
printed as part of the test failure message), so you can run diff
and so forth.
Tests that do not result in compile errors
By default, a UI test is expected not to compile (in which case,
it should contain at least one //~ ERROR
annotation). However, you
can also make UI tests where compilation is expected to succeed, and
you can even run the resulting program. Just add one of the following
header commands:
// check-pass
- compilation should succeed but skip codegen (which is expensive and isn't supposed to fail in most cases)// build-pass
– compilation and linking should succeed but do not run the resulting binary// run-pass
– compilation should succeed and we should run the resulting binary
Normalization
The normalization applied is aimed at eliminating output difference between platforms, mainly about filenames:
- the test directory is replaced with
$DIR
- all backslashes (
\
) are converted to forward slashes (/
) (for Windows) - all CR LF newlines are converted to LF
Sometimes these built-in normalizations are not enough. In such cases, you may provide custom normalization rules using the header commands, e.g.
#![allow(unused_variables)] fn main() { // normalize-stdout-test: "foo" -> "bar" // normalize-stderr-32bit: "fn\(\) \(32 bits\)" -> "fn\(\) \($$PTR bits\)" // normalize-stderr-64bit: "fn\(\) \(64 bits\)" -> "fn\(\) \($$PTR bits\)" }
This tells the test, on 32-bit platforms, whenever the compiler writes
fn() (32 bits)
to stderr, it should be normalized to read fn() ($PTR bits)
instead. Similar for 64-bit. The replacement is performed by regexes using
default regex flavor provided by regex
crate.
The corresponding reference file will use the normalized output to test both 32-bit and 64-bit platforms:
...
|
= note: source type: fn() ($PTR bits)
= note: target type: u16 (16 bits)
...
Please see ui/transmute/main.rs
and main.stderr
for a
concrete usage example.
Besides normalize-stderr-32bit
and -64bit
, one may use any target
information or stage supported by ignore-X
here as well (e.g.
normalize-stderr-windows
or simply normalize-stderr-test
for unconditional
replacement).
compiletest
Introduction
compiletest
is the main test harness of the Rust test suite. It allows
test authors to organize large numbers of tests (the Rust compiler has many
thousands), efficient test execution (parallel execution is supported), and
allows the test author to configure behavior and expected results of both
individual and groups of tests.
compiletest
tests may check test code for success, for failure or in some
cases, even failure to compile. Tests are typically organized as a Rust source
file with annotations in comments before and/or within the test code, which
serve to direct compiletest
on if or how to run the test, what behavior to
expect, and more. If you are unfamiliar with the compiler testing framework,
see this chapter for additional background.
The tests themselves are typically (but not always) organized into
"suites" – for example, run-fail
,
a folder holding tests that should compile successfully,
but return a failure (non-zero status), compile-fail
, a folder holding tests
that should fail to compile, and many more. The various suites are defined in
src/tools/compiletest/src/common.rs in the pub enum Mode
declaration. And a very good introduction to the different suites of compiler
tests along with details about them can be found in Adding new
tests.
Adding a new test file
Briefly, simply create your new test in the appropriate location under
src/test. No registration of test files is necessary as compiletest
will scan the src/test subfolder recursively, and will execute any Rust
source files it finds as tests. See Adding new tests
for a complete guide on how to adding new tests.
Header Commands
Source file annotations which appear in comments near the top of the source
file before any test code are known as header commands. These commands can
instruct compiletest
to ignore this test, set expectations on whether it is
expected to succeed at compiling, or what the test's return code is expected to
be. Header commands (and their inline counterparts, Error Info commands) are
described more fully
here.
Adding a new header command
Header commands are defined in the TestProps
struct in
src/tools/compiletest/src/header.rs. At a high level, there are
dozens of test properties defined here, all set to default values in the
TestProp
struct's impl
block. Any test can override this default value by
specifying the property in question as header command as a comment (//
) in
the test source file, before any source code.
Using a header command
Here is an example, specifying the must-compile-successfully
header command,
which takes no arguments, followed by the failure-status
header command,
which takes a single argument (which, in this case is a value of 1).
failure-status
is instructing compiletest
to expect a failure status of 1
(rather than the current Rust default of 101). The header command and
the argument list (if present) are typically separated by a colon:
// must-compile-successfully
// failure-status: 1
#![feature(termination_trait)]
use std::io::{Error, ErrorKind};
fn main() -> Result<(), Box<Error>> {
Err(Box::new(Error::new(ErrorKind::Other, "returned Box<Error> from main()")))
}
Adding a new header command property
One would add a new header command if there is a need to define some test property or behavior on an individual, test-by-test basis. A header command property serves as the header command's backing store (holds the command's current value) at runtime.
To add a new header command property:
1. Look for the pub struct TestProps
declaration in
src/tools/compiletest/src/header.rs and add the new public
property to the end of the declaration.
2. Look for the impl TestProps
implementation block immediately following
the struct declaration and initialize the new property to its default
value.
Adding a new header command parser
When compiletest
encounters a test file, it parses the file a line at a time
by calling every parser defined in the Config
struct's implementation block,
also in src/tools/compiletest/src/header.rs (note the Config
struct's declaration block is found in
src/tools/compiletest/src/common.rs. TestProps
's load_from()
method will try passing the current line of text to each parser, which, in turn
typically checks to see if the line begins with a particular commented (//
)
header command such as // must-compile-successfully
or // failure-status
.
Whitespace after the comment marker is optional.
Parsers will override a given header command property's default value merely by being specified in the test file as a header command or by having a parameter value specified in the test file, depending on the header command.
Parsers defined in impl Config
are typically named parse_<header_command>
(note kebab-case <header-command>
transformed to snake-case
<header_command>
). impl Config
also defines several 'low-level' parsers
which make it simple to parse common patterns like simple presence or not
(parse_name_directive()
), header-command:parameter(s)
(parse_name_value_directive()
), optional parsing only if a particular cfg
attribute is defined (has_cfg_prefix()
) and many more. The low-level parsers
are found near the end of the impl Config
block; be sure to look through them
and their associated parsers immediately above to see how they are used to
avoid writing additional parsing code unnecessarily.
As a concrete example, here is the implementation for the
parse_failure_status()
parser, in
src/tools/compiletest/src/header.rs:
@@ -232,6 +232,7 @@ pub struct TestProps {
// customized normalization rules
pub normalize_stdout: Vec<(String, String)>,
pub normalize_stderr: Vec<(String, String)>,
+ pub failure_status: i32,
}
impl TestProps {
@@ -260,6 +261,7 @@ impl TestProps {
run_pass: false,
normalize_stdout: vec![],
normalize_stderr: vec![],
+ failure_status: 101,
}
}
@@ -383,6 +385,10 @@ impl TestProps {
if let Some(rule) = config.parse_custom_normalization(ln, "normalize-stderr") {
self.normalize_stderr.push(rule);
}
+
+ if let Some(code) = config.parse_failure_status(ln) {
+ self.failure_status = code;
+ }
});
for key in &["RUST_TEST_NOCAPTURE", "RUST_TEST_THREADS"] {
@@ -488,6 +494,13 @@ impl Config {
self.parse_name_directive(line, "pretty-compare-only")
}
+ fn parse_failure_status(&self, line: &str) -> Option<i32> {
+ match self.parse_name_value_directive(line, "failure-status") {
+ Some(code) => code.trim().parse::<i32>().ok(),
+ _ => None,
+ }
+ }
Implementing the behavior change
When a test invokes a particular header command, it is expected that some
behavior will change as a result. What behavior, obviously, will depend on the
purpose of the header command. In the case of failure-status
, the behavior
that changes is that compiletest
expects the failure code defined by the
header command invoked in the test, rather than the default value.
Although specific to failure-status
(as every header command will have a
different implementation in order to invoke behavior change) perhaps it is
helpful to see the behavior change implementation of one case, simply as an
example. To implement failure-status
, the check_correct_failure_status()
function found in the TestCx
implementation block, located in
src/tools/compiletest/src/runtest.rs,
was modified as per below:
@@ -295,11 +295,14 @@ impl<'test> TestCx<'test> {
}
fn check_correct_failure_status(&self, proc_res: &ProcRes) {
- // The value the rust runtime returns on failure
- const RUST_ERR: i32 = 101;
- if proc_res.status.code() != Some(RUST_ERR) {
+ let expected_status = Some(self.props.failure_status);
+ let received_status = proc_res.status.code();
+
+ if expected_status != received_status {
self.fatal_proc_rec(
- &format!("failure produced the wrong error: {}", proc_res.status),
+ &format!("Error: expected failure status ({:?}) but received status {:?}.",
+ expected_status,
+ received_status),
proc_res,
);
}
@@ -320,7 +323,6 @@ impl<'test> TestCx<'test> {
);
let proc_res = self.exec_compiled_test();
-
if !proc_res.status.success() {
self.fatal_proc_rec("test run failed!", &proc_res);
}
@@ -499,7 +501,6 @@ impl<'test> TestCx<'test> {
expected,
actual
);
- panic!();
}
}
Note the use of self.props.failure_status
to access the header command
property. In tests which do not specify the failure status header command,
self.props.failure_status
will evaluate to the default value of 101 at the
time of this writing. But for a test which specifies a header command of, for
example, // failure-status: 1
, self.props.failure_status
will evaluate to
1, as parse_failure_status()
will have overridden the TestProps
default
value, for that test specifically.
Walkthrough: a typical contribution
There are a lot of ways to contribute to the rust compiler, including fixing bugs, improving performance, helping design features, providing feedback on existing features, etc. This chapter does not claim to scratch the surface. Instead, it walks through the design and implementation of a new feature. Not all of the steps and processes described here are needed for every contribution, and I will try to point those out as they arise.
In general, if you are interested in making a contribution and aren't sure where to start, please feel free to ask!
Overview
The feature I will discuss in this chapter is the ?
Kleene operator for
macros. Basically, we want to be able to write something like this:
macro_rules! foo {
($arg:ident $(, $optional_arg:ident)?) => {
println!("{}", $arg);
$(
println!("{}", $optional_arg);
)?
}
}
fn main() {
let x = 0;
foo!(x); // ok! prints "0"
foo!(x, x); // ok! prints "0 0"
}
So basically, the $(pat)?
matcher in the macro means "this pattern can occur
0 or 1 times", similar to other regex syntaxes.
There were a number of steps to go from an idea to stable rust feature. Here is a quick list. We will go through each of these in order below. As I mentioned before, not all of these are needed for every type of contribution.
- Idea discussion/Pre-RFC A Pre-RFC is an early draft or design discussion of a feature. This stage is intended to flesh out the design space a bit and get a grasp on the different merits and problems with an idea. It's a great way to get early feedback on your idea before presenting it the wider audience. You can find the original discussion here.
- RFC This is when you formally present your idea to the community for consideration. You can find the RFC here.
- Implementation Implement your idea unstably in the compiler. You can find the original implementation here.
- Possibly iterate/refine As the community gets experience with your
feature on the nightly compiler and in
libstd
, there may be additional feedback about design choice that might be adjusted. This particular feature went through a number of iterations. - Stabilization When your feature has baked enough, a rust team member may propose to stabilize it. If there is consensus, this is done.
- Relax Your feature is now a stable rust feature!
Pre-RFC and RFC
NOTE: In general, if you are not proposing a new feature or substantial change to rust or the ecosystem, you don't need to follow the RFC process. Instead, you can just jump to implementation.
You can find the official guidelines for when to open an RFC here.
An RFC is a document that describes the feature or change you are proposing in detail. Anyone can write an RFC; the process is the same for everyone, including rust team members.
To open an RFC, open a PR on the rust-lang/rfcs repo on GitHub. You can find detailed instructions in the README.
Before opening an RFC, you should do the research to "flesh out" your idea. Hastily-proposed RFCs tend not to be accepted. You should generally have a good description of the motivation, impact, disadvantages, and potential interactions with other features.
If that sounds like a lot of work, it's because it is. But no fear! Even if you're not a compiler hacker, you can get great feedback by doing a pre-RFC. This is an informal discussion of the idea. The best place to do this is internals.rust-lang.org. Your post doesn't have to follow any particular structure. It doesn't even need to be a cohesive idea. Generally, you will get tons of feedback that you can integrate back to produce a good RFC.
(Another pro-tip: try searching the RFCs repo and internals for prior related ideas. A lot of times an idea has already been considered and was either rejected or postponed to be tried again later. This can save you and everybody else some time)
In the case of our example, a participant in the pre-RFC thread pointed out a syntax ambiguity and a potential resolution. Also, the overall feedback seemed positive. In this case, the discussion converged pretty quickly, but for some ideas, a lot more discussion can happen (e.g. see this RFC which received a whopping 684 comments!). If that happens, don't be discouraged; it means the community is interested in your idea, but it perhaps needs some adjustments.
The RFC for our ?
macro feature did receive some discussion on the RFC thread
too. As with most RFCs, there were a few questions that we couldn't answer by
discussion: we needed experience using the feature to decide. Such questions
are listed in the "Unresolved Questions" section of the RFC. Also, over the
course of the RFC discussion, you will probably want to update the RFC document
itself to reflect the course of the discussion (e.g. new alternatives or prior
work may be added or you may decide to change parts of the proposal itself).
In the end, when the discussion seems to reach a consensus and die down a bit, a rust team member may propose to move to "final comment period" (FCP) with one of three possible dispositions. This means that they want the other members of the appropriate teams to review and comment on the RFC. More discussion may ensue, which may result in more changes or unresolved questions being added. At some point, when everyone is satisfied, the RFC enters the FCP, which is the last chance for people to bring up objections. When the FCP is over, the disposition is adopted. Here are the three possible dispositions:
- Merge: accept the feature. Here is the proposal to merge for our
?
macro feature. - Close: this feature in its current form is not a good fit for rust. Don't be discouraged if this happens to your RFC, and don't take it personally. This is not a reflection on you, but rather a community decision that rust will go a different direction.
- Postpone: there is interest in going this direction but not at the moment. This happens most often because the appropriate rust team doesn't have the bandwidth to shepherd the feature through the process to stabilization. Often this is the case when the feature doesn't fit into the team's roadmap. Postponed ideas may be revisited later.
When an RFC is merged, the PR is merged into the RFCs repo. A new tracking
issue is created in the rust-lang/rust repo to track progress on the feature
and discuss unresolved questions, implementation progress and blockers, etc.
Here is the tracking issue on for our ?
macro feature.
Implementation
To make a change to the compiler, open a PR against the rust-lang/rust repo.
Depending on the feature/change/bug fix/improvement, implementation may be relatively-straightforward or it may be a major undertaking. You can always ask for help or mentorship from more experienced compiler devs. Also, you don't have to be the one to implement your feature; but keep in mind that if you don't it might be a while before someone else does.
For the ?
macro feature, I needed to go understand the relevant parts of
macro expansion in the compiler. Personally, I find that improving the
comments in the code is a helpful way of making sure I understand
it, but you don't have to do that if you don't want to.
I then implemented the original feature, as described in the RFC. When
a new feature is implemented, it goes behind a feature gate, which means that
you have to use #![feature(my_feature_name)]
to use the feature. The feature
gate is removed when the feature is stabilized.
Most bug fixes and improvements don't require a feature gate. You can just make your changes/improvements.
When you open a PR on the rust-lang/rust, a bot will assign your PR to a
review. If there is a particular rust team member you are working with, you can
request that reviewer by leaving a comment on the thread with r? @reviewer-github-id
(e.g. r? @eddyb
). If you don't know who to request,
don't request anyone; the bot will assign someone automatically.
The reviewer may request changes before they approve your PR. Feel free to ask questions or discuss things you don't understand or disagree with. However, recognize that the PR won't be merged unless someone on the rust team approves it.
When your review approves the PR, it will go into a queue for yet another bot
called @bors
. @bors
manages the CI build/merge queue. When your PR reaches
the head of the @bors
queue, @bors
will test out the merge by running all
tests against your PR on Travis CI. This takes a lot of time to
finish. If all tests pass, the PR is merged and becomes part of the next
nightly compiler!
There are a couple of things that may happen for some PRs during the review process
- If the change is substantial enough, the reviewer may request an FCP on the PR. This gives all members of the appropriate team a chance to review the changes.
- If the change may cause breakage, the reviewer may request a crater run. This compiles the compiler with your changes and then attempts to compile all crates on crates.io with your modified compiler. This is a great smoke test to check if you introduced a change to compiler behavior that affects a large portion of the ecosystem.
- If the diff of your PR is large or the reviewer is busy, your PR may have some merge conflicts with other PRs that happen to get merged first. You should fix these merge conflicts using the normal git procedures.
If you are not doing a new feature or something like that (e.g. if you are fixing a bug), then that's it! Thanks for your contribution :)
Refining your implementation
As people get experience with your new feature on nightly, slight changes may
be proposed and unresolved questions may become resolved. Updates/changes go
through the same process for implementing any other changes, as described
above (i.e. submit a PR, go through review, wait for @bors
, etc).
Some changes may be major enough to require an FCP and some review by rust team members.
For the ?
macro feature, we went through a few different iterations after the
original implementation: 1, 2, 3.
Along the way, we decided that ?
should not take a separator, which was
previously an unresolved question listed in the RFC. We also changed the
disambiguation strategy: we decided to remove the ability to use ?
as a
separator token for other repetition operators (e.g. +
or *
). However,
since this was a breaking change, we decided to do it over an edition boundary.
Thus, the new feature can be enabled only in edition 2018. These deviations
from the original RFC required another
FCP.
Stabilization
Finally, after the feature had baked for a while on nightly, a language team member moved to stabilize it.
A stabilization report needs to be written that includes
- brief description of the behavior and any deviations from the RFC
- which edition(s) are affected and how
- links to a few tests to show the interesting aspects
The stabilization report for our feature is here.
After this, a PR is made to remove the feature gate, enabling the feature by default (on the 2018 edition). A note is added to the Release notes about the feature.
Steps to stabilize the feature can be found at Stabilizing Features.
Rustc Bug Fix Procedure
This page defines the best practices procedure for making bug fixes or soundness corrections in the compiler that can cause existing code to stop compiling. This text is based on RFC 1589.
Motivation
From time to time, we encounter the need to make a bug fix, soundness correction, or other change in the compiler which will cause existing code to stop compiling. When this happens, it is important that we handle the change in a way that gives users of Rust a smooth transition. What we want to avoid is that existing programs suddenly stop compiling with opaque error messages: we would prefer to have a gradual period of warnings, with clear guidance as to what the problem is, how to fix it, and why the change was made. This RFC describes the procedure that we have been developing for handling breaking changes that aims to achieve that kind of smooth transition.
One of the key points of this policy is that (a) warnings should be issued initially rather than hard errors if at all possible and (b) every change that causes existing code to stop compiling will have an associated tracking issue. This issue provides a point to collect feedback on the results of that change. Sometimes changes have unexpectedly large consequences or there may be a way to avoid the change that was not considered. In those cases, we may decide to change course and roll back the change, or find another solution (if warnings are being used, this is particularly easy to do).
What qualifies as a bug fix?
Note that this RFC does not try to define when a breaking change is permitted. That is already covered under RFC 1122. This document assumes that the change being made is in accordance with those policies. Here is a summary of the conditions from RFC 1122:
- Soundness changes: Fixes to holes uncovered in the type system.
- Compiler bugs: Places where the compiler is not implementing the specified semantics found in an RFC or lang-team decision.
- Underspecified language semantics: Clarifications to grey areas where the compiler behaves inconsistently and no formal behavior had been previously decided.
Please see the RFC for full details!
Detailed design
The procedure for making a breaking change is as follows (each of these steps is described in more detail below):
- Do a crater run to assess the impact of the change.
- Make a special tracking issue dedicated to the change.
- Do not report an error right away. Instead, issue forwards-compatibility
lint warnings.
- Sometimes this is not straightforward. See the text below for suggestions on different techniques we have employed in the past.
- For cases where warnings are infeasible:
- Report errors, but make every effort to give a targeted error message that directs users to the tracking issue
- Submit PRs to all known affected crates that fix the issue
- or, at minimum, alert the owners of those crates to the problem and direct them to the tracking issue
- Once the change has been in the wild for at least one cycle, we can stabilize the change, converting those warnings into errors.
Finally, for changes to librustc_ast
that will affect plugins, the general policy
is to batch these changes. That is discussed below in more detail.
Tracking issue
Every breaking change should be accompanied by a dedicated tracking issue for that change. The main text of this issue should describe the change being made, with a focus on what users must do to fix their code. The issue should be approachable and practical; it may make sense to direct users to an RFC or some other issue for the full details. The issue also serves as a place where users can comment with questions or other concerns.
A template for these breaking-change tracking issues can be found below. An example of how such an issue should look can be found here.
The issue should be tagged with (at least) B-unstable
and T-compiler
.
Tracking issue template
This is a template to use for tracking issues:
This is the **summary issue** for the `YOUR_LINT_NAME_HERE`
future-compatibility warning and other related errors. The goal of
this page is describe why this change was made and how you can fix
code that is affected by it. It also provides a place to ask questions
or register a complaint if you feel the change should not be made. For
more information on the policy around future-compatibility warnings,
see our [breaking change policy guidelines][guidelines].
[guidelines]: LINK_TO_THIS_RFC
#### What is the warning for?
*Describe the conditions that trigger the warning and how they can be
fixed. Also explain why the change was made.**
#### When will this warning become a hard error?
At the beginning of each 6-week release cycle, the Rust compiler team
will review the set of outstanding future compatibility warnings and
nominate some of them for **Final Comment Period**. Toward the end of
the cycle, we will review any comments and make a final determination
whether to convert the warning into a hard error or remove it
entirely.
Issuing future compatibility warnings
The best way to handle a breaking change is to begin by issuing future-compatibility warnings. These are a special category of lint warning. Adding a new future-compatibility warning can be done as follows.
#![allow(unused_variables)] fn main() { // 1. Define the lint in `src/librustc/lint/builtin.rs`: declare_lint! { pub YOUR_ERROR_HERE, Warn, "illegal use of foo bar baz" } // 2. Add to the list of HardwiredLints in the same file: impl LintPass for HardwiredLints { fn get_lints(&self) -> LintArray { lint_array!( .., YOUR_ERROR_HERE ) } } // 3. Register the lint in `src/librustc_lint/lib.rs`: store.register_future_incompatible(sess, vec![ ..., FutureIncompatibleInfo { id: LintId::of(YOUR_ERROR_HERE), reference: "issue #1234", // your tracking issue here! }, ]); // 4. Report the lint: tcx.lint_node( lint::builtin::YOUR_ERROR_HERE, path_id, binding.span, format!("some helper message here")); }
Helpful techniques
It can often be challenging to filter out new warnings from older, pre-existing errors. One technique that has been used in the past is to run the older code unchanged and collect the errors it would have reported. You can then issue warnings for any errors you would give which do not appear in that original set. Another option is to abort compilation after the original code completes if errors are reported: then you know that your new code will only execute when there were no errors before.
Crater and crates.io
We should always do a crater run to assess impact. It is polite and considerate to at least notify the authors of affected crates the breaking change. If we can submit PRs to fix the problem, so much the better.
Is it ever acceptable to go directly to issuing errors?
Changes that are believed to have negligible impact can go directly to issuing
an error. One rule of thumb would be to check against crates.io
: if fewer than
10 total affected projects are found (not root errors), we can move
straight to an error. In such cases, we should still make the "breaking change"
page as before, and we should ensure that the error directs users to this page.
In other words, everything should be the same except that users are getting an
error, and not a warning. Moreover, we should submit PRs to the affected
projects (ideally before the PR implementing the change lands in rustc).
If the impact is not believed to be negligible (e.g., more than 10 crates are affected), then warnings are required (unless the compiler team agrees to grant a special exemption in some particular case). If implementing warnings is not feasible, then we should make an aggressive strategy of migrating crates before we land the change so as to lower the number of affected crates. Here are some techniques for approaching this scenario:
- Issue warnings for subparts of the problem, and reserve the new errors for the smallest set of cases you can.
- Try to give a very precise error message that suggests how to fix the problem and directs users to the tracking issue.
- It may also make sense to layer the fix:
- First, add warnings where possible and let those land before proceeding to issue errors.
- Work with authors of affected crates to ensure that corrected versions are available before the fix lands, so that downstream users can use them.
Stabilization
After a change is made, we will stabilize the change using the same process that we use for unstable features:
-
After a new release is made, we will go through the outstanding tracking issues corresponding to breaking changes and nominate some of them for final comment period (FCP).
-
The FCP for such issues lasts for one cycle. In the final week or two of the cycle, we will review comments and make a final determination:
- Convert to error: the change should be made into a hard error.
- Revert: we should remove the warning and continue to allow the older code to compile.
- Defer: can't decide yet, wait longer, or try other strategies.
Ideally, breaking changes should have landed on the stable branch of the compiler before they are finalized.
Removing a lint
Once we have decided to make a "future warning" into a hard error, we need a PR
that removes the custom lint. As an example, here are the steps required to
remove the overlapping_inherent_impls
compatibility lint. First, convert the
name of the lint to uppercase (OVERLAPPING_INHERENT_IMPLS
) ripgrep through the
source for that string. We will basically by converting each place where this
lint name is mentioned (in the compiler, we use the upper-case name, and a macro
automatically generates the lower-case string; so searching for
overlapping_inherent_impls
would not find much).
NOTE: these exact files don't exist anymore, but the procedure is still the same.
Remove the lint.
The first reference you will likely find is the lint definition in
librustc/lint/builtin.rs
that resembles this:
#![allow(unused_variables)] fn main() { declare_lint! { pub OVERLAPPING_INHERENT_IMPLS, Deny, // this may also say Warning "two overlapping inherent impls define an item with the same name were erroneously allowed" } }
This declare_lint!
macro creates the relevant data structures. Remove it. You
will also find that there is a mention of OVERLAPPING_INHERENT_IMPLS
later in
the file as part of a lint_array!
; remove it too,
Next, you see see a reference to OVERLAPPING_INHERENT_IMPLS
in
librustc_lint/lib.rs
. This defining the lint as a "future
compatibility lint":
#![allow(unused_variables)] fn main() { FutureIncompatibleInfo { id: LintId::of(OVERLAPPING_INHERENT_IMPLS), reference: "issue #36889 <https://github.com/rust-lang/rust/issues/36889>", }, }
Remove this too.
Add the lint to the list of removed lists.
In src/librustc_lint/lib.rs
there is a list of "renamed and removed lints".
You can add this lint to the list:
#![allow(unused_variables)] fn main() { store.register_removed("overlapping_inherent_impls", "converted into hard error, see #36889"); }
where #36889
is the tracking issue for your lint.
Update the places that issue the lint
Finally, the last class of references you will see are the places that actually
trigger the lint itself (i.e., what causes the warnings to appear). These
you do not want to delete. Instead, you want to convert them into errors. In
this case, the add_lint
call looks like this:
#![allow(unused_variables)] fn main() { self.tcx.sess.add_lint(lint::builtin::OVERLAPPING_INHERENT_IMPLS, node_id, self.tcx.span_of_impl(item1).unwrap(), msg); }
We want to convert this into an error. In some cases, there may be an existing error for this scenario. In others, we will need to allocate a fresh diagnostic code. Instructions for allocating a fresh diagnostic code can be found here. You may want to mention in the extended description that the compiler behavior changed on this point, and include a reference to the tracking issue for the change.
Let's say that we've adopted E0592
as our code. Then we can change the
add_lint()
call above to something like:
#![allow(unused_variables)] fn main() { struct_span_err!(self.tcx.sess, self.tcx.span_of_impl(item1).unwrap(), msg) .emit(); }
Update tests
Finally, run the test suite. These should be some tests that used to reference
the overlapping_inherent_impls
lint, those will need to be updated. In
general, if the test used to have #[deny(overlapping_inherent_impls)]
, that
can just be removed.
./x.py test
All done!
Open a PR. =)
Implement New Feature
When you want to implement a new significant feature in the compiler, you need to go through this process to make sure everything goes smoothly.
The @rfcbot (p)FCP process
When the change is small and uncontroversial, then it can be done with just writing a PR and getting r+ from someone who knows that part of the code. However, if the change is potentially controversial, it would be a bad idea to push it without consensus from the rest of the team (both in the "distributed system" sense to make sure you don't break anything you don't know about, and in the social sense to avoid PR fights).
If such a change seems to be too small to require a full formal RFC process (e.g. a big refactoring of the code, or a "technically-breaking" change, or a "big bugfix" that basically amounts to a small feature) but is still too controversial or big to get by with a single r+, you can start a pFCP (or, if you don't have r+ rights, ask someone who has them to start one - and unless they have a concern themselves, they should).
Again, the pFCP process is only needed if you need consensus - if you don't think anyone would have a problem with your change, it's ok to get by with only an r+. For example, it is OK to add or modify unstable command-line flags or attributes without an pFCP for compiler development or standard library use, as long as you don't expect them to be in wide use in the nightly ecosystem.
You don't need to have the implementation fully ready for r+ to ask for a pFCP, but it is generally a good idea to have at least a proof of concept so that people can see what you are talking about.
That starts a "proposed final comment period" (pFCP), which requires all members of the team to sign off the FCP. After they all do so, there's a 10 day long "final comment period" where everybody can comment, and if no new concerns are raised, the PR/issue gets FCP approval.
The logistics of writing features
There are a few "logistic" hoops you might need to go through in order to implement a feature in a working way.
Warning Cycles
In some cases, a feature or bugfix might break some existing programs in some edge cases. In that case, you might want to do a crater run to assess the impact and possibly add a future-compatibility lint, similar to those used for edition-gated lints.
Stability
We value the stability of Rust. Code that works and runs on stable should (mostly) not break. Because of that, we don't want to release a feature to the world with only team consensus and code review - we want to gain real-world experience on using that feature on nightly, and we might want to change the feature based on that experience.
To allow for that, we must make sure users don't accidentally depend on that new feature - otherwise, especially if experimentation takes time or is delayed and the feature takes the trains to stable, it would end up de facto stable and we'll not be able to make changes in it without breaking people's code.
The way we do that is that we make sure all new features are feature
gated - they can't be used without enabling a feature gate
(#[feature(foo)]
), which can't be done in a stable/beta compiler.
See the stability in code section for the technical details.
Eventually, after we gain enough experience using the feature, make the necessary changes, and are satisfied, we expose it to the world using the stabilization process described here. Until then, the feature is not set in stone: every part of the feature can be changed, or the feature might be completely rewritten or removed. Features are not supposed to gain tenure by being unstable and unchanged for a year.
Tracking Issues
To keep track of the status of an unstable feature, the experience we get while using it on nightly, and of the concerns that block its stabilization, every feature-gate needs a tracking issue.
General discussions about the feature should be done on the tracking issue.
For features that have an RFC, you should use the RFC's tracking issue for the feature.
For other features, you'll have to make a tracking issue for that feature. The issue title should be "Tracking issue for YOUR FEATURE".
For tracking issues for features (as opposed to future-compat warnings), I don't think the description has to contain anything specific. Generally we put the list of items required for stabilization in a checklist, e.g.,
**Steps:**
- [ ] Implement the RFC. (CC @rust-lang/compiler -- can anyone write
up mentoring instructions?)
- [ ] Adjust the documentation. ([See instructions on rustc-dev-guide.](https://rustc-dev-guide.rust-lang.org/stabilization_guide.html#documentation-prs))
- [ ] Stabilize the feature. ([See instructions on rustc-dev-guide.](https://rustc-dev-guide.rust-lang.org/stabilization_guide.html#stabilization-pr))
Stability in code
The below steps needs to be followed in order to implement a new unstable feature:
-
Open a tracking issue - if you have an RFC, you can use the tracking issue for the RFC.
The tracking issue should be labeled with at least
C-tracking-issue
. For a language feature, a labelF-feature_name
should be added as well. -
Pick a name for the feature gate (for RFCs, use the name in the RFC).
-
Add a feature gate declaration to
librustc_feature/active.rs
in the activedeclare_features
block:/// description of feature (active, $feature_name, "$current_nightly_version", Some($tracking_issue_number), $edition)
where
$edition
has the typeOption<Edition>
, and is typically justNone
.For example:
/// Allows defining identifiers beyond ASCII. (active, non_ascii_idents, "1.0.0", Some(55467), None),
When added, the current version should be the one for the current nightly. Once the feature is moved to
accepted.rs
, the version is changed to that nightly version. -
Prevent usage of the new feature unless the feature gate is set. You can check it in most places in the compiler using the expression
tcx.features().$feature_name
(orsess.features_untracked().$feature_name
if the tcx is unavailable)If the feature gate is not set, you should either maintain the pre-feature behavior or raise an error, depending on what makes sense.
For features introducing new syntax, pre-expansion gating should be used instead. To do so, extend the
GatedSpans
struct, add spans to it during parsing, and then finally feature-gate all the spans inrustc_ast_passes::feature_gate::check_crate
. -
Add a test to ensure the feature cannot be used without a feature gate, by creating
feature-gate-$feature_name.rs
andfeature-gate-$feature_name.stderr
files under the directory where the other tests for your feature reside. -
Add a section to the unstable book, in
src/doc/unstable-book/src/language-features/$feature_name.md
. -
Write a lots of tests for the new feature. PRs without tests will not be accepted!
-
Get your PR reviewed and land it. You have now successfully implemented a feature in Rust!
Stability attributes
This section is about the stability attributes and schemes that allow stable APIs to use unstable APIs internally in the rustc standard library.
For instructions on stabilizing a language feature see Stabilizing Features.
unstable
The #[unstable(feature = "foo", issue = "1234", reason = "lorem ipsum")]
attribute explicitly marks an item as unstable. Items that are marked as
"unstable" cannot be used without a corresponding #![feature]
attribute on
the crate, even on a nightly compiler. This restriction only applies across
crate boundaries, unstable items may be used within the crate that defines
them.
The issue
field specifies the associated GitHub issue number. This field is
required and all unstable features should have an associated tracking issue. In
rare cases where there is no sensible value issue = "none"
is used.
The unstable
attribute infects all sub-items, where the attribute doesn't
have to be reapplied. So if you apply this to a module, all items in the module
will be unstable.
You can make specific sub-items stable by using the #[stable]
attribute on
them. The stability scheme works similarly to how pub
works. You can have
public functions of nonpublic modules and you can have stable functions in
unstable modules or vice versa.
Note, however, that due to a rustc bug, stable items inside unstable modules
are available to stable code in that location! So, for example, stable code
can import core::intrinsics::transmute
even though intrinsics
is an
unstable module. Thus, this kind of nesting should be avoided when possible.
The unstable
attribute may also have the soft
value, which makes it a
future-incompatible deny-by-default lint instead of a hard error. This is used
by the bench
attribute which was accidentally accepted in the past. This
prevents breaking dependencies by leveraging Cargo's lint capping.
stable
The #[stable(feature = "foo", "since = "1.420.69")]
attribute explicitly
marks an item as stabilized. To do this, follow the instructions in
Stabilizing Features.
Note that stable functions may use unstable things in their body.
rustc_const_unstable
The #[rustc_const_unstable(feature = "foo", issue = "1234", reason = "lorem ipsum")]
has the same interface as the unstable
attribute. It is used to mark
const fn
as having their constness be unstable. This allows you to make a
function stable without stabilizing its constness or even just marking an existing
stable function as const fn
without instantly stabilizing the const fn
ness.
Furthermore this attribute is needed to mark an intrinsic as const fn
, because
there's no way to add const
to functions in extern
blocks for now.
rustc_const_stable
The #[stable(feature = "foo", "since = "1.420.69")]
attribute explicitly marks
a const fn
as having its constness be stable
. This attribute can make sense
even on an unstable
function, if that function is called from another
rustc_const_stable
function.
Furthermore this attribute is needed to mark an intrinsic as callable from
rustc_const_stable
functions.
allow_internal_unstable
Macros, compiler desugarings and const fn
s expose their bodies to the call
site. To work around not being able to use unstable things in the standard
library's macros, there's the #[allow_internal_unstable(feature1, feature2)]
attribute that whitelists the given features for usage in stable macros or
const fn
s.
Note that const fn
s are even more special in this regard. You can't just
whitelist any feature, the features need an implementation in
qualify_min_const_fn.rs
. For example the const_fn_union
feature gate allows
accessing fields of unions inside stable const fn
s. The rules for when it's
ok to use such a feature gate are that behavior matches the runtime behavior of
the same code (see also this blog post). This means that you may not
create a const fn
that e.g. transmutes a memory address to an integer,
because the addresses of things are nondeterministic and often unknown at
compile-time.
Always ping @oli-obk, @RalfJung, and @Centril if you are adding more
allow_internal_unstable
attributes to any const fn
staged_api
Any crate that uses the stable
, unstable
, or rustc_deprecated
attributes
must include the #![feature(staged_api)]
attribute on the crate.
rustc_deprecated
The deprecation system shares the same infrastructure as the stable/unstable
attributes. The rustc_deprecated
attribute is similar to the deprecated
attribute. It was previously called deprecated
, but was split off when
deprecated
was stabilized. The deprecated
attribute cannot be used in a
staged_api
crate, rustc_deprecated
must be used instead. The deprecated
item must also have a stable
or unstable
attribute.
rustc_deprecated
has the following form:
#[rustc_deprecated(
since = "1.38.0",
reason = "explanation for deprecation",
suggestion = "other_function"
)]
The suggestion
field is optional. If given, it should be a string that can be
used as a machine-applicable suggestion to correct the warning. This is
typically used when the identifier is renamed, but no other significant changes
are necessary.
Another difference from the deprecated
attribute is that the since
field is
actually checked against the current version of rustc
. If since
is in a
future version, then the deprecated_in_future
lint is triggered which is
default allow
, but most of the standard library raises it to a warning with
#![warn(deprecated_in_future)]
.
-Zforce-unstable-if-unmarked
The -Zforce-unstable-if-unmarked
flag has a variety of purposes to help
enforce that the correct crates are marked as unstable. It was introduced
primarily to allow rustc and the standard library to link to arbitrary crates
on crates.io which do not themselves use staged_api
. rustc
also relies on
this flag to mark all of its crates as unstable with the rustc_private
feature so that each crate does not need to be carefully marked with
unstable
.
This flag is automatically applied to all of rustc
and the standard library
by the bootstrap scripts. This is needed because the compiler and all of its
dependencies are shipped in the sysroot to all users.
This flag has the following effects:
- Marks the crate as "unstable" with the
rustc_private
feature if it is not itself marked as stable or unstable. - Allows these crates to access other forced-unstable crates without any need
for attributes. Normally a crate would need a
#![feature(rustc_private)]
attribute to use other unstable crates. However, that would make it impossible for a crate from crates.io to access its own dependencies since that crate won't have afeature(rustc_private)
attribute, but everything is compiled with-Zforce-unstable-if-unmarked
.
Code which does not use -Zforce-unstable-if-unmarked
should include the
#![feature(rustc_private)]
crate attribute to access these force-unstable
crates. This is needed for things that link rustc
, such as miri
, rls
, or
clippy
.
Request for stabilization
Once an unstable feature has been well-tested with no outstanding concern, anyone may push for its stabilization. It involves the following steps.
- Documentation PRs
- Write a stabilization report
- FCP
- Stabilization PR
Documentation PRs
If any documentation for this feature exists, it should be
in the Unstable Book
, located at src/doc/unstable-book
.
If it exists, the page for the feature gate should be removed.
If there was documentation there, integrating it into the existing documentation is needed.
If there wasn't documentation there, it needs to be added.
Places that may need updated documentation:
- The Reference: This must be updated, in full detail.
- The Book: This may or may not need updating, depends. If you're not sure, please open an issue on this repository and it can be discussed.
- standard library documentation: As needed. Language features
often don't need this, but if it's a feature that changes
how good examples are written, such as when
?
was added to the language, updating examples is important. - Rust by Example: As needed.
Prepare PRs to update documentation involving this new feature for repositories mentioned above. Maintainers of these repositories will keep these PRs open until the whole stabilization process has completed. Meanwhile, we can proceed to the next step.
Write a stabilization report
Find the tracking issue of the feature, and create a short stabilization report. Essentially this would be a brief summary of the feature plus some links to test cases showing it works as expected, along with a list of edge cases that came up and were considered. This is a minimal "due diligence" that we do before stabilizing.
The report should contain:
- A summary, showing examples (e.g. code snippets) what is enabled by this feature.
- Links to test cases in our test suite regarding this feature and describe the feature's behavior on encountering edge cases.
- Links to the documentations (the PRs we have made in the previous steps).
- Any other relevant information(Examples of such reports can be found in rust-lang/rust#44494 and rust-lang/rust#28237).
- The resolutions of any unresolved questions if the stabilization is for an RFC.
FCP
If any member of the team responsible for tracking this feature agrees with stabilizing this feature, they will start the FCP (final-comment-period) process by commenting
@rfcbot fcp merge
The rest of the team members will review the proposal. If the final decision is to stabilize, we proceed to do the actual code modification.
Stabilization PR
Once we have decided to stabilize a feature, we need to have a PR that actually makes that stabilization happen. These kinds of PRs are a great way to get involved in Rust, as they take you on a little tour through the source code.
Here is a general guide to how to stabilize a feature -- every feature is different, of course, so some features may require steps beyond what this guide talks about.
Note: Before we stabilize any feature, it's the rule that it should appear in the documentation.
Updating the feature-gate listing
There is a central listing of feature-gates in
src/librustc_feature
. Search for the declare_features!
macro. There should be an entry for the feature you are aiming
to stabilize, something like (this example is taken from
rust-lang/rust#32409:
// pub(restricted) visibilities (RFC 1422)
(active, pub_restricted, "1.9.0", Some(32409)),
The above line should be moved down to the area for "accepted"
features, declared below in a separate call to declare_features!
.
When it is done, it should look like:
// pub(restricted) visibilities (RFC 1422)
(accepted, pub_restricted, "1.31.0", Some(32409)),
// note that we changed this
Note that, the version number is updated to be the version number of the stable release where this feature will appear. This can be found by consulting the forge, which will guide you the next stable release number. You want to add 1 to that, because the code that lands today will become go into beta on that date, and then become stable after that. So, at the time of this writing, the next stable release (i.e. what is currently beta) was 1.30.0, hence I wrote 1.31.0 above.
Removing existing uses of the feature-gate
Next search for the feature string (in this case, pub_restricted
)
in the codebase to find where it appears. Change uses of
#![feature(XXX)]
from the libstd
and any rustc crates to be
#![cfg_attr(bootstrap, feature(XXX))]
. This includes the feature-gate
only for stage0, which is built using the current beta (this is
needed because the feature is still unstable in the current beta).
Also, remove those strings from any tests. If there are tests specifically targeting the feature-gate (i.e., testing that the feature-gate is required to use the feature, but nothing else), simply remove the test.
Do not require the feature-gate to use the feature
Most importantly, remove the code which flags an error if the
feature-gate is not present (since the feature is now considered
stable). If the feature can be detected because it employs some
new syntax, then a common place for that code to be is in the
same src/librustc_ast_passes/feature_gate.rs
.
For example, you might see code like this:
gate_feature_post!(&self, pub_restricted, span,
"`pub(restricted)` syntax is experimental");
This gate_feature_post!
macro prints an error if the
pub_restricted
feature is not enabled. It is not needed
now that #[pub_restricted]
is stable.
For more subtle features, you may find code like this:
if self.tcx.sess.features.borrow().pub_restricted { /* XXX */ }
This pub_restricted
field (obviously named after the feature)
would ordinarily be false if the feature flag is not present
and true if it is. So transform the code to assume that the field
is true. In this case, that would mean removing the if
and
leaving just the /* XXX */
.
if self.tcx.sess.features.borrow().pub_restricted { /* XXX */ }
becomes
/* XXX */
if self.tcx.sess.features.borrow().pub_restricted && something { /* XXX */ }
becomes
if something { /* XXX */ }
Debugging the compiler
This chapter contains a few tips to debug the compiler. These tips aim to be useful no matter what you are working on. Some of the other chapters have advice about specific parts of the compiler (e.g. the Queries Debugging and Testing chapter or the LLVM Debugging chapter).
-Z
flags
The compiler has a bunch of -Z
flags. These are unstable flags that are only
enabled on nightly. Many of them are useful for debugging. To get a full listing
of -Z
flags, use -Z help
.
One useful flag is -Z verbose
, which generally enables printing more info that
could be useful for debugging.
Getting a backtrace
When you have an ICE (panic in the compiler), you can set
RUST_BACKTRACE=1
to get the stack trace of the panic!
like in
normal Rust programs. IIRC backtraces don't work on MinGW,
sorry. If you have trouble or the backtraces are full of unknown
,
you might want to find some way to use Linux, Mac, or MSVC on Windows.
In the default configuration, you don't have line numbers enabled, so the backtrace looks like this:
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
1: std::sys_common::backtrace::_print
2: std::panicking::default_hook::{{closure}}
3: std::panicking::default_hook
4: std::panicking::rust_panic_with_hook
5: std::panicking::begin_panic
(~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
32: rustc_typeck::check_crate
33: <std::thread::local::LocalKey<T>>::with
34: <std::thread::local::LocalKey<T>>::with
35: rustc::ty::context::TyCtxt::create_and_enter
36: rustc_driver::driver::compile_input
37: rustc_driver::run_compiler
If you want line numbers for the stack trace, you can enable debug = true
in
your config.toml and rebuild the compiler (debuginfo-level = 1
will also add
line numbers, but debug = true
gives full debuginfo). Then the backtrace will
look like this:
stack backtrace:
(~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
at /home/user/rust/src/librustc_typeck/check/cast.rs:110
7: rustc_typeck::check::cast::CastCheck::check
at /home/user/rust/src/librustc_typeck/check/cast.rs:572
at /home/user/rust/src/librustc_typeck/check/cast.rs:460
at /home/user/rust/src/librustc_typeck/check/cast.rs:370
(~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
33: rustc_driver::driver::compile_input
at /home/user/rust/src/librustc_driver/driver.rs:1010
at /home/user/rust/src/librustc_driver/driver.rs:212
34: rustc_driver::run_compiler
at /home/user/rust/src/librustc_driver/lib.rs:253
Getting a backtrace for errors
If you want to get a backtrace to the point where the compiler emits
an error message, you can pass the -Z treat-err-as-bug=n
, which
will make the compiler skip n
errors or delay_span_bug
calls and then
panic on the next one. If you leave off =n
, the compiler will assume 0
for
n
and thus panic on the first error it encounters.
This can also help when debugging delay_span_bug
calls - it will make
the first delay_span_bug
call panic, which will give you a useful backtrace.
For example:
$ cat error.rs
fn main() {
1 + ();
}
$ ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc error.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
--> error.rs:2:7
|
2 | 1 + ();
| ^ no implementation for `{integer} + ()`
|
= help: the trait `std::ops::Add<()>` is not implemented for `{integer}`
error: aborting due to previous error
$ # Now, where does the error above come from?
$ RUST_BACKTRACE=1 \
./build/x86_64-unknown-linux-gnu/stage1/bin/rustc \
error.rs \
-Z treat-err-as-bug
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
--> error.rs:2:7
|
2 | 1 + ();
| ^ no implementation for `{integer} + ()`
|
= help: the trait `std::ops::Add<()>` is not implemented for `{integer}`
error: internal compiler error: unexpected panic
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
note: rustc 1.24.0-dev running on x86_64-unknown-linux-gnu
note: run with `RUST_BACKTRACE=1` for a backtrace
thread 'rustc' panicked at 'encountered error with `-Z treat_err_as_bug',
/home/user/rust/src/librustc_errors/lib.rs:411:12
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
backtrace.
stack backtrace:
(~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
7: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'tcx>>
::report_selection_error
at /home/user/rust/src/librustc_middle/traits/error_reporting.rs:823
8: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'tcx>>
::report_fulfillment_errors
at /home/user/rust/src/librustc_middle/traits/error_reporting.rs:160
at /home/user/rust/src/librustc_middle/traits/error_reporting.rs:112
9: rustc_typeck::check::FnCtxt::select_obligations_where_possible
at /home/user/rust/src/librustc_typeck/check/mod.rs:2192
(~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
36: rustc_driver::run_compiler
at /home/user/rust/src/librustc_driver/lib.rs:253
$ # Cool, now I have a backtrace for the error
Getting logging output
These crates are used in compiler for logging:
- log
- env-logger: check the link to see the full
RUSTC_LOG
syntax
The compiler has a lot of debug!
calls, which print out logging information
at many points. These are very useful to at least narrow down the location of
a bug if not to find it entirely, or just to orient yourself as to why the
compiler is doing a particular thing.
To see the logs, you need to set the RUSTC_LOG
environment variable to
your log filter, e.g. to get the logs for a specific module, you can run the
compiler as RUSTC_LOG=module::path rustc my-file.rs
. All debug!
output will
then appear in standard error.
Note that unless you use a very strict filter, the logger will emit a lot of output, so use the most specific module(s) you can (comma-separated if multiple). It's typically a good idea to pipe standard error to a file and look at the log output with a text editor.
So to put it together.
# This puts the output of all debug calls in `librustc_middle/traits` into
# standard error, which might fill your console backscroll.
$ RUSTC_LOG=rustc::traits rustc +local my-file.rs
# This puts the output of all debug calls in `librustc_middle/traits` in
# `traits-log`, so you can then see it with a text editor.
$ RUSTC_LOG=rustc::traits rustc +local my-file.rs 2>traits-log
# Not recommended. This will show the output of all `debug!` calls
# in the Rust compiler, and there are a *lot* of them, so it will be
# hard to find anything.
$ RUSTC_LOG=debug rustc +local my-file.rs 2>all-log
# This will show the output of all `info!` calls in `rustc_trans`.
#
# There's an `info!` statement in `trans_instance` that outputs
# every function that is translated. This is useful to find out
# which function triggers an LLVM assertion, and this is an `info!`
# log rather than a `debug!` log so it will work on the official
# compilers.
$ RUSTC_LOG=rustc_trans=info rustc +local my-file.rs
How to keep or remove debug!
and trace!
calls from the resulting binary
While calls to error!
, warn!
and info!
are included in every build of the compiler,
calls to debug!
and trace!
are only included in the program if
debug-assertions=yes
is turned on in config.toml (it is
turned off by default), so if you don't see DEBUG
logs, especially
if you run the compiler with RUSTC_LOG=rustc rustc some.rs
and only see
INFO
logs, make sure that debug-assertions=yes
is turned on in your
config.toml.
I also think that in some cases just setting it will not trigger a rebuild,
so if you changed it and you already have a compiler built, you might
want to call x.py clean
to force one.
Logging etiquette and conventions
Because calls to debug!
are removed by default, in most cases, don't worry
about adding "unnecessary" calls to debug!
and leaving them in code you
commit - they won't slow down the performance of what we ship, and if they
helped you pinning down a bug, they will probably help someone else with a
different one.
A loosely followed convention is to use debug!("foo(...)")
at the start of
a function foo
and debug!("foo: ...")
within the function. Another
loosely followed convention is to use the {:?}
format specifier for debug
logs.
One thing to be careful of is expensive operations in logs.
If in the module rustc::foo
you have a statement
debug!("{:?}", random_operation(tcx));
Then if someone runs a debug rustc
with RUSTC_LOG=rustc::bar
, then
random_operation()
will run.
This means that you should not put anything too expensive or likely to crash there - that would annoy anyone who wants to use logging for their own module. No-one will know it until someone tries to use logging to find another bug.
Formatting Graphviz output (.dot files)
Some compiler options for debugging specific features yield graphviz graphs -
e.g. the #[rustc_mir(borrowck_graphviz_postflow="suffix.dot")]
attribute
dumps various borrow-checker dataflow graphs.
These all produce .dot
files. To view these files, install graphviz (e.g.
apt-get install graphviz
) and then run the following commands:
$ dot -T pdf maybe_init_suffix.dot > maybe_init_suffix.pdf
$ firefox maybe_init_suffix.pdf # Or your favorite pdf viewer
Narrowing (Bisecting) Regressions
The cargo-bisect-rustc tool can be used as a quick and easy way to
find exactly which PR caused a change in rustc
behavior. It automatically
downloads rustc
PR artifacts and tests them against a project you provide
until it finds the regression. You can then look at the PR to get more context
on why it was changed. See this tutorial on how to use
it.
Downloading Artifacts from Rust's CI
The rustup-toolchain-install-master tool by kennytm can be used to
download the artifacts produced by Rust's CI for a specific SHA1 -- this
basically corresponds to the successful landing of some PR -- and then sets
them up for your local use. This also works for artifacts produced by @bors try
. This is helpful when you want to examine the resulting build of a PR
without doing the build yourself.
Profiling the compiler
This section talks about how to profile the compiler and find out where it spends its time.
Depending on what you're trying to measure, there are several different approaches:
-
If you want to see if a PR improves or regresses compiler performance:
- The rustc-perf project makes this easy and can be triggered to run on a PR via the
@rustc-perf
bot.
- The rustc-perf project makes this easy and can be triggered to run on a PR via the
-
If you want a medium-to-high level overview of where
rustc
is spending its time:- The
-Zself-profile
flag and measureme tools offer a query-based approach to profiling. See their docs for more information.
- The
-
If you want function level performance data or even just more details than the above approaches:
- Consider using a native code profiler such as perf.
Profiling with perf
This is a guide for how to profile rustc with perf.
Initial steps
- Get a clean checkout of rust-lang/master, or whatever it is you want to profile.
- Set the following settings in your
config.toml
:debuginfo-level = 1
- enables line debuginfojemalloc = false
- lets you do memory use profiling with valgrind- leave everything else the defaults
- Run
./x.py build
to get a full build - Make a rustup toolchain pointing to that result
Gathering a perf profile
perf is an excellent tool on linux that can be used to gather and analyze all kinds of information. Mostly it is used to figure out where a program spends its time. It can also be used for other sorts of events, though, like cache misses and so forth.
The basics
The basic perf
command is this:
perf record -F99 --call-graph dwarf XXX
The -F99
tells perf to sample at 99 Hz, which avoids generating too
much data for longer runs (why 99 Hz you ask? It is often chosen
because it is unlikely to be in lockstep with other periodic
activity). The --call-graph dwarf
tells perf to get call-graph
information from debuginfo, which is accurate. The XXX
is the
command you want to profile. So, for example, you might do:
perf record -F99 --call-graph dwarf cargo +<toolchain> rustc
to run cargo
-- here <toolchain>
should be the name of the toolchain
you made in the beginning. But there are some things to be aware of:
- You probably don't want to profile the time spend building
dependencies. So something like
cargo build; cargo clean -p $C
may be helpful (where$C
is the crate name)- Though usually I just do
touch src/lib.rs
and rebuild instead. =)
- Though usually I just do
- You probably don't want incremental messing about with your
profile. So something like
CARGO_INCREMENTAL=0
can be helpful.
Gathering a perf profile from a perf.rust-lang.org
test
Often we want to analyze a specific test from perf.rust-lang.org
. To
do that, the first step is to clone
the rustc-perf repository:
git clone https://github.com/rust-lang/rustc-perf
Doing it the easy way
Once you've cloned the repo, you can use the collector
executable to
do profiling for you! You can find
instructions in the rustc-perf readme.
For example, to measure the clap-rs test, you might do:
./target/release/collector \
--output-repo /path/to/place/output \
profile perf-record \
--rustc /path/to/rustc/executable/from/your/build/directory \
--cargo `which cargo` \
--filter clap-rs \
--builds Check \
You can also use that same command to use cachegrind or other profiling tools.
Doing it the hard way
If you prefer to run things manually, that is also possible. You first
need to find the source for the test you want. Sources for the tests
are found in the collector/benchmarks
directory. So let's go
into the directory of a specific test; we'll use clap-rs
as an
example:
cd collector/benchmarks/clap-rs
In this case, let's say we want to profile the cargo check
performance. In that case, I would first run some basic commands to
build the dependencies:
# Setup: first clean out any old results and build the dependencies:
cargo +<toolchain> clean
CARGO_INCREMENTAL=0 cargo +<toolchain> check
(Again, <toolchain>
should be replaced with the name of the
toolchain we made in the first step.)
Next: we want record the execution time for just the clap-rs crate,
running cargo check. I tend to use cargo rustc
for this, since it
also allows me to add explicit flags, which we'll do later on.
touch src/lib.rs
CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib
Note that final command: it's a doozy! It uses the cargo rustc
command, which executes rustc with (potentially) additional options;
the --profile check
and --lib
options specify that we are doing a
cargo check
execution, and that this is a library (not a binary).
At this point, we can use perf
tooling to analyze the results. For example:
perf report
will open up an interactive TUI program. In simple cases, that can be
helpful. For more detailed examination, the perf-focus
tool
can be helpful; it is covered below.
A note of caution. Each of the rustc-perf tests is its own special
snowflake. In particular, some of them are not libraries, in which
case you would want to do touch src/main.rs
and avoid passing
--lib
. I'm not sure how best to tell which test is which to be
honest.
Gathering NLL data
If you want to profile an NLL run, you can just pass extra options to
the cargo rustc
command, like so:
touch src/lib.rs
CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib -- -Zborrowck=mir
Analyzing a perf profile with perf focus
Once you've gathered a perf profile, we want to get some information about it. For this, I personally use perf focus. It's a kind of simple but useful tool that lets you answer queries like:
- "how much time was spent in function F" (no matter where it was called from)
- "how much time was spent in function F when it was called from G"
- "how much time was spent in function F excluding time spent in G"
- "what functions does F call and how much time does it spend in them"
To understand how it works, you have to know just a bit about
perf. Basically, perf works by sampling your process on a regular
basis (or whenever some event occurs). For each sample, perf gathers a
backtrace. perf focus
lets you write a regular expression that tests
which functions appear in that backtrace, and then tells you which
percentage of samples had a backtrace that met the regular
expression. It's probably easiest to explain by walking through how I
would analyze NLL performance.
Installing perf-focus
You can install perf-focus using cargo install
:
cargo install perf-focus
Example: How much time is spent in MIR borrowck?
Let's say we've gathered the NLL data for a test. We'd like to know
how much time it is spending in the MIR borrow-checker. The "main"
function of the MIR borrowck is called do_mir_borrowck
, so we can do
this command:
$ perf focus '{do_mir_borrowck}'
Matcher : {do_mir_borrowck}
Matches : 228
Not Matches: 542
Percentage : 29%
The '{do_mir_borrowck}'
argument is called the matcher. It
specifies the test to be applied on the backtrace. In this case, the
{X}
indicates that there must be some function on the backtrace
that meets the regular expression X
. In this case, that regex is
just the name of the function we want (in fact, it's a subset of the name;
the full name includes a bunch of other stuff, like the module
path). In this mode, perf-focus just prints out the percentage of
samples where do_mir_borrowck
was on the stack: in this case, 29%.
A note about c++filt. To get the data from perf
, perf focus
currently executes perf script
(perhaps there is a better
way...). I've sometimes found that perf script
outputs C++ mangled
names. This is annoying. You can tell by running perf script | head
yourself — if you see names like 5rustc6middle
instead of
rustc::middle
, then you have the same problem. You can solve this
by doing:
perf script | c++filt | perf focus --from-stdin ...
This will pipe the output from perf script
through c++filt
and
should mostly convert those names into a more friendly format. The
--from-stdin
flag to perf focus
tells it to get its data from
stdin, rather than executing perf focus
. We should make this more
convenient (at worst, maybe add a c++filt
option to perf focus
, or
just always use it — it's pretty harmless).
Example: How much time does MIR borrowck spend solving traits?
Perhaps we'd like to know how much time MIR borrowck spends in the trait checker. We can ask this using a more complex regex:
$ perf focus '{do_mir_borrowck}..{^rustc::traits}'
Matcher : {do_mir_borrowck},..{^rustc::traits}
Matches : 12
Not Matches: 1311
Percentage : 0%
Here we used the ..
operator to ask "how often do we have
do_mir_borrowck
on the stack and then, later, some function whose
name begins with rusc::traits
?" (basically, code in that module). It
turns out the answer is "almost never" — only 12 samples fit that
description (if you ever see no samples, that often indicates your
query is messed up).
If you're curious, you can find out exactly which samples by using the
--print-match
option. This will print out the full backtrace for
each sample. The |
at the front of the line indicates the part that
the regular expression matched.
Example: Where does MIR borrowck spend its time?
Often we want to do a more "explorational" queries. Like, we know that
MIR borrowck is 29% of the time, but where does that time get spent?
For that, the --tree-callees
option is often the best tool. You
usually also want to give --tree-min-percent
or
--tree-max-depth
. The result looks like this:
$ perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
Matcher : {do_mir_borrowck}
Matches : 577
Not Matches: 746
Percentage : 43%
Tree
| matched `{do_mir_borrowck}` (43% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (20% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (13% total, 0% self)
: : : | core::ops::function::FnOnce::call_once (5% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (5% total, 3% self)
: : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (3% total, 0% self)
: | rustc::mir::visit::Visitor::visit_mir (8% total, 6% self)
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (5% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)
What happens with --tree-callees
is that
- we find each sample matching the regular expression
- we look at the code that is occurs after the regex match and try to build up a call tree
The --tree-min-percent 3
option says "only show me things that take
more than 3% of the time. Without this, the tree often gets really
noisy and includes random stuff like the innards of
malloc. --tree-max-depth
can be useful too, it just limits how many
levels we print.
For each line, we display the percent of time in that function altogether ("total") and the percent of time spent in just that function and not some callee of that function (self). Usually "total" is the more interesting number, but not always.
Relative percentages
By default, all in perf-focus are relative to the total program execution. This is useful to help you keep perspective — often as we drill down to find hot spots, we can lose sight of the fact that, in terms of overall program execution, this "hot spot" is actually not important. It also ensures that percentages between different queries are easily compared against one another.
That said, sometimes it's useful to get relative percentages, so perf focus
offers a --relative
option. In this case, the percentages are
listed only for samples that match (vs all samples). So for example we
could get our percentages relative to the borrowck itself
like so:
$ perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
Matcher : {do_mir_borrowck}
Matches : 577
Not Matches: 746
Percentage : 100%
Tree
| matched `{do_mir_borrowck}` (100% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (47% total, 0% self) [...]
: | rustc::mir::visit::Visitor::visit_mir (19% total, 15% self) [...]
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (13% total, 0% self) [...]
: | rustc_mir::dataflow::do_dataflow (8% total, 1% self) [...]
Here you see that compute_regions
came up as "47% total" — that
means that 47% of do_mir_borrowck
is spent in that function. Before,
we saw 20% — that's because do_mir_borrowck
itself is only 43% of
the total time (and .47 * .43 = .20
).
This file offers some tips on the coding conventions for rustc. This chapter covers formatting, coding for correctness, using crates from crates.io, and some tips on structuring your PR for easy review.
Formatting and the tidy script
rustc is slowly moving towards the Rust standard coding style;
at the moment, however, it follows a rather more chaotic style. We
do have some mandatory formatting conventions, which are automatically
enforced by a script we affectionately call the "tidy" script. The
tidy script runs automatically when you do ./x.py test
and can be run
in isolation with ./x.py test tidy
.
Copyright notice
In the past, files began with a copyright and license notice. Please omit this notice for new files licensed under the standard terms (dual MIT/Apache-2.0).
All of the copyright notices should be gone by now, but if you come across one in the rust-lang/rust repo, feel free to open a PR to remove it.
Line length
Lines should be at most 100 characters. It's even better if you can keep things to 80.
Ignoring the line length limit. Sometimes – in particular for tests – it can be necessary to exempt yourself from this limit. In that case, you can add a comment towards the top of the file (after the copyright notice) like so:
#![allow(unused_variables)] fn main() { // ignore-tidy-linelength }
Tabs vs spaces
Prefer 4-space indent.
Coding for correctness
Beyond formatting, there are a few other tips that are worth following.
Prefer exhaustive matches
Using _
in a match is convenient, but it means that when new
variants are added to the enum, they may not get handled correctly.
Ask yourself: if a new variant were added to this enum, what's the
chance that it would want to use the _
code, versus having some
other treatment? Unless the answer is "low", then prefer an
exhaustive match. (The same advice applies to if let
and while let
, which are effectively tests for a single variant.)
Use "TODO" comments for things you don't want to forget
As a useful tool to yourself, you can insert a // TODO
comment
for something that you want to get back to before you land your PR:
fn do_something() {
if something_else {
unimplemented!(); // TODO write this
}
}
The tidy script will report an error for a // TODO
comment, so this
code would not be able to land until the TODO is fixed (or removed).
This can also be useful in a PR as a way to signal from one commit that you are leaving a bug that a later commit will fix:
if foo {
return true; // TODO wrong, but will be fixed in a later commit
}
Using crates from crates.io
It is allowed to use crates from crates.io, though external dependencies should not be added gratuitously. All such crates must have a suitably permissive license. There is an automatic check which inspects the Cargo metadata to ensure this.
How to structure your PR
How you prepare the commits in your PR can make a big difference for the reviewer. Here are some tips.
Isolate "pure refactorings" into their own commit. For example, if you rename a method, then put that rename into its own commit, along with the renames of all the uses.
More commits is usually better. If you are doing a large change, it's almost always better to break it up into smaller steps that can be independently understood. The one thing to be aware of is that if you introduce some code following one strategy, then change it dramatically (versus adding to it) in a later commit, that 'back-and-forth' can be confusing.
Only run rustfmt on new content. One day, we might enforce formatting
for the rust-lang/rust repo. Meanwhile, we prefer that rustfmt not be run
on existing code as that will generate large diffs and will make git blame
harder to sift through. However, running rustfmt
on new content, e.g. a
new file or a largely new part of a file is ok. Small formatting adjustments
nearby code you are already changing for other purposes are also ok.
No merges. We do not allow merge commits into our history, other
than those by bors. If you get a merge conflict, rebase instead via a
command like git rebase -i rust-lang/master
(presuming you use the
name rust-lang
for your remote).
Individual commits do not have to build (but it's nice). We do not require that every intermediate commit successfully builds – we only expect to be able to bisect at a PR level. However, if you can make individual commits build, that is always helpful.
Naming conventions
Apart from normal Rust style/naming conventions, there are also some specific to the compiler.
-
cx
tends to be short for "context" and is often used as a suffix. For example,tcx
is a common name for the Typing Context. -
'tcx
is used as the lifetime name for the Typing Context. -
Because
crate
is a keyword, if you need a variable to represent something crate-related, often the spelling is changed tokrate
.
crates.io Dependencies
The rust compiler supports building with some dependencies from crates.io
.
For example, log
and env_logger
come from crates.io
.
In general, you should avoid adding dependencies to the compiler for several reasons:
- The dependency may not be high quality or well-maintained, whereas we want the compiler to be high-quality.
- The dependency may not be using a compatible license.
- The dependency may have transitive dependencies that have one of the above problems.
TODO: what is the vetting process?
Whitelist
The tidy
tool has a whitelist of crates that are allowed. To add a
dependency that is not already in the compiler, you will need to add it to this
whitelist.
Emitting Errors and other Diagnostics
A lot of effort has been put into making rustc
have great error messages.
This chapter is about how to emit compile errors and lints from the compiler.
Span
Span
is the primary data structure in rustc
used to represent a
location in the code being compiled. Span
s are attached to most constructs in
HIR and MIR, allowing for more informative error reporting.
A Span
can be looked up in a SourceMap
to get a "snippet"
useful for displaying errors with span_to_snippet
and other
similar methods on the SourceMap
.
Error messages
The rustc_errors
crate defines most of the utilities used for
reporting errors.
Session
and ParseSess
have
methods (or fields with methods) that allow reporting errors. These methods
usually have names like span_err
or struct_span_err
or span_warn
, etc...
There are lots of them; they emit different types of "errors", such as
warnings, errors, fatal errors, suggestions, etc.
In general, there are two classes of such methods: ones that emit an error
directly and ones that allow finer control over what to emit. For example,
span_err
emits the given error message at the given Span
, but
struct_span_err
instead returns a
DiagnosticBuilder
.
DiagnosticBuilder
allows you to add related notes and suggestions to an error
before emitting it by calling the emit
method. (Failing to either
emit or cancel a DiagnosticBuilder
will result in an ICE.) See the
docs for more info on what you can do.
// Get a DiagnosticBuilder. This does _not_ emit an error yet.
let mut err = sess.struct_span_err(sp, "oh no! this is an error!");
// In some cases, you might need to check if `sp` is generated by a macro to
// avoid printing weird errors about macro-generated code.
if let Ok(snippet) = sess.source_map().span_to_snippet(sp) {
// Use the snippet to generate a suggested fix
err.span_suggestion(suggestion_sp, "try using a qux here", format!("qux {}", snippet));
} else {
// If we weren't able to generate a snippet, then emit a "help" message
// instead of a concrete "suggestion". In practice this is unlikely to be
// reached.
err.span_help(suggestion_sp, "you could use a qux here instead");
}
// emit the error
err.emit();
Suggestions
In addition to telling the user exactly why their code is wrong, it's
oftentimes furthermore possible to tell them how to fix it. To this end,
DiagnosticBuilder
offers a structured suggestions API, which formats code
suggestions pleasingly in the terminal, or (when the --error-format json
flag
is passed) as JSON for consumption by tools, most notably the Rust Language
Server and rustfix
.
Not all suggestions should be applied mechanically. Use the
span_suggestion
method of DiagnosticBuilder
to
make a suggestion. The last argument provides a hint to tools whether
the suggestion is mechanically applicable or not.
For example, to make our qux
suggestion machine-applicable, we would do:
let mut err = sess.struct_span_err(sp, "oh no! this is an error!");
if let Ok(snippet) = sess.source_map().span_to_snippet(sp) {
err.span_suggestion(
suggestion_sp,
"try using a qux here",
format!("qux {}", snippet),
Applicability::MachineApplicable,
);
} else {
err.span_help(suggestion_sp, "you could use a qux here instead");
}
err.emit();
This might emit an error like
$ rustc mycode.rs
error[E0999]: oh no! this is an error!
--> mycode.rs:3:5
|
3 | sad()
| ^ help: try using a qux here: `qux sad()`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0999`.
In some cases, like when the suggestion spans multiple lines or when there are multiple suggestions, the suggestions are displayed on their own:
error[E0999]: oh no! this is an error!
--> mycode.rs:3:5
|
3 | sad()
| ^
help: try using a qux here:
|
3 | qux sad()
| ^^^
error: aborting due to previous error
For more information about this error, try `rustc --explain E0999`.
The possible values of Applicability
are:
MachineApplicable
: Can be applied mechanically.HasPlaceholders
: Cannot be applied mechanically because it has placeholder text in the suggestions. For example, "Try adding a type: `let x: <type>`".MaybeIncorrect
: Cannot be applied mechanically because the suggestion may or may not be a good one.Unspecified
: Cannot be applied mechanically because we don't know which of the above cases it falls into.
Lints
The compiler linting infrastructure is defined in the rustc::lint
module.
Declaring a lint
The built-in compiler lints are defined in the rustc_lint
crate.
Every lint is implemented via a struct
that implements the LintPass
trait
(you also implement one of the more specific lint pass traits, either
EarlyLintPass
or LateLintPass
). The trait implementation allows you to
check certain syntactic constructs as the linter walks the source code. You can
then choose to emit lints in a very similar way to compile errors.
You also declare the metadata of a particular lint via the declare_lint!
macro. This includes the name, the default level, a short description, and some
more details.
Note that the lint and the lint pass must be registered with the compiler.
For example, the following lint checks for uses
of while true { ... }
and suggests using loop { ... }
instead.
// Declare a lint called `WHILE_TRUE`
declare_lint! {
WHILE_TRUE,
// warn-by-default
Warn,
// This string is the lint description
"suggest using `loop { }` instead of `while true { }`"
}
// Define a struct and `impl LintPass` for it.
#[derive(Copy, Clone)]
pub struct WhileTrue;
// This declares a lint pass, providing a list of associated lints. The
// compiler currently doesn't use the associated lints directly (e.g., to not
// run the pass or otherwise check that the pass emits the appropriate set of
// lints). However, it's good to be accurate here as it's possible that we're
// going to register the lints via the get_lints method on our lint pass (that
// this macro generates).
impl_lint_pass!(
WhileTrue => [WHILE_TRUE],
);
// LateLintPass has lots of methods. We only override the definition of
// `check_expr` for this lint because that's all we need, but you could
// override other methods for your own lint. See the rustc docs for a full
// list of methods.
impl<'a, 'tcx> LateLintPass<'a, 'tcx> for WhileTrue {
fn check_expr(&mut self, cx: &LateContext, e: &hir::Expr) {
if let hir::ExprWhile(ref cond, ..) = e.node {
if let hir::ExprLit(ref lit) = cond.node {
if let ast::LitKind::Bool(true) = lit.node {
if lit.span.ctxt() == SyntaxContext::empty() {
let msg = "denote infinite loops with `loop { ... }`";
let condition_span = cx.tcx.sess.source_map().def_span(e.span);
let mut err = cx.struct_span_lint(WHILE_TRUE, condition_span, msg);
err.span_suggestion_short(condition_span, "use `loop`", "loop".to_owned());
err.emit();
}
}
}
}
}
}
Edition-gated Lints
Sometimes we want to change the behavior of a lint in a new edition. To do this,
we just add the transition to our invocation of declare_lint!
:
declare_lint! {
pub ANONYMOUS_PARAMETERS,
Allow,
"detects anonymous parameters",
Edition::Edition2018 => Warn,
}
This makes the ANONYMOUS_PARAMETERS
lint allow-by-default in the 2015 edition
but warn-by-default in the 2018 edition.
A future-incompatible lint should be declared with the @future_incompatible
additional "field":
declare_lint! {
pub ANONYMOUS_PARAMETERS,
Allow,
"detects anonymous parameters",
@future_incompatible = FutureIncompatibleInfo {
reference: "issue #41686 <https://github.com/rust-lang/rust/issues/41686>",
edition: Some(Edition::Edition2018),
};
}
If you need a combination of options that's not supported by the
declare_lint!
macro, you can always define your own static with a type of
&Lint
but this is currently linted against in the compiler tree.
Guidelines for creating a future incompatibility lint
- Create a lint defaulting to warn as normal, with ideally the same error message you would normally give.
- Add a suitable reference, typically an RFC or tracking issue. Go ahead and include the full URL, sort items in ascending order of issue numbers.
- Later, change lint to error.
- Eventually, remove lint.
Lint Groups
Lints can be turned on in groups. These groups are declared in the
register_builtins
function in rustc_lint::lib
. The
add_lint_group!
macro is used to declare a new group.
For example,
add_lint_group!(sess,
"nonstandard_style",
NON_CAMEL_CASE_TYPES,
NON_SNAKE_CASE,
NON_UPPER_CASE_GLOBALS);
This defines the nonstandard_style
group which turns on the listed lints. A
user can turn on these lints with a !#[warn(nonstandard_style)]
attribute in
the source code, or by passing -W nonstandard-style
on the command line.
Linting early in the compiler
On occasion, you may need to define a lint that runs before the linting system has been initialized (e.g. during parsing or macro expansion). This is problematic because we need to have computed lint levels to know whether we should emit a warning or an error or nothing at all.
To solve this problem, we buffer the lints until the linting system is
processed. Session
and ParseSess
both have
buffer_lint
methods that allow you to buffer a lint for later. The linting
system automatically takes care of handling buffered lints later.
Thus, to define a lint that runs early in the compilation, one defines a lint
like normal but invokes the lint with buffer_lint
.
Linting even earlier in the compiler
The parser (librustc_ast
) is interesting in that it cannot have dependencies on
any of the other librustc*
crates. In particular, it cannot depend on
librustc_middle::lint
or librustc_lint
, where all of the compiler linting
infrastructure is defined. That's troublesome!
To solve this, librustc_ast
defines its own buffered lint type, which
ParseSess::buffer_lint
uses. After macro expansion, these buffered lints are
then dumped into the Session::buffered_lints
used by the rest of the compiler.
JSON diagnostic output
The compiler accepts an --error-format json
flag to output
diagnostics as JSON objects (for the benefit of tools such as cargo fix
or the RLS). It looks like this—
$ rustc json_error_demo.rs --error-format json
{"message":"cannot add `&str` to `{integer}`","code":{"code":"E0277","explanation":"\nYou tried to use a type which doesn't implement some trait in a place which\nexpected that trait. Erroneous code example:\n\n```compile_fail,E0277\n// here we declare the Foo trait with a bar method\ntrait Foo {\n fn bar(&self);\n}\n\n// we now declare a function which takes an object implementing the Foo trait\nfn some_func<T: Foo>(foo: T) {\n foo.bar();\n}\n\nfn main() {\n // we now call the method with the i32 type, which doesn't implement\n // the Foo trait\n some_func(5i32); // error: the trait bound `i32 : Foo` is not satisfied\n}\n```\n\nIn order to fix this error, verify that the type you're using does implement\nthe trait. Example:\n\n```\ntrait Foo {\n fn bar(&self);\n}\n\nfn some_func<T: Foo>(foo: T) {\n foo.bar(); // we can now use this method since i32 implements the\n // Foo trait\n}\n\n// we implement the trait on the i32 type\nimpl Foo for i32 {\n fn bar(&self) {}\n}\n\nfn main() {\n some_func(5i32); // ok!\n}\n```\n\nOr in a generic context, an erroneous code example would look like:\n\n```compile_fail,E0277\nfn some_func<T>(foo: T) {\n println!(\"{:?}\", foo); // error: the trait `core::fmt::Debug` is not\n // implemented for the type `T`\n}\n\nfn main() {\n // We now call the method with the i32 type,\n // which *does* implement the Debug trait.\n some_func(5i32);\n}\n```\n\nNote that the error here is in the definition of the generic function: Although\nwe only call it with a parameter that does implement `Debug`, the compiler\nstill rejects the function: It must work with all possible input types. In\norder to make this example compile, we need to restrict the generic type we're\naccepting:\n\n```\nuse std::fmt;\n\n// Restrict the input type to types that implement Debug.\nfn some_func<T: fmt::Debug>(foo: T) {\n println!(\"{:?}\", foo);\n}\n\nfn main() {\n // Calling the method is still fine, as i32 implements Debug.\n some_func(5i32);\n\n // This would fail to compile now:\n // struct WithoutDebug;\n // some_func(WithoutDebug);\n}\n```\n\nRust only looks at the signature of the called function, as such it must\nalready specify all requirements that will be used for every type parameter.\n"},"level":"error","spans":[{"file_name":"json_error_demo.rs","byte_start":50,"byte_end":51,"line_start":4,"line_end":4,"column_start":7,"column_end":8,"is_primary":true,"text":[{"text":" a + b","highlight_start":7,"highlight_end":8}],"label":"no implementation for `{integer} + &str`","suggested_replacement":null,"suggestion_applicability":null,"expansion":null}],"children":[{"message":"the trait `std::ops::Add<&str>` is not implemented for `{integer}`","code":null,"level":"help","spans":[],"children":[],"rendered":null}],"rendered":"error[E0277]: cannot add `&str` to `{integer}`\n --> json_error_demo.rs:4:7\n |\n4 | a + b\n | ^ no implementation for `{integer} + &str`\n |\n = help: the trait `std::ops::Add<&str>` is not implemented for `{integer}`\n\n"}
{"message":"aborting due to previous error","code":null,"level":"error","spans":[],"children":[],"rendered":"error: aborting due to previous error\n\n"}
{"message":"For more information about this error, try `rustc --explain E0277`.","code":null,"level":"","spans":[],"children":[],"rendered":"For more information about this error, try `rustc --explain E0277`.\n"}
Note that the output is a series of lines, each of which is a JSON
object, but the series of lines taken together is, unfortunately, not
valid JSON, thwarting tools and tricks (such as piping to python3 -m json.tool
)
that require such. (One speculates that this was intentional for LSP
performance purposes, so that each line/object can be sent to RLS as
it is flushed?)
Also note the "rendered" field, which contains the "human" output as a string; this was introduced so that UI tests could both make use of the structured JSON and see the "human" output (well, sans colors) without having to compile everything twice.
The "human" readable and the json format emitter can be found under
librustc_errors, both were moved from the librustc_ast
crate to the
librustc_errors crate.
The JSON emitter defines its own Diagnostic
struct
(and sub-structs) for the JSON serialization. Don't confuse this with
errors::Diagnostic
!
#[rustc_on_unimplemented(...)]
The #[rustc_on_unimplemented]
attribute allows trait definitions to add specialized
notes to error messages when an implementation was expected but not found.
You can refer to the trait's generic arguments by name and to the resolved type using Self
.
For example:
#![feature(rustc_attrs)]
#[rustc_on_unimplemented="an iterator over elements of type `{A}` \
cannot be built from a collection of type `{Self}`"]
trait MyIterator<A> {
fn next(&mut self) -> A;
}
fn iterate_chars<I: MyIterator<char>>(i: I) {
// ...
}
fn main() {
iterate_chars(&[1, 2, 3][..]);
}
When the user compiles this, they will see the following;
error[E0277]: the trait bound `&[{integer}]: MyIterator<char>` is not satisfied
--> <anon>:14:5
|
14 | iterate_chars(&[1, 2, 3][..]);
| ^^^^^^^^^^^^^ an iterator over elements of type `char` cannot be built from a collection of type `&[{integer}]`
|
= help: the trait `MyIterator<char>` is not implemented for `&[{integer}]`
= note: required by `iterate_chars`
rustc_on_unimplemented
also supports advanced filtering for better targeting
of messages, as well as modifying specific parts of the error message. You
target the text of:
- the main error message (
message
) - the label (
label
) - an extra note (
note
)
For example, the following attribute
#[rustc_on_unimplemented(
message="message",
label="label",
note="note"
)]
trait MyIterator<A> {
fn next(&mut self) -> A;
}
Would generate the following output:
error[E0277]: message
--> <anon>:14:5
|
14 | iterate_chars(&[1, 2, 3][..]);
| ^^^^^^^^^^^^^ label
|
= note: note
= help: the trait `MyIterator<char>` is not implemented for `&[{integer}]`
= note: required by `iterate_chars`
To allow more targeted error messages, it is possible to filter the
application of these fields based on a variety of attributes when using
on
:
crate_local
: whether the code causing the trait bound to not be fulfilled is part of the user's crate. This is used to avoid suggesting code changes that would require modifying a dependency.- Any of the generic arguments that can be substituted in the text can be
referred by name as well for filtering, like
Rhs="i32"
, except forSelf
. _Self
: to filter only on a particular calculated trait resolution, likeSelf="std::iter::Iterator<char>"
. This is needed becauseSelf
is a keyword which cannot appear in attributes.direct
: user-specified rather than derived obligation.from_method
: usable both as boolean (whether the flag is present, likecrate_local
) or matching against a particular method. Currently used fortry
.from_desugaring
: usable both as boolean (whether the flag is present) or matching against a particular desugaring. The desugaring is identified with its variant name in theDesugaringKind
enum.
For example, the Iterator
trait can be annotated in the following way:
#[rustc_on_unimplemented(
on(
_Self="&str",
note="call `.chars()` or `.as_bytes()` on `{Self}"
),
message="`{Self}` is not an iterator",
label="`{Self}` is not an iterator",
note="maybe try calling `.iter()` or a similar method"
)]
pub trait Iterator {}
Which would produce the following outputs:
error[E0277]: `Foo` is not an iterator
--> src/main.rs:4:16
|
4 | for foo in Foo {}
| ^^^ `Foo` is not an iterator
|
= note: maybe try calling `.iter()` or a similar method
= help: the trait `std::iter::Iterator` is not implemented for `Foo`
= note: required by `std::iter::IntoIterator::into_iter`
error[E0277]: `&str` is not an iterator
--> src/main.rs:5:16
|
5 | for foo in "" {}
| ^^ `&str` is not an iterator
|
= note: call `.chars()` or `.bytes() on `&str`
= help: the trait `std::iter::Iterator` is not implemented for `&str`
= note: required by `std::iter::IntoIterator::into_iter`
If you need to filter on multiple attributes, you can use all
, any
or
not
in the following way:
#[rustc_on_unimplemented(
on(
all(_Self="&str", T="std::string::String"),
note="you can coerce a `{T}` into a `{Self}` by writing `&*variable`"
)
)]
pub trait From<T>: Sized { /* ... */ }
Lints
This page documents some of the machinery around lint registration and how we run lints in the compiler.
The LintStore
is the central piece of infrastructure, around which everything
rotates. It's not available during the early parts of compilation (i.e., before
TyCtxt) in most code, as we need to fill it in with all of the lints, which can only happen after
plugin registration.
Lints vs. lint passes
There are two parts to the linting mechanism within the compiler: lints and lint passes. Unfortunately, a lot of the documentation we have refers to both of these as just "lints."
First, we have the lint declarations themselves: this is where the name and default lint level and
other metadata come from. These are normally defined by way of the declare_lint!
macro, which
boils down to a static with type &rustc::lint::Lint
. We lint against direct declarations without
the use of the macro today (though this may change in the future, as the macro is somewhat unwieldy
to add new fields to, like all macros by example).
Lint declarations don't carry any "state" - they are merely global identifers and descriptions of lints. We assert at runtime that they are not registered twice (by lint name).
Lint passes are the meat of any lint. Notably, there is not a one-to-one relationship between lints and lint passes; a lint might not have any lint pass that emits it, it could have many, or just one -- the compiler doesn't track whether a pass is in any way associated with a particular lint, and frequently lints are emitted as part of other work (e.g., type checking, etc.).
Registration
High-level overview
The lint store is created and all lints are registered during plugin registration, in
rustc_interface::register_plugins
. There are three 'sources' of lint: the internal lints, plugin
lints, and rustc_interface::Config
register_lints
. All are registered here, in
register_plugins
.
Once the registration is complete, we "freeze" the lint store by placing it in an Lrc
. Later in
the driver, it's passed into the GlobalCtxt
constructor where it lives in an immutable form from
then on.
Lints are registered via the LintStore::register_lint
function. This should
happen just once for any lint, or an ICE will occur.
Lint passes are registered separately into one of the categories (pre-expansion,
early, late, late module). Passes are registered as a closure -- i.e., impl Fn() -> Box<dyn X>
, where dyn X
is either an early or late lint pass trait
object. When we run the lint passes, we run the closure and then invoke the lint
pass methods, which take &mut self
-- lint passes can keep track of state
internally.
Internal lints
Note, these include both rustc-internal lints, and the traditional lints, like, for example the dead code lint.
These are primarily described in two places: rustc::lint::builtin
and rustc_lint::builtin
. The
first provides the definitions for the lints themselves, and the latter provides the lint pass
definitions (and implementations).
The internal lint registration happens in the rustc_lint::register_builtins
function, along with
the rustc_lint::register_internals
function. More generally, the LintStore "constructor"
function which is the way to get a LintStore
in the compiler (you should not construct it
directly) is rustc_lint::new_lint_store
; it calls the registration functions.
Plugin lints
This is one of the primary use cases remaining for plugins/drivers. Plugins are given access to the
mutable LintStore
during registration to call any functions they need on the LintStore
, just
like rustc code. Plugins are intended to declare lints with the plugin
field set to true (e.g., by
way of the declare_tool_lint!
macro), but this is purely for diagnostics and help text;
otherwise plugin lints are mostly just as first class as rustc builtin lints.
Driver lints
These are the lints provided by drivers via the rustc_interface::Config
register_lints
field,
which is a callback. Drivers should, if finding it already set, call the function currently set
within the callback they add. The best way for drivers to get access to this is by overriding the
Callbacks::config
function which gives them direct access to the Config
structure.
Compiler lint passes are combined into one pass
Within the compiler, for performance reasons, we usually do not register dozens
of lint passes. Instead, we have a single lint pass of each variety
(e.g. BuiltinCombinedModuleLateLintPass
) which will internally call all of the
individual lint passes; this is because then we get the benefits of static over
dynamic dispatch for each of the (often empty) trait methods.
Ideally, we'd not have to do this, since it certainly adds to the complexity of understanding the code. However, with the current type-erased lint store approach, it is beneficial to do so for performance reasons.
New lints being added likely want to join one of the existing declarations like
late_lint_mod_passes
in librustc_lint/lib.rs
, which would then
auto-propagate into the other.
Diagnostic Codes
We generally try assign each error message a unique code like E0123
. These
codes are defined in the compiler in the diagnostics.rs
files found in each
crate, which basically consist of macros. The codes come in two varieties: those
that have an extended write-up, and those that do not. Whenever possible, if you
are making a new code, you should write an extended write-up.
Allocating a fresh code
If you want to create a new error, you first need to find the next available code. This is a bit tricky since the codes are defined in various crates. To do it, run this obscure command:
./x.py test --stage 0 tidy
This will invoke the tidy script, which generally checks that your code obeys our coding conventions. One of those jobs is to check that diagnostic codes are indeed unique. Once it is finished with that, tidy will print out the lowest unused code:
...
tidy check (x86_64-apple-darwin)
* 470 error codes
* highest error code: E0591
...
Here we see the highest error code in use is E0591
, so we probably want
E0592
. To be sure, run rg E0592
and check, you should see no references.
Next, open src/{crate}/diagnostics.rs
within the crate where you wish to issue
the error (e.g., src/librustc_typeck/diagnostics.rs
). Ideally, you will add
the code (in its proper numerical order) into the register_long_diagnostics!
macro, sort of like this:
#![allow(unused_variables)] fn main() { register_long_diagnostics! { ... E0592: r##" Your extended error text goes here! "##, } }
But you can also add it without an extended description:
#![allow(unused_variables)] fn main() { register_diagnostics! { ... E0592, // put a description here } }
To actually issue the error, you can use the struct_span_err!
macro:
#![allow(unused_variables)] fn main() { struct_span_err!(self.tcx.sess, // some path to the session here span, // whatever span in the source you want E0592, // your new error code &format!("text of the error")) .emit() // actually issue the error }
If you want to add notes or other snippets, you can invoke methods before you
call .emit()
:
#![allow(unused_variables)] fn main() { struct_span_err!(...) .span_label(another_span, "something to label in the source") .span_note(another_span, "some separate note, probably avoid these") .emit_() }
ICE-breakers
The ICE-breaker groups are an easy way to help out with rustc in a "piece-meal" fashion, without committing to a larger project. ICE-breaker groups are easy to join (just submit a PR!) and joining does not entail any particular commitment.
Once you join an ICE ICE-breaker group, you will be added to a list that receives pings on github whenever a new issue is found that fits the ICE-breaker group's criteria. If you are interested, you can then claim the issue and start working on it.
Of course, you don't have to wait for new issues to be tagged! If you prefer, you can use the Github label for an ICE-breaker group to search for existing issues that haven't been claimed yet.
What issues are a good fit for ICE-breaker groups?
"ICE-breaker issues" are intended to be isolated bugs of middle priority:
- By isolated, we mean that we do not expect large-scale refactoring to be required to fix the bug.
- By middle priority, we mean that we'd like to see the bug fixed, but it's not such a burning problem that we are dropping everything else to fix it. The danger with such bugs, of course, is that they can accumulate over time, and the role of the ICE-breaker groups is to try and stop that from happening!
Joining an ICE-breaker group
To join an ICE-breaker group, you just have to open a PR adding your Github username to the appropriate file in the Rust team repository. See the "example PRs" below to get a precise idea and to identify the file to edit.
Also, if you are not already a member of a Rust team then -- in addition to adding your name to the file -- you have to checkout the repository and run the following command:
cargo run add-person $your_user_name
Example PRs:
- Example of adding yourself to the LLVM ICE-breakers.
- Example of adding yourself to the Cleanup Crew ICE-breakers.
Tagging an issue for an ICE-breaker group
To tag an issue as appropriate for an ICE-breaker group, you give
rustbot a ping
command with the name of the ICE-breakers
team. For example:
@rustbot ping icebreakers-llvm
@rustbot ping icebreakers-cleanup-crew
To make these commands shorter and easier to remember, there are aliases,
defined in the triagebot.toml
file. For example:
@rustbot ping llvm
@rustbot ping cleanup
Keep in mind that these aliases are meant to make humans' life easier. They might be subject to change. If you need to ensure that a command will always be valid, prefer the full invocations over the aliases.
Note though that this should only be done by compiler team members or contributors, and is typically done as part of compiler team triage.
Cleanup Crew
Github Label: ICEBreaker-Cleanup-Crew
The "Cleanup Crew" are focused on improving bug reports. Specifically, the goal is to try to ensure that every bug report has all the information that will be needed for someone to fix it:
- a minimal, standalone example that shows the problem
- links to duplicates or related bugs
- if the bug is a regression (something that used to work, but no longer does), then a bisection to the PR or nightly that caused the regression
This kind of cleanup is invaluable in getting bugs fixed. Better still, it can be done by anybody who knows Rust, without any particularly deep knowledge of the compiler.
Let's look a bit at the workflow for doing "cleanup crew" actions.
Finding a minimal, standalone example
Here the ultimate goal is to produce an example that reproduces the same problem but without relying on any external crates. Such a test ought to contain as little code as possible, as well. This will make it much easier to isolate the problem.
However, even if the "ultimate minimal test" cannot be achieved, it's still useful to post incremental minimizations. For example, if you can eliminate some of the external dependencies, that is helpful, and so forth.
It's particularly useful to reduce to an example that works in the Rust playground, rather than requiring people to checkout a cargo build.
There are many resources for how to produce minimized test cases. Here are a few:
- The rust-reduce tool can try to reduce
code automatically.
- The C-reduce tool also works on Rust code, though it requires that you start from a single file. (XXX link to some post explaining how to do it?)
- pnkfelix's Rust Bug Minimization Patterns blog post
- This post focuses on "heavy bore" techniques, where you are starting with a large, complex cargo project that you wish to narrow down to something standalone.
Links to duplicate or related bugs
If you are on the "Cleanup Crew", you will sometimes see multiple bug reports that seem very similar. You can link one to the other just by mentioning the other bug number in a Github comment. Sometimes it is useful to close duplicate bugs. But if you do so, you should always copy any test case from the bug you are closing to the other bug that remains open, as sometimes duplicate-looking bugs will expose different facets of the same problem.
Bisecting regressions
For regressions (something that used to work, but no longer does), it is super useful if we can figure out precisely when the code stopped working. The gold standard is to be able to identify the precise PR that broke the code, so we can ping the author, but even narrowing it down to a nightly build is helpful, especially as that then gives us a range of PRs. (One other challenge is that we sometimes land "rollup" PRs, which combine multiple PRs into one.)
cargo-bisect-rustc
To help in figuring out the cause of a regression we have a tool called cargo-bisect-rustc. It will automatically download and test various builds of rustc. For recent regressions, it is even able to use the builds from our CI to track down the regression to a specific PR; for older regressions, it will simply identify a nightly.
To learn to use cargo-bisect-rustc, check out this blog
post, which gives a quick introduction to how it works. You
can also ask questions at the Zulip stream
#t-compiler/cargo-bisect-rustc
, or help in improving the tool.
identifying the range of PRs in a nightly
If the regression occurred more than 90 days ago, then cargo-bisect-rustc will not able to identify the particular PR that caused the regression, just the nightly build. In that case, we can identify the set of PRs that this corresponds to by using the git history.
The command rustc +nightly -vV
will cause rustc to output a number
of useful bits of version info, including the commit-hash
. Given the
commit-hash of two nightly versions, you can find all of PRs that have
landed in between by taking the following steps:
- Go to an update checkout of the rust-lang/rust repository
- Execute the command
git log --author=bors --format=oneline SHA1..SHA2
- This will list out all of the commits by bors, which is our merge bot
- Each commit corresponds to one PR, and information about the PR should be in the description
- Copy and paste that information into the bug report
Often, just eye-balling the PR descriptions (which are included in the
commit messages) will give you a good idea which one likely caused the
problem. But if you're unsure feel free to just ping the compiler team
(@rust-lang/compiler
) or else to ping the authors of the PR
themselves.
LLVM ICE-breakers
Github Label: ICEBreaker-LLVM
The "LLVM ICE-breakers" are focused on bugs that center around LLVM. These bugs often arise because of LLVM optimizations gone awry, or as the result of an LLVM upgrade. The goal here is:
- to determine whether the bug is a result of us generating invalid LLVM IR, or LLVM misoptimizing;
- if the former, to fix our IR;
- if the latter, to try and file a bug on LLVM (or identify an existing bug).
Helpful tips and options
The "Debugging LLVM" section of the
rustc-dev-guide gives a step-by-step process for how to help debug bugs
caused by LLVM. In particular, it discusses how to emit LLVM IR, run
the LLVM IR optimization pipeliness, and so forth. You may also find
it useful to look at the various codegen options listed under -Chelp
and the internal options under -Zhelp
-- there are a number that
pertain to LLVM (just search for LLVM).
If you do narrow to an LLVM bug
The "Debugging LLVM" section also describes what to do once you've identified the bug.
rust-lang/rust
Licenses
The rustc
compiler source and standard library are dual licensed under the Apache License v2.0 and the MIT License unless otherwise specified.
Detailed licensing information is available in the COPYRIGHT document of the rust-lang/rust
repository.
Part 2: 高层编译器架构
本指南的其余部分讨论了编译器的工作方式。 介绍了从编译器的高级结构到编译的每个阶段如何工作的所有内容。 对于对端到端编译过程感兴趣的读者,以及学习他们希望作出贡献的特定系统的读者,这些文章都应该是比较友好的。 如果有任何不清楚的地方,请随时在rustc-dev-guide仓库上提出问题,或联系编译器团队,详细见第1部分的这一章中。
在这一部分中,我们将专门研究编译器的高级体系结构。 具体来说,将研究查询系统,增量编译和interning。 这是三个影响整个编译器的总体设计选择。
Overview of the Compiler
Coming soon! Work is in progress on this chapter. See https://github.com/rust-lang/rustc-dev-guide/pull/633 for the source and the project README for local build instructions.
编译器源代码的高层次概观
Crate 结构
Rust的主要存储库由src
目录组成,该目录下有许多crate。 这些crate包含标准库和编译器的源代码。 当然,本文主要针对后者。
Rustc由许多crate组成,包括rustc_ast
,rustc
,rustc_target
,rustc_codegen
,rustc_driver
等。
每个crate的源码都可以在src/libXXX
之类的目录中找到,其中XXX是crate名称。
(注:这些crate的名称和划分不是一成不变的,可能会随着时间而改变。 目前,我们倾向于采用更细粒度的划分来帮助缩短编译时间, 尽管随着增量编译的改进,这种情况可能会发生变化。)
这些crate的依赖关系结构大致是钻石形的:
rustc_driver
/ | \
/ | \
/ | \
/ v \
rustc_codegen rustc_borrowck ... rustc_metadata
\ | /
\ | /
\ | /
\ v /
rustc
|
v
rustc_ast
/ \
/ \
rustc_span rustc_builtin_macros
在这个格的顶部的rustc_driver
crate是rust编译器的"main"函数。
它没有太多的“实际代码”,而是将其他crate中定义的所有代码绑定在一起,并定义了整个执行流程。
(但是,随着我们越来越多地向 查询模型 过渡,编译的“流程”正越来越少地集中定义。)
在另一端,rustc
crate定义了其余所有编译器中使用的通用的数据结构(例如,如何表示类型,trait和程序本身)。
它也包含一些编译器本身的代码,尽管相对有限。
最后,位于中间的凸出部分中的所有crate定义了编译器的大部分内容——它们都依赖于rustc
,
因此它们可以利用在那里定义的各种类型,并且导出rustc_driver
将根据需要调用的公共子过程
(这些crate导出的内容越来越多是“查询定义”,但这些内容将在稍后介绍)。
在rustc
下面的是构成parser和错误报告机制的各种crate。
它们也是internal部分的一部分
(尽管它们确实确实会被其他的一些crate使用;但我们希望逐渐淘汰这种做法)。
编译的主要阶段
Rust编译器目前处于过渡阶段。 它曾经是一个纯粹的“基于pass”的编译器,我们在整个程序中运行了许多pass,每个过程都进行了特定的转换。 我们正在逐步将这种基于pass的代码替换为基于按需查询的替代方案。 在查询模型中,我们自结果往回工作,执行一个query来表达我们的最终目标(例如“编译此crate”)。 该查询又可以进行其他查询(例如“为我提供crate中所有模块的列表”)。 这些查询会进行其他查询,这些查询最终会在基本操作中触底,例如解析输入,运行类型检查器等等。 这种按需模式允许我们做一些令人兴奋的事情,例如只做少量工作就能完成对单个函数的类型检查。 它还有助于增量编译。 (有关定义查询的详细信息,请查看查询模型。)
无论基于pass还是查询,编译器必须执行的基本操作都是相同的。 唯一改变的是这些操作是前后调用还是按需调用。 为了编译一个Rust crate,以下是我们采取的一般步骤:
-
Parsing 输入
- 这一步将处理
.rs
文件并产生AST(“抽象语法树”) - AST是在
src/librustc_ast/ast.rs
中定义的。 它旨在紧密地匹配Rust语言的词汇语法。
- 这一步将处理
-
名称解析,宏扩展和配置
- parse完成后,我们将递归处理AST,解析路径并扩展宏。这个过程也处理
#[cfg]
节点,因此也可能把东西从AST中剥离出来。
- parse完成后,我们将递归处理AST,解析路径并扩展宏。这个过程也处理
-
降级成HIR
- 名称解析完成后,我们将AST转换为HIR,或者说“高级中间表示”。 HIR在
src/librustc_middle/hir/
中定义;该模块还包含降级代码。 - HIR是AST的轻度简化版。它比AST进行了更多处理,并且更适合随后的分析。 它不需要匹配Rust语言的语法。
- 一个简单例子:在AST中,我们保留了用户编写的括号,
因此,即使
((1 + 2)+ 3)
和1 + 2 + 3
是等效的,它们也被解析为不同的抽象语法树。 但是,在HIR中,括号节点被删除,并且这两个表达式以相同的方式表示。
- 类型检查和后续分析
- 处理HIR的重要步骤是执行类型检查。
该过程为每个HIR表达式分配类型,并且还负责解析一些“类型相关”的路径,例如字段访问
(
x.f
——我们不知道正在访问哪个字段f
,直到我们知道“x”的类型) 和关联类型(T::Item
——在知道T
是什么之前,我们无法知道Item
是什么类型)。 - 类型检查会创建“side-tables”(
TypeckTables
),其中包括表达式的类型,方法的解析方式等。 - 经过类型检查后,我们可以进行其他分析,例如访问控制检查。
- 名称解析完成后,我们将AST转换为HIR,或者说“高级中间表示”。 HIR在
-
降级成MIR并进行后续处理
- 完成类型检查后,我们可以将HIR降低为MIR(“中级IR”),这是Rust的非常脱糖的版本,非常适合借用检查和某些高级优化。
-
转换为LLVM和LLVM优化
- 从MIR,我们可以生成LLVM IR。
- 然后LLVM会运行其各种优化,这会产生许多
.o
文件(每个“codegen单位”一个)。
-
链接
- 最后,这些
.o
文件会链接在一起。
- 最后,这些
查询: 需求驱动的编译
如编译器高级概述中所述,Rust编译器当前正在从传统的“基于pass”的设置过渡到“需求驱动”的系统。
编译器查询系统是我们新的需求驱动型组织的关键。
背后的想法很简单。 您有各种查询来计算有关输入的内容
– 例如,有一个名为type_of(def_id)
的查询,给定某项的def-id,它将计算该项的类型并将其返回给您。
查询执行是“记忆式”的 —— 因此,第一次调用查询时,它将执行计算,但是下一次,结果将从哈希表中返回。 此外,查询执行非常适合“增量计算”; 大致的想法是,当您执行查询时,可能会通过从磁盘加载存储的数据来将结果返回给您(但这是一个单独的主题,我们将不在此处进一步讨论)。
总体愿景是,最终,整个编译器控制流将由查询驱动。 实际上,将有一个顶级查询(“编译”)在一个crate上运行编译。 这反过来会要求从最底层开始的有关该crate的信息。 例如:
- 此“编译”查询可能需要获取代码生成单元列表(即需要由LLVM编译的模块)。
- 但是计算代码生成单元列表将调用一些子查询,该子查询返回Rust源代码中定义的所有模块的列表。
- 该查询会调用一些要求HIR的内容。
- 这会越来越远,直到我们完成实际的parsing。
但是,这一愿景尚未完全实现。 尽管如此,编译器的大量代码(例如,生成MIR)仍然完全像这样工作。
增量编译的详细说明
增量编译的详细说明一章提供了关于什么是查询及其工作方式的深入描述。 如果您打算编写自己的查询,那么可以读一读这一章节。
调用查询
调用查询很简单。 tcx(“类型上下文”)为每个定义的查询提供了一种方法。 因此,例如,要调用type_of
查询,只需执行以下操作:
let ty = tcx.type_of(some_def_id);
编译器如何执行查询
您可能想知道调用查询方法时会发生什么。
答案是,对于每个查询,编译器都会维护一个缓存——如果您的查询已经执行过,那么我们将简单地从缓存中复制上一次的返回值并将其返回
(因此,您应尝试确保查询的返回类型可以低成本的克隆;如有必要,请插入Rc
)。
Providers
但是,如果查询不在缓存中,则编译器将尝试找到合适的provider。 provider是已定义并链接到编译器的某个函数,其包含用于计算查询结果的代码。
Provider是按crate定义的。
编译器至少在概念上在内部维护每个crate的provider表。
目前,实际上有两组表:用于查询“本地crate”的provider(即正在编译的crate)和用于查询“外部crate”(即正在编译的crate的依赖) 的provider。
请注意,确定查询所在的crate的类型不是查询的类型,而是键。
例如,当您调用tcx.type_of(def_id)
时,它可以是本地查询或外部查询,
具体取决于def_id
所指的crate(请参阅self::keys::Key
trait 以获取有关其工作原理的更多信息)。
Provider 始终具有相同的签名:
fn provider<'tcx>(
tcx: TyCtxt<'tcx>,
key: QUERY_KEY,
) -> QUERY_RESULT {
...
}
提供者采用两个参数:tcx
和查询键。 并返回查询结果。
####如何初始化provider
创建tcx时,它的创建者会使用Providers
结构为它提供provider。
此结构是由此处的宏生成的,但基本上就是一大堆函数指针:
struct Providers {
type_of: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx>,
...
}
目前,我们为本地crate提供一份该结构的副本,为所有外部crate提供一份该结构的副本,尽管计划是最终可能为每个crate提供一份。
这些Provider
结构最终是由librustc_driver
创建并填充的,但是它是通过将工作分配给其他rustc_*
crate来完成的。
这是通过调用各种provide
函数来完成的。 这些函数看起来像这样:
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
..*providers
};
}
也就是说,他们使用一个 &mut Providers
并对其进行in place的修改。
通常我们使用上面的写法只是因为它看起来比较漂亮,但是您也可以providers.type_of = type_of
,这是等效的。
(在这里,type_of
将是一个顶层函数,如我们之前看到的那样定义。)
因此,如果我们想为其他查询添加provider,我们可以在上面的crate中将其称为“ fubar”,我们可以修改 provide()
函数如下:
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
fubar,
..*providers
};
}
fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: DefId) -> Fubar<'tcx> { ... }
注意大多数rustc_*
crate仅提供local provider。
几乎所有的extern provider都会通过rustc_metadata
crate 进行处理,后者会从crate元数据中加载信息。
但是在某些情况下,某些crate可以既提供本地也提供外部crate查询,
在这种情况下,它们会定义rustc_driver
可以调用的provide
和provide_extern
函数。
添加一种新的查询
假设您想添加一种新的查询,您该怎么做? 定义查询分为两个步骤:
- 首先,必须指定查询名称和参数; 然后,
- 您必须在需要的地方提供查询提供程序。
要指定查询名称和参数,您只需将条目添加到
src/librustc_middle/query/mod.rs
中的大型宏调用之中,类似于:
rustc_queries! {
Other {
/// Records the type of every item.
query type_of(key: DefId) -> Ty<'tcx> {
cache { key.is_local() }
}
}
...
}
查询分为几类(Other
,Codegn
,TypeChecking
等)。
每组包含一个或多个查询。 每个查询的定义都是这样分解的:
query type_of(key: DefId) -> Ty<'tcx> { ... }
^^ ^^^^^^^ ^^^^^ ^^^^^^^^ ^^^
| | | | |
| | | | 查询修饰符
| | | 查询的结果类型
| | 查询的 key 的类型
| 查询名称
query关键字
让我们一一介绍它们:
- query关键字: 表示查询定义的开始。
- **查询名称:**查询方法的名称(
tcx.type_of(..)
)。也用作将生成以表示此查询的结构的名称(ty::queries::type_of
)。 - **查询的 key 的类型:**此查询的参数类型。此类型必须实现
ty::query::keys::Key
trait,该trait定义了(例如)如何将其映射到crate,等等。 - 查询的结果类型: 此查询产生的类型。
这种类型应该(a)不使用
RefCell
或其他内部可变性模式,并且 (b)可以廉价地克隆。对于非平凡的数据类型,建议使用Interning方法或使用Rc
或Arc
。- 一个例外是
ty::steal::Steal
类型,该类型用于廉价地修改MIR。 有关更多详细信息,请参见Steal
的定义。不应该在不警告@rust-lang/compiler
的情况下添加对Steal
的新的使用。
- 一个例外是
- 查询修饰符: 各种标志和选项,可自定义查询的处理方式。
因此,要添加查询:
- 使用上述格式在
rustc_queries!
中添加一个条目。 - 通过修改适当的
provide
方法链接provider; 或根据需要添加一个新文件,并确保rustc_driver
会调用它。
查询结构体和查询描述
对于每种类型,rustc_queries
宏都会生成一个以查询命名的“查询结构体”。
此结构体是描述查询的一种占位符。 每个这样的结构都要实现self::config::QueryConfig
trait,
该trait具有与该特定查询的键/值相关的类型。
基本上,生成的代码如下所示:
// Dummy struct representing a particular kind of query:
pub struct type_of<'tcx> { data: PhantomData<&'tcx ()> }
impl<'tcx> QueryConfig for type_of<'tcx> {
type Key = DefId;
type Value = Ty<'tcx>;
const NAME: QueryName = QueryName::type_of;
const CATEGORY: ProfileCategory = ProfileCategory::Other;
}
您可能希望实现一个额外的trait,称为self::config::QueryDescription
。
这个trait是用于在发生cycle错误时使用,为查询提供一个“人类可读”的名称,以便我们可以探明在cycle发生的情况。
如果查询键是DefId
,则实现此特征是可选的,但是如果不实现它,则会得到一个相当普通的错误(“processing foo
...”)。
您可以将新的impl放入config
模块中。 他们看起来像这样:
impl<'tcx> QueryDescription for queries::type_of<'tcx> {
fn describe(tcx: TyCtxt, key: DefId) -> String {
format!("computing the type of `{}`", tcx.def_path_str(key))
}
}
另一个选择是添加desc
修饰符:
rustc_queries! {
Other {
/// Records the type of every item.
query type_of(key: DefId) -> Ty<'tcx> {
desc { |tcx| "computing the type of `{}`", tcx.def_path_str(key) }
}
}
}
rustc_queries
宏会自动生成合适的 impl
。
查询求值模型的详细介绍
本章将更深入地探讨建立在查询上的抽象模型。 它不涉及实现细节,而是尝试解释底层逻辑。 因此,这里的示例已经精简和简化,没有直接反映出编译器的内部API。
查询是什么
抽象地,我们将编译器关于给定crate的知识视为“数据库”,而查询是向编译器询问有关该问题的方式,即我们“查询”编译器的“数据库”以获取事实。
但是,此编译器数据库有一些特殊之处:它开始为空,并在执行查询时按需填充。因此,如果数据库尚不包含查询,则查询必须知道如何计算其结果。为此,它可以访问创建数据库时预先填充的其他查询和某些输入值。
因此,查询包含以下内容:
- 标识查询的名称
- 一个“键”,指定我们要查找的内容
- 一种结果类型,用于指定产生什么样的结果
- 一个 "provider",它是一个函数,用于指定如果数据库中尚不存在结果,该如何计算结果。
例如,type_of
查询的名称为type_of
,其查询键为DefId
,用于标识我们要了解其类型的项目,
结果类型为Ty<'tcx>
,并且provider是一个函数,只要向其提供查询键,它就能访问数据库其余部分,计算出该键标识的项的类型。
因此,从某种意义上说,查询只是将查询关键字映射到相应结果的函数。但是,为了使其听起来合理,我们必须应用一些限制:
- 键和结果必须是不可变的值。
- provider函数必须是纯函数,即对于相同的键,它必须始终产生相同的结果。
- provider函数的参数是键和对“查询上下文”的引用(提供对“数据库”其余部分的访问)。
该数据库是通过“懒惰地”调用查询构建的。 provider将调用其他查询,其结果或者已被缓存或者要通过调用另一个provider进行计算。 这些provider调用从概念上形成有向无环图(DAG),在其叶上是创建查询上下文时已知的输入值。
缓存/记忆化
查询调用的结果是“记忆化”的,这意味着查询上下文会将结果缓存在内部表中,并且当再次使用相同的查询键调用查询时,将从缓存中返回结果,而不是再次运行provider。
这种缓存对于提高查询引擎的效率至关重要。 没有记忆化,系统将仍然是健全的(也就是说,它将产生相同的结果),但是相同的计算将一遍又一遍地进行。
记忆化是查询提供程序必须为纯函数的主要原因之一。 如果调用提供程序函数可能对每个调用产生不同的结果(因为它访问某些全局可变状态),则我们将无法记住结果。
输入数据
当查询上下文刚刚被创建出来时,它是空的:未执行任何查询,也不可能缓存任何结果。 但是上下文已经提供了对“输入”数据的访问权限,即在创建上下文之前计算的不可变数据段,并且查询可以访问以执行其计算。 当前,此输入数据主要由HIR map,上游crate元数据和调用编译器的命令行选项组成。 将来,输入将仅包含命令行选项和源文件列表——HIR map本身将由处理这些源文件的查询提供。
没有输入,查询就没有任何用处,没有任何东西可以计算结果(请记住,查询provider只能访问其他查询和上下文,而不能访问任何其他外部状态或信息)。
对于查询provider,输入数据和其他查询的结果看起来完全相同:它只是告诉上下文“给我X的值”。因为输入数据是不可变的,所以提供者可以在不同的查询调用之间依赖于输入数据,就像查询结果一样。
一些查询的执行过程示例
这个查询调用DAG是如何形成的? 在某个时候,编译器驱动程序将创建暂时为空的查询上下文。 然后,它将从查询系统外部调用执行其任务所需的查询。 看起来类似于以下内容:
fn compile_crate() {
let cli_options = ...;
let hir_map = ...;
// Create the query context `tcx`
let tcx = TyCtxt::new(cli_options, hir_map);
// Do type checking by invoking the type check query
tcx.type_check_crate();
}
type_check_crate
查询 provider 看起来像这样:
fn type_check_crate_provider(tcx, _key: ()) {
let list_of_hir_items = tcx.hir_map.list_of_items();
for item_def_id in list_of_hir_items {
tcx.type_check_item(item_def_id);
}
}
我们看到,type_check_crate
查询访问输入数据(tcx.hir_map.list_of_items()
)并调用其他查询( type_check_item
)。
type_check_item
调用本身将访问输入数据和/或调用其他查询,因此最后,查询调用的DAG将从最初执行的节点向后构建:
(2) (1)
list_of_all_hir_items <----------------------------- type_check_crate()
|
(5) (4) (3) |
Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
| |
+-----------------+ |
| |
(7) v (6) (8) |
Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
// (x) denotes invocation order
我们还看到通常可以从缓存中读取查询结果:type_check_item(foo)
调用时已经计算出了type_of(bar)
,
因此当type_check_item(bar)
需要它时,它已经在缓存中了。
只要上下文存在,查询结果就会保留在查询上下文中。 因此,如果编译器驱动程序稍后调用另一个查询,则上面的图将仍然存在,并且已经执行的查询将不必重新执行。
环
前面我们曾说过,查询调用构成了DAG。 但是,类似如下查询的provider很容易导致形成有环图:
fn cyclic_query_provider(tcx, key) -> u32 {
// Invoke the same query with the same key again
tcx.cyclic_query(key)
}
由于查询provider是常规函数,因此其行为将与预期的一样:求值将陷入无限递归中。 这样的查询也不可能会有用。 但是,有时某些类型的无效用户输入可能导致以循环方式调用查询。 查询引擎包括对循环调用的检查,并且由于循环是不可恢复的错误,因此将中止执行,并显示尽可能可读的“cycle error”消息。
"窃取" 查询
一些查询的结果包装在Steal<T>
结构中。
这些查询的行为与常规查询完全相同,但有一个例外:它们的结果有时是从缓存中“窃取”来的,这意味着程序的其他部分正在拥有该所有权,并且该结果无法再访问。
这种窃取机制纯粹是作为性能优化而存在的,因为某些结果值的克隆成本太高(例如,函数的MIR)。 结果窃取似乎违反了查询结果必须是不可变的条件(毕竟,我们将结果值移出了缓存),但是只要无法观察到该突变就可以。这可以通过两件事来实现:
- 在结果被窃取之前,我们确保eager地运行所有可能需要读取该结果的查询。必须通过手动调用这些查询来完成此操作。
- 每当查询尝试访问被窃取的结果时,我们都会使编译器ICE,以使这种情况不会被忽略。
由于需要手动干预,因此这不是理想的工作方式,因此应谨慎使用它,并且仅在众所周知哪些查询可以访问给定结果的情况下使用。 然而,实际上,窃取并没有成为很大的维护负担。
总结一下:“窃取查询”以受控方式破坏了一些规则。但是有检查确保不会悄悄地出错。
查询的并行执行
查询模型具有一些属性,这些属性使得并行求值多个查询实际上可行,而无需花费太多精力:
- 查询provider可以访问的所有数据都是通过查询上下文访问的,因此查询上下文可以确保同步访问。
- 查询结果必须是不可变的,以便不同线程可以同时安全地使用它们。
nightly编译器已经实现了并行查询求值,如下所示:
当求值查询foo
时,foo
的缓存表被锁定。
- 如果已经有结果,我们可以克隆它,释放锁,然后就这样完成了查询。
- 如果没有缓存条目并且没有其他活动中的查询调用正在计算这一结果,则将该键标记为“进行中”,释放锁并开始求值。
- 如果正在对同一个键进行另一个查询调用,我们将释放锁,并阻塞线程,直到另一个调用计算出我们正在等待的结果。 这不会造成死锁,因为如前所述,查询调用形成了DAG。总会有一些线程可以进展。
Incremental compilation
The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. We'll start by describing a slightly simplified variant of the real thing – the "basic algorithm" – and then describe some possible improvements.
The basic algorithm
The basic algorithm is called the red-green algorithm1. The high-level idea is that, after each run of the compiler, we will save the results of all the queries that we do, as well as the query DAG. The query DAG is a DAG that indexes which queries executed which other queries. So, for example, there would be an edge from a query Q1 to another query Q2 if computing Q1 required computing Q2 (note that because queries cannot depend on themselves, this results in a DAG and not a general graph).
On the next run of the compiler, then, we can sometimes reuse these query results to avoid re-executing a query. We do this by assigning every query a color:
- If a query is colored red, that means that its result during this compilation has changed from the previous compilation.
- If a query is colored green, that means that its result is the same as the previous compilation.
There are two key insights here:
- First, if all the inputs to query Q are colored green, then the query Q must result in the same value as last time and hence need not be re-executed (or else the compiler is not deterministic).
- Second, even if some inputs to a query changes, it may be that it
still produces the same result as the previous compilation. In
particular, the query may only use part of its input.
- Therefore, after executing a query, we always check whether it produced the same result as the previous time. If it did, we can still mark the query as green, and hence avoid re-executing dependent queries.
The try-mark-green algorithm
At the core of incremental compilation is an algorithm called "try-mark-green". It has the job of determining the color of a given query Q (which must not have yet been executed). In cases where Q has red inputs, determining Q's color may involve re-executing Q so that we can compare its output, but if all of Q's inputs are green, then we can conclude that Q must be green without re-executing it or inspecting its value at all. In the compiler, this allows us to avoid deserializing the result from disk when we don't need it, and in fact enables us to sometimes skip serializing the result as well (see the refinements section below).
Try-mark-green works as follows:
- First check if the query Q was executed during the previous compilation.
- If not, we can just re-execute the query as normal, and assign it the color of red.
- If yes, then load the 'dependent queries' of Q.
- If there is a saved result, then we load the
reads(Q)
vector from the query DAG. The "reads" is the set of queries that Q executed during its execution.- For each query R in
reads(Q)
, we recursively demand the color of R using try-mark-green.- Note: it is important that we visit each node in
reads(Q)
in same order as they occurred in the original compilation. See the section on the query DAG below. - If any of the nodes in
reads(Q)
wind up colored red, then Q is dirty.- We re-execute Q and compare the hash of its result to the hash of the result from the previous compilation.
- If the hash has not changed, we can mark Q as green and return.
- Otherwise, all of the nodes in
reads(Q)
must be green. In that case, we can color Q as green and return.
- Note: it is important that we visit each node in
- For each query R in
The query DAG
The query DAG code is stored in
src/librustc_middle/dep_graph
. Construction of the DAG is done
by instrumenting the query execution.
One key point is that the query DAG also tracks ordering; that is, for each query Q, we not only track the queries that Q reads, we track the order in which they were read. This allows try-mark-green to walk those queries back in the same order. This is important because once a subquery comes back as red, we can no longer be sure that Q will continue along the same path as before. That is, imagine a query like this:
fn main_query(tcx) {
if tcx.subquery1() {
tcx.subquery2()
} else {
tcx.subquery3()
}
}
Now imagine that in the first compilation, main_query
starts by
executing subquery1
, and this returns true. In that case, the next
query main_query
executes will be subquery2
, and subquery3
will
not be executed at all.
But now imagine that in the next compilation, the input has
changed such that subquery1
returns false. In this case, subquery2
would never execute. If try-mark-green were to visit reads(main_query)
out
of order, however, it might visit subquery2
before subquery1
, and hence
execute it.
This can lead to ICEs and other problems in the compiler.
Improvements to the basic algorithm
In the description of the basic algorithm, we said that at the end of compilation we would save the results of all the queries that were performed. In practice, this can be quite wasteful – many of those results are very cheap to recompute, and serializing and deserializing them is not a particular win. In practice, what we would do is to save the hashes of all the subqueries that we performed. Then, in select cases, we also save the results.
This is why the incremental algorithm separates computing the color of a node, which often does not require its value, from computing the result of a node. Computing the result is done via a simple algorithm like so:
- Check if a saved result for Q is available. If so, compute the color of Q. If Q is green, deserialize and return the saved result.
- Otherwise, execute Q.
- We can then compare the hash of the result and color Q as green if it did not change.
Resources
The initial design document can be found at https://github.com/nikomatsakis/rustc-on-demand-incremental-design-doc/blob/master/0000-rustc-on-demand-and-incremental.md, which expands on the memoization details, provides more high-level overview and motivation for this system.
Footnotes
I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis
Incremental Compilation In Detail
The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. It relies on the fact that:
- queries are pure functions -- given the same inputs, a query will always yield the same result, and
- the query model structures compilation in an acyclic graph that makes dependencies between individual computations explicit.
This chapter will explain how we can use these properties for making things incremental and then goes on to discuss version implementation issues.
A Basic Algorithm For Incremental Query Evaluation
As explained in the query evaluation model primer, query invocations form a directed-acyclic graph. Here's the example from the previous chapter again:
list_of_all_hir_items <----------------------------- type_check_crate()
|
|
Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
| |
+-----------------+ |
| |
v |
Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
Since every access from one query to another has to go through the query context, we can record these accesses and thus actually build this dependency graph in memory. With dependency tracking enabled, when compilation is done, we know which queries were invoked (the nodes of the graph) and for each invocation, which other queries or input has gone into computing the query's result (the edges of the graph).
Now suppose we change the source code of our program so that
HIR of bar
looks different than before. Our goal is to only recompute
those queries that are actually affected by the change while re-using
the cached results of all the other queries. Given the dependency graph we can
do exactly that. For a given query invocation, the graph tells us exactly
what data has gone into computing its results, we just have to follow the
edges until we reach something that has changed. If we don't encounter
anything that has changed, we know that the query still would evaluate to
the same result we already have in our cache.
Taking the type_of(foo)
invocation from above as an example, we can check
whether the cached result is still valid by following the edges to its
inputs. The only edge leads to Hir(foo)
, an input that has not been affected
by the change. So we know that the cached result for type_of(foo)
is still
valid.
The story is a bit different for type_check_item(foo)
: We again walk the
edges and already know that type_of(foo)
is fine. Then we get to
type_of(bar)
which we have not checked yet, so we walk the edges of
type_of(bar)
and encounter Hir(bar)
which has changed. Consequently
the result of type_of(bar)
might yield a different same result than what we
have in the cache and, transitively, the result of type_check_item(foo)
might have changed too. We thus re-run type_check_item(foo)
, which in
turn will re-run type_of(bar)
, which will yield an up-to-date result
because it reads the up-to-date version of Hir(bar)
.
The Problem With The Basic Algorithm: False Positives
If you read the previous paragraph carefully you'll notice that it says that
type_of(bar)
might have changed because one of its inputs has changed.
There's also the possibility that it might still yield exactly the same
result even though its input has changed. Consider an example with a
simple query that just computes the sign of an integer:
IntValue(x) <---- sign_of(x) <--- some_other_query(x)
Let's say that IntValue(x)
starts out as 1000
and then is set to 2000
.
Even though IntValue(x)
is different in the two cases, sign_of(x)
yields
the result +
in both cases.
If we follow the basic algorithm, however, some_other_query(x)
would have to
(unnecessarily) be re-evaluated because it transitively depends on a changed
input. Change detection yields a "false positive" in this case because it has
to conservatively assume that some_other_query(x)
might be affected by that
changed input.
Unfortunately it turns out that the actual queries in the compiler are full of examples like this and small changes to the input often potentially affect very large parts of the output binaries. As a consequence, we had to make the change detection system smarter and more accurate.
Improving Accuracy: The red-green Algorithm
The "false positives" problem can be solved by interleaving change detection and query re-evaluation. Instead of walking the graph all the way to the inputs when trying to find out if some cached result is still valid, we can check if a result has actually changed after we were forced to re-evaluate it.
We call this algorithm the red-green algorithm because nodes in the dependency graph are assigned the color green if we were able to prove that its cached result is still valid and the color red if the result has turned out to be different after re-evaluating it.
The meat of red-green change tracking is implemented in the try-mark-green algorithm, that, you've guessed it, tries to mark a given node as green:
fn try_mark_green(tcx, current_node) -> bool {
// Fetch the inputs to `current_node`, i.e. get the nodes that the direct
// edges from `node` lead to.
let dependencies = tcx.dep_graph.get_dependencies_of(current_node);
// Now check all the inputs for changes
for dependency in dependencies {
match tcx.dep_graph.get_node_color(dependency) {
Green => {
// This input has already been checked before and it has not
// changed; so we can go on to check the next one
}
Red => {
// We found an input that has changed. We cannot mark
// `current_node` as green without re-running the
// corresponding query.
return false
}
Unknown => {
// This is the first time we look at this node. Let's try
// to mark it green by calling try_mark_green() recursively.
if try_mark_green(tcx, dependency) {
// We successfully marked the input as green, on to the
// next.
} else {
// We could *not* mark the input as green. This means we
// don't know if its value has changed. In order to find
// out, we re-run the corresponding query now!
tcx.run_query_for(dependency);
// Fetch and check the node color again. Running the query
// has forced it to either red (if it yielded a different
// result than we have in the cache) or green (if it
// yielded the same result).
match tcx.dep_graph.get_node_color(dependency) {
Red => {
// The input turned out to be red, so we cannot
// mark `current_node` as green.
return false
}
Green => {
// Re-running the query paid off! The result is the
// same as before, so this particular input does
// not invalidate `current_node`.
}
Unknown => {
// There is no way a node has no color after
// re-running the query.
panic!("unreachable")
}
}
}
}
}
}
// If we have gotten through the entire loop, it means that all inputs
// have turned out to be green. If all inputs are unchanged, it means
// that the query result corresponding to `current_node` cannot have
// changed either.
tcx.dep_graph.mark_green(current_node);
true
}
// Note: The actual implementation can be found in
// src/librustc_middle/dep_graph/graph.rs
By using red-green marking we can avoid the devastating cumulative effect of
having false positives during change detection. Whenever a query is executed
in incremental mode, we first check if its already green. If not, we run
try_mark_green()
on it. If it still isn't green after that, then we actually
invoke the query provider to re-compute the result.
The Real World: How Persistence Makes Everything Complicated
The sections above described the underlying algorithm for incremental compilation but because the compiler process exits after being finished and takes the query context with its result cache with it into oblivion, we have to persist data to disk, so the next compilation session can make use of it. This comes with a whole new set of implementation challenges:
- The query result cache is stored to disk, so they are not readily available for change comparison.
- A subsequent compilation session will start off with new version of the code
that has arbitrary changes applied to it. All kinds of IDs and indices that
are generated from a global, sequential counter (e.g.
NodeId
,DefId
, etc) might have shifted, making the persisted results on disk not immediately usable anymore because the same numeric IDs and indices might refer to completely new things in the new compilation session. - Persisting things to disk comes at a cost, so not every tiny piece of information should be actually cached in between compilation sessions. Fixed-sized, plain-old-data is preferred to complex things that need to run through an expensive (de-)serialization step.
The following sections describe how the compiler currently solves these issues.
A Question Of Stability: Bridging The Gap Between Compilation Sessions
As noted before, various IDs (like DefId
) are generated by the compiler in a
way that depends on the contents of the source code being compiled. ID assignment
is usually deterministic, that is, if the exact same code is compiled twice,
the same things will end up with the same IDs. However, if something
changes, e.g. a function is added in the middle of a file, there is no
guarantee that anything will have the same ID as it had before.
As a consequence we cannot represent the data in our on-disk cache the same
way it is represented in memory. For example, if we just stored a piece
of type information like TyKind::FnDef(DefId, &'tcx Substs<'tcx>)
(as we do
in memory) and then the contained DefId
points to a different function in
a new compilation session we'd be in trouble.
The solution to this problem is to find "stable" forms for IDs which remain
valid in between compilation sessions. For the most important case, DefId
s,
these are the so-called DefPath
s. Each DefId
has a
corresponding DefPath
but in place of a numeric ID, a DefPath
is based on
the path to the identified item, e.g. std::collections::HashMap
. The
advantage of an ID like this is that it is not affected by unrelated changes.
For example, one can add a new function to std::collections
but
std::collections::HashMap
would still be std::collections::HashMap
. A
DefPath
is "stable" across changes made to the source code while a DefId
isn't.
There is also the DefPathHash
which is just a 128-bit hash value of the
DefPath
. The two contain the same information and we mostly use the
DefPathHash
because it simpler to handle, being Copy
and self-contained.
This principle of stable identifiers is used to make the data in the on-disk
cache resilient to source code changes. Instead of storing a DefId
, we store
the DefPathHash
and when we deserialize something from the cache, we map the
DefPathHash
to the corresponding DefId
in the current compilation session
(which is just a simple hash table lookup).
The HirId
, used for identifying HIR components that don't have their own
DefId
, is another such stable ID. It is (conceptually) a pair of a DefPath
and a LocalId
, where the LocalId
identifies something (e.g. a hir::Expr
)
locally within its "owner" (e.g. a hir::Item
). If the owner is moved around,
the LocalId
s within it are still the same.
Checking Query Results For Changes: HashStable And Fingerprints
In order to do red-green-marking we often need to check if the result of a query has changed compared to the result it had during the previous compilation session. There are two performance problems with this though:
- We'd like to avoid having to load the previous result from disk just for doing the comparison. We already computed the new result and will use that. Also loading a result from disk will "pollute" the interners with data that is unlikely to ever be used.
- We don't want to store each and every result in the on-disk cache. For example, it would be wasted effort to persist things to disk that are already available in upstream crates.
The compiler avoids these problems by using so-called Fingerprint
s. Each time
a new query result is computed, the query engine will compute a 128 bit hash
value of the result. We call this hash value "the Fingerprint
of the query
result". The hashing is (and has to be) done "in a stable way". This means
that whenever something is hashed that might change in between compilation
sessions (e.g. a DefId
), we instead hash its stable equivalent
(e.g. the corresponding DefPath
). That's what the whole HashStable
infrastructure is for. This way Fingerprint
s computed in two
different compilation sessions are still comparable.
The next step is to store these fingerprints along with the dependency graph. This is cheap since fingerprints are just bytes to be copied. It's also cheap to load the entire set of fingerprints together with the dependency graph.
Now, when red-green-marking reaches the point where it needs to check if a result has changed, it can just compare the (already loaded) previous fingerprint to the fingerprint of the new result.
This approach works rather well but it's not without flaws:
-
There is a small possibility of hash collisions. That is, two different results could have the same fingerprint and the system would erroneously assume that the result hasn't changed, leading to a missed update.
We mitigate this risk by using a high-quality hash function and a 128 bit wide hash value. Due to these measures the practical risk of a hash collision is negligible.
-
Computing fingerprints is quite costly. It is the main reason why incremental compilation can be slower than non-incremental compilation. We are forced to use a good and thus expensive hash function, and we have to map things to their stable equivalents while doing the hashing.
A Tale Of Two DepGraphs: The Old And The New
The initial description of dependency tracking glosses over a few details that quickly become a head scratcher when actually trying to implement things. In particular it's easy to overlook that we are actually dealing with two dependency graphs: The one we built during the previous compilation session and the one that we are building for the current compilation session.
When a compilation session starts, the compiler loads the previous dependency
graph into memory as an immutable piece of data. Then, when a query is invoked,
it will first try to mark the corresponding node in the graph as green. This
means really that we are trying to mark the node in the previous dep-graph
as green that corresponds to the query key in the current session. How do we
do this mapping between current query key and previous DepNode
? The answer
is again Fingerprint
s: Nodes in the dependency graph are identified by a
fingerprint of the query key. Since fingerprints are stable across compilation
sessions, computing one in the current session allows us to find a node
in the dependency graph from the previous session. If we don't find a node with
the given fingerprint, it means that the query key refers to something that
did not yet exist in the previous session.
So, having found the dep-node in the previous dependency graph, we can look up its dependencies (i.e. also dep-nodes in the previous graph) and continue with the rest of the try-mark-green algorithm. The next interesting thing happens when we successfully marked the node as green. At that point we copy the node and the edges to its dependencies from the old graph into the new graph. We have to do this because the new dep-graph cannot not acquire the node and edges via the regular dependency tracking. The tracking system can only record edges while actually running a query -- but running the query, although we have the result already cached, is exactly what we want to avoid.
Once the compilation session has finished, all the unchanged parts have been copied over from the old into the new dependency graph, while the changed parts have been added to the new graph by the tracking system. At this point, the new graph is serialized out to disk, alongside the query result cache, and can act as the previous dep-graph in a subsequent compilation session.
Didn't You Forget Something?: Cache Promotion
The system described so far has a somewhat subtle property: If all inputs of a dep-node are green then the dep-node itself can be marked as green without computing or loading the corresponding query result. Applying this property transitively often leads to the situation that some intermediate results are never actually loaded from disk, as in the following example:
input(A) <-- intermediate_query(B) <-- leaf_query(C)
The compiler might need the value of leaf_query(C)
in order to generate some
output artifact. If it can mark leaf_query(C)
as green, it will load the
result from the on-disk cache. The result of intermediate_query(B)
is never
loaded though. As a consequence, when the compiler persists the new result
cache by writing all in-memory query results to disk, intermediate_query(B)
will not be in memory and thus will be missing from the new result cache.
If there subsequently is another compilation session that actually needs the
result of intermediate_query(B)
it will have to be re-computed even though we
had a perfectly valid result for it in the cache just before.
In order to prevent this from happening, the compiler does something called "cache promotion": Before emitting the new result cache it will walk all green dep-nodes and make sure that their query result is loaded into memory. That way the result cache doesn't unnecessarily shrink again.
Incremental Compilation and the Compiler Backend
The compiler backend, the part involving LLVM, is using the query system but it is not implemented in terms of queries itself. As a consequence it does not automatically partake in dependency tracking. However, the manual integration with the tracking system is pretty straight-forward. The compiler simply tracks what queries get invoked when generating the initial LLVM version of each codegen unit, which results in a dep-node for each of them. In subsequent compilation sessions it then tries to mark the dep-node for a CGU as green. If it succeeds it knows that the corresponding object and bitcode files on disk are still valid. If it doesn't succeed, the entire codegen unit has to be recompiled.
This is the same approach that is used for regular queries. The main differences are:
-
that we cannot easily compute a fingerprint for LLVM modules (because they are opaque C++ objects),
-
that the logic for dealing with cached values is rather different from regular queries because here we have bitcode and object files instead of serialized Rust values in the common result cache file, and
-
the operations around LLVM are so expensive in terms of computation time and memory consumption that we need to have tight control over what is executed when and what stays in memory for how long.
The query system could probably be extended with general purpose mechanisms to deal with all of the above but so far that seemed like more trouble than it would save.
Shortcomings of the Current System
There are many things that still can be improved.
Incrementality of on-disk data structures
The current system is not able to update on-disk caches and the dependency graph in-place. Instead it has to rewrite each file entirely in each compilation session. The overhead of doing so is a few percent of total compilation time.
Unnecessary data dependencies
Data structures used as query results could be factored in a way that removes edges from the dependency graph. Especially "span" information is very volatile, so including it in query result will increase the chance that that result won't be reusable. See https://github.com/rust-lang/rust/issues/47389 for more information.
Debugging and Testing Dependencies
Testing the dependency graph
There are various ways to write tests against the dependency graph.
The simplest mechanisms are the #[rustc_if_this_changed]
and
#[rustc_then_this_would_need]
annotations. These are used in compile-fail
tests to test whether the expected set of paths exist in the dependency graph.
As an example, see src/test/compile-fail/dep-graph-caller-callee.rs
.
The idea is that you can annotate a test like:
#[rustc_if_this_changed]
fn foo() { }
#[rustc_then_this_would_need(TypeckTables)] //~ ERROR OK
fn bar() { foo(); }
#[rustc_then_this_would_need(TypeckTables)] //~ ERROR no path
fn baz() { }
This will check whether there is a path in the dependency graph from Hir(foo)
to TypeckTables(bar)
. An error is reported for each
#[rustc_then_this_would_need]
annotation that indicates whether a path
exists. //~ ERROR
annotations can then be used to test if a path is found (as
demonstrated above).
Debugging the dependency graph
Dumping the graph
The compiler is also capable of dumping the dependency graph for your
debugging pleasure. To do so, pass the -Z dump-dep-graph
flag. The
graph will be dumped to dep_graph.{txt,dot}
in the current
directory. You can override the filename with the RUST_DEP_GRAPH
environment variable.
Frequently, though, the full dep graph is quite overwhelming and not particularly helpful. Therefore, the compiler also allows you to filter the graph. You can filter in three ways:
- All edges originating in a particular set of nodes (usually a single node).
- All edges reaching a particular set of nodes.
- All edges that lie between given start and end nodes.
To filter, use the RUST_DEP_GRAPH_FILTER
environment variable, which should
look like one of the following:
source_filter // nodes originating from source_filter
-> target_filter // nodes that can reach target_filter
source_filter -> target_filter // nodes in between source_filter and target_filter
source_filter
and target_filter
are a &
-separated list of strings.
A node is considered to match a filter if all of those strings appear in its
label. So, for example:
RUST_DEP_GRAPH_FILTER='-> TypeckTables'
would select the predecessors of all TypeckTables
nodes. Usually though you
want the TypeckTables
node for some particular fn, so you might write:
RUST_DEP_GRAPH_FILTER='-> TypeckTables & bar'
This will select only the predecessors of TypeckTables
nodes for functions
with bar
in their name.
Perhaps you are finding that when you change foo
you need to re-type-check
bar
, but you don't think you should have to. In that case, you might do:
RUST_DEP_GRAPH_FILTER='Hir & foo -> TypeckTables & bar'
This will dump out all the nodes that lead from Hir(foo)
to
TypeckTables(bar)
, from which you can (hopefully) see the source
of the erroneous edge.
Tracking down incorrect edges
Sometimes, after you dump the dependency graph, you will find some
path that should not exist, but you will not be quite sure how it came
to be. When the compiler is built with debug assertions, it can
help you track that down. Simply set the RUST_FORBID_DEP_GRAPH_EDGE
environment variable to a filter. Every edge created in the dep-graph
will be tested against that filter – if it matches, a bug!
is
reported, so you can easily see the backtrace (RUST_BACKTRACE=1
).
The syntax for these filters is the same as described in the previous section. However, note that this filter is applied to every edge and doesn't handle longer paths in the graph, unlike the previous section.
Example:
You find that there is a path from the Hir
of foo
to the type
check of bar
and you don't think there should be. You dump the
dep-graph as described in the previous section and open dep-graph.txt
to see something like:
Hir(foo) -> Collect(bar)
Collect(bar) -> TypeckTables(bar)
That first edge looks suspicious to you. So you set
RUST_FORBID_DEP_GRAPH_EDGE
to Hir&foo -> Collect&bar
, re-run, and
then observe the backtrace. Voila, bug fixed!
Profiling Queries
In an effort to support incremental compilation, the latest design of the Rust compiler consists of a query-based model.
The details of this model are (currently) outside the scope of this document, however, we explain some background of this model, in an effort to explain how we profile its performance. We intend this profiling effort to address issue 42678.
Quick Start
0. Enable debug assertions
./configure --enable-debug-assertions
1. Compile rustc
Compile the compiler, up to at least stage 1:
python x.py --stage 1
2. Run rustc
, with flags
Run the compiler on a source file, supplying two additional debugging flags with
-Z
:
rustc -Z profile-queries -Z incremental=cache foo.rs
Regarding the two additional parameters:
-Z profile-queries
tells the compiler to run a separate thread that profiles the queries made by the main compiler thread(s).-Z incremental=cache
tells the compiler to "cache" various files that describe the compilation dependencies, in the subdirectorycache
.
This command will generate the following files:
profile_queries.html
consists of an HTML-based representation of the trace of queries.profile_queries.counts.txt
consists of a histogram, where each histogram "bucket" is a query provider.
3. Run rustc
, with -Z time-passes
:
- This additional flag will add all timed passes to the output files mentioned above, in step 2. As described below, these passes appear visually distinct from the queries in the HTML output (they currently appear as green boxes, via CSS).
4. Inspect the output
- 4(a). Open the HTML file (
profile_queries.html
) with a browser. See this section for an explanation of this file. - 4(b). Open the data file (
profile_queries.counts.txt
) with a text editor, or spreadsheet. See this section for an explanation of this file.
Interpret the HTML Output
Example 0
The following image gives some example output, from tracing the queries of
hello_world.rs
(a single main
function, that prints "hello world"
via the
macro println!
). This image only shows a short prefix of the total output; the
actual output is much longer.
View full HTML output. Note; it could take up
to a second to properly render depending on your browser.
Here is the corresponding text output](./example-0.counts.txt).
Example 0 explanation
The trace of the queries has a formal structure; see Trace of Queries for details.
We style this formal structure as follows:
- Timed passes: Green boxes, when present (via
-Z time-passes
), represent timed passes in the compiler. In future versions, these passes may be replaced by queries, explained below. - Labels: Some green and red boxes are labeled with text. Where they are
present, the labels give the following information:
- The query's provider, sans its key and its result, which are often too long to include in these labels.
- The duration of the provider, as a fraction of the total time (for the entire trace). This fraction includes the query's entire extent (that is, the sum total of all of its sub-queries).
- Query hits: Blue dots represent query hits. They consist of leaves in the
trace's tree. (CSS class:
hit
). - Query misses: Red boxes represent query misses. They consist of internal
nodes in the trace's tree. (CSS class:
miss
). - Nesting structure: Many red boxes contain nested boxes and dots. This nesting structure reflects that some providers depend on results from other providers, which consist of their nested children.
- Some red boxes are labeled with text, and have highlighted borders (light red, and bolded). (See heuristics for details).
Heuristics
Heuristics-based CSS Classes:
-
important
-- Trace nodes areimportant
if they have an extent of 6 (or more), or they have a duration fraction of one percent (or more). These numbers are simple heuristics (currently hard-coded, but easy to modify). Important nodes are styled with textual labels, and highlighted borders (light red, and bolded). -
frac-50
,-40
, ... -- Trace nodes whose total duration (self and children) take a large fraction of the total duration, at or above 50%, 40%, and so on. We style nodes these with larger font and padding.
Interpret the Data Output
The file profile_queries.counts.txt
contains a table of information about the
queries, organized around their providers.
For each provider (or timed pass, when -Z time-passes
is present), we produce:
-
A total count --- the total number of times this provider was queried
-
A total duration --- the total number of seconds spent running this provider, including all providers it may depend on. To get a sense of this dependency structure, and inspect a more fine-grained view of these durations, see this section.
These rows are sorted by total duration, in descending order.
Counts: Example 0
The following example profile_queries.counts.txt
file results from running on
a hello world program (a single main function that uses println
to print
`"hellow world").
As explained above, the columns consist of provider/pass
, count
, duration
:
translation,1,0.891
symbol_name,2658,0.733
def_symbol_name,2556,0.268
item_attrs,5566,0.162
type_of,6922,0.117
generics_of,8020,0.084
serialize dep graph,1,0.079
relevant_trait_impls_for,50,0.063
def_span,24875,0.061
expansion,1,0.059
const checking,1,0.055
adt_def,1141,0.048
trait_impls_of,32,0.045
is_copy_raw,47,0.045
is_foreign_item,2638,0.042
fn_sig,2172,0.033
adt_dtorck_constraint,2,0.023
impl_trait_ref,2434,0.023
typeck_tables_of,29,0.022
item-bodies checking,1,0.017
typeck_item_bodies,1,0.017
is_default_impl,2320,0.017
borrow checking,1,0.014
borrowck,4,0.014
mir_validated,4,0.013
adt_destructor,10,0.012
layout_raw,258,0.010
load_dep_graph,1,0.007
item-types checking,1,0.005
mir_const,2,0.005
name resolution,1,0.004
is_object_safe,35,0.003
is_sized_raw,89,0.003
parsing,1,0.003
is_freeze_raw,11,0.001
privacy checking,1,0.001
privacy_access_levels,5,0.001
resolving dependency formats,1,0.001
adt_sized_constraint,9,0.001
wf checking,1,0.001
liveness checking,1,0.001
compute_incremental_hashes_map,1,0.001
match checking,1,0.001
type collecting,1,0.001
param_env,31,0.000
effect checking,1,0.000
trait_def,140,0.000
lowering ast -> hir,1,0.000
predicates_of,70,0.000
extern_crate,319,0.000
lifetime resolution,1,0.000
is_const_fn,6,0.000
intrinsic checking,1,0.000
translation item collection,1,0.000
impl_polarity,15,0.000
creating allocators,1,0.000
language item collection,1,0.000
crate injection,1,0.000
early lint checks,1,0.000
indexing hir,1,0.000
maybe creating a macro crate,1,0.000
coherence checking,1,0.000
optimized_mir,6,0.000
is_panic_runtime,33,0.000
associated_item_def_ids,7,0.000
needs_drop_raw,10,0.000
lint checking,1,0.000
complete gated feature checking,1,0.000
stability index,1,0.000
region_maps,11,0.000
super_predicates_of,8,0.000
coherent_trait,2,0.000
AST validation,1,0.000
loop checking,1,0.000
static item recursion checking,1,0.000
variances_of,11,0.000
associated_item,5,0.000
plugin loading,1,0.000
looking for plugin registrar,1,0.000
stability checking,1,0.000
describe_def,15,0.000
variance testing,1,0.000
codegen unit partitioning,1,0.000
looking for entry point,1,0.000
checking for inline asm in case the target doesn't support it,1,0.000
inherent_impls,1,0.000
crate_inherent_impls,1,0.000
trait_of_item,7,0.000
crate_inherent_impls_overlap_check,1,0.000
attribute checking,1,0.000
internalize symbols,1,0.000
impl wf inference,1,0.000
death checking,1,0.000
reachability checking,1,0.000
reachable_set,1,0.000
is_exported_symbol,3,0.000
is_mir_available,2,0.000
unused lib feature checking,1,0.000
maybe building test harness,1,0.000
recursion limit,1,0.000
write allocator module,1,0.000
assert dep graph,1,0.000
plugin registration,1,0.000
write metadata,1,0.000
Background
We give some background about the query model of the Rust compiler.
Def IDs
In the query model, many queries have a key that consists of a Def ID. The Rust compiler uses Def IDs to distinguish definitions in the input Rust program.
From the compiler source code (src/librustc_middle/hir/def_id.rs
):
/// A DefId identifies a particular *definition*, by combining a crate
/// index and a def index.
#[derive(Clone, Eq, Ord, PartialOrd, PartialEq, RustcEncodable, RustcDecodable, Hash, Copy)]
pub struct DefId {
pub krate: CrateNum,
pub index: DefIndex,
}
Queries
A query relates a key to a result, either by invoking a provider that computes this result, or by reusing a cached result that was provided earlier. We explain each term in more detail:
- Query Provider: Each kind of query has a pre-defined provider, which
refers to the compiler behavior that provides an answer to the query. These
providers may nest; see trace of queries for more
information about this nesting structure.
Example providers:
typeck_tables_of
-- Typecheck a Def ID; produce "tables" of type information.borrowck
-- Borrow-check a Def ID.optimized_mir
-- Generate an optimized MIR for a Def ID; produce MIR.- For more examples, see Example 0.
- Query Key: The input/arguments to the provider. Often, this consists of a particular Def ID.
- Query Result: The output of the provider.
Trace of Queries
Formally, a trace of the queries consists of a tree, where sub-trees represent sub-traces. In particular, the nesting structure of the trace of queries describes how the queries depend on one another.
Even more precisely, this tree represents a directed acyclic graph (DAG), where shared sub-graphs consist of tree nodes that occur multiple times in the tree, first as "cache misses" and later as "cache hits".
Cache hits and misses. The trace is a tree with the following possible tree nodes:
- Query, with cache miss: The query's result is unknown, and its provider runs to compute it. In this case, the dynamic extent of the query's trace consists of the traced behavior of its provider.
- Query, with cache hit: The query's result is known, and is reused; its provider does not rerun. These nodes are leaves in the trace, since they have no dynamic extent. These leaves also represent where the tree, represented as a DAG, would share a sub-graph (namely, the sub-graph of the query that was reused from the cache).
Tree node metrics. To help determine how to style this tree, we define the following tree node metrics:
- Depth: The number of ancestors of the node in its path from the tree root.
- Extent: The number of immediate children of the node.
Intuitively, a dependency tree is "good" for incremental caching when the depth and extent of each node is relatively small. It is pathological when either of these metrics grows too large. For instance, a tree node whose extent consists of 1M immediate children means that if and when this node is re-computed, all 1M children must be re-queried, at the very least (some may also require recomputation, too).
External Links
Related design ideas, and tracking issues:
- Design document: On-demand Rustc incremental design doc
- Tracking Issue: "Red/Green" dependency tracking in compiler
More discussion and issues:
How Salsa works
This chapter is based on the explanation given by Niko Matsakis in this video about Salsa.
Salsa is not used directly in rustc, but it is used extensively for rust-analyzer and may be integrated into the compiler in the future.
What is Salsa?
Salsa is a library for incremental recomputation. This means it allows reusing computations that were already done in the past to increase the efficiency of future computations.
The objectives of Salsa are:
- Provide that functionality in an automatic way, so reusing old computations is done automatically by the library
- Doing so in a "sound", or "correct", way, therefore leading to the same results as if it had been done from scratch
Salsa's actual model is much richer, allowing many kinds of inputs and many
different outputs.
For example, integrating Salsa with an IDE could mean that the inputs could be
the manifest (Cargo.toml
), entire source files (foo.rs
), snippets and so
on; the outputs of such an integration could range from a binary executable, to
lints, types (for example, if a user selects a certain variable and wishes to
see its type), completions, etc.
How does it work?
The first thing that Salsa has to do is identify the "base inputs" 1.
Then Salsa has to also identify intermediate, "derived" values, which are something that the library produces, but, for each derived value there's a "pure" function that computes the derived value.
For example, there might be a function ast(x: Path) -> AST
. The produced
AST
isn't a final value, it's an intermidiate value that the library would
use for the computation.
This means that when you try to compute with the library, Salsa is going to compute various derived values, and eventually read the input and produce the result for the asked computation.
In the course of computing, Salsa tracks which inputs were accessed and which values are derived. This information is used to determine what's going to happen when the inputs change: are the derived values still valid?
This doesn't necessarily mean that each computation downstream from the input is going to be checked, which could be costly. Salsa only needs to check each downstream computation until it finds one that isn't changed. At that point, it won't check other derived computations since they wouldn't need to change.
It's is helpful to think about this as a graph with nodes. Each derived value has a dependency on other values, which could themselves be either base or derived. Base values don't have a dependency.
I <- A <- C ...
|
J <- B <--+
When an input I
changes, the derived value A
could change. The derived
value B
, which does not depend on I
, A
, or any value derived from A
or
I
, is not subject to change. Therefore, Salsa can reuse the computation done
for B
in the past, without having to compute it again.
The computation could also terminate early. Keeping the same graph as before,
say that input I
has changed in some way (and input J
hasn't) but, when
computing A
again, it's found that A
hasn't changed from the previous
computation. This leads to an "early termination", because there's no need to
check if C
needs to change, since both C
direct inputs, A
and B
,
haven't changed.
Key Salsa concepts
Query
A query is some value that Salsa can access in the course of computation. Each query can have a number of keys (from 0 to many), and all queries have a result, akin to functions. 0-key queries are called "input" queries.
Database
The database is basically the context for the entire computation, it's meant to store Salsa's internal state, all intermediate values for each query, and anything else that the computation might need. The database must know all the queries that the library is going to do before it can be built, but they don't need to be specified in the same place.
After the database is formed, it can be accessed with queries that are very similar to functions. Since each query's result is stored in the database, when a query is invoked N times, it will return N cloned results, without having to recompute the query (unless the input has changed in such a way that it warrants recomputation).
For each input query (0-key), a "set" method is generated, allowing the user to change the output of such query, and trigger previous memoized values to be potentially invalidated.
Query Groups
A query group is a set of queries which have been defined together as a unit. The database is formed by combining query groups. Query groups are akin to "Salsa modules" 2.
A set of queries in a query group are just a set of methods in a trait.
To create a query group a trait annotated with a specific attribute
(#[salsa::query_group(...)]
) has to be created.
An argument must also be provided to said attribute as it will be used by Salsa to create a struct to be used later when the database is created.
Example input query group:
/// This attribute will process this tree, produce this tree as output, and produce
/// a bunch of intermidiate stuff that Salsa also uses. One of these things is a
/// "StorageStruct", whose name we have specified in the attribute.
///
/// This query group is a bunch of **input** queries, that do not rely on any
/// derived input.
#[salsa::query_group(InputsStorage)]
pub trait Inputs {
/// This attribute (`#[salsa::input]`) indicates that this query is a base
/// input, therefore `set_manifest` is going to be auto-generated
#[salsa::input]
fn manifest(&self) -> Manifest;
#[salsa::input]
fn source_text(&self, name: String) -> String;
}
To create a derived query group, one must specify which other query groups this one depends on by specifying them as supertraits, as seen in the following example:
/// This query group is going to contain queries that depend on derived values a
/// query group can access another query group's queries by specifying the
/// dependency as a super trait query groups can be stacked as much as needed using
/// that pattern.
#[salsa::query_group(ParserStorage)]
pub trait Parser: Inputs {
/// This query `ast` is not an input query, it's a derived query this means
/// that a definition is necessary.
fn ast(&self, name: String) -> String;
}
When creating a derived query the implementation of said query must be defined
outside the trait. The definition must take a database parameter as an impl Trait
(or dyn Trait
), where Trait
is the query group that the definition
belongs to, in addition to the other keys.
///This is going to be the definition of the `ast` query in the `Parser` trait.
///So, when the query `ast` is invoked, and it needs to be recomputed, Salsa is going to call this function
///and it's is going to give it the database as `impl Parser`.
///The function doesn't need to be aware of all the queries of all the query groups
fn ast(db: &impl Parser, name: String) -> String {
//! Note, `impl Parser` is used here but `dyn Parser` works just as well
/* code */
///By passing an `impl Parser`, this is allowed
let source_text = db.input_file(name);
/* do the actual parsing */
return ast;
}
Eventually, after all the query groups have been defined, the database can be created by declaring a struct.
To specify which query groups are going to be part of the database an attribute
(#[salsa::database(...)]
) must be added. The argument of said attribute is a
list of identifiers, specifying the query groups storages.
///This attribute specifies which query groups are going to be in the database
#[salsa::database(InputsStorage, ParserStorage)]
#[derive(Default)] //optional!
struct MyDatabase {
///You also need this one field
runtime : salsa::Runtime<MyDatabase>,
}
///And this trait has to be implemented
impl salsa::Databse for MyDatabase {
fn salsa_runtime(&self) -> &salsa::Runtime<MyDatabase> {
&self.runtime
}
}
Example usage:
fn main() {
let db = MyDatabase::default();
db.set_manifest(...);
db.set_source_text(...);
loop {
db.ast(...); //will reuse results
db.set_source_text(...);
}
}
"They are not something that you inaubible but something that you kinda get inaudible from the outside 3:23.
What is a Salsa module?
Memory Management in Rustc
Rustc tries to be pretty careful how it manages memory. The compiler allocates a lot of data structures throughout compilation, and if we are not careful, it will take a lot of time and space to do so.
One of the main way the compiler manages this is using arenas and interning.
Arenas and Interning
We create a LOT of data structures during compilation. For performance reasons,
we allocate them from a global memory pool; they are each allocated once from a
long-lived arena. This is called arena allocation. This system reduces
allocations/deallocations of memory. It also allows for easy comparison of
types for equality: for each interned type X
, we implemented PartialEq for X
, so we can just compare pointers. The CtxtInterners
type
contains a bunch of maps of interned types and the arena itself.
Example: ty::TyS
Taking the example of ty::TyS
which represents a type in the compiler (you
can read more here). Each time we want to construct a type, the
compiler doesn’t naively allocate from the buffer. Instead, we check if that
type was already constructed. If it was, we just get the same pointer we had
before, otherwise we make a fresh pointer. With this schema if we want to know
if two types are the same, all we need to do is compare the pointers which is
efficient. TyS
is carefully setup so you never construct them on the stack.
You always allocate them from this arena and you always intern them so they are
unique.
At the beginning of the compilation we make a buffer and each time we need to allocate a type we use
some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer
is 'tcx
. Our types are tied to that lifetime, so when compilation finishes all the memory related
to that buffer is freed and our 'tcx
references would be invalid.
In addition to types, there are a number of other arena-allocated data structures that you can allocate, and which are found in this module. Here are a few examples:
Substs
, allocated withmk_substs
– this will intern a slice of types, often used to specify the values to be substituted for generics (e.g.HashMap<i32, u32>
would be represented as a slice&'tcx [tcx.types.i32, tcx.types.u32]
).TraitRef
, typically passed by value – a trait reference consists of a reference to a trait along with its various type parameters (includingSelf
), likei32: Display
(here, the def-id would reference theDisplay
trait, and the substs would containi32
). Note thatdef-id
is defined and discussed in depth in theAdtDef and DefId
section.Predicate
defines something the trait system has to prove (seetraits
module).
The tcx and how it uses lifetimes
The tcx
("typing context") is the central data structure in the compiler. It is the context that
you use to perform all manner of queries. The struct TyCtxt
defines a reference to this shared
context:
tcx: TyCtxt<'tcx>
// ----
// |
// arena lifetime
As you can see, the TyCtxt
type takes a lifetime parameter. When you see a reference with a
lifetime like 'tcx
, you know that it refers to arena-allocated data (or data that lives as long as
the arenas, anyhow).
A Note On Lifetimes
The Rust compiler is a fairly large program containing lots of big data
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
references are heavily relied upon to minimize unnecessary memory use. This
manifests itself in the way people can plug into the compiler (i.e. the
driver), preferring a "push"-style API (callbacks) instead
of the more Rust-ic "pull" style (think the Iterator
trait).
Thread-local storage and interning are used a lot through the compiler to reduce
duplication while also preventing a lot of the ergonomic issues due to many
pervasive lifetimes. The rustc::ty::tls
module is used to access these
thread-locals, although you should rarely need to touch it.
Part 3: 源代码的不同表示
这部分描述了从用户那里获取原始源代码并将其转换为编译器可以轻松使用的各种形式的过程。 这些称为中间表示。
此过程首先从编译器了解用户的要求开始:分析给定的命令行参数并确定要编译的内容。
The Rustc Driver and Interface
The rustc_driver
is essentially rustc
's main()
function. It acts as
the glue for running the various phases of the compiler in the correct order,
using the interface defined in the rustc_interface
crate.
The rustc_interface
crate provides external users with an (unstable) API
for running code at particular times during the compilation process, allowing
third parties to effectively use rustc
's internals as a library for
analysing a crate or emulating the compiler in-process (e.g. the RLS or rustdoc).
For those using rustc
as a library, the rustc_interface::run_compiler()
function is the main entrypoint to the compiler. It takes a configuration for the compiler
and a closure that takes a Compiler
. run_compiler
creates a Compiler
from the
configuration and passes it to the closure. Inside the closure, you can use the Compiler
to drive queries to compile a crate and get the results. This is what the rustc_driver
does too.
You can see a minimal example of how to use rustc_interface
here.
You can see what queries are currently available through the rustdocs for Compiler
.
You can see an example of how to use them by looking at the rustc_driver
implementation,
specifically the rustc_driver::run_compiler
function (not to be confused with
rustc_interface::run_compiler
). The rustc_driver::run_compiler
function
takes a bunch of command-line args and some other configurations and
drives the compilation to completion.
rustc_driver::run_compiler
also takes a Callbacks
,
a trait that allows for custom compiler configuration,
as well as allowing some custom code run after different phases of the compilation.
Warning: By its very nature, the internal compiler APIs are always going to be unstable. That said, we do try not to break things unnecessarily.
The walking tour of rustdoc
Rustdoc actually uses the rustc internals directly. It lives in-tree with the compiler and standard library. This chapter is about how it works.
Rustdoc is implemented entirely within the crate librustdoc
. It runs
the compiler up to the point where we have an internal representation of a
crate (HIR) and the ability to run some queries about the types of items. HIR
and queries are discussed in the linked chapters.
librustdoc
performs two major steps after that to render a set of
documentation:
- "Clean" the AST into a form that's more suited to creating documentation (and slightly more resistant to churn in the compiler).
- Use this cleaned AST to render a crate's documentation, one page at a time.
Naturally, there's more than just this, and those descriptions simplify out lots of details, but that's the high-level overview.
(Side note: librustdoc
is a library crate! The rustdoc
binary is created
using the project in src/tools/rustdoc
. Note that literally all that
does is call the main()
that's in this crate's lib.rs
, though.)
Cheat sheet
- Use
./x.py build --stage 1 src/libstd src/tools/rustdoc
to make a usable rustdoc you can run on other projects.- Add
src/libtest
to be able to userustdoc --test
. - If you've used
rustup toolchain link local /path/to/build/$TARGET/stage1
previously, then after the previous build command,cargo +local doc
will Just Work.
- Add
- Use
./x.py doc --stage 1 src/libstd
to use this rustdoc to generate the standard library docs.- The completed docs will be available in
build/$TARGET/doc/std
, though the bundle is meant to be used as though you would copy out thedoc
folder to a web server, since that's where the CSS/JS and landing page are.
- The completed docs will be available in
- Most of the HTML printing code is in
html/format.rs
andhtml/render.rs
. It's in a bunch offmt::Display
implementations and supplementary functions. - The types that got
Display
impls above are defined inclean/mod.rs
, right next to the customClean
trait used to process them out of the rustc HIR. - The bits specific to using rustdoc as a test harness are in
test.rs
. - The Markdown renderer is loaded up in
html/markdown.rs
, including functions for extracting doctests from a given block of Markdown. - The tests on rustdoc output are located in
src/test/rustdoc
, where they're handled by the test runner of rustbuild and the supplementary scriptsrc/etc/htmldocck.py
. - Tests on search index generation are located in
src/test/rustdoc-js
, as a series of JavaScript files that encode queries on the standard library search index and expected results.
From crate to clean
In core.rs
are two central items: the DocContext
struct, and the run_core
function. The latter is where rustdoc calls out to rustc to compile a crate to
the point where rustdoc can take over. The former is a state container used
when crawling through a crate to gather its documentation.
The main process of crate crawling is done in clean/mod.rs
through several
implementations of the Clean
trait defined within. This is a conversion
trait, which defines one method:
pub trait Clean<T> {
fn clean(&self, cx: &DocContext) -> T;
}
clean/mod.rs
also defines the types for the "cleaned" AST used later on to
render documentation pages. Each usually accompanies an implementation of
Clean
that takes some AST or HIR type from rustc and converts it into the
appropriate "cleaned" type. "Big" items like modules or associated items may
have some extra processing in its Clean
implementation, but for the most part
these impls are straightforward conversions. The "entry point" to this module
is the impl Clean<Crate> for visit_ast::RustdocVisitor
, which is called by
run_core
above.
You see, I actually lied a little earlier: There's another AST transformation
that happens before the events in clean/mod.rs
. In visit_ast.rs
is the
type RustdocVisitor
, which actually crawls a rustc_hir::Crate
to get the first
intermediate representation, defined in doctree.rs
. This pass is mainly to
get a few intermediate wrappers around the HIR types and to process visibility
and inlining. This is where #[doc(inline)]
, #[doc(no_inline)]
, and
#[doc(hidden)]
are processed, as well as the logic for whether a pub use
should get the full page or a "Reexport" line in the module page.
The other major thing that happens in clean/mod.rs
is the collection of doc
comments and #[doc=""]
attributes into a separate field of the Attributes
struct, present on anything that gets hand-written documentation. This makes it
easier to collect this documentation later in the process.
The primary output of this process is a clean::Crate
with a tree of Items
which describe the publicly-documentable items in the target crate.
Hot potato
Before moving on to the next major step, a few important "passes" occur over
the documentation. These do things like combine the separate "attributes" into
a single string and strip leading whitespace to make the document easier on the
markdown parser, or drop items that are not public or deliberately hidden with
#[doc(hidden)]
. These are all implemented in the passes/
directory, one
file per pass. By default, all of these passes are run on a crate, but the ones
regarding dropping private/hidden items can be bypassed by passing
--document-private-items
to rustdoc. Note that unlike the previous set of AST
transformations, the passes happen on the cleaned crate.
(Strictly speaking, you can fine-tune the passes run and even add your own, but we're trying to deprecate that. If you need finer-grain control over these passes, please let us know!)
Here is current (as of this writing) list of passes:
propagate-doc-cfg
- propagates#[doc(cfg(...))]
to child items.collapse-docs
concatenates all document attributes into one document attribute. This is necessary because each line of a doc comment is given as a separate doc attribute, and this will combine them into a single string with line breaks between each attribute.unindent-comments
removes excess indentation on comments in order for markdown to like it. This is necessary because the convention for writing documentation is to provide a space between the///
or//!
marker and the text, and stripping that leading space will make the text easier to parse by the Markdown parser. (In the past, the markdown parser used was not Commonmark- compliant, which caused annoyances with extra whitespace but this seems to be less of an issue today.)strip-priv-imports
strips all private import statements (use
,extern crate
) from a crate. This is necessary because rustdoc will handle public imports by either inlining the item's documentation to the module or creating a "Reexports" section with the import in it. The pass ensures that all of these imports are actually relevant to documentation.strip-hidden
andstrip-private
strip alldoc(hidden)
and private items from the output.strip-private
impliesstrip-priv-imports
. Basically, the goal is to remove items that are not relevant for public documentation.
From clean to crate
This is where the "second phase" in rustdoc begins. This phase primarily lives
in the html/
folder, and it all starts with run()
in html/render.rs
. This
code is responsible for setting up the Context
, SharedContext
, and Cache
which are used during rendering, copying out the static files which live in
every rendered set of documentation (things like the fonts, CSS, and JavaScript
that live in html/static/
), creating the search index, and printing out the
source code rendering, before beginning the process of rendering all the
documentation for the crate.
Several functions implemented directly on Context
take the clean::Crate
and
set up some state between rendering items or recursing on a module's child
items. From here the "page rendering" begins, via an enormous write!()
call
in html/layout.rs
. The parts that actually generate HTML from the items and
documentation occurs within a series of std::fmt::Display
implementations and
functions that pass around a &mut std::fmt::Formatter
. The top-level
implementation that writes out the page body is the impl<'a> fmt::Display for Item<'a>
in html/render.rs
, which switches out to one of several item_*
functions based on the kind of Item
being rendered.
Depending on what kind of rendering code you're looking for, you'll probably
find it either in html/render.rs
for major items like "what sections should I
print for a struct page" or html/format.rs
for smaller component pieces like
"how should I print a where clause as part of some other item".
Whenever rustdoc comes across an item that should print hand-written
documentation alongside, it calls out to html/markdown.rs
which interfaces
with the Markdown parser. This is exposed as a series of types that wrap a
string of Markdown, and implement fmt::Display
to emit HTML text. It takes
special care to enable certain features like footnotes and tables and add
syntax highlighting to Rust code blocks (via html/highlight.rs
) before
running the Markdown parser. There's also a function in here
(find_testable_code
) that specifically scans for Rust code blocks so the
test-runner code can find all the doctests in the crate.
From soup to nuts
(alternate title: "An unbroken thread that stretches from those first Cell
s
to us")
It's important to note that the AST cleaning can ask the compiler for
information (crucially, DocContext
contains a TyCtxt
), but page rendering
cannot. The clean::Crate
created within run_core
is passed outside the
compiler context before being handed to html::render::run
. This means that a
lot of the "supplementary data" that isn't immediately available inside an
item's definition, like which trait is the Deref
trait used by the language,
needs to be collected during cleaning, stored in the DocContext
, and passed
along to the SharedContext
during HTML rendering. This manifests as a bunch
of shared state, context variables, and RefCell
s.
Also of note is that some items that come from "asking the compiler" don't go
directly into the DocContext
- for example, when loading items from a foreign
crate, rustdoc will ask about trait implementations and generate new Item
s
for the impls based on that information. This goes directly into the returned
Crate
rather than roundabout through the DocContext
. This way, these
implementations can be collected alongside the others, right before rendering
the HTML.
Other tricks up its sleeve
All this describes the process for generating HTML documentation from a Rust
crate, but there are couple other major modes that rustdoc runs in. It can also
be run on a standalone Markdown file, or it can run doctests on Rust code or
standalone Markdown files. For the former, it shortcuts straight to
html/markdown.rs
, optionally including a mode which inserts a Table of
Contents to the output HTML.
For the latter, rustdoc runs a similar partial-compilation to get relevant
documentation in test.rs
, but instead of going through the full clean and
render process, it runs a much simpler crate walk to grab just the
hand-written documentation. Combined with the aforementioned
"find_testable_code
" in html/markdown.rs
, it builds up a collection of
tests to run before handing them off to the libtest test runner. One notable
location in test.rs
is the function make_test
, which is where hand-written
doctests get transformed into something that can be executed.
Some extra reading about make_test
can be found
here.
Dotting i's and crossing t's
So that's rustdoc's code in a nutshell, but there's more things in the repo
that deal with it. Since we have the full compiletest
suite at hand, there's
a set of tests in src/test/rustdoc
that make sure the final HTML is what we
expect in various situations. These tests also use a supplementary script,
src/etc/htmldocck.py
, that allows it to look through the final HTML using
XPath notation to get a precise look at the output. The full description of all
the commands available to rustdoc tests is in htmldocck.py
.
In addition, there are separate tests for the search index and rustdoc's
ability to query it. The files in src/test/rustdoc-js
each contain a
different search query and the expected results, broken out by search tab.
These files are processed by a script in src/tools/rustdoc-js
and the Node.js
runtime. These tests don't have as thorough of a writeup, but a broad example
that features results in all tabs can be found in basic.js
. The basic idea is
that you match a given QUERY
with a set of EXPECTED
results, complete with
the full item path of each item.
Example: Type checking through rustc_interface
rustc_interface
allows you to interact with Rust code at various stages of compilation.
Getting the type of an expression
NOTE: For the example to compile, you will need to first run the following:
rustup component add rustc-dev
To get the type of an expression, use the global_ctxt
to get a TyCtxt
:
// In this example, config specifies the rust program: // fn main() { let message = \"Hello, world!\"; println!(\"{}\", message); } // Our goal is to get the type of the string literal "Hello, world!". // // See https://github.com/rust-lang/rustc-dev-guide/blob/master/examples/rustc-driver-example.rs for a complete example of configuring rustc_interface rustc_interface::run_compiler(config, |compiler| { compiler.enter(|queries| { // Analyze the crate and inspect the types under the cursor. queries.global_ctxt().unwrap().take().enter(|tcx| { // Every compilation contains a single crate. let krate = tcx.hir().krate(); // Iterate over the top-level items in the crate, looking for the main function. for (_, item) in &krate.items { // Use pattern-matching to find a specific node inside the main function. if let rustc_hir::ItemKind::Fn(_, _, body_id) = item.kind { let expr = &tcx.hir().body(body_id).value; if let rustc_hir::ExprKind::Block(block, _) = expr.kind { if let rustc_hir::StmtKind::Local(local) = block.stmts[0].kind { if let Some(expr) = local.init { let hir_id = expr.hir_id; // hir_id identifies the string "Hello, world!" let def_id = tcx.hir().local_def_id(item.hir_id); // def_id identifies the main function let ty = tcx.typeck_tables_of(def_id).node_type(hir_id); println!("{:?}: {:?}", expr, ty); // prints expr(HirId { owner: DefIndex(3), local_id: 4 }: "Hello, world!"): &'static str } } } } } }) }); });
符号与AST
直接使用源代码非常不方便且容易出错。 因此,在执行其他任何操作之前,我们将原始源代码转换为AST。
事实证明,做到这一点还涉及很多工作,包括词法分析,语法分析,宏展开,名称解析,条件编译,特性门控检查和AST验证。
在本章中,我们将介绍所有这些步骤。
词法分析与语法分析
词法分析和语法分析器当前正在进行大量重构,因此本章的某些部分可能已过时。
编译器要做的第一件事就是将程序(一堆Unicode字符)转换为比字符串更方便编译器使用的表示形式。 这发生在两个阶段:词法分析和语法分析。
词法分析接收字符串并将其转换为token流。
例如,a.b + c
将被转换为token a
,.
,b
,+
和c
。 该词法分析器位于librustc_lexer
中。
然后,文法分析将获取到的token流并将其转换为一种通常称为抽象语法树(AST)的结构化形式,便于编译器使用。
AST使用Span
将特定的AST节点链接回其源文本,从而镜像内存中Rust程序的结构。
AST是在librustc_ast
中定义的,
其中包括token和token流的一些定义、用于修改AST的数据结构/trait、以及编译器其他与AST相关的部分(如词法分析器和宏展开)。
文法分析器在librustc_parse
中定义,这个crate中也包含了词法分析器的高级接口以及一些在宏扩展后运行的验证例程。
特别的,rustc_parse::parser
包含文法分析器实现。
文法分析器的主要入口点是通过 parser 中的各种parse_*
函数。
它们使您可以执行以下操作,例如将SourceFile
(例如,单个文件中的源)转换为token流,
从token流创建文法分析器,然后执行文法分析器以获取Crate
( AST根节点)。
为了最大程度地减少复制的数量,StringReader
和Parser
都具有将其绑定到父ParseSess
的生命周期。它包含文法分析时所需的所有信息以及SourceMap
本身。
更多关于词法分析的信息
词法分析代码分为两个部分:
rustc_lexer
crate负责将&str
分成组成token。 尽管普遍的做法是使用程序生成基于有限状态机的词法分析器,但rustc_lexer
中的词法分析器是手写的。- 来自
librustc_ast
的StringReader
将rustc_lexer
与rustc
特定的数据结构集成在一起。 具体来说,它将Span
信息添加到rustc_lexer
返回的token和内部标识符中。
The #[test]
attribute
Today, rust programmers rely on a built in attribute called #[test]
. All
you have to do is mark a function as a test and include some asserts like so:
#[test]
fn my_test() {
assert!(2+2 == 4);
}
When this program is compiled using rustc --test
or cargo test
, it will
produce an executable that can run this, and any other test function. This
method of testing allows tests to live alongside code in an organic way. You
can even put tests inside private modules:
mod my_priv_mod {
fn my_priv_func() -> bool {}
#[test]
fn test_priv_func() {
assert!(my_priv_func());
}
}
Private items can thus be easily tested without worrying about how to expose
them to any sort of external testing apparatus. This is key to the
ergonomics of testing in Rust. Semantically, however, it's rather odd.
How does any sort of main
function invoke these tests if they're not visible?
What exactly is rustc --test
doing?
#[test]
is implemented as a syntactic transformation inside the compiler's
librustc_ast
crate. Essentially, it's a fancy macro, that
rewrites the crate in 3 steps:
Step 1: Re-Exporting
As mentioned earlier, tests can exist inside private modules, so we need a
way of exposing them to the main function, without breaking any existing
code. To that end, librustc_ast
will create local modules called
__test_reexports
that recursively reexport tests. This expansion translates
the above example into:
mod my_priv_mod {
fn my_priv_func() -> bool {}
pub fn test_priv_func() {
assert!(my_priv_func());
}
pub mod __test_reexports {
pub use super::test_priv_func;
}
}
Now, our test can be accessed as
my_priv_mod::__test_reexports::test_priv_func
. For deeper module
structures, __test_reexports
will reexport modules that contain tests, so a
test at a::b::my_test
becomes
a::__test_reexports::b::__test_reexports::my_test
. While this process seems
pretty safe, what happens if there is an existing __test_reexports
module?
The answer: nothing.
To explain, we need to understand how the AST represents
identifiers. The name of every function, variable, module, etc. is
not stored as a string, but rather as an opaque Symbol which is
essentially an ID number for each identifier. The compiler keeps a separate
hashtable that allows us to recover the human-readable name of a Symbol when
necessary (such as when printing a syntax error). When the compiler generates
the __test_reexports
module, it generates a new Symbol for the identifier,
so while the compiler-generated __test_reexports
may share a name with your
hand-written one, it will not share a Symbol. This technique prevents name
collision during code generation and is the foundation of Rust's macro
hygiene.
Step 2: Harness Generation
Now that our tests are accessible from the root of our crate, we need to do
something with them. librustc_ast
generates a module like so:
#[main]
pub fn main() {
extern crate test;
test::test_main_static(&[&path::to::test1, /*...*/]);
}
where path::to::test1
is a constant of type test::TestDescAndFn
.
While this transformation is simple, it gives us a lot of insight into how
tests are actually run. The tests are aggregated into an array and passed to
a test runner called test_main_static
. We'll come back to exactly what
TestDescAndFn
is, but for now, the key takeaway is that there is a crate
called test
that is part of Rust core, that implements all of the
runtime for testing. test
's interface is unstable, so the only stable way
to interact with it is through the #[test]
macro.
Step 3: Test Object Generation
If you've written tests in Rust before, you may be familiar with some of the
optional attributes available on test functions. For example, a test can be
annotated with #[should_panic]
if we expect the test to cause a panic. It
looks something like this:
#[test]
#[should_panic]
fn foo() {
panic!("intentional");
}
This means our tests are more than just simple functions, they have
configuration information as well. test
encodes this configuration data
into a struct called TestDesc
. For each test function in a
crate, librustc_ast
will parse its attributes and generate a TestDesc
instance. It then combines the TestDesc
and test function into the
predictably named TestDescAndFn
struct, that test_main_static
operates
on. For a given test, the generated TestDescAndFn
instance looks like so:
self::test::TestDescAndFn{
desc: self::test::TestDesc{
name: self::test::StaticTestName("foo"),
ignore: false,
should_panic: self::test::ShouldPanic::Yes,
allow_fail: false,
},
testfn: self::test::StaticTestFn(||
self::test::assert_test_result(::crate::__test_reexports::foo())),
}
Once we've constructed an array of these test objects, they're passed to the test runner via the harness generated in step 2.
Inspecting the generated code
On nightly rust, there's an unstable flag called unpretty
that you can use
to print out the module source after macro expansion:
$ rustc my_mod.rs -Z unpretty=hir
Panicking in rust
Step 1: Invocation of the panic!
macro.
There are actually two panic macros - one defined in libcore
, and one defined in libstd
.
This is due to the fact that code in libcore
can panic. libcore
is built before libstd
,
but we want panics to use the same machinery at runtime, whether they originate in libcore
or libstd
.
libcore definition of panic!
The libcore
panic!
macro eventually makes the following call (in src/libcore/panicking.rs
):
#![allow(unused_variables)] fn main() { // NOTE This function never crosses the FFI boundary; it's a Rust-to-Rust call extern "Rust" { #[lang = "panic_impl"] fn panic_impl(pi: &PanicInfo<'_>) -> !; } let pi = PanicInfo::internal_constructor(Some(&fmt), location); unsafe { panic_impl(&pi) } }
Actually resolving this goes through several layers of indirection:
-
In
src/librustc_middle/middle/weak_lang_items.rs
,panic_impl
is declared as 'weak lang item', with the symbolrust_begin_unwind
. This is used inlibrustc_typeck/collect.rs
to set the actual symbol name torust_begin_unwind
.Note that
panic_impl
is declared in anextern "Rust"
block, which means that libcore will attempt to call a foreign symbol calledrust_begin_unwind
(to be resolved at link time) -
In
src/libstd/panicking.rs
, we have this definition:
#![allow(unused_variables)] fn main() { /// Entry point of panic from the libcore crate. #[cfg(not(test))] #[panic_handler] #[unwind(allowed)] pub fn begin_panic_handler(info: &PanicInfo<'_>) -> ! { ... } }
The special panic_handler
attribute is resolved via src/librustc_middle/middle/lang_items
.
The extract
function converts the panic_handler
attribute to a panic_impl
lang item.
Now, we have a matching panic_handler
lang item in the libstd
. This function goes
through the same process as the extern { fn panic_impl }
definition in libcore
, ending
up with a symbol name of rust_begin_unwind
. At link time, the symbol reference in libcore
will be resolved to the definition of libstd
(the function called begin_panic_handler
in the
Rust source).
Thus, control flow will pass from libcore to std at runtime. This allows panics from libcore
to go through the same infrastructure that other panics use (panic hooks, unwinding, etc)
libstd implementation of panic!
This is where the actual panic-related logic begins. In src/libstd/panicking.rs
,
control passes to rust_panic_with_hook
. This method is responsible
for invoking the global panic hook, and checking for double panics. Finally,
we call __rust_start_panic
, which is provided by the panic runtime.
The call to __rust_start_panic
is very weird - it is passed a *mut &mut dyn BoxMeUp
,
converted to an usize
. Let's break this type down:
-
BoxMeUp
is an internal trait. It is implemented forPanicPayload
(a wrapper around the user-supplied payload type), and has a methodfn box_me_up(&mut self) -> *mut (dyn Any + Send)
. This method takes the user-provided payload (T: Any + Send
), boxes it, and converts the box to a raw pointer. -
When we call
__rust_start_panic
, we have an&mut dyn BoxMeUp
. However, this is a fat pointer (twice the size of ausize
). To pass this to the panic runtime across an FFI boundary, we take a mutable reference to this mutable reference (&mut &mut dyn BoxMeUp
), and convert it to a raw pointer (*mut &mut dyn BoxMeUp
). The outer raw pointer is a thin pointer, since it points to aSized
type (a mutable reference). Therefore, we can convert this thin pointer into ausize
, which is suitable for passing across an FFI boundary.
Finally, we call __rust_start_panic
with this usize
. We have now entered the panic runtime.
Step 2: The panic runtime
Rust provides two panic runtimes: libpanic_abort
and libpanic_unwind
. The user chooses
between them at build time via their Cargo.toml
libpanic_abort
is extremely simple: its implementation of __rust_start_panic
just aborts,
as you would expect.
libpanic_unwind
is the more interesting case.
In its implementation of __rust_start_panic
, we take the usize
, convert
it back to a *mut &mut dyn BoxMeUp
, dereference it, and call box_me_up
on the &mut dyn BoxMeUp
. At this point, we have a raw pointer to the payload
itself (a *mut (dyn Send + Any)
): that is, a raw pointer to the actual value
provided by the user who called panic!
.
At this point, the platform-independent code ends. We now call into
platform-specific unwinding logic (e.g libunwind
). This code is
responsible for unwinding the stack, running any 'landing pads' associated
with each frame (currently, running destructors), and transferring control
to the catch_unwind
frame.
Note that all panics either abort the process or get caught by some call to catch_unwind
:
in src/libstd/rt.rs
, the call to the user-provided main
function is wrapped in catch_unwind
.
Macro expansion
librustc_ast
,librustc_expand
, andlibrustc_builtin_macros
are all undergoing refactoring, so some of the links in this chapter may be broken.
Macro expansion happens during parsing. rustc
has two parsers, in fact: the
normal Rust parser, and the macro parser. During the parsing phase, the normal
Rust parser will set aside the contents of macros and their invocations. Later,
before name resolution, macros are expanded using these portions of the code.
The macro parser, in turn, may call the normal Rust parser when it needs to
bind a metavariable (e.g. $my_expr
) while parsing the contents of a macro
invocation. The code for macro expansion is in
src/librustc_expand/mbe/
. This chapter aims to explain how macro
expansion works.
Example
It's helpful to have an example to refer to. For the remainder of this chapter, whenever we refer to the "example definition", we mean the following:
macro_rules! printer {
(print $mvar:ident) => {
println!("{}", $mvar);
};
(print twice $mvar:ident) => {
println!("{}", $mvar);
println!("{}", $mvar);
};
}
$mvar
is called a metavariable. Unlike normal variables, rather than
binding to a value in a computation, a metavariable binds at compile time to
a tree of tokens. A token is a single "unit" of the grammar, such as an
identifier (e.g. foo
) or punctuation (e.g. =>
). There are also other
special tokens, such as EOF
, which indicates that there are no more tokens.
Token trees resulting from paired parentheses-like characters ((
...)
,
[
...]
, and {
...}
) – they include the open and close and all the tokens
in between (we do require that parentheses-like characters be balanced). Having
macro expansion operate on token streams rather than the raw bytes of a source
file abstracts away a lot of complexity. The macro expander (and much of the
rest of the compiler) doesn't really care that much about the exact line and
column of some syntactic construct in the code; it cares about what constructs
are used in the code. Using tokens allows us to care about what without
worrying about where. For more information about tokens, see the
Parsing chapter of this book.
Whenever we refer to the "example invocation", we mean the following snippet:
printer!(print foo); // Assume `foo` is a variable defined somewhere else...
The process of expanding the macro invocation into the syntax tree
println!("{}", foo)
and then expanding that into a call to Display::fmt
is
called macro expansion, and it is the topic of this chapter.
The macro parser
There are two parts to macro expansion: parsing the definition and parsing the invocations. Interestingly, both are done by the macro parser.
Basically, the macro parser is like an NFA-based regex parser. It uses an
algorithm similar in spirit to the Earley parsing
algorithm. The macro parser is
defined in src/librustc_expand/mbe/macro_parser.rs
.
The interface of the macro parser is as follows (this is slightly simplified):
fn parse_tt(
parser: &mut Cow<Parser>,
ms: &[TokenTree],
) -> NamedParseResult
We use these items in macro parser:
sess
is a "parsing session", which keeps track of some metadata. Most notably, this is used to keep track of errors that are generated so they can be reported to the user.tts
is a stream of tokens. The macro parser's job is to consume the raw stream of tokens and output a binding of metavariables to corresponding token trees.ms
a matcher. This is a sequence of token trees that we want to matchtts
against.
In the analogy of a regex parser, tts
is the input and we are matching it
against the pattern ms
. Using our examples, tts
could be the stream of
tokens containing the inside of the example invocation print foo
, while ms
might be the sequence of token (trees) print $mvar:ident
.
The output of the parser is a NamedParseResult
, which indicates which of
three cases has occurred:
- Success:
tts
matches the given matcherms
, and we have produced a binding from metavariables to the corresponding token trees. - Failure:
tts
does not matchms
. This results in an error message such as "No rule expected token blah". - Error: some fatal error has occurred in the parser. For example, this happens if there are more than one pattern match, since that indicates the macro is ambiguous.
The full interface is defined here.
The macro parser does pretty much exactly the same as a normal regex parser with
one exception: in order to parse different types of metavariables, such as
ident
, block
, expr
, etc., the macro parser must sometimes call back to the
normal Rust parser.
As mentioned above, both definitions and invocations of macros are parsed using
the macro parser. This is extremely non-intuitive and self-referential. The code
to parse macro definitions is in
src/librustc_expand/mbe/macro_rules.rs
. It defines the pattern for
matching for a macro definition as $( $lhs:tt => $rhs:tt );+
. In other words,
a macro_rules
definition should have in its body at least one occurrence of a
token tree followed by =>
followed by another token tree. When the compiler
comes to a macro_rules
definition, it uses this pattern to match the two token
trees per rule in the definition of the macro using the macro parser itself.
In our example definition, the metavariable $lhs
would match the patterns of
both arms: (print $mvar:ident)
and (print twice $mvar:ident)
. And $rhs
would match the bodies of both arms: { println!("{}", $mvar); }
and { println!("{}", $mvar); println!("{}", $mvar); }
. The parser would keep this
knowledge around for when it needs to expand a macro invocation.
When the compiler comes to a macro invocation, it parses that invocation using
the same NFA-based macro parser that is described above. However, the matcher
used is the first token tree ($lhs
) extracted from the arms of the macro
definition. Using our example, we would try to match the token stream print foo
from the invocation against the matchers print $mvar:ident
and print twice $mvar:ident
that we previously extracted from the definition. The
algorithm is exactly the same, but when the macro parser comes to a place in the
current matcher where it needs to match a non-terminal (e.g. $mvar:ident
),
it calls back to the normal Rust parser to get the contents of that
non-terminal. In this case, the Rust parser would look for an ident
token,
which it finds (foo
) and returns to the macro parser. Then, the macro parser
proceeds in parsing as normal. Also, note that exactly one of the matchers from
the various arms should match the invocation; if there is more than one match,
the parse is ambiguous, while if there are no matches at all, there is a syntax
error.
For more information about the macro parser's implementation, see the comments
in src/librustc_expand/mbe/macro_parser.rs
.
Hygiene
If you have ever used C/C++ preprocessor macros, you know that there are some annoying and hard-to-debug gotchas! For example, consider the following C code:
#define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;};
// Then, somewhere else
struct Bar {
...
};
DEFINE_FOO
Most people avoid writing C like this – and for good reason: it doesn't
compile. The struct Bar
defined by the macro clashes names with the struct Bar
defined in the code. Consider also the following example:
#define DO_FOO(x) {\
int y = 0;\
foo(x, y);\
}
// Then elsewhere
int y = 22;
DO_FOO(y);
Do you see the problem? We wanted to generate a call foo(22, 0)
, but instead
we got foo(0, 0)
because the macro defined its own y
!
These are both examples of macro hygiene issues. Hygiene relates to how to handle names defined within a macro. In particular, a hygienic macro system prevents errors due to names introduced within a macro. Rust macros are hygienic in that they do not allow one to write the sorts of bugs above.
At a high level, hygiene within the rust compiler is accomplished by keeping track of the context where a name is introduced and used. We can then disambiguate names based on that context. Future iterations of the macro system will allow greater control to the macro author to use that context. For example, a macro author may want to introduce a new name to the context where the macro was called. Alternately, the macro author may be defining a variable for use only within the macro (i.e. it should not be visible outside the macro).
In rustc, this "context" is tracked via Span
s.
TODO: what is call-site hygiene? what is def-site hygiene?
TODO
Procedural Macros
TODO
Custom Derive
TODO
TODO: maybe something about macros 2.0?
Discussion about hygiene
The rest of this chapter is a dump of a discussion between mark-i-m
and
petrochenkov
about Macro Expansion and Hygiene. I am pasting it here so that
it never gets lost until we can make it into a proper chapter.
mark-i-m: @Vadim Petrochenkov Hi :wave:
I was wondering if you would have a chance sometime in the next month or so to
just have a zulip discussion where you tell us (WG-learning) everything you
know about macros/expansion/hygiene. We were thinking this could be less formal
(and less work for you) than compiler lecture series lecture... thoughts?
mark-i-m: The goal is to fill out that long-standing gap in the rustc-dev-guide
Vadim Petrochenkov: Ok, I'm at UTC+03:00 and generally available in the
evenings (or weekends).
mark-i-m: @Vadim Petrochenkov Either of those works for me (your evenings are
about lunch time for me :) ) Is there a particular date that would work best
for you?
mark-i-m: @WG-learning Does anyone else have a preferred date?
Vadim Petrochenkov:
Is there a particular date that would work best for you?
Nah, not much difference. (If something changes for a specific day, I'll
notify.)
Santiago Pastorino: week days are better, but I'd say let's wait for @Vadim
Petrochenkov to say when they are ready for it and we can set a date
Santiago Pastorino: also, we should record this so ... I guess it doesn't
matter that much when :)
mark-i-m:
also, we should record this so ... I guess it doesn't matter that much when
:)
@Santiago Pastorino My thinking was to just use zulip, so we would have the log
mark-i-m: @Vadim Petrochenkov @WG-learning How about 2 weeks from now: July 24
at 5pm UTC time (if I did the math right, that should be evening for Vadim)
Amanjeev Sethi: i can try and do this but I am starting a new job that week so
cannot promise.
Santiago Pastorino:
Vadim Petrochenkov @WG-learning How about 2 weeks from now: July 24 at 5pm
UTC time (if I did the math right, that should be evening for Vadim)
works perfect for me
Santiago Pastorino: @mark-i-m I have access to the compiler calendar so I can
add something there
Santiago Pastorino: let me know if you want to add an event to the calendar, I
can do that
Santiago Pastorino: how long it would be?
mark-i-m:
let me know if you want to add an event to the calendar, I can do that
mark-i-m: That could be good :+1:
mark-i-m:
how long it would be?
Let's start with 30 minutes, and if we need to schedule another we cna
Vadim Petrochenkov:
5pm UTC
1-2 hours later would be better, 5pm UTC is not evening enough.
Vadim Petrochenkov: How exactly do you plan the meeting to go (aka how much do
I need to prepare)?
Santiago Pastorino:
5pm UTC
1-2 hours later would be better, 5pm UTC is not evening enough.
Scheduled for 7pm UTC then
Santiago Pastorino:
How exactly do you plan the meeting to go (aka how much do I need to
prepare)?
/cc @mark-i-m
mark-i-m: @Vadim Petrochenkov
How exactly do you plan the meeting to go (aka how much do I need to
prepare)?
My hope was that this could be less formal than for a compiler lecture series,
but it would be nice if you could have in your mind a tour of the design and
the code
That is, imagine that a new person was joining the compiler team and needed to
get up to speed about macros/expansion/hygiene. What would you tell such a
person?
mark-i-m: @Vadim Petrochenkov Are we still on for tomorrow at 7pm UTC?
Vadim Petrochenkov: Yes.
Santiago Pastorino: @Vadim Petrochenkov @mark-i-m I've added an event on rust
compiler team calendar
mark-i-m: @WG-learning @Vadim Petrochenkov Hello!
mark-i-m: We will be starting in ~7 minutes
mark-i-m: :wave:
Vadim Petrochenkov: I'm here.
mark-i-m: Cool :)
Santiago Pastorino: hello @Vadim Petrochenkov
mark-i-m: Shall we start?
mark-i-m: First off, @Vadim Petrochenkov Thanks for doing this!
Vadim Petrochenkov: Here's some preliminary data I prepared.
Vadim Petrochenkov: Below I'll assume #62771 and #62086 has landed.
Vadim Petrochenkov: Where to find the code: librustc_span/hygiene.rs -
structures related to hygiene and expansion that are kept in global data (can
be accessed from any Ident without any context) librustc_span/lib.rs - some
secondary methods like macro backtrace using primary methods from hygiene.rs
librustc_builtin_macros - implementations of built-in macros (including macro attributes
and derives) and some other early code generation facilities like injection of
standard library imports or generation of test harness. librustc_ast/config.rs -
implementation of cfg/cfg_attr (they treated specially from other macros),
should probably be moved into librustc_ast/ext. librustc_ast/tokenstream.rs +
librustc_ast/parse/token.rs - structures for compiler-side tokens, token trees,
and token streams. librustc_ast/ext - various expansion-related stuff
librustc_ast/ext/base.rs - basic structures used by expansion
librustc_ast/ext/expand.rs - some expansion structures and the bulk of expansion
infrastructure code - collecting macro invocations, calling into resolve for
them, calling their expanding functions, and integrating the results back into
AST librustc_ast/ext/placeholder.rs - the part of expand.rs responsible for
"integrating the results back into AST" basicallly, "placeholder" is a
temporary AST node replaced with macro expansion result nodes
librustc_ast/ext/builer.rs - helper functions for building AST for built-in macros
in librustc_builtin_macros (and user-defined syntactic plugins previously), can probably
be moved into librustc_builtin_macros these days librustc_ast/ext/proc_macro.rs +
librustc_ast/ext/proc_macro_server.rs - interfaces between the compiler and the
stable proc_macro library, converting tokens and token streams between the two
representations and sending them through C ABI librustc_ast/ext/tt -
implementation of macro_rules, turns macro_rules DSL into something with
signature Fn(TokenStream) -> TokenStream that can eat and produce tokens,
@mark-i-m knows more about this librustc_resolve/macros.rs - resolving macro
paths, validating those resolutions, reporting various "not found"/"found, but
it's unstable"/"expected x, found y" errors librustc_middle/hir/map/def_collector.rs +
librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly
expanded from a macro into various parent/child structures like module
hierarchy or "definition paths"
Primary structures: HygieneData - global piece of data containing hygiene and
expansion info that can be accessed from any Ident without any context ExpnId -
ID of a macro call or desugaring (and also expansion of that call/desugaring,
depending on context) ExpnInfo/InternalExpnData - a subset of properties from
both macro definition and macro call available through global data
SyntaxContext - ID of a chain of nested macro definitions (identified by
ExpnIds) SyntaxContextData - data associated with the given SyntaxContext,
mostly a cache for results of filtering that chain in different ways Span - a
code location + SyntaxContext Ident - interned string (Symbol) + Span, i.e. a
string with attached hygiene data TokenStream - a collection of TokenTrees
TokenTree - a token (punctuation, identifier, or literal) or a delimited group
(anything inside ()/[]/{}) SyntaxExtension - a lowered macro representation,
contains its expander function transforming a tokenstream or AST into
tokenstream or AST + some additional data like stability, or a list of unstable
features allowed inside the macro. SyntaxExtensionKind - expander functions
may have several different signatures (take one token stream, or two, or a
piece of AST, etc), this is an enum that lists them
ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing
the expander signatures (TODO: change and rename the signatures into something
more consistent) trait Resolver - a trait used to break crate dependencies (so
resolver services can be used in librustc_ast, despite librustc_resolve and pretty
much everything else depending on librustc_ast) ExtCtxt/ExpansionData - various
intermediate data kept and used by expansion infra in the process of its work
AstFragment - a piece of AST that can be produced by a macro (may include
multiple homogeneous AST nodes, like e.g. a list of items) Annotatable - a
piece of AST that can be an attribute target, almost same thing as AstFragment
except for types and patterns that can be produced by macros but cannot be
annotated with attributes (TODO: Merge into AstFragment) trait MacResult - a
"polymorphic" AST fragment, something that can turn into a different
AstFragment depending on its context (aka AstFragmentKind - item, or
expression, or pattern etc.) Invocation/InvocationKind - a structure describing
a macro call, these structures are collected by the expansion infra
(InvocationCollector), queued, resolved, expanded when resolved, etc.
Primary algorithms / actions: TODO
mark-i-m: Very useful :+1:
mark-i-m: @Vadim Petrochenkov Zulip doesn't have an indication of typing, so
I'm not sure if you are waiting for me or not
Vadim Petrochenkov: The TODO part should be about how a crate transitions from
the state "macros exist as written in source" to "all macros are expanded", but
I didn't write it yet.
Vadim Petrochenkov: (That should probably better happen off-line.)
Vadim Petrochenkov: Now, if you have any questions?
mark-i-m: Thanks :)
mark-i-m: /me is still reading :P
mark-i-m: Ok
mark-i-m: So I guess my first question is about hygiene, since that remains the
most mysterious to me... My understanding is that the parser outputs AST nodes,
where each node has a Span
mark-i-m: In the absence of macros and desugaring, what does the syntax context
of an AST node look like?
mark-i-m: @Vadim Petrochenkov
Vadim Petrochenkov: Not each node, but many of them. When a node is not
macro-expanded, its context is 0.
Vadim Petrochenkov: aka SyntaxContext::empty()
Vadim Petrochenkov: it's a chain that consists of one expansion - expansion 0
aka ExpnId::root.
mark-i-m: Do all expansions start at root?
Vadim Petrochenkov: Also, SyntaxContext:empty() is its own father.
mark-i-m: Is this actually stored somewhere or is it a logical value?
Vadim Petrochenkov: All expansion hyerarchies (there are several of them) start
at ExpnId::root.
Vadim Petrochenkov: Vectors in HygieneData has entries for both ctxt == 0 and
expn_id == 0.
Vadim Petrochenkov: I don't think anyone looks into them much though.
mark-i-m: Ok
Vadim Petrochenkov: Speaking of multiple hierarchies...
mark-i-m: Go ahead :)
Vadim Petrochenkov: One is parent (expn_id1) -> parent(expn_id2) -> ...
Vadim Petrochenkov: This is the order in which macros are expanded.
Vadim Petrochenkov: Well.
Vadim Petrochenkov: When we are expanding one macro another macro is revealed
in its output.
Vadim Petrochenkov: That's the parent-child relation in this hierarchy.
Vadim Petrochenkov: InternalExpnData::parent is the child->parent link.
mark-i-m: So in the above chain expn_id1 is the child?
Vadim Petrochenkov: Yes.
Vadim Petrochenkov: The second one is parent (SyntaxContext1) ->
parent(SyntaxContext2) -> ...
Vadim Petrochenkov: This is about nested macro definitions. When we are
expanding one macro another macro definition is revealed in its output.
Vadim Petrochenkov: SyntaxContextData::parent is the child->parent link here.
Vadim Petrochenkov: So, SyntaxContext is the whole chain in this hierarchy, and
outer_expns are individual elements in the chain.
mark-i-m: So for example, suppose I have the following:
macro_rules! foo { () => { println!(); } }
fn main() { foo!(); }
Then AST nodes that are finally generated would have parent(expn_id_println) ->
parent(expn_id_foo), right?
Vadim Petrochenkov: Pretty common construction (at least it was, before
refactorings) is SyntaxContext::empty().apply_mark(expn_id), which means...
Vadim Petrochenkov:
Then AST nodes that are finally generated would have
parent(expn_id_println) -> parent(expn_id_foo), right?
Yes.
mark-i-m:
and outer_expns are individual elements in the chain.
Sorry, what is outer_expns?
Vadim Petrochenkov: SyntaxContextData::outer_expn
mark-i-m: Thanks :) Please continue
Vadim Petrochenkov: ...which means a token produced by a built-in macro (which
is defined in the root effectively).
mark-i-m: Where does the expn_id come from?
Vadim Petrochenkov: Or a stable proc macro, which are always considered to be
defined in the root because they are always cross-crate, and we don't have the
cross-crate hygiene implemented, ha-ha.
Vadim Petrochenkov:
Where does the expn_id come from?
Vadim Petrochenkov: ID of the built-in macro call like line!().
Vadim Petrochenkov: Assigned continuously from 0 to N as soon as we discover
new macro calls.
mark-i-m: Sorry, I didn't quite understand. Do you mean that only built-in
macros receive continuous IDs?
Vadim Petrochenkov: So, the second hierarchy has a catch - the context
transplantation hack -
https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732.
Vadim Petrochenkov:
Do you mean that only built-in macros receive continuous IDs?
Vadim Petrochenkov: No, all macro calls receive ID.
Vadim Petrochenkov: Built-ins have the typical pattern
SyntaxContext::empty().apply_mark(expn_id) for syntax contexts produced by
them.
mark-i-m: I see, but this pattern is only used for built-ins, right?
Vadim Petrochenkov: And also all stable proc macros, see the comments above.
mark-i-m: Got it
Vadim Petrochenkov: The third hierarchy is call-site hierarchy.
Vadim Petrochenkov: If foo!(bar!(ident)) expands into ident
Vadim Petrochenkov: then hierarchy 1 is root -> foo -> bar -> ident
Vadim Petrochenkov: but hierarchy 3 is root -> ident
Vadim Petrochenkov: ExpnInfo::call_site is the child-parent link in this case.
mark-i-m: When we expand, do we expand foo first or bar? Why is there a
hierarchy 1 here? Is that foo expands first and it expands to something that
contains bar!(ident)?
Vadim Petrochenkov: Ah, yes, let's assume both foo and bar are identity macros.
Vadim Petrochenkov: Then foo!(bar!(ident)) -> expand -> bar!(ident) -> expand
-> ident
Vadim Petrochenkov: If bar were expanded first, that would be eager expansion -
https://github.com/rust-lang/rfcs/pull/2320.
mark-i-m: And after we expand only foo! presumably whatever intermediate state
has heirarchy 1 of root->foo->(bar_ident), right?
Vadim Petrochenkov: (We have it hacked into some built-in macros, but not
generally.)
Vadim Petrochenkov:
And after we expand only foo! presumably whatever intermediate state has
heirarchy 1 of root->foo->(bar_ident), right?
Vadim Petrochenkov: Yes.
mark-i-m: Got it :)
mark-i-m: It looks like we have ~5 minutes left. This has been very helpful
already, but I also have more questions. Shall we try to schedule another
meeting in the future?
Vadim Petrochenkov: Sure, why not.
Vadim Petrochenkov: A thread for offline questions-answers would be good too.
mark-i-m:
A thread for offline questions-answers would be good too.
I don't mind using this thread, since it already has a lot of info in it. We
also plan to summarize the info from this thread into the rustc-dev-guide.
Sure, why not.
Unfortunately, I'm unavailable for a few weeks. Would August 21-ish work for
you (and @WG-learning )?
mark-i-m: @Vadim Petrochenkov Thanks very much for your time and knowledge!
mark-i-m: One last question: are there more hierarchies?
Vadim Petrochenkov: Not that I know of. Three + the context transplantation
hack is already more complex than I'd like.
mark-i-m: Yes, one wonders what it would be like if one also had to think about
eager expansion...
Santiago Pastorino: sorry but I couldn't follow that much today, will read it
when I have some time later
Santiago Pastorino: btw https://github.com/rust-lang/rustc-dev-guide/issues/398
mark-i-m: @Vadim Petrochenkov Would 7pm UTC on August 21 work for a followup?
Vadim Petrochenkov: Tentatively yes.
mark-i-m: @Vadim Petrochenkov @WG-learning Does this still work for everyone?
Vadim Petrochenkov: August 21 is still ok.
mark-i-m: @WG-learning @Vadim Petrochenkov We will start in ~30min
Vadim Petrochenkov: Oh. Thanks for the reminder, I forgot about this entirely.
mark-i-m: Hello!
Vadim Petrochenkov: (I'll be here in a couple of minutes.)
Vadim Petrochenkov: Ok, I'm here.
mark-i-m: Hi :)
Vadim Petrochenkov: Hi.
mark-i-m: so last time, we talked about the 3 context heirarchies
Vadim Petrochenkov: Right.
mark-i-m: Was there anything you wanted to add to that? If not, I think it
would be good to get a big-picture... Given some piece of rust code, how do we
get to the point where things are expanded and hygiene context is computed?
mark-i-m: (I'm assuming that hygiene info is computed as we expand stuff, since
I don't think you can discover it beforehand)
Vadim Petrochenkov: Ok, let's move from hygiene to expansion.
Vadim Petrochenkov: Especially given that I don't remember the specific hygiene
algorithms like adjust in detail.
Vadim Petrochenkov:
Given some piece of rust code, how do we get to the point where things are
expanded
So, first of all, the "some piece of rust code" is the whole crate.
mark-i-m: Just to confirm, the algorithms are well-encapsulated, right? Like a
function or a struct as opposed to a bunch of conventions distributed across
the codebase?
Vadim Petrochenkov: We run fully_expand_fragment in it.
Vadim Petrochenkov:
Just to confirm, the algorithms are well-encapsulated, right?
Yes, the algorithmic parts are entirely inside hygiene.rs.
Vadim Petrochenkov: Ok, some are in fn resolve_crate_root, but those are hacks.
Vadim Petrochenkov: (Continuing about expansion.) If fully_expand_fragment is
run not on a whole crate, it means that we are performing eager expansion.
Vadim Petrochenkov: Eager expansion is done for arguments of some built-in
macros that expect literals.
Vadim Petrochenkov: It generally performs a subset of actions performed by the
non-eager expansion.
Vadim Petrochenkov: So, I'll talk about non-eager expansion for now.
mark-i-m: Eager expansion is not exposed as a language feature, right? i.e. it
is not possible for me to write an eager macro?
Vadim Petrochenkov:
https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 (vvv The
link is explained below vvv )
Vadim Petrochenkov:
Eager expansion is not exposed as a language feature, right? i.e. it is not
possible for me to write an eager macro?
Yes, it's entirely an ability of some built-in macros.
Vadim Petrochenkov: Not exposed for general use.
Vadim Petrochenkov: fully_expand_fragment works in iterations.
Vadim Petrochenkov: Iterations looks roughly like this:
- Resolve imports in our partially built crate as much as possible.
- Collect as many macro invocations as possible from our partially built crate
(fn-like, attributes, derives) from the crate and add them to the queue.
Vadim Petrochenkov: Take a macro from the queue, and attempt to resolve it.
Vadim Petrochenkov: If it's resolved - run its expander function that
consumes tokens or AST and produces tokens or AST (depending on the macro
kind).
Vadim Petrochenkov: (If it's not resolved, then put it back into the
queue.)
Vadim Petrochenkov: ^^^ That's where we fill in the hygiene data associated
with ExpnIds.
mark-i-m: When we put it back in the queue?
mark-i-m: or do you mean the collect step in general?
Vadim Petrochenkov: Once we resolved the macro call to the macro definition we
know everything about the macro and can call set_expn_data to fill in its
properties in the global data.
Vadim Petrochenkov: I mean, immediately after successful resolution.
Vadim Petrochenkov: That's the first part of hygiene data, the second one is
associated with SyntaxContext rather than with ExpnId, it's filled in later
during expansion.
Vadim Petrochenkov: So, after we run the macro's expander function and got a
piece of AST (or got tokens and parsed them into a piece of AST) we need to
integrate that piece of AST into the big existing partially built AST.
Vadim Petrochenkov: This integration is a really important step where the next
things happen:
- NodeIds are assigned.
Vadim Petrochenkov: "def paths"s and their IDs (DefIds) are created
Vadim Petrochenkov: Names are put into modules from the resolver point of
view.
Vadim Petrochenkov: So, we are basically turning some vague token-like mass
into proper set in stone hierarhical AST and side tables.
Vadim Petrochenkov: Where exactly this happens - NodeIds are assigned by
InvocationCollector (which also collects new macro calls from this new AST
piece and adds them to the queue), DefIds are created by DefCollector, and
modules are filled by BuildReducedGraphVisitor.
Vadim Petrochenkov: These three passes run one after another on every AST
fragment freshly expanded from a macro.
Vadim Petrochenkov: After expanding a single macro and integrating its output
we again try to resolve all imports in the crate, and then return to the big
queue processing loop and pick up the next macro.
Vadim Petrochenkov: Repeat until there's no more macros. Vadim Petrochenkov:
mark-i-m: The integration step is where we would get parser errors too right?
mark-i-m: Also, when do we know definitively that resolution has failed for
particular ident?
Vadim Petrochenkov:
The integration step is where we would get parser errors too right?
Yes, if the macro produced tokens (rather than AST directly) and we had to
parse them.
Vadim Petrochenkov:
when do we know definitively that resolution has failed for particular
ident?
So, ident is looked up in a number of scopes during resolution. From closest
like the current block or module, to far away like preludes or built-in types.
Vadim Petrochenkov: If lookup is certainly failed in all of the scopes, then
it's certainly failed.
mark-i-m: This is after all expansions and integrations are done, right?
Vadim Petrochenkov: "Certainly" is determined differently for different scopes,
e.g. for a module scope it means no unexpanded macros and no unresolved glob
imports in that module.
Vadim Petrochenkov:
This is after all expansions and integrations are done, right?
For macro and import names this happens during expansions and integrations.
mark-i-m: Makes sense
Vadim Petrochenkov: For all other names we certainly know whether a name is
resolved successfully or not on the first attempt, because no new names can
appear.
Vadim Petrochenkov: (They are resolved in a later pass, see
librustc_resolve/late.rs.)
mark-i-m: And if at the end of the iteration, there are still things in the
queue that can't be resolve, this represents an error, right?
mark-i-m: i.e. an undefined macro?
Vadim Petrochenkov: Yes, if we make no progress during an iteration, then we
are stuck and that state represent an error.
Vadim Petrochenkov: We attempt to recover though, using dummies expanding into
nothing or ExprKind::Err or something like that for unresolved macros.
mark-i-m: This is for the purposes of diagnostics, though, right?
Vadim Petrochenkov: But if we are going through recovery, then compilation must
result in an error anyway.
Vadim Petrochenkov: Yes, that's for diagnostics, without recovery we would
stuck at the first unresolved macro or import. Vadim Petrochenkov:
So, about the SyntaxContext hygiene...
Vadim Petrochenkov: New syntax contexts are created during macro expansion.
Vadim Petrochenkov: If the token had context X before being produced by a
macro, e.g. here ident has context SyntaxContext::root(): Vadim Petrochenkov:
macro m() { ident }
Vadim Petrochenkov: , then after being produced by the macro it has context X
-> macro_id.
Vadim Petrochenkov: I.e. our ident has context ROOT -> id(m) after it's
produced by m.
Vadim Petrochenkov: The "chaining operator" -> is apply_mark in compiler code.
Vadim Petrochenkov:
macro m() { macro n() { ident } }
Vadim Petrochenkov: In this example the ident has context ROOT originally, then
ROOT -> id(m), then ROOT -> id(m) -> id(n).
Vadim Petrochenkov: Note that these chains are not entirely determined by their
last element, in other words ExpnId is not isomorphic to SyntaxCtxt.
Vadim Petrochenkov: Couterexample: Vadim Petrochenkov:
macro m($i: ident) { macro n() { ($i, bar) } }
m!(foo);
Vadim Petrochenkov: foo has context ROOT -> id(n) and bar has context ROOT ->
id(m) -> id(n) after all the expansions.
mark-i-m: Cool :)
mark-i-m: It looks like we are out of time
mark-i-m: Is there anything you wanted to add?
mark-i-m: We can schedule another meeting if you would like
Vadim Petrochenkov: Yep, 23.06 already. No, I think this is an ok point to
stop.
mark-i-m: :+1:
mark-i-m: Thanks @Vadim Petrochenkov ! This was very helpful
Vadim Petrochenkov: Yeah, we can schedule another one. So far it's been like 1
hour of meetings per month? Certainly not a big burden.
Name resolution
Basics
In our programs we can refer to variables, types, functions, etc, by giving them a name. These names are not always unique. For example, take this valid Rust program:
#![allow(unused_variables)] fn main() { type x = u32; let x: x = 1; let y: x = 2; }
How do we know on line 3 whether x
is a type (u32) or a value (1)? These
conflicts are resolved during name resolution. In this specific case, name
resolution defines that type names and variable names live in separate
namespaces and therefore can co-exist.
The name resolution in Rust is a two-phase process. In the first phase, which runs
during macro expansion, we build a tree of modules and resolve imports. Macro
expansion and name resolution communicate with each other via the
Resolver
trait.
The input to the second phase is the syntax tree, produced by parsing input files and expanding macros. This phase produces links from all the names in the source to relevant places where the name was introduced. It also generates helpful error messages, like typo suggestions, traits to import or lints about unused items.
A successful run of the second phase (Resolver::resolve_crate
) creates kind
of an index the rest of the compilation may use to ask about the present names
(through the hir::lowering::Resolver
interface).
The name resolution lives in the librustc_resolve
crate, with the meat in
lib.rs
and some helpers or symbol-type specific logic in the other modules.
Namespaces
Different kind of symbols live in different namespaces ‒ e.g. types don't clash with variables. This usually doesn't happen, because variables start with lower-case letter while types with upper case one, but this is only a convention. This is legal Rust code that'll compile (with warnings):
#![allow(unused_variables)] fn main() { type x = u32; let x: x = 1; let y: x = 2; // See? x is still a type here. }
To cope with this, and with slightly different scoping rules for these namespaces, the resolver keeps them separated and builds separate structures for them.
In other words, when the code talks about namespaces, it doesn't mean the module hierarchy, it's types vs. values vs. macros.
Scopes and ribs
A name is visible only in certain area in the source code. This forms a hierarchical structure, but not necessarily a simple one ‒ if one scope is part of another, it doesn't mean the name visible in the outer one is also visible in the inner one, or that it refers to the same thing.
To cope with that, the compiler introduces the concept of Ribs. This is abstraction of a scope. Every time the set of visible names potentially changes, a new rib is pushed onto a stack. The places where this can happen includes for example:
- The obvious places ‒ curly braces enclosing a block, function boundaries, modules.
- Introducing a let binding ‒ this can shadow another binding with the same name.
- Macro expansion border ‒ to cope with macro hygiene.
When searching for a name, the stack of ribs is traversed from the innermost outwards. This helps to find the closest meaning of the name (the one not shadowed by anything else). The transition to outer rib may also change the rules what names are usable ‒ if there are nested functions (not closures), the inner one can't access parameters and local bindings of the outer one, even though they should be visible by ordinary scoping rules. An example:
#![allow(unused_variables)] fn main() { fn do_something<T: Default>(val: T) { // <- New rib in both types and values (1) // `val` is accessible, as is the helper function // `T` is accessible let helper = || { // New rib on `helper` (2) and another on the block (3) // `val` is accessible here }; // End of (3) // `val` is accessible, `helper` variable shadows `helper` function fn helper() { // <- New rib in both types and values (4) // `val` is not accessible here, (4) is not transparent for locals) // `T` is not accessible here } // End of (4) let val = T::default(); // New rib (5) // `val` is the variable, not the parameter here } // End of (5), (2) and (1) }
Because the rules for different namespaces are a bit different, each namespace has its own independent rib stack that is constructed in parallel to the others. In addition, there's also a rib stack for local labels (e.g. names of loops or blocks), which isn't a full namespace in its own right.
Overall strategy
To perform the name resolution of the whole crate, the syntax tree is traversed top-down and every encountered name is resolved. This works for most kinds of names, because at the point of use of a name it is already introduced in the Rib hierarchy.
There are some exceptions to this. Items are bit tricky, because they can be used even before encountered ‒ therefore every block needs to be first scanned for items to fill in its Rib.
Other, even more problematic ones, are imports which need recursive fixed-point resolution and macros, that need to be resolved and expanded before the rest of the code can be processed.
Therefore, the resolution is performed in multiple stages.
TODO:
This is a result of the first pass of learning the code. It is definitely incomplete and not detailed enough. It also might be inaccurate in places. Still, it probably provides useful first guidepost to what happens in there.
- What exactly does it link to and how is that published and consumed by following stages of compilation?
- Who calls it and how it is actually used.
- Is it a pass and then the result is only used, or can it be computed incrementally (e.g. for RLS)?
- The overall strategy description is a bit vague.
- Where does the name
Rib
come from? - Does this thing have its own tests, or is it tested only as part of some e2e testing?
AST Validation
AST validation is the process of checking various correctness properties about the AST after macro expansion.
TODO: write this chapter.
Feature Gate Checking
TODO: this chapter
HIR
HIR ——“高级中间表示” ——是大多数rustc组件中使用的主要IR。
它是抽象语法树(AST)的编译器友好表示形式,该结构在语法分析,宏扩展和名称解析之后生成(有关如何创建HIR,请参见Lowering)。
HIR的许多部分都非常类似于Rust表面语法,但是Rust的某些表达形式已被“去糖”。
例如,for
循环将转换为了loop
,因此不会出现在HIR中不会出现for
。 这使HIR比普通AST更易于分析。
本章介绍了HIR的主要概念。
您可以通过将-Zunpretty=hir-tree
标志传递给rustc来查看代码的HIR表示形式:
cargo rustc -- -Zunpretty=hir-tree
带外存储和Crate
类型
HIR中的顶层数据结构是Crate
,它存储当前正在编译的crate的内容(我们只为当前crate构造HIR)。
在AST中,crate数据结构基本上只包含根模块,而HIRCrate
结构则包含许多map和其他用于组织crate内容以便于访问的数据。
例如,HIR中单个项目的内容(例如模块,功能,特征,隐含等)不能在父级中立即访问。
因此,例如,如果有一个包含函数bar()
的模块项目foo
:
#![allow(unused_variables)] fn main() { mod foo { fn bar() { } } }
那么在模块foo
的HIR中表示(Mod
结构)中将只有bar()
的**ItemId
** I
。
要获取函数bar()
的详细信息,我们将在items
映射中查找I
。
这种表示的一个很好的结果是,可以通过遍历这些映射中的键值对来遍历crate中的所有项目(而无需遍历整个HIR)。 对于trait项和impl项以及“实体”(如下所述)也有类似的map。
使用这种表示形式的另一个原因是为了更好地与增量编译集成。
这样,如果您想要访问&rustc_hir::Item
(例如modfoo
),则不能立即访问函数bar()
的内容。
相反,您只能访问bar()
的id,并且必须调用要求id作为参数的某些函数来查找bar
的内容。 这使编译器有机会观察到您访问了bar()
的数据,然后记录依赖。
HIR中的标识符
大多数必须处理HIR中事物的代码都倾向于不携带对HIR的引用,而是携带标识符号(或“ids”)。现在,您会发现四种正在使用的标识符:
DefId
,主要标识“定义”或顶层项目。HirId
,它将特定item的索引与该item内的偏移量结合在一起。BodyId
引用了板条箱中的特定项目的实际内容(函数或常量的定义)。 当前,它实际上是“ newtype'd”的HirId
。NodeId
,它是一个绝对ID,用于标识HIR树中的单个节点。- 尽管这种标识符仍很常用,但它们正在被逐步淘汰。
- 由于它们在crate中是绝对的,因此在树中的任何位置添加新节点都会导致包装箱中所有后续代码的
NodeId
发生更改。 你可能已经看出来了,这对增量编译极为不利。
我们还有一个内部map,是从DefId
到所谓的 “Def path”的映射。
“ Def path”就像一个模块路径,但是内容更为丰富。
例如,可能是crate::foo::MyStruct
唯一标识此定义。
它与模块路径有些不同,因为它可能包含类型参数T
,例如crate::foo::MyStruct::T
,在普通的Rust中,就不能这么写。
这些用于增量编译。
HIR Map
在大多数情况下,当您使用HIR时,您将通过 HIR Map进行操作,该map可通过tcx.hir_map
在tcx中访问(并在hir::map
模块中定义)。
HIR map包含多种方法,用于在各种ID之间进行转换并查找与HIR节点关联的数据。
例如,如果您有DefId
,并且想将其转换为NodeId
,则可以使用tcx.hir.as_local_node_id(def_id)
。
这将返回一个Option<NodeId>
—— 如果def-id引用了当前crate之外的内容(因为这种内容没有HIR节点),则将为None
;
否则返回Some(n)
,其中n
是定义的节点ID。
同样,您可以使用tcx.hir.find(n)
在节点上查找NodeId
。
这将返回一个Option<Node<'tcx>>
,其中Node
是在map中定义的枚举。
通过对此进行匹配,您可以找出node-id所指的节点类型,并获得指向数据本身的指针。
通常,您知道节点n
是哪种类型——例如 如果您知道n
必须是某些HIR表达式,
则可以执行tcx.hir.expect_expr(n)
,它将提取并返回&hir::Expr
,此时如果n
实际上不是一个表达式,那么会panic。
最后,您可以通过tcx.hir.get_parent_node(n)
之类的调用,使用HIR map来查找节点的父节点。
HIR Bodies
rustc_hir::Body
代表某种可执行代码,例如函数/闭包的主体或常量的定义。
body与一个所有者相关联,“所有者”通常是某种Item(例如,fn()
或const
),但也可以是闭包表达式(例如, |x, y| x + y
)。
您可以使用HIR映射来查找与给定def-id(maybe_body_owned_by
)关联的body,或找到body的所有者(body_owner_def_id
)。
Lowering
Lowering步骤将AST转换为HIR。 这意味着许多与类型分析或类似的语法无关分析无关的代码在这一阶段被删除了。 这种结构的例子包括但不限于
- 括号
- 无需替换,直接删除,树结构本身就能明确运算顺序
for
循环和while (let)
循环- 转换为
loop
+match
和一些let
binding
- 转换为
if let
- 转换为
match
- 转换为
- Universal
impl Trait
- 转换成范型参数(会添加flag来标志这些参数不是用户写的)
- Existential
impl Trait
- 转换为虚拟的
existential type
声明
- 转换为虚拟的
Lowering需要遵守几点,否则就会触发src/librustc_middle/hir/map/hir_id_validator.rs
中的检查:
- 如果创建了一个
HirId
,那就必须使用它。 因此,如果您使用lower_node_id
,则必须使用生成的NodeId
或HirId
(两个都可以,因为检查HIR
中的NodeId
时也会检查是否存在现有的HirId
s) - Lowering
HirId
必须在对item有所有权的作用域内完成。 这意味着如果要创建除当前正在Lower的item之外的其他item,则需要使用with_hir_id_owner
。 例如,在lower existential的impl Trait
时会发生这种情况. - 即使其
HirId
未使用,要放入HIR结构中的NodeId
也必须被lower。 此时一个合理的方案是调用let _ = self.lower_node_id(node_id);
。 - 如果要创建在
AST
中不存在的新节点,则必须为它们创建新的ID。 这是通过调用next_id
方法来完成的,该方法会生成一个新的NodeId
并自动为您lowering它,以便您也可以获得HirId
。
如果您要创建新的DefId
,由于每个DefId
需要具有一个对应的NodeId
,建议将这些NodeId
添加到AST
中,这样您就不必在lowering时生成新的DefId
。
这样做的好处是创建了一种通过NodeId
查找某物的DefID
的方法。
如果lower操作需要在多个位置使用该DefId
,则不能在所有这些位置生成一个新的NodeId
,因为那样的话,您将获得多余的的DefId
。
对于来自AST的NodeId
来说,这不是问题。
有一个NodeId
也允许了DefCollector
生成DefId
,而不需要立即进行操作。将DefId
生成集中在一个地方可以使重构和推理变得更加容易。
HIR Debugging
The -Zunpretty=hir-tree
flag will dump out the HIR.
If you are trying to correlate NodeId
s or DefId
s with source code, the
--pretty expanded,identified
flag may be useful.
TODO: anything else?
MIR (中层IR)
MIR 是 Rust's 中层中间表示. MIR是在RFC 1211中引入的。 它是Rust的一种非常简化的形式,用于某些对控制流敏感的安全检查——尤其是是借用检查器! ——以及优化和代码生成。 如果您想阅读对MIR非常层次的介绍,以及它所依赖的一些编译器概念(例如控制流图和简化),则可以欣赏介绍MIR的rust-lang博客文章 。
介绍 MIR
MIR 在 src/librustc_middle/mir/
模块中定义,但许多操纵它的代码都在 src/librustc_mir
.
MIR的一些核心特征有:
- 它基于 控制流图。
- 他没有嵌套的表达式。
- MIR中的所有类型都是完全显式的。
MIR核心词汇
本节介绍了MIR的关键概念,总结如下:
- 基本块: 控制流图的单元,包含了:
- 语句: 有一个后继的动作
- 终结句: 可能有多个后继的动作,永远在块的末尾
- (如果你对术语基本块不熟悉,见 背景知识)
- 本地变量: 在堆栈上分配的内存位置(至少在概念上是这样),例如函数参数,局部变量和临时变量。
这些由索引标识,并带有前导下划线,例如
_1
。 还有一个特殊的“本地变量”(_0
)分配来存储返回值。 - 位置: 用来表达内存中一个位置的表达式,像
_1
或者_1.f
. - 右值: 生成一个值的表达式,“右”意味着这些表达式一般只会出现在赋值语句的右侧。
- 操作数: 右值表达式的参数,可以是一个常数(如
22
)或者一个位置(如_1
)。
- 操作数: 右值表达式的参数,可以是一个常数(如
通过将简单的程序转换为MIR并读取pretty print的输出,您可以了解MIR的结构。 实际上,playgroud使得此操作变得容易,因为它提供了一个MIR按钮,该按钮将向您显示程序的MIR。 尝试运行此程序(或单击此链接),然后单击顶部的“ MIR”按钮:
fn main() { let mut vec = Vec::new(); vec.push(1); vec.push(2); }
你会看见:
// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn main() -> () {
...
}
这是 main
函数的MIR格式。
变量定义 如果我们深入一些,我们可以看到函数以一些变量定义开始,他们看起来像这样:
let mut _0: (); // return place
let mut _1: std::vec::Vec<i32>; // in scope 0 at src/main.rs:2:9: 2:16
let mut _2: ();
let mut _3: &mut std::vec::Vec<i32>;
let mut _4: ();
let mut _5: &mut std::vec::Vec<i32>;
您会看到MIR中的变量没有名称,而是具有索引,例如_0
或_1
。
我们还将用户变量(例如_1
)与临时值(例如_2
或_3
)混为一谈。
但您还是可以区分出哪些是用户定义的变量,因为它们具有与之相关联的调试信息(请参见下文)。
用户变量的调试信息 在变量定义下面,我们能发现唯一能提醒我们 _1
代表的是一个用户变量的提示:
scope 1 {
debug vec => _1; // in scope 1 at src/main.rs:2:9: 2:16
}
每个 debug <Name> => <Place>;
注解都描述了一个用户定义变量与调试器在哪里(即位置)能找到这个变量对应的数据。
这里这个映射非常简单,但优化可能会使得这个位置的使用情况复杂化,也可能会让多个用户变量共享同一个位置。
另外,闭包的捕获也是用同一套系统描述的,这种情况下,即使不进行优化,也已经很复杂了。如:debug x => (*((*_1).0: &T));
。
“scope”块(例如,scope 1 {..}
)描述了源程序的词法结构(某个名称在哪个作用域中),
因此,用// in scope 0
中注释的程序的任何部分都看不到vec
,在调试器中单步执行代码时就能发现这一点。
基本块:进一步阅读代码,我们能看到我们的第一个“基本块”(自然,当您查看它时,它看起来可能略有不同,我也省略了一些注释):
bb0: {
StorageLive(_1);
_1 = const <std::vec::Vec<T>>::new() -> bb2;
}
基本块由一系列语句和最终终结句定义。 在这个例子,有一个语句:
StorageLive(_1);
该语句表明变量 _1
是“活动的”,这意味着它可以在以后使用 —— 它将持续存在,直到遇到 StorageDead(_1)
语句为止,该语句表明变量_1
已完成使用。
LLVM使用这些“存储语句”来分配栈空间。
bb0
块的 终结句 是对 Vec::new
的调用:
_1 = const <std::vec::Vec<T>>::new() -> bb2;
终结句和一般语句不同,它们能有多个后继 —— 控制流可能会流向不同的地方。
像 Vec::new
这样的函数调用永远是终结句,因为这可能可以导致堆栈解退,尽管在Vec::new
的情况下显然堆栈解退是不可能的,因此我们只列出了唯一的后继块bb2
。
如果我们继续向前看到 bb2
,我们可以看见像这样的代码:
bb2: {
StorageLive(_3);
_3 = &mut _1;
_2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4];
}
这里有两个语句:另一个 StorageLive
,引入了 _3
临时变量,然后是一个赋值:
_3 = &mut _1;
赋值一般有形式:
<Place> = <Rvalue>
位置是类似于_3
,_ 3.f
或* _3
的表达式——它表示内存中的位置。
右值是一个创建值的表达式:在这种情况下,rvalue是一个可变借用表达式,看起来像&mut <Place>
。
因此,我们可以为右值定义语法,如下所示:
<Rvalue> = & (mut)? <Place>
| <Operand> + <Operand>
| <Operand> - <Operand>
| ...
<Operand> = Constant
| copy Place
| move Place
从该语法可以看出,右值不能嵌套——它们只能引用位置和常量。
此外,当您使用某个位置时,我们会指明是要复制该位置(要求该位置的类型为 T: Copy
)还是移动它(适用于 任何类型的位置)。
因此,例如,如果我们在Rust中写了表达式x = a + b + c
,它将被编译为两个语句和一个临时变量:
TMP1 = a + b
x = TMP1 + c
(试试看,你可能想要使用release模式来编译来跳过overflow检查)
MIR 中的数据类型
MIR中的数据类型的定义在 src/librustc_middle/mir/
模块中。
前面章节提到的关键概念都有一个直接对应的Rust类型。
MIR的主要数据类型为Mir
。 它包含单个函数的数据(以及Mir的“提升过的常量”的子实例,您可以在下面阅读其中的内容)。
- 基本块: 基本块被保存在
basic_blocks
成员中;这是一个BasicBlockData
向量。 我们不会直接引用一个基本块,代替地,我们会传递BasicBlock
值,其实际上是newtype过的这个向量中的索引。 - 语句 由
Statement
类型表示。 - 终结句 由
Terminator
类型表示。 - 本地变量 由类型
Local
(newtype过的索引)表示。 本地变量的实际数据保存在Mir
中的local_decls
。 也有一个特殊的常量RETURN_PLACE
来标记一个特殊的表示返回值的本地变量。 - 位置 由枚举
Place
表示。有如下变种:- 本地变量如
_1
- 静态变量如
FOO
- 投影,这一般是结构的成员或者从某个基位置“投影”出来的位置。
例如
_1.f
就是从)1
上投影出来的。*_1
也是一个投影,这类投影由ProjectionElem::Deref
代表。
- 本地变量如
- Rvalues 由
Rvalue
枚举表示。 - Operands 由
Operand
枚举表示。
表示常量
to be written
提升过的常量
to be written
HAIR 和 MIR 的构建
从 HIR lower到 MIR 的过程会在下面(可能不完整)这些item上进行:
- 函数体和闭包体
static
和const
的初始化- 枚举判别的初始化
- 任何类型的胶水和补充代码
- Tuple结构体的初始化函数
- Drop 代码 (
Drop::drop
函数不会直接被调用) - 没有显式
Drop
实现的对象的drop
Lowering是通过调用mir_built
查询触发的。
HIR和MIR之间有一个中间表示,称为HAIR,这仅在lowering过程中使用。
HAIR的最重要特征是各种调整(在没有显式语法的情况下发生),例如隐式类型转换,自动解引用,自动引用和重载方法调用,已成为显式强制转换,解引用操作,引用操作和具体的函数调用。
HAIR的数据类型与HIR数据类型类似,
但是例如-x
是hair::ExprKind::Neg(hir::Expr)
,
而不是hair::ExprKind::Neg(hair::Expr)
。
这种shallow的特性使HAIR
能够表示HIR具有的所有数据类型,而不必创建整个HIR的副本。
MIR lowering 将首先将最上面的表达式从HIR转换为HAIR(在rustc_mir_build::hair::cx::expr
中),然后递归处理HAIR表达式。
Lowering会为函数签名中指定的每个参数创建局部变量。
接下来,它为指定的每个绑定创建局部变量(例如, (a, b): (i32, String)
)产生3个绑定,一个用于参数,两个用于绑定。
接下来,它生成字段访问,该访问从参数读取字段并将其值写入绑定变量。
在解决了初始化的情况下,lowering为函数体递归生成MIR( Block
表达式)并将结果写入RETURN_PLACE
。
unpack!
所有东西
生成MIR的函数有两种模式。 第一种情况,如果该函数仅生成语句,则它将以基本块作为参数,这些语句应放入该基本块。 然后可以正常返回结果:
fn generate_some_mir(&mut self, block: BasicBlock) -> ResultType {
...
}
但是还有其他一些函数会生成新的基本块。
例如,lowering像if foo { 22 } else { 44 }
这样的表达式需要生成一个小的“菱形图”。
在这种情况下,函数将在其代码开始处使用一个基本块,并在代码生成结束时返回一个(可能)新的基本块。
BlockAnd
类型用于表示此类情况:
fn generate_more_mir(&mut self, block: BasicBlock) -> BlockAnd<ResultType> {
...
}
当您调用这些函数时,通常有一个局部变量block
,它实际上是一个“光标”。 它代表了我们要添加新的MIR的位置。
当调用generate_more_mir
时,您会想更新该光标。
您可以手动执行此操作,但这很繁琐:
let mut block;
let v = match self.generate_more_mir(..) {
BlockAnd { block: new_block, value: v } => {
block = new_block;
v
}
};
For this reason, we offer a macro that lets you write
let v = unpack!(block = self.generate_more_mir(...))
.
It simply extracts the new block and overwrites the
variable block
that you named in the unpack!
.
因此,我们提供了一个宏,可让您编写
let v = unpack!(block = self.generate_more_mir(...))
。
它简单地提取新的块并覆盖在unpack!
中指明的变量block
。
将表达式 Lowering 到 MIR
本质上一个表达式可以有四种表示形式:
Place
指一个(或一部分)已经存在的内存地址(本地,静态,或者提升过的)Rvalue
是可以给一个Place
赋值的东西Operand
是一个给像+
这样的运算符或者一个函数调用的参数- 一个存放了一个值的拷贝的临时变量
下图描绘了表示之间的交互的一般概述:
我们首先将函数体lowering到一个 Rvalue
,这样我们就可以为 RETURN_PLACE
创建一个赋值,
这个Rvalue
的lowering反过来会触发其参数的Operand
lowering(如果有的话)
lowering Operaper
会产生一个const
操作数,或者移动/复制出Place
,从而触发Place
lowering。
如果降低的表达式包含操作,则lowering到Place
的表达式可以触发创建一个临时变量。
这是蛇咬自己的尾巴的地方,我们需要触发Rvalue
lowering,以将表达式的值写入本地变量。
Operator lowering
内置类型的运算符不会lower为函数调用(这将导致无限递归调用,因为trait包含了操作本身)。相反,存在用于二元和一元运算符和索引运算的Rvalue
。
这些Rvalue
稍后将生成为llvm基本操作或llvm内部函数。
所有其他类型的运算符都被lower为对运算符对应特征的impl
的函数调用。
无论采用哪种lower方式,运算符的参数都会lower为Operand
。
这意味着所有参数都是常量,或者引用局部或静态位置中已经存在的值。
方法调用的 lowering
方法调用被降低到与一般函数调用相同的TerminatorKind
。
在MIR中,方法调用和一般函数调用之间不再存在差异。
条件
不带字段变量的enum
的if
条件判断和match
语句都会被lower为TerminatorKind::SwitchInt
。
每个可能的值(如果为if
条件判断,则对应的值为0
和1
)都有一个对应的BasicBlock
。
分支的参数是表示if条件值的Operand
。
模式匹配
具有字段enum
的match
语句也被lower为TerminatorKind::SwitchInt
,但是操作数是一个Place
,可以在其中找到该值的判别式。
这通常涉及将判别式读取为新的临时变量。
聚合构造
任何类型的聚合值(例如结构或元组)都是通过Rvalue::Aggregate
建立的。
所有字段都lower为Operator
。
从本质上讲,这等效于每个聚合字段都会有一个赋值语句,再加上一个对enum
的判别式的赋值。
MIR 访问者
MIR访问者是遍历MIR并查找事物或对其进行更改的便捷工具。
访问者trait特征是在rustc ::mir::visit
模块中定义的
——其中有两个是通过宏生成的:Visitor
(在&Mir
上运行并返回共享引用)和MutVisitor
(在&mut Mir
上运行并返回可变引用)。
要实现访问者,您必须创建一个代表这个访问者的类型。 通常,此类型希望在处理MIR时“挂起”到您需要的任何状态:
struct MyVisitor<...> {
tcx: TyCtxt<'tcx>,
...
}
然后您可以为那个类型实现 Visitor
或 MutVisitor
trait:
impl<'tcx> MutVisitor<'tcx> for NoLandingPads {
fn visit_foo(&mut self, ...) {
...
self.super_foo(...);
}
}
如上所示,在impl中,您可以覆盖任何visit_foo
方法(例如,visit_terminator
),以便编写一些在看到foo
时执行的代码。
如果您想递归遍历foo
的内容,则可以调用super_foo
方法。
(注意。您永远都不应该覆盖super_foo
)
可以在NoLandingPads
中找到一个非常简单的访问者示例。该访问者甚至不需要任何状态:它仅访问所有终止符并删除其unwind
后继。
遍历
In addition the visitor, the rustc::mir::traversal
module
contains useful functions for walking the MIR CFG in
different standard orders (e.g. pre-order, reverse
post-order, and so forth).
除了访问者之外, rustc::mir::traversal
模块也包含了用于按不同的标准顺序(例如,先根序、后根序等等)便利MIR CFG的实用函数。
MIR passes
如果您想获取某个函数(或常量,等等)的MIR,则可以使用optimized_mir(def_id)
查询。
这将返回给您最终的,优化的MIR。 对于外部def-id,我们只需从其他crate的元数据中读取MIR。
但是对于本地def-id,查询将构造MIR,然后通过应用一系列pass来迭代优化它。 本节描述了这些pass的工作方式以及如何扩展它们。
为了为给定的def-idD
生成optimized_mir(D)
,MIR会通过几组优化,每组均由一个查询表示。
每个套件都包含多个优化和转换。
这些套件代表有用的中间点,我们可以在这些中间点访问MIR以进行类型检查或其他目的:
mir_build(D)
—— 这不是一个查询,而是构建初始MIRmir_const(D)
—— 应用一些简单的转换以使MIR准备进行常量求值;mir_validated(D)
—— 应用更多的转换,使MIR准备好进行借用检查;optimized_mir(D)
—— 完成所有优化后的最终状态。
实现并注册一个pass
MirPass
是处理MIR的一些代码,通常(但不总是)以某种方式对其进行转换。 例如,它可能会执行优化。
MirPass
trait本身可在rustc_mir::transform
模块中找到,它基本上由一个run_pass
方法组成,
该方法仅要求一个&mut Mir
(以及tcx及有关其来源的一些信息)。
因此,其是对MIR进行原地修改而非重新构造(这有助于保持效率)。
基本的MIR pass的一个很好的例子是NoLandingPads
,它遍历MIR并删除由于unwinding而产生的所有边——当配置为panic=abort
时,它unwind永远不会发生。
从源代码可以看到,MIR pass是通过首先定义虚拟类型(无字段的结构)来定义的,例如:
#![allow(unused_variables)] fn main() { struct MyPass; }
接下来您可以为它实现MirPass
特征。
然后,您可以将此pass插入到在诸如optimized_mir
,mir_validated
等查询的pass的适当列表中。
(如果这是一种优化,则应进入optimized_mir
列表中。)
如果您要写pass,则很有可能要使用MIR访问者。 MIR访问者是一种方便的方法,可以遍历MIR的所有部分,以进行搜索或进行少量编辑。
窃取
中间查询mir_const()
和mir_validated()
产生一个使用tcx.alloc_steal_mir()
分配的&tcx Steal<Mir<'tcx >>
。
这表明结果可能会被下一组优化窃取 —— 这是一种避免克隆MIR的优化。
尝试使用被窃取的结果会导致编译器panic。
因此,重要的是,除了作为MIR处理管道的一部分之外,不要直接从这些中间查询中读取信息。
由于存在这种窃取机制,因此还必须注意确保在处理管道中特定阶段的MIR被窃取之前,任何想要读取的信息都已经被读取完了。
具体来说,这意味着如果您有一些查询foo(D)
要访问mir_const(D)
或mir_validated(D)
的结果,
则需要让后继使用ty::queries::foo::force(...)
强制传递foo(D)
。
即使您不直接要求查询结果,这也会强制执行查询。
例如,考虑MIR const限定。
它想读取由mir_const()
套件产生的结果。
但是,该结果将被mir_validated()
套件窃取。
如果什么都不做,那么如果mir_const_qualif(D)
在mir_validated(D)
之前执行,它将成功执行,否则就会失败。
因此,mir_validated(D)
会在实际窃取之前对mir_const_qualif
进行强制执行,
从而确保读取已发生(请记住查询已被记忆,因此执行第二次查询时只是从缓存加载):
mir_const(D) --read-by--> mir_const_qualif(D)
| ^
stolen-by |
| (forces)
v |
mir_validated(D) ------------+
这种机制有点偷奸耍滑的感觉。 rust-lang/rust#41710 中讨论了更优雅的替代方法。
Closure Expansion in rustc
This section describes how rustc handles closures. Closures in Rust are
effectively "desugared" into structs that contain the values they use (or
references to the values they use) from their creator's stack frame. rustc has
the job of figuring out which values a closure uses and how, so it can decide
whether to capture a given variable by shared reference, mutable reference, or
by move. rustc also has to figure out which the closure traits (Fn
,
FnMut
, or FnOnce
) a closure is capable of
implementing.
Let's start with a few examples:
Example 1
To start, let's take a look at how the closure in the following example is desugared:
fn closure(f: impl Fn()) { f(); } fn main() { let x: i32 = 10; closure(|| println!("Hi {}", x)); // The closure just reads x. println!("Value of x after return {}", x); }
Let's say the above is the content of a file called immut.rs
. If we compile
immut.rs
using the following command. The -Zdump-mir=all
flag will cause
rustc
to generate and dump the MIR to a directory called mir_dump
.
> rustc +stage1 immut.rs -Zdump-mir=all
After we run this command, we will see a newly generated directory in our
current working directory called mir_dump
, which will contain several files.
If we look at file rustc.main.-------.mir_map.0.mir
, we will find, among
other things, it also contains this line:
_4 = &_1;
_3 = [closure@immut.rs:7:13: 7:36] { x: move _4 };
Note that in the MIR examples in this chapter, _1
is x
.
Here in first line _4 = &_1;
, the mir_dump
tells us that x
was borrowed
as an immutable reference. This is what we would hope as our closure just
reads x
.
Example 2
Here is another example:
fn closure(mut f: impl FnMut()) { f(); } fn main() { let mut x: i32 = 10; closure(|| { x += 10; // The closure mutates the value of x println!("Hi {}", x) }); println!("Value of x after return {}", x); }
_4 = &mut _1;
_3 = [closure@mut.rs:7:13: 10:6] { x: move _4 };
This time along, in the line _4 = &mut _1;
, we see that the borrow is changed to mutable borrow.
Fair enough! The closure increments x
by 10.
Example 3
One more example:
fn closure(f: impl FnOnce()) { f(); } fn main() { let x = vec![21]; closure(|| { drop(x); // Makes x unusable after the fact. }); // println!("Value of x after return {:?}", x); }
_6 = [closure@move.rs:7:13: 9:6] { x: move _1 }; // bb16[3]: scope 1 at move.rs:7:13: 9:6
Here, x
is directly moved into the closure and the access to it will not be permitted after the
closure.
Inferences in the compiler
Now let's dive into rustc code and see how all these inferences are done by the compiler.
Let's start with defining a term that we will be using quite a bit in the rest of the discussion -
upvar. An upvar is a variable that is local to the function where the closure is defined. So,
in the above examples, x will be an upvar to the closure. They are also sometimes referred to as
the free variables meaning they are not bound to the context of the closure.
src/librustc_middle/ty/query/mod.rs
defines a query called upvars for this purpose.
Other than lazy invocation, one other thing that the distinguishes a closure from a
normal function is that it can use the upvars. It borrows these upvars from its surrounding
context; therefore the compiler has to determine the upvar's borrow type. The compiler starts with
assigning an immutable borrow type and lowers the restriction (that is, changes it from
immutable to mutable to move) as needed, based on the usage. In the Example 1 above, the
closure only uses the variable for printing but does not modify it in any way and therefore, in the
mir_dump
, we find the borrow type for the upvar x
to be immutable. In example 2, however, the
closure modifies x
and increments it by some value. Because of this mutation, the compiler, which
started off assigning x
as an immutable reference type, has to adjust it as a mutable reference.
Likewise in the third example, the closure drops the vector and therefore this requires the variable
x
to be moved into the closure. Depending on the borrow kind, the closure has to implement the
appropriate trait: Fn
trait for immutable borrow, FnMut
for mutable borrow,
and FnOnce
for move semantics.
Most of the code related to the closure is in the
src/librustc_typeck/check/upvar.rs
file and the data structures are
declared in the file src/librustc_middle/ty/mod.rs
.
Before we go any further, let's discuss how we can examine the flow of control through the rustc
codebase. For closures specifically, set the RUST_LOG
env variable as below and collect the
output in a file:
> RUST_LOG=rustc_typeck::check::upvar rustc +stage1 -Zdump-mir=all \
<.rs file to compile> 2> <file where the output will be dumped>
This uses the stage1 compiler and enables debug!
logging for the
rustc_typeck::check::upvar
module.
The other option is to step through the code using lldb or gdb.
rust-lldb build/x86_64-apple-darwin/stage1/bin/rustc test.rs
- In lldb:
b upvar.rs:134
// Setting the breakpoint on a certain line in the upvar.rs file`r
// Run the program until it hits the breakpoint
Let's start with upvar.rs
. This file has something called
the euv::ExprUseVisitor
which walks the source of the closure and
invokes a callbackfor each upvar that is borrowed, mutated, or moved.
fn main() { let mut x = vec![21]; let _cl = || { let y = x[0]; // 1. x[0] += 1; // 2. }; }
In the above example, our visitor will be called twice, for the lines marked 1 and 2, once for a shared borrow and another one for a mutable borrow. It will also tell us what was borrowed.
The callbacks are defined by implementing the Delegate
trait. The
InferBorrowKind
type implements Delegate
and keeps a map that
records for each upvar which mode of borrow was required. The modes of borrow
can be ByValue
(moved) or ByRef
(borrowed). For ByRef
borrows, it can be
shared
, shallow
, unique
or mut
as defined in the
src/librustc_middle/mir/mod.rs
.
Delegate
defines a few different methods (the different callbacks):
consume: for move of a variable, borrow for a borrow of some kind
(shared or mutable), and mutate when we see an assignment of something.
All of these callbacks have a common argument cmt which stands for Category,
Mutability and Type and is defined in
src/librustc_middle/middle/mem_categorization.rs
. Borrowing from the code
comments, "cmt
is a complete categorization of a value indicating where it
originated and how it is located, as well as the mutability of the memory in
which the value is stored". Based on the callback (consume, borrow etc.), we
will call the relevant adjust_upvar_borrow_kind_for_cmt
along. Once the borrow type is adjusted, we store it in the table, which
basically says what borrows were made for each closure.
self.tables
.borrow_mut()
.upvar_capture_map
.extend(delegate.adjust_upvar_captures);
Part 4: 分析
本部分讨论了编译器用来检查代码的各种属性并通知以后阶段的许多分析。 通常,这就是人们谈论“Rust的类型系统”时的意思。 这包括类型,特征系统和借用检查器的表示,推断和检查。 这些分析不会作为一个大pass或一组连续pass进行。 相反,它们分布在整个编译过程的各个部分,并使用不同的中间表示形式。 例如,类型检查在HIR上进行,而借用检查在MIR上进行。 尽管如此,出于介绍的目的,我们将在本指南的这一部分中讨论所有这些分析。
ty
模块:类型的表示
ty
模块定义了Rust编译器如何在内部表示类型。 它还定义了类型上下文(tcx
或TyCtxt
),这是编译器中的中央数据结构。
ty::Ty
当我们谈论rustc如何表示类型时,我们通常指的是称为Ty
的类型。 编译器中有很多Ty
的模块和类型(Ty 文档)。
我们指的Ty
是rustc::ty:: Ty
(而不是rustc_hir::Ty
)。它们之间的区别很重要,因此我们将在讨论ty::Ty
之前先进行讨论。
rustc_hir::Ty
vs ty::Ty
rustc中的HIR可以看作是高级中间表示。 它或多或少是一种AST(请参阅本章),因为它代表用户编写的语法,并且是在语法分析和一些desugaring之后获得的。 它具有类型的表示形式,但实际上它反映了用户编写的内容,即他们为表示该类型而编写的内容。
相反,ty::Ty
表示类型的语义,即用户编写内容的含义。
例如,rustc_hir::Ty
会记录用户在程序中使用了两次u32
这个名字,但是ty::Ty
会记录两种用法都指向同一类型。
例如: fn foo(x: u32) → u32 { }
在这个函数中,我们看到u32
出现了两次。
我们知道这是同一类型,即该函数接受一个参数并返回相同类型的参数,但是从HIR的角度来看,将存在两个不同的类型实例,因为它们分别在程序中的两个不同位置出现。
也就是说,它们有两个不同的Span
(位置)。
例如: fn foo(x: &u32) -> &u32)
另外,HIR可能遗漏了信息。
&u32
类型是不完整的,因为在完整的Rust类型中实际上这里应该存在一个生命周期,但是我们不需要编写这些生命周期。
还有一些省略规则可以插入信息。
结果可能看起来像fn foo<'a>(x: &'a u32) -> &'a u32)
.
在HIR级别上,这些内容并未阐明。
但是,在ty::Ty
级别,添加了这些详细信息。
此外,对于给定类型,我们将只有一个ty::Ty
,例如u32
,并且该ty::Ty
用于整个程序中的所有u32,而不是只在特定场景中使用,这与 rustc_hir::Ty
不同。
这里有一个总结:
rustc_hir::Ty | ty::Ty |
---|---|
描述类型的语法:用户写的内容(去除了一些语法糖)。 | 描述一种类型的“语义”:用户写的内容的含义。 |
每个rustc_hir::Ty 都有自己的span,对应于程序中的适当位置。 | 与用户程序中的单个位置不对应。 |
rustc_hir::Ty 具有泛型和生命周期; 但是,其中一些生命周期是特殊标记,例如LifetimeName::Implicit 。 | ty::Ty 具有完整的类型,包括泛型和生命周期,即使用户忽略了它们 |
fn foo(x: u32) → u32 { } —— 两个rustc_hir::Ty 代表了u32 的两次不同的使用。 每个都有自己的Span 等。——rustc_hir::Ty 不能告诉我们两者是同一类型 | 整个程序中所有u32 是同一个ty::Ty 。——ty::Ty 告诉我们,u32 的两次使用表示相同的类型。 |
fn foo(x: &u32) -> &u32) —— 仍然有两个rustc_hir::Ty 。 —— 在rustc_hir::Ty 中这两个引用的生命期使用特殊标记`LifetimeName::Implicit表示。 | fn foo(x: &u32) -> &u32) —— 单个ty::Ty 。—— ty::Ty 具有隐藏的生命周期参数 |
次序 HIR是直接从AST构建的,因此会在生成任何ty::Ty
之前发生。
构建HIR之后,将完成一些基本的类型推断和类型检查。
在类型推断过程中,我们找出所有事物的ty::Ty
是什么,并且还要检查某事物的类型是否不明确。
然后,ty::Ty
将用于类型检查,来确保所有内容都具有预期的类型。
astconv
模块是负责将rustc_hir::Ty
转换为ty::Ty
的代码所在的位置。
这发生在类型检查阶段,但也发生在编译器的其他部分,例如“该函数需要什么样的参数类型”之类的问题。
语义如何驱动两个Ty
实例 您可以将HIR视为对类型信息假设最少的视角。
我们假设两件事是截然不同的,直到证明它们是同一件事为止。
换句话说,我们对它们的了解较少,因此我们应该对它们的假设较少。
从文法上讲,第N行第20列的"u32"
和第N行第35列的"u32"
是两个字符串。我们尚不知道它们是否相同。 因此,在HIR中,我们将它们视为不同的。
后来,我们确定它们在语义上是相同的类型,这就是我们使用ty::Ty
的地方。
考虑另一个例子: fn foo<T>(x: T) -> u32
假设有人调用了 foo::<u32>(0)
。
这意味着T
和u32
(在本次调用中)实际上是相同的类型,因此最终我们最终将得到相同的ty::Ty
,但是我们有截然不同的rustc_hir::Ty
。
(不过,这有点过于简化了,因为在类型检查过程中,我们将对函数范型检查,并且仍然具有不同于u32
的T
。
之后,在进行代码生成时,我们将始终处理每个函数的“单态化"(完全替换的)版本,因此我们将知道T
代表什么(特别是它是u32
)。
这里还有一个例子:
#![allow(unused_variables)] fn main() { mod a { type X = u32; pub fn foo(x: X) -> i32 { 22 } } mod b { type X = i32; pub fn foo(x: X) -> i32 { x } } }
显然,这里的X
类型将根据上下文而变化。 如果查看rustc_hir::Ty
,您会发现X
在两种情况下都是别名(尽管它将通过名称解析映射到不同的别名)。
但是,如果您查看ty::Ty
中的函数签名,它将是 fn(u32) -> u32
或fn(i32) -> i32
(类型别名已完全展开)。
ty::Ty
的实现
rustc::ty::Ty
实际上是&TyS
的类型别名(稍后会详细介绍)。
TyS
(Type Structure)是主要功能所在的位置。
您通常可以忽略TyS
结构;您基本上永远不会显式访问它。我们总是使用Ty
别名通过引用传递它。
唯一的例外是在类型上定义固有方法。
特别地,TyS
具有类型为TyKind
的kind
字段,其表示关键类型信息。
TyKind
是一个很大的枚举,代表了不同类型的类型(例如原生类型,引用,抽象数据类型,泛型,生命周期等)。
TyS
还有另外2个字段:flags
和outer_exclusive_binder
。
它们是提高效率的便捷工具,可以汇总有关我们可能想知道的类型的信息,但本文并不多涉及这部分内容。
最后,ty::TyS
是interned的,以便使ty::TyS
可以是类似于指针的瘦类型。这使我们能够进行低成本的相等比较,以及其他的interning的好处。
分配和使用类型
要分配新类型,可以使用在tcx
上定义的各种mk_
方法。 它们的名称主要对应于各种类型。 例如:
let array_ty = tcx.mk_array(elem_ty, len * 2);
这些方法都返回Ty<'tcx>
—— 注意,返回的生命周期是该tcx
可以访问的生命周期。 类型总是被规范化和interned(因此我们永远不会两次分配完全相同的类型)。
注意 由于类型是interned的,因此可以使用
==
高效地比较它们是否相等 —— 但是,除非您碰巧正在散列并寻找重复项,否则您应该不会希望这么做。 这是因为在Rust中通常有多种方法来表示同一类型,特别是一旦涉及到类型推断。 如果要测试类型相等性,则可能需要开始研究类型推倒的代码才能正确完成。
您还可以通过访问tcx.types.bool
,tcx.types.char
等来在tcx
中找到各种常见类型(有关更多信息,请参见 CommonTypes
。)。
ty::TyKind
的变体
注意:TyKind
并非 Kind的函数式编程概念。
每当在编译器中使用Ty
时,通常会在类型上进行匹配:
fn foo(x: Ty<'tcx>) {
match x.kind {
...
}
}
kind
字段的类型为TyKind<'tcx>
,它是一个枚举,用于定义编译器中所有不同种类的类型。
N.B. 在类型推断过程中检查类型的
kind
字段可能会很冒险,因为可能会有推断变量和其他要考虑的因素,或者有时类型未知,并且稍后将变得已知。
相关类型的很多,我们会及时介绍(例如,区域/生命周期,“替代”等)。
TyKind
枚举上有很多变体,您可以通过查看rustdocs来看到。 这是一个样本:
代数数据类型(ADT) 代数数据类型是struct
,enum
或union
。
实际上,struct
,enum
和union
是用相同的方式实现的:它们都是ty::TyKind::Adt
类型。
这基本上是用户定义的类型。稍后我们将详细讨论。
Foreign 对应 extern type T
.
Str 是str类型。当用户编写&str
时,Str
是我们表示该类型的str
部分的方式。
Slice 对应 [T]
.
Array 对应 [T; n]
.
RawPtr 对应 *mut T
或者 *const T
Ref Ref
代表安全的引用,&'a mut T
或&'a T
。
Ref
具有一些相关类型,例如,Ty<tcx>
是引用所引用的类型,Region<tcx>
是引用的生命周期或区域,Mutability
则是引用的可变性。
Param 代表类型参数,如Vec<T>
中的T
。
Error 在某处表示类型错误,以便我们可以打印出更好的诊断信息。 我们将在后面讨论它。
Import 约定
尽管没有硬性规定,但是ty
模块的用法通常如下:
use ty::{self, Ty, TyCtxt};
由于Ty
和TyCtxt
类型使用得非常普遍,因此可以直接导入。
其他类型通常使用显式的ty::
前缀来引用(例如ty::TraitRef<'tcx>
)。
但是某些模块选择显式导入更大或更小的名称集。
ADT的表示
让我们考虑像MyStruct<u32>
这样的类型的例子,其中MyStruct的定义如下:
struct MyStruct<T> { x: u32, y: T }
类型MyStruct<u32>
将是TyKind::Adt
的实例:
Adt(&'tcx AdtDef, SubstsRef<'tcx>)
// ------------ ---------------
// (1) (2)
//
// (1) 表示 `MyStruct` 部分
// (2) 表示 `<u32>`, 或者 "substitutions" / 范型参数
有两个部分:
AdtDef
引用struct/enum/union,但没有类型参数的值。 在我们的示例中,这是MyStruct部分,没有参数u32。- 请注意,在HIR中,结构体,枚举和union的表示方式是不同的,但是在
ty::Ty
中,它们均使用TyKind::Adt
表示。
- 请注意,在HIR中,结构体,枚举和union的表示方式是不同的,但是在
SubstsRef
是要替换的范型参数值的内部列表。 在我们的MyStruct<u32>
的示例中,我们会得到一个类似[u32]
的列表。 稍后,我们将进一步探讨泛型和替换。
AdtDef
和 DefId
对于源代码中定义的每种类型,都有一个唯一的DefId
(请参阅本章)。
这包括ADT和泛型。 在上面给出的MyStruct<T>
定义中,有两个DefId
:一个用于MyStruct
,一个用于T
。
注意,上面的代码不会为u32
生成新的DefId
,因为该代码并不定义u32
(而仅是引用它)。
AdtDef
或多或少是DefId
的包装,其中包含许多有用的辅助方法。
AdtDef
和DefId
之间本质上是一对一的关系。
您可以通过tcx.adt_def(def_id)
查询DefId
对应的AdtDef
。 所有AdtDef
都被缓存了(您可以看到其上的'tcx
生命周期)。
类型错误
用户制造了类型错误时会生成TyKind::Error
。
我们的想法是,我们将传播这种类型并抑制由于该类型而引起的其他错误,以免级联的编译器错误消息使用户不知所措。
TyKind::Error
的使用有一个重要的原则。
除非您知道已经向用户报告了错误,否则您绝不要返回“错误类型”。
通常是因为(a)您刚刚在此报告了该错误,或者(b)您正在传播现有的Error类型(在这种情况下,应该在生成该错误类型时报告该错误)。
此原则非常重要,因为Error
类型的全部目的就是抑制其他错误——即,我们不报告它们。
如果我们在不向用户实际制造了错误的情况下生成Error
类型,则这可能导致以后的错误被抑制,并且编译可能会无意中成功!
有时还有第三种情况。
您认为已报告了一个错误,但是您认为该错误将在编译的更早阶段而不是现在得到报告。
在这种情况下,您可以调用delay_span_bug
,这表示编译应该会产生错误——如果编译意外地成功了,则将触发编译器错误报告。
问题:为什么在AdtDef
“内部”做替换?
回想一下,我们用(AdtDef,substs)
表示一个范型结构体。 那么,为什么要使用这种麻烦的模式?
我们可以选择表示这种类型的另一种方法是始终创建一个新的,完全不同的AdtDef
形式,其中所有类型都已被替换。
这样做好像比较方便。 但是,(AdtDef,substs)
方案对此有一些优势。
首先,(AdtDef,substs)
方案可以提高效率:
struct MyStruct<T> {
... 100s of fields ...
}
// Want to do: MyStruct<A> ==> MyStruct<B>
在像这样的示例中,只需将对A
的一个引用替换为B
,就可以低成本地地将MyStruct<A>
替换为MyStruct<B>
(依此类推)。
但是,如果我们替换所有字段,则可能需要多做很多工作,我们可能必须遍历AdtDef
中的所有字段并更新所有类型。
更深入一点来说,Rust中的结构体是nominal 类型——这意味着它们是由其名称定义的(然后它们的内容将从该名称的定义中进行索引,而不是携带在类型本身“内”)。
范型和替换
给定泛型类型MyType<A, B, ...>
,我们可能希望将泛型参数A, B, ...
替换为其他一些类型(可能是其他泛型参数或具体类型)。
我们在进行类型推断,类型检查和Trait求解时会做很多这类事情。
从概念上讲,在这些过程中,我们可能会发现一种类型等于另一种类型,并希望将一种类型换成另一种类型,依此类推,直到最终得到一些具体的类型(或错误) 。
在rustc中,这是使用我们上面提到的SubstsRef
完成的(“substs” = “substitutions”)。
从概念上讲,您可以认为SubstsRef
是一个替换ADT泛型类型参数的类型列表。
SubstsRef
是List<GenericArg<'tcx>>
的类型别名(请参阅rust文档中的List
)。
GenericArg
本质上是GenericArgKind
周围的节省空间的包装器,这是一个枚举,指示类型参数是哪种泛型(类型,生存期或const)。
因此,SubstsRef
在概念上类似于&tcx [GenericArgKind <'tcx>]
切片(但它实际上是一个List
)。
那么为什么我们使用这种List
类型而不是真正的slice呢?
它的长度是“内连”的,因此&List
仅为32位。
结果,它不能被切片(仅在长度超出范围时才起作用)。
这也意味着您可以通过==
来检查两个List
的相等性(对于普通切片是不可能的)。
正是因为它们从不代表“子列表”,而仅代表完整的“列表”,该列表已被散列和interned。
综上所述,让我们回到上面的示例:
struct MyStruct<T>
MyStruct
会有一个AdtDef
(和相应的DefId
)。T
会有一个TyKind::Param
(以及相应的DefId
)(稍后再介绍)。- 将有一个包含列表
[GenericArgKind::Type(Ty(T))]
的SubstsRef
。- 这里的
Ty(T)
是对ty::Ty
的简写,其中有TyKind::Param
,我们在之前提到过这一点。
- 这里的
- 这是一个
TyKind::Adt
,其中包含MyStruct
的AdtDef
和上面的SubstsRef
。
最后,我们将快速提到Generics
类型。 它用于提供某个类型的类型参数的信息。
替换前的范型
因此,回想一下,在我们的示例中,MyStruct
结构具有范型T
。
例如,当我们对使用MyStruct
的函数进行类型检查时,我们将需要能够在不真正知道T
是什么的情况下引用该类型T
。
总的来说,在所有泛型定义中都是如此:我们需要能够处理未知类型。 这是通过TyKind::Param
(我们在上面的示例中提到的)完成的。
每个TyKind::Param
都包含两个字段:名称和索引。
通常,索引完全定义了参数,并且大多数代码都使用该索引。
名称则包含在调试打印输出中。
这么做有两个原因。
首先,索引很方便,它使您可以在替换时将其包含在通用参数列表中。
其次,索引鲁棒性更强。 例如,原则上可以有两个使用相同名称的不同类型参数,例如 impl<A> Foo<A> { fn bar<A>() { .. } }
,尽管禁止阴影的规则使此操作变得困难(但是将来这些语言规则可能会更改)。
类型参数的索引是一个整数,指示其在类型参数列表中的顺序。 此外,我们认为该列表包括来自外部作用域的所有类型参数。 考虑以下示例:
struct Foo<A, B> {
// A would have index 0
// B would have index 1
.. // some fields
}
impl<X, Y> Foo<X, Y> {
fn method<Z>() {
// inside here, X, Y and Z are all in scope
// X has index 0
// Y has index 1
// Z has index 2
}
}
当我们在泛型定义中工作时,我们将像其他TyKind
一样使用TyKind::Param
。
毕竟这只是一种类型。
但是,如果我们想在某个地方使用范型,那么我们将需要进行替换。
例如,假设前面示例中的Foo <A, B>
类型的字段为Vec<A>
。
请注意,Vec
也是通用类型。
我们要告诉编译器,应将Vec
的类型参数替换为Foo<A,B>
的A
类型参数。我们通过替换来做到这一点:
struct Foo<A, B> { // Adt(Foo, &[Param(0), Param(1)])
x: Vec<A>, // Adt(Vec, &[Param(0)])
..
}
fn bar(foo: Foo<u32, f32>) { // Adt(Foo, &[u32, f32])
let y = foo.x; // Vec<Param(0)> => Vec<u32>
}
这个例子有一些不同的替代:
- 在
Foo
的定义中,在字段x
的类型中,将Vec
的类型参数替换为Param(0)
,即Foo<A, B>
的第一个参数,因此x
的类型是Vec <A>
。 - 在函数
bar
上,我们指定要使用Foo<u32, f32>
。这意味着我们将用u32
和f32
替换Param(0)
和Param(1)
。 - 在
bar
的函数体中,我们访问foo.x
,其类型为Vec<Param(0)>
,但Param(0)
已经被替换为u32
,因此,foo.x
的类型为Vec<u32>
。
让我们更仔细地看看最后的替换方法,以了解为什么使用索引。如果要查找foo.x
的类型,则可以获取x的范型,即Vec<Param(0)>
。
现在我们可以使用索引0
,并使用它来查找正确的类型替换:查看Foo
的SubstsRef
,我们有列表[u32, f32]
,
因为我们要替换索引0
,我们采用此列表的第0个索引,即u32
。然后就好了!
您可能有几个后续问题……
type_of
我们如何获得x
的范型?您可以通过 tcx.type_of(def_id)
查询获得几乎所有类型的东西,在这种情况下,我们将传递字段x
的DefId
。
type_of
查询总是返回带有定义范围内的泛型的定义。
例如,tcx.type_of(def_id_of_my_struct)
将返回MyStruct
的“自视图”:Adt(Foo, &[Param(0), Param(1)])
。
subst
我们如何实际地进行替换?也有一个用来这么做的函数!您可以使用subst
将SubstRef
替换为其他类型的列表。
这里是在编译器中实际使用subst
的示例。
确切的细节并不是太重要,但是在这段代码中,我们碰巧将其从rustc_hir::Ty
转换为真实的ty::Ty
。
您可以看到我们首先得到了一些替换(substs
)。然后我们调用type_of
来获取类型,并调用ty.subst(substs)
来获得新的ty
类型,并进行替换。
关于索引的注释:Param
中的索引可能与我们期望的不匹配。
例如,索引可能超出范围,或者可能是我们期望类型时却得到了一个生命周期的索引。
从rustc_hir::Ty
转换为ty::Ty
时或者更早,编译器会捕获这些错误。
如果它们在那以后发生,那就是编译器错误。
TypeFoldable
and TypeFolder
How is this subst
query actually implemented? As you can imagine, we might want to do
substitutions on a lot of different things. For example, we might want to do a substitution directly
on a type like we did with Vec
above. But we might also have a more complex type with other types
nested inside that also need substitutions.
The answer is a couple of traits:
TypeFoldable
and
TypeFolder
.
TypeFoldable
is implemented by types that embed type information. It allows you to recursively process the contents of theTypeFoldable
and do stuff to them.TypeFolder
defines what you want to do with the types you encounter while processing theTypeFoldable
.
For example, the TypeFolder
trait has a method
fold_ty
that takes a type as input a type and returns a new type as a result. TypeFoldable
invokes the
TypeFolder
fold_foo
methods on itself, giving the TypeFolder
access to its contents (the
types, regions, etc that are contained within).
You can think of it with this analogy to the iterator combinators we have come to love in rust:
vec.iter().map(|e1| foo(e2)).collect()
// ^^^^^^^^^^^^ analogous to `TypeFolder`
// ^^^ analogous to `TypeFoldable`
So to reiterate:
TypeFolder
is a trait that defines a “map” operation.TypeFoldable
is a trait that is implemented by things that embed types.
In the case of subst
, we can see that it is implemented as a TypeFolder
:
SubstFolder
.
Looking at its implementation, we see where the actual substitutions are happening.
However, you might also notice that the implementation calls this super_fold_with
method. What is
that? It is a method of TypeFoldable
. Consider the following TypeFoldable
type MyFoldable
:
struct MyFoldable<'tcx> {
def_id: DefId,
ty: Ty<'tcx>,
}
The TypeFolder
can call super_fold_with
on MyFoldable
if it just wants to replace some of the
fields of MyFoldable
with new values. If it instead wants to replace the whole MyFoldable
with a
different one, it would call fold_with
instead (a different method on TypeFoldable
).
In almost all cases, we don’t want to replace the whole struct; we only want to replace ty::Ty
s in
the struct, so usually we call super_fold_with
. A typical implementation that MyFoldable
could
have might do something like this:
my_foldable: MyFoldable<'tcx>
my_foldable.subst(..., subst)
impl TypeFoldable for MyFoldable {
fn super_fold_with(&self, folder: &mut impl TypeFolder<'tcx>) -> MyFoldable {
MyFoldable {
def_id: self.def_id.fold_with(folder),
ty: self.ty.fold_with(folder),
}
}
fn super_visit_with(..) { }
}
Notice that here, we implement super_fold_with
to go over the fields of MyFoldable
and call
fold_with
on them. That is, a folder may replace def_id
and ty
, but not the whole
MyFoldable
struct.
Here is another example to put things together: suppose we have a type like Vec<Vec<X>>
. The
ty::Ty
would look like: Adt(Vec, &[Adt(Vec, &[Param(X)])])
. If we want to do subst(X => u32)
,
then we would first look at the overall type. We would see that there are no substitutions to be
made at the outer level, so we would descend one level and look at Adt(Vec, &[Param(X)])
. There
are still no substitutions to be made here, so we would descend again. Now we are looking at
Param(X)
, which can be substituted, so we replace it with u32
. We can’t descend any more, so we
are done, and the overall result is Adt(Vec, &[Adt(Vec, &[u32])])
.
One last thing to mention: often when folding over a TypeFoldable
, we don’t want to change most
things. We only want to do something when we reach a type. That means there may be a lot of
TypeFoldable
types whose implementations basically just forward to their fields’ TypeFoldable
implementations. Such implementations of TypeFoldable
tend to be pretty tedious to write by hand.
For this reason, there is a derive
macro that allows you to #![derive(TypeFoldable)]
. It is
defined
here.
subst
In the case of substitutions the actual
folder
is going to be doing the indexing we’ve already mentioned. There we define a Folder
and call
fold_with
on the TypeFoldable
to process yourself. Then
fold_ty
the method that process each type it looks for a ty::Param
and for those it replaces it for
something from the list of substitutions, otherwise recursively process the type. To replace it,
calls
ty_for_param
and all that does is index into the list of substitutions with the index of the Param
.
Generic arguments
A ty::subst::GenericArg<'tcx>
represents some entity in the type system: a type
(Ty<'tcx>
), lifetime (ty::Region<'tcx>
) or constant (ty::Const<'tcx>
).
GenericArg
is used to perform substitutions of generic parameters for concrete
arguments, such as when calling a function with generic parameters explicitly
with type arguments. Substitutions are represented using the
Subst
type as described below.
Subst
ty::subst::Subst<'tcx>
is intuitively simply a slice of GenericArg<'tcx>
s,
acting as an ordered list of substitutions from generic parameters to
concrete arguments (such as types, lifetimes and consts).
For example, given a HashMap<K, V>
with two type parameters, K
and V
, an
instantiation of the parameters, for example HashMap<i32, u32>
, would be
represented by the substitution &'tcx [tcx.types.i32, tcx.types.u32]
.
Subst
provides various convenience methods to instantiate substitutions
given item definitions, which should generally be used rather than explicitly
constructing such substitution slices.
GenericArg
The actual GenericArg
struct is optimised for space, storing the type, lifetime or
const as an interned pointer containing a tag identifying its kind (in the
lowest 2 bits). Unless you are working with the Subst
implementation
specifically, you should generally not have to deal with GenericArg
and instead
make use of the safe GenericArgKind
abstraction.
GenericArgKind
As GenericArg
itself is not type-safe, the GenericArgKind
enum provides a more
convenient and safe interface for dealing with generic arguments. An
GenericArgKind
can be converted to a raw GenericArg
using GenericArg::from()
(or simply .into()
when the context is clear). As mentioned earlier, substitution
lists store raw GenericArg
s, so before dealing with them, it is preferable to
convert them to GenericArgKind
s first. This is done by calling the .unpack()
method.
// An example of unpacking and packing a generic argument.
fn deal_with_generic_arg<'tcx>(generic_arg: GenericArg<'tcx>) -> GenericArg<'tcx> {
// Unpack a raw `GenericArg` to deal with it safely.
let new_generic_arg: GenericArgKind<'tcx> = match generic_arg.unpack() {
GenericArgKind::Type(ty) => { /* ... */ }
GenericArgKind::Lifetime(lt) => { /* ... */ }
GenericArgKind::Const(ct) => { /* ... */ }
};
// Pack the `GenericArgKind` to store it in a substitution list.
new_generic_arg.into()
}
Type inference
Type inference is the process of automatic detection of the type of an expression.
It is what allows Rust to work with fewer or no type annotations, making things easier for users:
fn main() { let mut things = vec![]; things.push("thing"); }
Here, the type of things
is inferred to be Vec<&str>
because of the value
we push into things
.
The type inference is based on the standard Hindley-Milner (HM) type inference algorithm, but extended in various way to accommodate subtyping, region inference, and higher-ranked types.
A note on terminology
We use the notation ?T
to refer to inference variables, also called
existential variables.
We use the terms "region" and "lifetime" interchangeably. Both refer to
the 'a
in &'a T
.
The term "bound region" refers to a region that is bound in a function
signature, such as the 'a
in for<'a> fn(&'a u32)
. A region is
"free" if it is not bound.
Creating an inference context
You create and "enter" an inference context by doing something like the following:
tcx.infer_ctxt().enter(|infcx| {
// Use the inference context `infcx` here.
})
Within the closure, infcx
has the type InferCtxt<'cx, 'tcx>
for some
fresh 'cx
, while 'tcx
is the same as outside the inference context.
(Again, see the ty
chapter for more details on this setup.)
The tcx.infer_ctxt
method actually returns a builder, which means
there are some kinds of configuration you can do before the infcx
is
created. See InferCtxtBuilder
for more information.
Inference variables
The main purpose of the inference context is to house a bunch of inference variables – these represent types or regions whose precise value is not yet known, but will be uncovered as we perform type-checking.
If you're familiar with the basic ideas of unification from H-M type systems, or logic languages like Prolog, this is the same concept. If you're not, you might want to read a tutorial on how H-M type inference works, or perhaps this blog post on unification in the Chalk project.
All told, the inference context stores four kinds of inference variables as of this writing:
- Type variables, which come in three varieties:
- General type variables (the most common). These can be unified with any type.
- Integral type variables, which can only be unified with an integral type,
and arise from an integer literal expression like
22
. - Float type variables, which can only be unified with a float type, and
arise from a float literal expression like
22.0
.
- Region variables, which represent lifetimes, and arise all over the place.
All the type variables work in much the same way: you can create a new
type variable, and what you get is Ty<'tcx>
representing an
unresolved type ?T
. Then later you can apply the various operations
that the inferencer supports, such as equality or subtyping, and it
will possibly instantiate (or bind) that ?T
to a specific
value as a result.
The region variables work somewhat differently, and are described below in a separate section.
Enforcing equality / subtyping
The most basic operations you can perform in the type inferencer is
equality, which forces two types T
and U
to be the same. The
recommended way to add an equality constraint is to use the at
method, roughly like so:
infcx.at(...).eq(t, u);
The first at()
call provides a bit of context, i.e. why you are
doing this unification, and in what environment, and the eq
method
performs the actual equality constraint.
When you equate things, you force them to be precisely equal. Equating
returns an InferResult
– if it returns Err(err)
, then equating
failed, and the enclosing TypeError
will tell you what went wrong.
The success case is perhaps more interesting. The "primary" return
type of eq
is ()
– that is, when it succeeds, it doesn't return a
value of any particular interest. Rather, it is executed for its
side-effects of constraining type variables and so forth. However, the
actual return type is not ()
, but rather InferOk<()>
. The
InferOk
type is used to carry extra trait obligations – your job is
to ensure that these are fulfilled (typically by enrolling them in a
fulfillment context). See the trait chapter for more background on that.
You can similarly enforce subtyping through infcx.at(..).sub(..)
. The same
basic concepts as above apply.
"Trying" equality
Sometimes you would like to know if it is possible to equate two
types without error. You can test that with infcx.can_eq
(or
infcx.can_sub
for subtyping). If this returns Ok
, then equality
is possible – but in all cases, any side-effects are reversed.
Be aware, though, that the success or failure of these methods is always
modulo regions. That is, two types &'a u32
and &'b u32
will
return Ok
for can_eq
, even if 'a != 'b
. This falls out from the
"two-phase" nature of how we solve region constraints.
Snapshots
As described in the previous section on can_eq
, often it is useful
to be able to do a series of operations and then roll back their
side-effects. This is done for various reasons: one of them is to be
able to backtrack, trying out multiple possibilities before settling
on which path to take. Another is in order to ensure that a series of
smaller changes take place atomically or not at all.
To allow for this, the inference context supports a snapshot
method.
When you call it, it will start recording changes that occur from the
operations you perform. When you are done, you can either invoke
rollback_to
, which will undo those changes, or else confirm
, which
will make them permanent. Snapshots can be nested as long as you follow
a stack-like discipline.
Rather than use snapshots directly, it is often helpful to use the
methods like commit_if_ok
or probe
that encapsulate higher-level
patterns.
Subtyping obligations
One thing worth discussing is subtyping obligations. When you force
two types to be a subtype, like ?T <: i32
, we can often convert those
into equality constraints. This follows from Rust's rather limited notion
of subtyping: so, in the above case, ?T <: i32
is equivalent to ?T = i32
.
However, in some cases we have to be more careful. For example, when
regions are involved. So if you have ?T <: &'a i32
, what we would do
is to first "generalize" &'a i32
into a type with a region variable:
&'?b i32
, and then unify ?T
with that (?T = &'?b i32
). We then
relate this new variable with the original bound:
&'?b i32 <: &'a i32
This will result in a region constraint (see below) of '?b: 'a
.
One final interesting case is relating two unbound type variables,
like ?T <: ?U
. In that case, we can't make progress, so we enqueue
an obligation Subtype(?T, ?U)
and return it via the InferOk
mechanism. You'll have to try again when more details about ?T
or
?U
are known.
Region constraints
Regions are inferenced somewhat differently from types. Rather than eagerly unifying things, we simply collect constraints as we go, but make (almost) no attempt to solve regions. These constraints have the form of an "outlives" constraint:
'a: 'b
Actually the code tends to view them as a subregion relation, but it's the same idea:
'b <= 'a
(There are various other kinds of constraints, such as "verifys"; see
the region_constraints
module for details.)
There is one case where we do some amount of eager unification. If you have an equality constraint between two regions
'a = 'b
we will record that fact in a unification table. You can then use
opportunistic_resolve_var
to convert 'b
to 'a
(or vice
versa). This is sometimes needed to ensure termination of fixed-point
algorithms.
Extracting region constraints
Ultimately, region constraints are only solved at the very end of type-checking, once all other constraints are known. There are two ways to solve region constraints right now: lexical and non-lexical. Eventually there will only be one.
To solve lexical region constraints, you invoke
resolve_regions_and_report_errors
. This "closes" the region
constraint process and invokes the lexical_region_resolve
code. Once
this is done, any further attempt to equate or create a subtyping
relationship will yield an ICE.
Non-lexical region constraints are not handled within the inference
context. Instead, the NLL solver (actually, the MIR type-checker)
invokes take_and_reset_region_constraints
periodically. This
extracts all of the outlives constraints from the region solver, but
leaves the set of variables intact. This is used to get just the
region constraints that resulted from some particular point in the
program, since the NLL solver needs to know not just what regions
were subregions, but also where. Finally, the NLL solver invokes
take_region_var_origins
, which "closes" the region constraint
process in the same way as normal solving.
Lexical region resolution
Lexical region resolution is done by initially assigning each region variable to an empty value. We then process each outlives constraint repeatedly, growing region variables until a fixed-point is reached. Region variables can be grown using a least-upper-bound relation on the region lattice in a fairly straightforward fashion.
Trait resolution (old-style)
This chapter describes the general process of trait resolution and points out some non-obvious things.
Note: This chapter (and its subchapters) describe how the trait solver currently works. However, we are in the process of designing a new trait solver. If you'd prefer to read about that, see this traits chapter.
Major concepts
Trait resolution is the process of pairing up an impl with each reference to a trait. So, for example, if there is a generic function like:
fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> { ... }
and then a call to that function:
let v: Vec<isize> = clone_slice(&[1, 2, 3])
it is the job of trait resolution to figure out whether there exists an impl of
(in this case) isize : Clone
.
Note that in some cases, like generic functions, we may not be able to
find a specific impl, but we can figure out that the caller must
provide an impl. For example, consider the body of clone_slice
:
fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> {
let mut v = Vec::new();
for e in &x {
v.push((*e).clone()); // (*)
}
}
The line marked (*)
is only legal if T
(the type of *e
)
implements the Clone
trait. Naturally, since we don't know what T
is, we can't find the specific impl; but based on the bound T:Clone
,
we can say that there exists an impl which the caller must provide.
We use the term obligation to refer to a trait reference in need of an impl. Basically, the trait resolution system resolves an obligation by proving that an appropriate impl does exist.
During type checking, we do not store the results of trait selection. We simply wish to verify that trait selection will succeed. Then later, at trans time, when we have all concrete types available, we can repeat the trait selection to choose an actual implementation, which will then be generated in the output binary.
Overview
Trait resolution consists of three major parts:
-
Selection: Deciding how to resolve a specific obligation. For example, selection might decide that a specific obligation can be resolved by employing an impl which matches the
Self
type, or by using a parameter bound (e.g.T: Trait
). In the case of an impl, selecting one obligation can create nested obligations because of where clauses on the impl itself. It may also require evaluating those nested obligations to resolve ambiguities. -
Fulfillment: The fulfillment code is what tracks that obligations are completely fulfilled. Basically it is a worklist of obligations to be selected: once selection is successful, the obligation is removed from the worklist and any nested obligations are enqueued.
-
Coherence: The coherence checks are intended to ensure that there are never overlapping impls, where two impls could be used with equal precedence.
Selection
Selection is the process of deciding whether an obligation can be
resolved and, if so, how it is to be resolved (via impl, where clause, etc).
The main interface is the select()
function, which takes an obligation
and returns a SelectionResult
. There are three possible outcomes:
-
Ok(Some(selection))
– yes, the obligation can be resolved, andselection
indicates how. If the impl was resolved via an impl, thenselection
may also indicate nested obligations that are required by the impl. -
Ok(None)
– we are not yet sure whether the obligation can be resolved or not. This happens most commonly when the obligation contains unbound type variables. -
Err(err)
– the obligation definitely cannot be resolved due to a type error or because there are no impls that could possibly apply.
The basic algorithm for selection is broken into two big phases: candidate assembly and confirmation.
Note that because of how lifetime inference works, it is not possible to give back immediate feedback as to whether a unification or subtype relationship between lifetimes holds or not. Therefore, lifetime matching is not considered during selection. This is reflected in the fact that subregion assignment is infallible. This may yield lifetime constraints that will later be found to be in error (in contrast, the non-lifetime-constraints have already been checked during selection and can never cause an error, though naturally they may lead to other errors downstream).
Candidate assembly
Searches for impls/where-clauses/etc that might possibly be used to satisfy the obligation. Each of those is called a candidate. To avoid ambiguity, we want to find exactly one candidate that is definitively applicable. In some cases, we may not know whether an impl/where-clause applies or not – this occurs when the obligation contains unbound inference variables.
The subroutines that decide whether a particular impl/where-clause/etc
applies to a particular obligation are collectively referred to as the
process of matching. At the moment, this amounts to
unifying the Self
types, but in the future we may also recursively
consider some of the nested obligations, in the case of an impl.
TODO: what does "unifying the Self
types" mean? The Self
of the
obligation with that of an impl?
The basic idea for candidate assembly is to do a first pass in which we identify all possible candidates. During this pass, all that we do is try and unify the type parameters. (In particular, we ignore any nested where clauses.) Presuming that this unification succeeds, the impl is added as a candidate.
Once this first pass is done, we can examine the set of candidates. If it is a singleton set, then we are done: this is the only impl in scope that could possibly apply. Otherwise, we can winnow down the set of candidates by using where clauses and other conditions. If this reduced set yields a single, unambiguous entry, we're good to go, otherwise the result is considered ambiguous.
The basic process: Inferring based on the impls we see
This process is easier if we work through some examples. Consider the following trait:
trait Convert<Target> {
fn convert(&self) -> Target;
}
This trait just has one method. It's about as simple as it gets. It
converts from the (implicit) Self
type to the Target
type. If we
wanted to permit conversion between isize
and usize
, we might
implement Convert
like so:
impl Convert<usize> for isize { ... } // isize -> usize
impl Convert<isize> for usize { ... } // usize -> isize
Now imagine there is some code like the following:
let x: isize = ...;
let y = x.convert();
The call to convert will generate a trait reference Convert<$Y> for isize
, where $Y
is the type variable representing the type of
y
. Of the two impls we can see, the only one that matches is
Convert<usize> for isize
. Therefore, we can
select this impl, which will cause the type of $Y
to be unified to
usize
. (Note that while assembling candidates, we do the initial
unifications in a transaction, so that they don't affect one another.)
TODO: The example says we can "select" the impl, but this section is
talking specifically about candidate assembly. Does this mean we can sometimes
skip confirmation? Or is this poor wording?
TODO: Is the unification of $Y
part of trait resolution or type
inference? Or is this not the same type of "inference variable" as in type
inference?
Winnowing: Resolving ambiguities
But what happens if there are multiple impls where all the types unify? Consider this example:
trait Get {
fn get(&self) -> Self;
}
impl<T:Copy> Get for T {
fn get(&self) -> T { *self }
}
impl<T:Get> Get for Box<T> {
fn get(&self) -> Box<T> { Box::new(get_it(&**self)) }
}
What happens when we invoke get_it(&Box::new(1_u16))
, for example? In this
case, the Self
type is Box<u16>
– that unifies with both impls,
because the first applies to all types T
, and the second to all
Box<T>
. In order for this to be unambiguous, the compiler does a winnowing
pass that considers where
clauses
and attempts to remove candidates. In this case, the first impl only
applies if Box<u16> : Copy
, which doesn't hold. After winnowing,
then, we are left with just one candidate, so we can proceed.
where
clauses
Besides an impl, the other major way to resolve an obligation is via a where clause. The selection process is always given a parameter environment which contains a list of where clauses, which are basically obligations that we can assume are satisfiable. We will iterate over that list and check whether our current obligation can be found in that list. If so, it is considered satisfied. More precisely, we want to check whether there is a where-clause obligation that is for the same trait (or some subtrait) and which can match against the obligation.
Consider this simple example:
trait A1 {
fn do_a1(&self);
}
trait A2 : A1 { ... }
trait B {
fn do_b(&self);
}
fn foo<X:A2+B>(x: X) {
x.do_a1(); // (*)
x.do_b(); // (#)
}
In the body of foo
, clearly we can use methods of A1
, A2
, or B
on variable x
. The line marked (*)
will incur an obligation X: A1
,
while the line marked (#)
will incur an obligation X: B
. Meanwhile,
the parameter environment will contain two where-clauses: X : A2
and X : B
.
For each obligation, then, we search this list of where-clauses. The
obligation X: B
trivially matches against the where-clause X: B
.
To resolve an obligation X:A1
, we would note that X:A2
implies that X:A1
.
Confirmation
Confirmation unifies the output type parameters of the trait with the values found in the obligation, possibly yielding a type error.
Suppose we have the following variation of the Convert
example in the
previous section:
trait Convert<Target> {
fn convert(&self) -> Target;
}
impl Convert<usize> for isize { ... } // isize -> usize
impl Convert<isize> for usize { ... } // usize -> isize
let x: isize = ...;
let y: char = x.convert(); // NOTE: `y: char` now!
Confirmation is where an error would be reported because the impl specified
that Target
would be usize
, but the obligation reported char
. Hence the
result of selection would be an error.
Note that the candidate impl is chosen based on the Self
type, but
confirmation is done based on (in this case) the Target
type parameter.
Selection during translation
As mentioned above, during type checking, we do not store the results of trait selection. At trans time, we repeat the trait selection to choose a particular impl for each method call. In this second selection, we do not consider any where-clauses to be in scope because we know that each resolution will resolve to a particular impl.
One interesting twist has to do with nested obligations. In general, in trans, we only need to do a "shallow" selection for an obligation. That is, we wish to identify which impl applies, but we do not (yet) need to decide how to select any nested obligations. Nonetheless, we do currently do a complete resolution, and that is because it can sometimes inform the results of type inference. That is, we do not have the full substitutions in terms of the type variables of the impl available to us, so we must run trait selection to figure everything out.
TODO: is this still talking about trans?
Here is an example:
trait Foo { ... }
impl<U, T:Bar<U>> Foo for Vec<T> { ... }
impl Bar<usize> for isize { ... }
After one shallow round of selection for an obligation like Vec<isize> : Foo
, we would know which impl we want, and we would know that
T=isize
, but we do not know the type of U
. We must select the
nested obligation isize : Bar<U>
to find out that U=usize
.
It would be good to only do just as much nested resolution as necessary. Currently, though, we just do a full resolution.
Higher-ranked trait bounds
One of the more subtle concepts in trait resolution is higher-ranked trait
bounds. An example of such a bound is for<'a> MyTrait<&'a isize>
.
Let's walk through how selection on higher-ranked trait references
works.
Basic matching and placeholder leaks
Suppose we have a trait Foo
:
#![allow(unused_variables)] fn main() { trait Foo<X> { fn foo(&self, x: X) { } } }
Let's say we have a function want_hrtb
that wants a type which
implements Foo<&'a isize>
for any 'a
:
fn want_hrtb<T>() where T : for<'a> Foo<&'a isize> { ... }
Now we have a struct AnyInt
that implements Foo<&'a isize>
for any
'a
:
struct AnyInt;
impl<'a> Foo<&'a isize> for AnyInt { }
And the question is, does AnyInt : for<'a> Foo<&'a isize>
? We want the
answer to be yes. The algorithm for figuring it out is closely related
to the subtyping for higher-ranked types (which is described here
and also in a paper by SPJ. If you wish to understand higher-ranked
subtyping, we recommend you read the paper). There are a few parts:
- Replace bound regions in the obligation with placeholders.
- Match the impl against the placeholder obligation.
- Check for placeholder leaks.
So let's work through our example.
-
The first thing we would do is to replace the bound region in the obligation with a placeholder, yielding
AnyInt : Foo<&'0 isize>
(here'0
represents placeholder region #0). Note that we now have no quantifiers; in terms of the compiler type, this changes from aty::PolyTraitRef
to aTraitRef
. We would then create theTraitRef
from the impl, using fresh variables for it's bound regions (and thus gettingFoo<&'$a isize>
, where'$a
is the inference variable for'a
). -
Next we relate the two trait refs, yielding a graph with the constraint that
'0 == '$a
. -
Finally, we check for placeholder "leaks" – a leak is basically any attempt to relate a placeholder region to another placeholder region, or to any region that pre-existed the impl match. The leak check is done by searching from the placeholder region to find the set of regions that it is related to in any way. This is called the "taint" set. To pass the check, that set must consist solely of itself and region variables from the impl. If the taint set includes any other region, then the match is a failure. In this case, the taint set for
'0
is{'0, '$a}
, and hence the check will succeed.
Let's consider a failure case. Imagine we also have a struct
struct StaticInt;
impl Foo<&'static isize> for StaticInt;
We want the obligation StaticInt : for<'a> Foo<&'a isize>
to be
considered unsatisfied. The check begins just as before. 'a
is
replaced with a placeholder '0
and the impl trait reference is instantiated to
Foo<&'static isize>
. When we relate those two, we get a constraint
like 'static == '0
. This means that the taint set for '0
is {'0, 'static}
, which fails the leak check.
TODO: This is because 'static
is not a region variable but is in the
taint set, right?
Higher-ranked trait obligations
Once the basic matching is done, we get to another interesting topic:
how to deal with impl obligations. I'll work through a simple example
here. Imagine we have the traits Foo
and Bar
and an associated impl:
#![allow(unused_variables)] fn main() { trait Foo<X> { fn foo(&self, x: X) { } } trait Bar<X> { fn bar(&self, x: X) { } } impl<X,F> Foo<X> for F where F : Bar<X> { } }
Now let's say we have a obligation Baz: for<'a> Foo<&'a isize>
and we match
this impl. What obligation is generated as a result? We want to get
Baz: for<'a> Bar<&'a isize>
, but how does that happen?
After the matching, we are in a position where we have a placeholder
substitution like X => &'0 isize
. If we apply this substitution to the
impl obligations, we get F : Bar<&'0 isize>
. Obviously this is not
directly usable because the placeholder region '0
cannot leak out of
our computation.
What we do is to create an inverse mapping from the taint set of '0
back to the original bound region ('a
, here) that '0
resulted
from. (This is done in higher_ranked::plug_leaks
). We know that the
leak check passed, so this taint set consists solely of the placeholder
region itself plus various intermediate region variables. We then walk
the trait-reference and convert every region in that taint set back to
a late-bound region, so in this case we'd wind up with
Baz: for<'a> Bar<&'a isize>
.
Caching and subtle considerations therewith
In general, we attempt to cache the results of trait selection. This is a somewhat complex process. Part of the reason for this is that we want to be able to cache results even when all the types in the trait reference are not fully known. In that case, it may happen that the trait selection process is also influencing type variables, so we have to be able to not only cache the result of the selection process, but replay its effects on the type variables.
An example
The high-level idea of how the cache works is that we first replace
all unbound inference variables with placeholder versions. Therefore,
if we had a trait reference usize : Foo<$t>
, where $t
is an unbound
inference variable, we might replace it with usize : Foo<$0>
, where
$0
is a placeholder type. We would then look this up in the cache.
If we found a hit, the hit would tell us the immediate next step to
take in the selection process (e.g. apply impl #22, or apply where
clause X : Foo<Y>
).
On the other hand, if there is no hit, we need to go through the selection process from scratch. Suppose, we come to the conclusion that the only possible impl is this one, with def-id 22:
impl Foo<isize> for usize { ... } // Impl #22
We would then record in the cache usize : Foo<$0> => ImplCandidate(22)
. Next
we would confirm ImplCandidate(22)
, which would (as a side-effect) unify
$t
with isize
.
Now, at some later time, we might come along and see a usize : Foo<$u>
. When replaced with a placeholder, this would yield usize : Foo<$0>
, just as
before, and hence the cache lookup would succeed, yielding
ImplCandidate(22)
. We would confirm ImplCandidate(22)
which would
(as a side-effect) unify $u
with isize
.
Where clauses and the local vs global cache
One subtle interaction is that the results of trait lookup will vary
depending on what where clauses are in scope. Therefore, we actually
have two caches, a local and a global cache. The local cache is
attached to the ParamEnv
, and the global cache attached to the
tcx
. We use the local cache whenever the result might depend on the
where clauses that are in scope. The determination of which cache to
use is done by the method pick_candidate_cache
in select.rs
. At
the moment, we use a very simple, conservative rule: if there are any
where-clauses in scope, then we use the local cache. We used to try
and draw finer-grained distinctions, but that led to a serious of
annoying and weird bugs like #22019 and #18290. This simple rule seems
to be pretty clearly safe and also still retains a very high hit rate
(~95% when compiling rustc).
TODO: it looks like pick_candidate_cache
no longer exists. In
general, is this section still accurate at all?
Specialization
TODO: where does Chalk fit in? Should we mention/discuss it here?
Defined in the specialize
module.
The basic strategy is to build up a specialization graph during coherence checking (recall that coherence checking looks for overlapping impls). Insertion into the graph locates the right place to put an impl in the specialization hierarchy; if there is no right place (due to partial overlap but no containment), you get an overlap error. Specialization is consulted when selecting an impl (of course), and the graph is consulted when propagating defaults down the specialization hierarchy.
You might expect that the specialization graph would be used during selection – i.e. when actually performing specialization. This is not done for two reasons:
-
It's merely an optimization: given a set of candidates that apply, we can determine the most specialized one by comparing them directly for specialization, rather than consulting the graph. Given that we also cache the results of selection, the benefit of this optimization is questionable.
-
To build the specialization graph in the first place, we need to use selection (because we need to determine whether one impl specializes another). Dealing with this reentrancy would require some additional mode switch for selection. Given that there seems to be no strong reason to use the graph anyway, we stick with a simpler approach in selection, and use the graph only for propagating default implementations.
Trait impl selection can succeed even when multiple impls can apply,
as long as they are part of the same specialization family. In that
case, it returns a single impl on success – this is the most
specialized impl known to apply. However, if there are any inference
variables in play, the returned impl may not be the actual impl we
will use at trans time. Thus, we take special care to avoid projecting
associated types unless either (1) the associated type does not use
default
and thus cannot be overridden or (2) all input types are
known concretely.
Additional Resources
This talk by @sunjay may be useful. Keep in mind that the talk only gives a broad overview of the problem and the solution (it was presented about halfway through @sunjay's work). Also, it was given in June 2018, and some things may have changed by the time you watch it.
Trait solving (new-style)
🚧 This chapter describes "new-style" trait solving. This is still in the process of being implemented; this chapter serves as a kind of in-progress design document. If you would prefer to read about how the current trait solver works, check out this other chapter. 🚧
By the way, if you would like to help in hacking on the new solver, you will find instructions for getting involved in the Traits Working Group tracking issue!
The new-style trait solver is based on the work done in chalk. Chalk recasts Rust's trait system explicitly in terms of logic programming. It does this by "lowering" Rust code into a kind of logic program we can then execute queries against.
You can read more about chalk itself in the Overview of Chalk section.
Trait solving in rustc is based around a few key ideas:
- Lowering to logic, which expresses
Rust traits in terms of standard logical terms.
- The goals and clauses chapter describes the precise form of rules we use, and lowering rules gives the complete set of lowering rules in a more reference-like form.
- Lazy normalization, which is the technique we use to accommodate associated types when figuring out whether types are equal.
- Region constraints, which are accumulated during trait solving but mostly ignored. This means that trait solving effectively ignores the precise regions involved, always – but we still remember the constraints on them so that those constraints can be checked by the type checker.
- Canonical queries, which allow us
to solve trait problems (like "is
Foo
implemented for the typeBar
?") once, and then apply that same result independently in many different inference contexts.
This is not a complete list of topics. See the sidebar for more.
Ongoing work
The design of the new-style trait solving currently happens in two places:
chalk. The chalk repository is where we experiment with new ideas and designs for the trait system. It primarily consists of two parts:
- a unit testing framework for the correctness and feasibility of the logical rules defining the new-style trait system.
- the
chalk_engine
crate, which defines the new-style trait solver used both in the unit testing framework and in rustc.
rustc. Once we are happy with the logical rules, we proceed to
implementing them in rustc. This mainly happens in
librustc_traits
.
Lowering to logic
The key observation here is that the Rust trait system is basically a kind of logic, and it can be mapped onto standard logical inference rules. We can then look for solutions to those inference rules in a very similar fashion to how e.g. a Prolog solver works. It turns out that we can't quite use Prolog rules (also called Horn clauses) but rather need a somewhat more expressive variant.
Rust traits and logic
One of the first observations is that the Rust trait system is basically a kind of logic. As such, we can map our struct, trait, and impl declarations into logical inference rules. For the most part, these are basically Horn clauses, though we'll see that to capture the full richness of Rust – and in particular to support generic programming – we have to go a bit further than standard Horn clauses.
To see how this mapping works, let's start with an example. Imagine we declare a trait and a few impls, like so:
#![allow(unused_variables)] fn main() { trait Clone { } impl Clone for usize { } impl<T> Clone for Vec<T> where T: Clone { } }
We could map these declarations to some Horn clauses, written in a Prolog-like notation, as follows:
Clone(usize).
Clone(Vec<?T>) :- Clone(?T).
// The notation `A :- B` means "A is true if B is true".
// Or, put another way, B implies A.
In Prolog terms, we might say that Clone(Foo)
– where Foo
is some
Rust type – is a predicate that represents the idea that the type
Foo
implements Clone
. These rules are program clauses; they
state the conditions under which that predicate can be proven (i.e.,
considered true). So the first rule just says "Clone is implemented
for usize
". The next rule says "for any type ?T
, Clone is
implemented for Vec<?T>
if clone is implemented for ?T
". So
e.g. if we wanted to prove that Clone(Vec<Vec<usize>>)
, we would do
so by applying the rules recursively:
Clone(Vec<Vec<usize>>)
is provable if:Clone(Vec<usize>)
is provable if:Clone(usize)
is provable. (Which it is, so we're all good.)
But now suppose we tried to prove that Clone(Vec<Bar>)
. This would
fail (after all, I didn't give an impl of Clone
for Bar
):
Clone(Vec<Bar>)
is provable if:Clone(Bar)
is provable. (But it is not, as there are no applicable rules.)
We can easily extend the example above to cover generic traits with
more than one input type. So imagine the Eq<T>
trait, which declares
that Self
is equatable with a value of type T
:
trait Eq<T> { ... }
impl Eq<usize> for usize { }
impl<T: Eq<U>> Eq<Vec<U>> for Vec<T> { }
That could be mapped as follows:
Eq(usize, usize).
Eq(Vec<?T>, Vec<?U>) :- Eq(?T, ?U).
So far so good.
Type-checking normal functions
OK, now that we have defined some logical rules that are able to express when traits are implemented and to handle associated types, let's turn our focus a bit towards type-checking. Type-checking is interesting because it is what gives us the goals that we need to prove. That is, everything we've seen so far has been about how we derive the rules by which we can prove goals from the traits and impls in the program; but we are also interested in how to derive the goals that we need to prove, and those come from type-checking.
Consider type-checking the function foo()
here:
fn foo() { bar::<usize>() }
fn bar<U: Eq<U>>() { }
This function is very simple, of course: all it does is to call
bar::<usize>()
. Now, looking at the definition of bar()
, we can see
that it has one where-clause U: Eq<U>
. So, that means that foo()
will
have to prove that usize: Eq<usize>
in order to show that it can call bar()
with usize
as the type argument.
If we wanted, we could write a Prolog predicate that defines the
conditions under which bar()
can be called. We'll say that those
conditions are called being "well-formed":
barWellFormed(?U) :- Eq(?U, ?U).
Then we can say that foo()
type-checks if the reference to
bar::<usize>
(that is, bar()
applied to the type usize
) is
well-formed:
fooTypeChecks :- barWellFormed(usize).
If we try to prove the goal fooTypeChecks
, it will succeed:
fooTypeChecks
is provable if:barWellFormed(usize)
, which is provable if:Eq(usize, usize)
, which is provable because of an impl.
Ok, so far so good. Let's move on to type-checking a more complex function.
Type-checking generic functions: beyond Horn clauses
In the last section, we used standard Prolog horn-clauses (augmented with Rust's
notion of type equality) to type-check some simple Rust functions. But that only
works when we are type-checking non-generic functions. If we want to type-check
a generic function, it turns out we need a stronger notion of goal than what Prolog
can provide. To see what I'm talking about, let's revamp our previous
example to make foo
generic:
fn foo<T: Eq<T>>() { bar::<T>() }
fn bar<U: Eq<U>>() { }
To type-check the body of foo
, we need to be able to hold the type
T
"abstract". That is, we need to check that the body of foo
is
type-safe for all types T
, not just for some specific type. We might express
this like so:
fooTypeChecks :-
// for all types T...
forall<T> {
// ...if we assume that Eq(T, T) is provable...
if (Eq(T, T)) {
// ...then we can prove that `barWellFormed(T)` holds.
barWellFormed(T)
}
}.
This notation I'm using here is the notation I've been using in my
prototype implementation; it's similar to standard mathematical
notation but a bit Rustified. Anyway, the problem is that standard
Horn clauses don't allow universal quantification (forall
) or
implication (if
) in goals (though many Prolog engines do support
them, as an extension). For this reason, we need to accept something
called "first-order hereditary harrop" (FOHH) clauses – this long
name basically means "standard Horn clauses with forall
and if
in
the body". But it's nice to know the proper name, because there is a
lot of work describing how to efficiently handle FOHH clauses; see for
example Gopalan Nadathur's excellent
"A Proof Procedure for the Logic of Hereditary Harrop Formulas"
in the bibliography.
It turns out that supporting FOHH is not really all that hard. And
once we are able to do that, we can easily describe the type-checking
rule for generic functions like foo
in our logic.
Source
This page is a lightly adapted version of a blog post by Nicholas Matsakis.
Goals and clauses
In logic programming terms, a goal is something that you must prove and a clause is something that you know is true. As described in the lowering to logic chapter, Rust's trait solver is based on an extension of hereditary harrop (HH) clauses, which extend traditional Prolog Horn clauses with a few new superpowers.
Goals and clauses meta structure
In Rust's solver, goals and clauses have the following forms (note that the two definitions reference one another):
Goal = DomainGoal // defined in the section below
| Goal && Goal
| Goal || Goal
| exists<K> { Goal } // existential quantification
| forall<K> { Goal } // universal quantification
| if (Clause) { Goal } // implication
| true // something that's trivially true
| ambiguous // something that's never provable
Clause = DomainGoal
| Clause :- Goal // if can prove Goal, then Clause is true
| Clause && Clause
| forall<K> { Clause }
K = <type> // a "kind"
| <lifetime>
The proof procedure for these sorts of goals is actually quite straightforward. Essentially, it's a form of depth-first search. The paper "A Proof Procedure for the Logic of Hereditary Harrop Formulas" gives the details.
In terms of code, these types are defined in
librustc_middle/traits/mod.rs
in rustc, and in
chalk-ir/src/lib.rs
in chalk.
Domain goals
Domain goals are the atoms of the trait logic. As can be seen in the definitions given above, general goals basically consist in a combination of domain goals.
Moreover, flattening a bit the definition of clauses given previously, one can see that clauses are always of the form:
forall<K1, ..., Kn> { DomainGoal :- Goal }
hence domain goals are in fact clauses' LHS. That is, at the most granular level, domain goals are what the trait solver will end up trying to prove.
To define the set of domain goals in our system, we need to first introduce a few simple formulations. A trait reference consists of the name of a trait along with a suitable set of inputs P0..Pn:
TraitRef = P0: TraitName<P1..Pn>
So, for example, u32: Display
is a trait reference, as is Vec<T>: IntoIterator
. Note that Rust surface syntax also permits some extra
things, like associated type bindings (Vec<T>: IntoIterator<Item = T>
), that are not part of a trait reference.
A projection consists of an associated item reference along with its inputs P0..Pm:
Projection = <P0 as TraitName<P1..Pn>>::AssocItem<Pn+1..Pm>
Given these, we can define a DomainGoal
as follows:
DomainGoal = Holds(WhereClause)
| FromEnv(TraitRef)
| FromEnv(Type)
| WellFormed(TraitRef)
| WellFormed(Type)
| Normalize(Projection -> Type)
WhereClause = Implemented(TraitRef)
| ProjectionEq(Projection = Type)
| Outlives(Type: Region)
| Outlives(Region: Region)
WhereClause
refers to a where
clause that a Rust user would actually be able
to write in a Rust program. This abstraction exists only as a convenience as we
sometimes want to only deal with domain goals that are effectively writable in
Rust.
Let's break down each one of these, one-by-one.
Implemented(TraitRef)
e.g. Implemented(i32: Copy)
True if the given trait is implemented for the given input types and lifetimes.
ProjectionEq(Projection = Type)
e.g. ProjectionEq<T as Iterator>::Item = u8
The given associated type Projection
is equal to Type
; this can be proved
with either normalization or using placeholder associated types. See
the section on associated types.
Normalize(Projection -> Type)
e.g. ProjectionEq<T as Iterator>::Item -> u8
The given associated type Projection
can be normalized to Type
.
As discussed in the section on associated
types, Normalize
implies ProjectionEq
,
but not vice versa. In general, proving Normalize(<T as Trait>::Item -> U)
also requires proving Implemented(T: Trait)
.
FromEnv(TraitRef)
e.g. FromEnv(Self: Add<i32>)
True if the inner TraitRef
is assumed to be true,
that is, if it can be derived from the in-scope where clauses.
For example, given the following function:
#![allow(unused_variables)] fn main() { fn loud_clone<T: Clone>(stuff: &T) -> T { println!("cloning!"); stuff.clone() } }
Inside the body of our function, we would have FromEnv(T: Clone)
. In-scope
where clauses nest, so a function body inside an impl body inherits the
impl body's where clauses, too.
This and the next rule are used to implement implied bounds. As we'll see
in the section on lowering, FromEnv(TraitRef)
implies Implemented(TraitRef)
,
but not vice versa. This distinction is crucial to implied bounds.
FromEnv(Type)
e.g. FromEnv(HashSet<K>)
True if the inner Type
is assumed to be well-formed, that is, if it is an
input type of a function or an impl.
For example, given the following code:
struct HashSet<K> where K: Hash { ... }
fn loud_insert<K>(set: &mut HashSet<K>, item: K) {
println!("inserting!");
set.insert(item);
}
HashSet<K>
is an input type of the loud_insert
function. Hence, we assume it
to be well-formed, so we would have FromEnv(HashSet<K>)
inside the body of our
function. As we'll see in the section on lowering, FromEnv(HashSet<K>)
implies
Implemented(K: Hash)
because the
HashSet
declaration was written with a K: Hash
where clause. Hence, we don't
need to repeat that bound on the loud_insert
function: we rather automatically
assume that it is true.
WellFormed(Item)
These goals imply that the given item is well-formed.
We can talk about different types of items being well-formed:
-
Types, like
WellFormed(Vec<i32>)
, which is true in Rust, orWellFormed(Vec<str>)
, which is not (becausestr
is notSized
.) -
TraitRefs, like
WellFormed(Vec<i32>: Clone)
.
Well-formedness is important to implied bounds. In particular, the reason
it is okay to assume FromEnv(T: Clone)
in the loud_clone
example is that we
also verify WellFormed(T: Clone)
for each call site of loud_clone
.
Similarly, it is okay to assume FromEnv(HashSet<K>)
in the loud_insert
example because we will verify WellFormed(HashSet<K>)
for each call site of
loud_insert
.
Outlives(Type: Region), Outlives(Region: Region)
e.g. Outlives(&'a str: 'b)
, Outlives('a: 'static)
True if the given type or region on the left outlives the right-hand region.
Coinductive goals
Most goals in our system are "inductive". In an inductive goal, circular reasoning is disallowed. Consider this example clause:
Implemented(Foo: Bar) :-
Implemented(Foo: Bar).
Considered inductively, this clause is useless: if we are trying to
prove Implemented(Foo: Bar)
, we would then recursively have to prove
Implemented(Foo: Bar)
, and that cycle would continue ad infinitum
(the trait solver will terminate here, it would just consider that
Implemented(Foo: Bar)
is not known to be true).
However, some goals are co-inductive. Simply put, this means that
cycles are OK. So, if Bar
were a co-inductive trait, then the rule
above would be perfectly valid, and it would indicate that
Implemented(Foo: Bar)
is true.
Auto traits are one example in Rust where co-inductive goals are used.
Consider the Send
trait, and imagine that we have this struct:
#![allow(unused_variables)] fn main() { struct Foo { next: Option<Box<Foo>> } }
The default rules for auto traits say that Foo
is Send
if the
types of its fields are Send
. Therefore, we would have a rule like
Implemented(Foo: Send) :-
Implemented(Option<Box<Foo>>: Send).
As you can probably imagine, proving that Option<Box<Foo>>: Send
is
going to wind up circularly requiring us to prove that Foo: Send
again. So this would be an example where we wind up in a cycle – but
that's ok, we do consider Foo: Send
to hold, even though it
references itself.
In general, co-inductive traits are used in Rust trait solving when we
want to enumerate a fixed set of possibilities. In the case of auto
traits, we are enumerating the set of reachable types from a given
starting point (i.e., Foo
can reach values of type
Option<Box<Foo>>
, which implies it can reach values of type
Box<Foo>
, and then of type Foo
, and then the cycle is complete).
In addition to auto traits, WellFormed
predicates are co-inductive.
These are used to achieve a similar "enumerate all the cases" pattern,
as described in the section on implied bounds.
Incomplete chapter
Some topics yet to be written:
- Elaborate on the proof procedure
- SLG solving – introduce negative reasoning
Equality and associated types
This section covers how the trait system handles equality between associated types. The full system consists of several moving parts, which we will introduce one by one:
- Projection and the
Normalize
predicate - Placeholder associated type projections
- The
ProjectionEq
predicate - Integration with unification
Associated type projection and normalization
When a trait defines an associated type (e.g.,
the Item
type in the IntoIterator
trait), that
type can be referenced by the user using an associated type
projection like <Option<u32> as IntoIterator>::Item
.
Often, people will use the shorthand syntax
T::Item
. Presently, that syntax is expanded during "type collection" into the explicit form, though that is something we may want to change in the future.
In some cases, associated type projections can be normalized –
that is, simplified – based on the types given in an impl. So, to
continue with our example, the impl of IntoIterator
for Option<T>
declares (among other things) that Item = T
:
impl<T> IntoIterator for Option<T> {
type Item = T;
...
}
This means we can normalize the projection <Option<u32> as IntoIterator>::Item
to just u32
.
In this case, the projection was a "monomorphic" one – that is, it did not have any type parameters. Monomorphic projections are special because they can always be fully normalized.
Often, we can normalize other associated type projections as well. For
example, <Option<?T> as IntoIterator>::Item
, where ?T
is an inference
variable, can be normalized to just ?T
.
In our logic, normalization is defined by a predicate
Normalize
. The Normalize
clauses arise only from
impls. For example, the impl
of IntoIterator
for Option<T>
that
we saw above would be lowered to a program clause like so:
forall<T> {
Normalize(<Option<T> as IntoIterator>::Item -> T) :-
Implemented(Option<T>: IntoIterator)
}
where in this case, the one Implemented
condition is always true.
Since we do not permit quantification over traits, this is really more like a family of program clauses, one for each associated type.
We could apply that rule to normalize either of the examples that we've seen so far.
Placeholder associated types
Sometimes however we want to work with associated types that cannot be normalized. For example, consider this function:
fn foo<T: IntoIterator>(...) { ... }
In this context, how would we normalize the type T::Item
?
Without knowing what T
is, we can't really do so. To represent this case,
we introduce a type called a placeholder associated type projection. This
is written like so: (IntoIterator::Item)<T>
.
You may note that it looks a lot like a regular type (e.g., Option<T>
),
except that the "name" of the type is (IntoIterator::Item)
. This is not an
accident: placeholder associated type projections work just like ordinary
types like Vec<T>
when it comes to unification. That is, they are only
considered equal if (a) they are both references to the same associated type,
like IntoIterator::Item
and (b) their type arguments are equal.
Placeholder associated types are never written directly by the user. They are used internally by the trait system only, as we will see shortly.
In rustc, they correspond to the TyKind::UnnormalizedProjectionTy
enum
variant, declared in librustc_middle/ty/sty.rs
. In chalk, we use an
ApplicationTy
with a name living in a special namespace dedicated to
placeholder associated types (see the TypeName
enum declared in
chalk-ir/src/lib.rs
).
Projection equality
So far we have seen two ways to answer the question of "When can we consider an associated type projection equal to another type?":
- the
Normalize
predicate could be used to transform projections when we knew which impl applied; - placeholder associated types can be used when we don't. This is also known as lazy normalization.
We now introduce the ProjectionEq
predicate to bring those two cases
together. The ProjectionEq
predicate looks like so:
ProjectionEq(<T as IntoIterator>::Item = U)
and we will see that it can be proven either via normalization or
via the placeholder type. As part of lowering an associated type declaration from
some trait, we create two program clauses for ProjectionEq
:
forall<T, U> {
ProjectionEq(<T as IntoIterator>::Item = U) :-
Normalize(<T as IntoIterator>::Item -> U)
}
forall<T> {
ProjectionEq(<T as IntoIterator>::Item = (IntoIterator::Item)<T>)
}
These are the only two ProjectionEq
program clauses we ever make for
any given associated item.
Integration with unification
Now we are ready to discuss how associated type equality integrates with unification. As described in the type inference section, unification is basically a procedure with a signature like this:
Unify(A, B) = Result<(Subgoals, RegionConstraints), NoSolution>
In other words, we try to unify two things A and B. That procedure
might just fail, in which case we get back Err(NoSolution)
. This
would happen, for example, if we tried to unify u32
and i32
.
The key point is that, on success, unification can also give back to us a set of subgoals that still remain to be proven. (It can also give back region constraints, but those are not relevant here).
Whenever unification encounters a non-placeholder associated type
projection P being equated with some other type T, it always succeeds,
but it produces a subgoal ProjectionEq(P = T)
that is propagated
back up. Thus it falls to the ordinary workings of the trait system
to process that constraint.
If we unify two projections P1 and P2, then unification produces a variable X and asks us to prove that
ProjectionEq(P1 = X)
andProjectionEq(P2 = X)
. (That used to be needed in an older system to prevent cycles; I rather doubt it still is. -nmatsakis)
Implied Bounds
Implied bounds remove the need to repeat where clauses written on a type declaration or a trait declaration. For example, say we have the following type declaration:
struct HashSet<K: Hash> {
...
}
then everywhere we use HashSet<K>
as an "input" type, that is appearing in
the receiver type of an impl
or in the arguments of a function, we don't
want to have to repeat the where K: Hash
bound, as in:
// I don't want to have to repeat `where K: Hash` here.
impl<K> HashSet<K> {
...
}
// Same here.
fn loud_insert<K>(set: &mut HashSet<K>, item: K) {
println!("inserting!");
set.insert(item);
}
Note that in the loud_insert
example, HashSet<K>
is not the type
of the set
argument of loud_insert
, it only appears in the
argument type &mut HashSet<K>
: we care about every type appearing
in the function's header (the header is the signature without the return type),
not only types of the function's arguments.
The rationale for applying implied bounds to input types is that, for example,
in order to call the loud_insert
function above, the programmer must have
produced the type HashSet<K>
already, hence the compiler already verified
that HashSet<K>
was well-formed, i.e. that K
effectively implemented
Hash
, as in the following example:
fn main() {
// I am producing a value of type `HashSet<i32>`.
// If `i32` was not `Hash`, the compiler would report an error here.
let set: HashSet<i32> = HashSet::new();
loud_insert(&mut set, 5);
}
Hence, we don't want to repeat where clauses for input types because that would
sort of duplicate the work of the programmer, having to verify that their types
are well-formed both when calling the function and when using them in the
arguments of their function. The same reasoning applies when using an impl
.
Similarly, given the following trait declaration:
trait Copy where Self: Clone { // desugared version of `Copy: Clone`
...
}
then everywhere we bound over SomeType: Copy
, we would like to be able to
use the fact that SomeType: Clone
without having to write it explicitly,
as in:
fn loud_clone<T: Clone>(x: T) {
println!("cloning!");
x.clone();
}
fn fun_with_copy<T: Copy>(x: T) {
println!("will clone a `Copy` type soon...");
// I'm using `loud_clone<T: Clone>` with `T: Copy`, I know this
// implies `T: Clone` so I don't want to have to write it explicitly.
loud_clone(x);
}
The rationale for implied bounds for traits is that if a type implements
Copy
, that is, if there exists an impl Copy
for that type, there ought
to exist an impl Clone
for that type, otherwise the compiler would have
reported an error in the first place. So again, if we were forced to repeat the
additional where SomeType: Clone
everywhere whereas we already know that
SomeType: Copy
hold, we would kind of duplicate the verification work.
Implied bounds are not yet completely enforced in rustc, at the moment it only works for outlive requirements, super trait bounds, and bounds on associated types. The full RFC can be found here. We'll give here a brief view of how implied bounds work and why we chose to implement it that way. The complete set of lowering rules can be found in the corresponding chapter.
Implied bounds and lowering rules
Now we need to express implied bounds in terms of logical rules. We will start with exposing a naive way to do it. Suppose that we have the following traits:
trait Foo {
...
}
trait Bar where Self: Foo { } {
...
}
So we would like to say that if a type implements Bar
, then necessarily
it must also implement Foo
. We might think that a clause like this would
work:
forall<Type> {
Implemented(Type: Foo) :- Implemented(Type: Bar).
}
Now suppose that we just write this impl:
struct X;
impl Bar for X { }
Clearly this should not be allowed: indeed, we wrote a Bar
impl for X
, but
the Bar
trait requires that we also implement Foo
for X
, which we never
did. In terms of what the compiler does, this would look like this:
struct X;
impl Bar for X {
// We are in a `Bar` impl for the type `X`.
// There is a `where Self: Foo` bound on the `Bar` trait declaration.
// Hence I need to prove that `X` also implements `Foo` for that impl
// to be legal.
}
So the compiler would try to prove Implemented(X: Foo)
. Of course it will
not find any impl Foo for X
since we did not write any. However, it
will see our implied bound clause:
forall<Type> {
Implemented(Type: Foo) :- Implemented(Type: Bar).
}
so that it may be able to prove Implemented(X: Foo)
if Implemented(X: Bar)
holds. And it turns out that Implemented(X: Bar)
does hold since we wrote
a Bar
impl for X
! Hence the compiler will accept the Bar
impl while it
should not.
Implied bounds coming from the environment
So the naive approach does not work. What we need to do is to somehow decouple
implied bounds from impls. Suppose we know that a type SomeType<...>
implements Bar
and we want to deduce that SomeType<...>
must also implement
Foo
.
There are two possibilities: first, we have enough information about
SomeType<...>
to see that there exists a Bar
impl in the program which
covers SomeType<...>
, for example a plain impl<...> Bar for SomeType<...>
.
Then if the compiler has done its job correctly, there must exist a Foo
impl which covers SomeType<...>
, e.g. another plain
impl<...> Foo for SomeType<...>
. In that case then, we can just use this
impl and we do not need implied bounds at all.
Second possibility: we do not know enough about SomeType<...>
in order to
find a Bar
impl which covers it, for example if SomeType<...>
is just
a type parameter in a function:
fn foo<T: Bar>() {
// We'd like to deduce `Implemented(T: Foo)`.
}
That is, the information that T
implements Bar
here comes from the
environment. The environment is the set of things that we assume to be true
when we type check some Rust declaration. In that case, what we assume is that
T: Bar
. Then at that point, we might authorize ourselves to have some kind
of "local" implied bound reasoning which would say
Implemented(T: Foo) :- Implemented(T: Bar)
. This reasoning would
only be done within our foo
function in order to avoid the earlier
problem where we had a global clause.
We can apply these local reasonings everywhere we can have an environment -- i.e. when we can write where clauses -- that is, inside impls, trait declarations, and type declarations.
Computing implied bounds with FromEnv
The previous subsection showed that it was only useful to compute implied bounds for facts coming from the environment. We talked about "local" rules, but there are multiple possible strategies to indeed implement the locality of implied bounds.
In rustc, the current strategy is to elaborate bounds: that is, each time we have a fact in the environment, we recursively derive all the other things that are implied by this fact until we reach a fixed point. For example, if we have the following declarations:
trait A { }
trait B where Self: A { }
trait C where Self: B { }
fn foo<T: C>() {
...
}
then inside the foo
function, we start with an environment containing only
Implemented(T: C)
. Then because of implied bounds for the C
trait, we
elaborate Implemented(T: B)
and add it to our environment. Because of
implied bounds for the B
trait, we elaborate Implemented(T: A)
and add it
to our environment as well. We cannot elaborate anything else, so we conclude
that our final environment consists of Implemented(T: A + B + C)
.
In the new-style trait system, we like to encode as many things as possible with logical rules. So rather than "elaborating", we have a set of global program clauses defined like so:
forall<T> { Implemented(T: A) :- FromEnv(T: A). }
forall<T> { Implemented(T: B) :- FromEnv(T: B). }
forall<T> { FromEnv(T: A) :- FromEnv(T: B). }
forall<T> { Implemented(T: C) :- FromEnv(T: C). }
forall<T> { FromEnv(T: C) :- FromEnv(T: C). }
So these clauses are defined globally (that is, they are available from
everywhere in the program) but they cannot be used because the hypothesis
is always of the form FromEnv(...)
which is a bit special. Indeed, as
indicated by the name, FromEnv(...)
facts can only come from the
environment.
How it works is that in the foo
function, instead of having an environment
containing Implemented(T: C)
, we replace this environment with
FromEnv(T: C)
. From here and thanks to the above clauses, we see that we
are able to reach any of Implemented(T: A)
, Implemented(T: B)
or
Implemented(T: C)
, which is what we wanted.
Implied bounds and well-formedness checking
Implied bounds are tightly related with well-formedness checking. Well-formedness checking is the process of checking that the impls the programmer wrote are legal, what we referred to earlier as "the compiler doing its job correctly".
We already saw examples of illegal and legal impls:
trait Foo { }
trait Bar where Self: Foo { }
struct X;
struct Y;
impl Bar for X {
// This impl is not legal: the `Bar` trait requires that we also
// implement `Foo`, and we didn't.
}
impl Foo for Y {
// This impl is legal: there is nothing to check as there are no where
// clauses on the `Foo` trait.
}
impl Bar for Y {
// This impl is legal: we have a `Foo` impl for `Y`.
}
We must define what "legal" and "illegal" mean. For this, we introduce another
predicate: WellFormed(Type: Trait)
. We say that the trait reference
Type: Trait
is well-formed if Type
meets the bounds written on the
Trait
declaration. For each impl we write, assuming that the where clauses
declared on the impl hold, the compiler tries to prove that the corresponding
trait reference is well-formed. The impl is legal if the compiler manages to do
so.
Coming to the definition of WellFormed(Type: Trait)
, it would be tempting
to define it as:
trait Trait where WC1, WC2, ..., WCn {
...
}
forall<Type> {
WellFormed(Type: Trait) :- WC1 && WC2 && .. && WCn.
}
and indeed this was basically what was done in rustc until it was noticed that
this mixed badly with implied bounds. The key thing is that implied bounds
allows someone to derive all bounds implied by a fact in the environment, and
this transitively as we've seen with the A + B + C
traits example.
However, the WellFormed
predicate as defined above only checks that the
direct superbounds hold. That is, if we come back to our A + B + C
example:
trait A { }
// No where clauses, always well-formed.
// forall<Type> { WellFormed(Type: A). }
trait B where Self: A { }
// We only check the direct superbound `Self: A`.
// forall<Type> { WellFormed(Type: B) :- Implemented(Type: A). }
trait C where Self: B { }
// We only check the direct superbound `Self: B`. We do not check
// the `Self: A` implied bound coming from the `Self: B` superbound.
// forall<Type> { WellFormed(Type: C) :- Implemented(Type: B). }
There is an asymmetry between the recursive power of implied bounds and
the shallow checking of WellFormed
. It turns out that this asymmetry
can be exploited. Indeed, suppose that we define the following
traits:
trait Partial where Self: Copy { }
// WellFormed(Self: Partial) :- Implemented(Self: Copy).
trait Complete where Self: Partial { }
// WellFormed(Self: Complete) :- Implemented(Self: Partial).
impl<T> Partial for T where T: Complete { }
impl<T> Complete for T { }
For the Partial
impl, what the compiler must prove is:
forall<T> {
if (T: Complete) { // assume that the where clauses hold
WellFormed(T: Partial) // show that the trait reference is well-formed
}
}
Proving WellFormed(T: Partial)
amounts to proving Implemented(T: Copy)
.
However, we have Implemented(T: Complete)
in our environment: thanks to
implied bounds, we can deduce Implemented(T: Partial)
. Using implied bounds
one level deeper, we can deduce Implemented(T: Copy)
. Finally, the Partial
impl is legal.
For the Complete
impl, what the compiler must prove is:
forall<T> {
WellFormed(T: Complete) // show that the trait reference is well-formed
}
Proving WellFormed(T: Complete)
amounts to proving Implemented(T: Partial)
.
We see that the impl Partial for T
applies if we can prove
Implemented(T: Complete)
, and it turns out we can prove this fact since our
impl<T> Complete for T
is a blanket impl without any where clauses.
So both impls are legal and the compiler accepts the program. Moreover, thanks
to the Complete
blanket impl, all types implement Complete
. So we could
now use this impl like so:
fn eat<T>(x: T) { }
fn copy_everything<T: Complete>(x: T) {
eat(x);
eat(x);
}
fn main() {
let not_copiable = vec![1, 2, 3, 4];
copy_everything(not_copiable);
}
In this program, we use the fact that Vec<i32>
implements Complete
, as any
other type. Hence we can call copy_everything
with an argument of type
Vec<i32>
. Inside the copy_everything
function, we have the
Implemented(T: Complete)
bound in our environment. Thanks to implied bounds,
we can deduce Implemented(T: Partial)
. Using implied bounds again, we deduce
Implemented(T: Copy)
and we can indeed call the eat
function which moves
the argument twice since its argument is Copy
. Problem: the T
type was
in fact Vec<i32>
which is not copy at all, hence we will double-free the
underlying vec storage so we have a memory unsoundness in safe Rust.
Of course, disregarding the asymmetry between WellFormed
and implied bounds,
this bug was possible only because we had some kind of self-referencing impls.
But self-referencing impls are very useful in practice and are not the real
culprits in this affair.
Co-inductiveness of WellFormed
So the solution is to fix this asymmetry between WellFormed
and implied
bounds. For that, we need for the WellFormed
predicate to not only require
that the direct superbounds hold, but also all the bounds transitively implied
by the superbounds. What we can do is to have the following rules for the
WellFormed
predicate:
trait A { }
// WellFormed(Self: A) :- Implemented(Self: A).
trait B where Self: A { }
// WellFormed(Self: B) :- Implemented(Self: B) && WellFormed(Self: A).
trait C where Self: B { }
// WellFormed(Self: C) :- Implemented(Self: C) && WellFormed(Self: B).
Notice that we are now also requiring Implemented(Self: Trait)
for
WellFormed(Self: Trait)
to be true: this is to simplify the process of
traversing all the implied bounds transitively. This does not change anything
when checking whether impls are legal, because since we assume
that the where clauses hold inside the impl, we know that the corresponding
trait reference do hold. Thanks to this setup, you can see that we indeed
require to prove the set of all bounds transitively implied by the where
clauses.
However there is still a catch. Suppose that we have the following trait definition:
trait Foo where <Self as Foo>::Item: Foo {
type Item;
}
so this definition is a bit more involved than the ones we've seen already because it defines an associated item. However, the well-formedness rule would not be more complicated:
WellFormed(Self: Foo) :-
Implemented(Self: Foo) &&
WellFormed(<Self as Foo>::Item: Foo).
Now we would like to write the following impl:
impl Foo for i32 {
type Item = i32;
}
The Foo
trait definition and the impl Foo for i32
are perfectly valid
Rust: we're kind of recursively using our Foo
impl in order to show that
the associated value indeed implements Foo
, but that's ok. But if we
translate this to our well-formedness setting, the compiler proof process
inside the Foo
impl is the following: it starts with proving that the
well-formedness goal WellFormed(i32: Foo)
is true. In order to do that,
it must prove the following goals: Implemented(i32: Foo)
and
WellFormed(<i32 as Foo>::Item: Foo)
. Implemented(i32: Foo)
holds because
there is our impl and there are no where clauses on it so it's always true.
However, because of the associated type value we used,
WellFormed(<i32 as Foo>::Item: Foo)
simplifies to just
WellFormed(i32: Foo)
. So in order to prove its original goal
WellFormed(i32: Foo)
, the compiler needs to prove WellFormed(i32: Foo)
:
this clearly is a cycle and cycles are usually rejected by the trait solver,
unless... if the WellFormed
predicate was made to be co-inductive.
A co-inductive predicate, as discussed in the chapter on
goals and clauses, are predicates
for which the
trait solver accepts cycles. In our setting, this would be a valid thing to do:
indeed, the WellFormed
predicate just serves as a way of enumerating all
the implied bounds. Hence, it's like a fixed point algorithm: it tries to grow
the set of implied bounds until there is nothing more to add. Here, a cycle
in the chain of WellFormed
predicates just means that there is no more bounds
to add in that direction, so we can just accept this cycle and focus on other
directions. It's easy to prove that under these co-inductive semantics, we
are effectively visiting all the transitive implied bounds, and only these.
Implied bounds on types
We mainly talked about implied bounds for traits because this was the most subtle regarding implementation. Implied bounds on types are simpler, especially because if we assume that a type is well-formed, we don't use that fact to deduce that other types are well-formed, we only use it to deduce that e.g. some trait bounds hold.
For types, we just use rules like these ones:
struct Type<...> where WC1, ..., WCn {
...
}
forall<...> {
WellFormed(Type<...>) :- WC1, ..., WCn.
}
forall<...> {
FromEnv(WC1) :- FromEnv(Type<...>).
...
FromEnv(WCn) :- FromEnv(Type<...>).
}
We can see that we have this asymmetry between well-formedness check,
which only verifies that the direct superbounds hold, and implied bounds which
gives access to all bounds transitively implied by the where clauses. In that
case this is ok because as we said, we don't use FromEnv(Type<...>)
to deduce
other FromEnv(OtherType<...>)
things, nor do we use FromEnv(Type: Trait)
to
deduce FromEnv(OtherType<...>)
things. So in that sense type definitions are
"less recursive" than traits, and we saw in a previous subsection that
it was the combination of asymmetry and recursive trait / impls that led to
unsoundness. As such, the WellFormed(Type<...>)
predicate does not need
to be co-inductive.
This asymmetry optimization is useful because in a real Rust program, we have to check the well-formedness of types very often (e.g. for each type which appears in the body of a function).
Region constraints
To be written.
Chalk does not have the concept of region constraints, and as of this writing, work on rustc was not far enough to worry about them.
In the meantime, you can read about region constraints in the type inference section.
The lowering module in rustc
The program clauses described in the
lowering rules section are actually
created in the rustc_traits::lowering
module.
The program_clauses_for
query
The main entry point is the program_clauses_for
query, which –
given a DefId
– produces a set of Chalk program clauses. The
query is invoked on a DefId
that identifies something like a trait,
an impl, or an associated item definition. It then produces and
returns a vector of program clauses.
Unit tests
Note: We've removed the Chalk unit tests in rust-lang/rust#69247. They will come back once we're ready to integrate next Chalk into rustc.
Here's a good example test. At the time of this writing, it looked like this:
#![feature(rustc_attrs)]
trait Foo { }
#[rustc_dump_program_clauses] //~ ERROR program clause dump
impl<T: 'static> Foo for T where T: Iterator<Item = i32> { }
fn main() {
println!("hello");
}
The #[rustc_dump_program_clauses]
annotation can be attached to
anything with a DefId
(It requires the rustc_attrs
feature). The
compiler will then invoke the program_clauses_for
query on that
item, and emit compiler errors that dump the clauses produced. These
errors just exist for unit-testing. The stderr will be:
error: program clause dump
--> $DIR/lower_impl.rs:5:1
|
LL | #[rustc_dump_program_clauses]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: forall<T> { Implemented(T: Foo) :- ProjectionEq(<T as std::iter::Iterator>::Item == i32), TypeOutlives(T: 'static), Implemented(T: std::iter::Iterator), Implemented(T: std::marker::Sized). }
Lowering rules
This section gives the complete lowering rules for Rust traits into program clauses. It is a kind of reference. These rules reference the domain goals defined in an earlier section.
Notation
The nonterminal Pi
is used to mean some generic parameter, either a
named lifetime like 'a
or a type parameter like A
.
The nonterminal Ai
is used to mean some generic argument, which
might be a lifetime like 'a
or a type like Vec<A>
.
When defining the lowering rules, we will give goals and clauses in
the notation given in this section.
We sometimes insert "macros" like LowerWhereClause!
into these
definitions; these macros reference other sections within this chapter.
Rule names and cross-references
Each of these lowering rules is given a name, documented with a comment like so:
// Rule Foo-Bar-Baz
The reference implementation of these rules is to be found in
chalk/chalk-solve/src/clauses.rs
. They are also ported in
rustc in the librustc_traits
crate.
Lowering where clauses
When used in a goal position, where clauses can be mapped directly to
the Holds
variant of domain goals, as follows:
A0: Foo<A1..An>
maps toImplemented(A0: Foo<A1..An>)
T: 'r
maps toOutlives(T, 'r)
'a: 'b
maps toOutlives('a, 'b)
A0: Foo<A1..An, Item = T>
is a bit special and expands to two distinct goals, namelyImplemented(A0: Foo<A1..An>)
andProjectionEq(<A0 as Foo<A1..An>>::Item = T)
In the rules below, we will use WC
to indicate where clauses that
appear in Rust syntax; we will then use the same WC
to indicate
where those where clauses appear as goals in the program clauses that
we are producing. In that case, the mapping above is used to convert
from the Rust syntax into goals.
Transforming the lowered where clauses
In addition, in the rules below, we sometimes do some transformations on the lowered where clauses, as defined here:
FromEnv(WC)
– this indicates that:Implemented(TraitRef)
becomesFromEnv(TraitRef)
- other where-clauses are left intact
WellFormed(WC)
– this indicates that:Implemented(TraitRef)
becomesWellFormed(TraitRef)
- other where-clauses are left intact
TODO: I suspect that we want to alter the outlives relations too, but Chalk isn't modeling those right now.
Lowering traits
Given a trait definition
trait Trait<P1..Pn> // P0 == Self
where WC
{
// trait items
}
we will produce a number of declarations. This section is focused on
the program clauses for the trait header (i.e., the stuff outside the
{}
); the section on trait items covers the stuff
inside the {}
.
Trait header
From the trait itself we mostly make "meta" rules that setup the
relationships between different kinds of domain goals. The first such
rule from the trait header creates the mapping between the FromEnv
and Implemented
predicates:
// Rule Implemented-From-Env
forall<Self, P1..Pn> {
Implemented(Self: Trait<P1..Pn>) :- FromEnv(Self: Trait<P1..Pn>)
}
Implied bounds
The next few clauses have to do with implied bounds (see also RFC 2089 and the implied bounds chapter for a more in depth cover). For each trait, we produce two clauses:
// Rule Implied-Bound-From-Trait
//
// For each where clause WC:
forall<Self, P1..Pn> {
FromEnv(WC) :- FromEnv(Self: Trait<P1..Pn>)
}
This clause says that if we are assuming that the trait holds, then we can also assume that its where-clauses hold. It's perhaps useful to see an example:
trait Eq: PartialEq { ... }
In this case, the PartialEq
supertrait is equivalent to a where Self: PartialEq
where clause, in our simplified model. The program
clause above therefore states that if we can prove FromEnv(T: Eq)
–
e.g., if we are in some function with T: Eq
in its where clauses –
then we also know that FromEnv(T: PartialEq)
. Thus the set of things
that follow from the environment are not only the direct where
clauses but also things that follow from them.
The next rule is related; it defines what it means for a trait reference to be well-formed:
// Rule WellFormed-TraitRef
forall<Self, P1..Pn> {
WellFormed(Self: Trait<P1..Pn>) :- Implemented(Self: Trait<P1..Pn>) && WellFormed(WC)
}
This WellFormed
rule states that T: Trait
is well-formed if (a)
T: Trait
is implemented and (b) all the where-clauses declared on
Trait
are well-formed (and hence they are implemented). Remember
that the WellFormed
predicate is
coinductive; in this
case, it is serving as a kind of "carrier" that allows us to enumerate
all the where clauses that are transitively implied by T: Trait
.
An example:
trait Foo: A + Bar { }
trait Bar: B + Foo { }
trait A { }
trait B { }
Here, the transitive set of implications for T: Foo
are T: A
, T: Bar
, and
T: B
. And indeed if we were to try to prove WellFormed(T: Foo)
, we would
have to prove each one of those:
WellFormed(T: Foo)
Implemented(T: Foo)
WellFormed(T: A)
Implemented(T: A)
WellFormed(T: Bar)
Implemented(T: Bar)
WellFormed(T: B)
Implemented(T: Bar)
WellFormed(T: Foo)
-- cycle, true coinductively
This WellFormed
predicate is only used when proving that impls are
well-formed – basically, for each impl of some trait ref TraitRef
,
we must show that WellFormed(TraitRef)
. This in turn justifies the
implied bounds rules that allow us to extend the set of FromEnv
items.
Lowering type definitions
We also want to have some rules which define when a type is well-formed. For example, given this type:
struct Set<K> where K: Hash { ... }
then Set<i32>
is well-formed because i32
implements Hash
, but
Set<NotHash>
would not be well-formed. Basically, a type is well-formed
if its parameters verify the where clauses written on the type definition.
Hence, for every type definition:
struct Type<P1..Pn> where WC { ... }
we produce the following rule:
// Rule WellFormed-Type
forall<P1..Pn> {
WellFormed(Type<P1..Pn>) :- WC
}
Note that we use struct
for defining a type, but this should be understood
as a general type definition (it could be e.g. a generic enum
).
Conversely, we define rules which say that if we assume that a type is well-formed, we can also assume that its where clauses hold. That is, we produce the following family of rules:
// Rule Implied-Bound-From-Type
//
// For each where clause `WC`
forall<P1..Pn> {
FromEnv(WC) :- FromEnv(Type<P1..Pn>)
}
As for the implied bounds RFC, functions will assume that their arguments are well-formed. For example, suppose we have the following bit of code:
trait Hash: Eq { }
struct Set<K: Hash> { ... }
fn foo<K>(collection: Set<K>, x: K, y: K) {
// `x` and `y` can be equalized even if we did not explicitly write
// `where K: Eq`
if x == y {
...
}
}
In the foo
function, we assume that Set<K>
is well-formed, i.e. we have
FromEnv(Set<K>)
in our environment. Because of the previous rule, we get
FromEnv(K: Hash)
without needing an explicit where clause. And because
of the Hash
trait definition, there also exists a rule which says:
forall<K> {
FromEnv(K: Eq) :- FromEnv(K: Hash)
}
which means that we finally get FromEnv(K: Eq)
and then can compare x
and y
without needing an explicit where clause.
Lowering trait items
Associated type declarations
Given a trait that declares a (possibly generic) associated type:
trait Trait<P1..Pn> // P0 == Self
where WC
{
type AssocType<Pn+1..Pm>: Bounds where WC1;
}
We will produce a number of program clauses. The first two define
the rules by which ProjectionEq
can succeed; these two clauses are discussed
in detail in the section on associated types,
but reproduced here for reference:
// Rule ProjectionEq-Normalize
//
// ProjectionEq can succeed by normalizing:
forall<Self, P1..Pn, Pn+1..Pm, U> {
ProjectionEq(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> = U) :-
Normalize(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> -> U)
}
// Rule ProjectionEq-Placeholder
//
// ProjectionEq can succeed through the placeholder associated type,
// see "associated type" chapter for more:
forall<Self, P1..Pn, Pn+1..Pm> {
ProjectionEq(
<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> =
(Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>
)
}
The next rule covers implied bounds for the projection. In particular,
the Bounds
declared on the associated type must have been proven to hold
to show that the impl is well-formed, and hence we can rely on them
elsewhere.
// Rule Implied-Bound-From-AssocTy
//
// For each `Bound` in `Bounds`:
forall<Self, P1..Pn, Pn+1..Pm> {
FromEnv(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm>>: Bound) :-
FromEnv(Self: Trait<P1..Pn>) && WC1
}
Next, we define the requirements for an instantiation of our associated type to be well-formed...
// Rule WellFormed-AssocTy
forall<Self, P1..Pn, Pn+1..Pm> {
WellFormed((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>) :-
Implemented(Self: Trait<P1..Pn>) && WC1
}
...along with the reverse implications, when we can assume that it is well-formed.
// Rule Implied-WC-From-AssocTy
//
// For each where clause WC1:
forall<Self, P1..Pn, Pn+1..Pm> {
FromEnv(WC1) :- FromEnv((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>)
}
// Rule Implied-Trait-From-AssocTy
forall<Self, P1..Pn, Pn+1..Pm> {
FromEnv(Self: Trait<P1..Pn>) :-
FromEnv((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>)
}
Lowering function and constant declarations
Chalk didn't model functions and constants, but I would eventually like to treat them exactly like normalization. See the section on function/constant values below for more details.
Lowering impls
Given an impl of a trait:
impl<P0..Pn> Trait<A1..An> for A0
where WC
{
// zero or more impl items
}
Let TraitRef
be the trait reference A0: Trait<A1..An>
. Then we
will create the following rules:
// Rule Implemented-From-Impl
forall<P0..Pn> {
Implemented(TraitRef) :- WC
}
In addition, we will lower all of the impl items.
Lowering impl items
Associated type values
Given an impl that contains:
impl<P0..Pn> Trait<P1..Pn> for P0
where WC_impl
{
type AssocType<Pn+1..Pm> = T;
}
and our where clause WC1
on the trait associated type from above, we
produce the following rule:
// Rule Normalize-From-Impl
forall<P0..Pm> {
forall<Pn+1..Pm> {
Normalize(<P0 as Trait<P1..Pn>>::AssocType<Pn+1..Pm> -> T) :-
Implemented(P0 as Trait) && WC1
}
}
Note that WC_impl
and WC1
both encode where-clauses that the impl can
rely on. (WC_impl
is not used here, because it is implied by
Implemented(P0 as Trait)
.)
Function and constant values
Chalk didn't model functions and constants, but I would eventually
like to treat them exactly like normalization. This presumably
involves adding a new kind of parameter (constant), and then having a
NormalizeValue
domain goal. This is to be written because the
details are a bit up in the air.
Well-formedness checking
WF checking has the job of checking that the various declarations in a Rust program are well-formed. This is the basis for implied bounds, and partly for that reason, this checking can be surprisingly subtle! For example, we have to be sure that each impl proves the WF conditions declared on the trait.
For each declaration in a Rust program, we will generate a logical goal and try to prove it using the lowered rules we described in the lowering rules chapter. If we are able to prove it, we say that the construct is well-formed. If not, we report an error to the user.
Well-formedness checking happens in the chalk/chalk-solve/src/wf.rs
module in chalk. After you have read this chapter, you may find useful to see
an extended set of examples in the chalk/tests/test/wf_lowering.rs
submodule.
The new-style WF checking has not been implemented in rustc yet.
We give here a complete reference of the generated goals for each Rust declaration.
In addition to the notations introduced in the chapter about
lowering rules, we'll introduce another notation: when checking WF of a
declaration, we'll often have to prove that all types that appear are
well-formed, except type parameters that we always assume to be WF. Hence,
we'll use the following notation: for a type SomeType<...>
, we define
InputTypes(SomeType<...>)
to be the set of all non-parameter types appearing
in SomeType<...>
, including SomeType<...>
itself.
Examples:
InputTypes((u32, f32)) = [u32, f32, (u32, f32)]
InputTypes(Box<T>) = [Box<T>]
(assuming thatT
is a type parameter)InputTypes(Box<Box<T>>) = [Box<T>, Box<Box<T>>]
We also extend the InputTypes
notation to where clauses in the natural way.
So, for example InputTypes(A0: Trait<A1,...,An>)
is the union of
InputTypes(A0)
, InputTypes(A1)
, ..., InputTypes(An)
.
Type definitions
Given a general type definition:
struct Type<P...> where WC_type {
field1: A1,
...
fieldn: An,
}
we generate the following goal, which represents its well-formedness condition:
forall<P...> {
if (FromEnv(WC_type)) {
WellFormed(InputTypes(WC_type)) &&
WellFormed(InputTypes(A1)) &&
...
WellFormed(InputTypes(An))
}
}
which in English states: assuming that the where clauses defined on the type hold, prove that every type appearing in the type definition is well-formed.
Some examples:
struct OnlyClone<T> where T: Clone {
clonable: T,
}
// The only types appearing are type parameters: we have nothing to check,
// the type definition is well-formed.
struct Foo<T> where T: Clone {
foo: OnlyClone<T>,
}
// The only non-parameter type which appears in this definition is
// `OnlyClone<T>`. The generated goal is the following:
// ```
// forall<T> {
// if (FromEnv(T: Clone)) {
// WellFormed(OnlyClone<T>)
// }
// }
// ```
// which is provable.
struct Bar<T> where <T as Iterator>::Item: Debug {
bar: i32,
}
// The only non-parameter types which appear in this definition are
// `<T as Iterator>::Item` and `i32`. The generated goal is the following:
// ```
// forall<T> {
// if (FromEnv(<T as Iterator>::Item: Debug)) {
// WellFormed(<T as Iterator>::Item) &&
// WellFormed(i32)
// }
// }
// ```
// which is not provable since `WellFormed(<T as Iterator>::Item)` requires
// proving `Implemented(T: Iterator)`, and we are unable to prove that for an
// unknown `T`.
//
// Hence, this type definition is considered illegal. An additional
// `where T: Iterator` would make it legal.
Trait definitions
Given a general trait definition:
trait Trait<P1...> where WC_trait {
type Assoc<P2...>: Bounds_assoc where WC_assoc;
}
we generate the following goal:
forall<P1...> {
if (FromEnv(WC_trait)) {
WellFormed(InputTypes(WC_trait)) &&
forall<P2...> {
if (FromEnv(WC_assoc)) {
WellFormed(InputTypes(Bounds_assoc)) &&
WellFormed(InputTypes(WC_assoc))
}
}
}
}
There is not much to verify in a trait definition. We just want to prove that the types appearing in the trait definition are well-formed, under the assumption that the different where clauses hold.
Some examples:
trait Foo<T> where T: Iterator, <T as Iterator>::Item: Debug {
...
}
// The only non-parameter type which appears in this definition is
// `<T as Iterator>::Item`. The generated goal is the following:
// ```
// forall<T> {
// if (FromEnv(T: Iterator), FromEnv(<T as Iterator>::Item: Debug)) {
// WellFormed(<T as Iterator>::Item)
// }
// }
// ```
// which is provable thanks to the `FromEnv(T: Iterator)` assumption.
trait Bar {
type Assoc<T>: From<<T as Iterator>::Item>;
}
// The only non-parameter type which appears in this definition is
// `<T as Iterator>::Item`. The generated goal is the following:
// ```
// forall<T> {
// WellFormed(<T as Iterator>::Item)
// }
// ```
// which is not provable, hence the trait definition is considered illegal.
trait Baz {
type Assoc<T>: From<<T as Iterator>::Item> where T: Iterator;
}
// The generated goal is now:
// ```
// forall<T> {
// if (FromEnv(T: Iterator)) {
// WellFormed(<T as Iterator>::Item)
// }
// }
// ```
// which is now provable.
Impls
Now we give ourselves a general impl for the trait defined above:
impl<P1...> Trait<A1...> for SomeType<A2...> where WC_impl {
type Assoc<P2...> = SomeValue<A3...> where WC_assoc;
}
Note that here, WC_assoc
are the same where clauses as those defined on the
associated type definition in the trait declaration, except that type
parameters from the trait are substituted with values provided by the impl
(see example below). You cannot add new where clauses. You may omit to write
the where clauses if you want to emphasize the fact that you are actually not
relying on them.
Some examples to illustrate that:
trait Foo<T> {
type Assoc where T: Clone;
}
struct OnlyClone<T: Clone> { ... }
impl<U> Foo<Option<U>> for () {
// We substitute type parameters from the trait by the ones provided
// by the impl, that is instead of having a `T: Clone` where clause,
// we have an `Option<U>: Clone` one.
type Assoc = OnlyClone<Option<U>> where Option<U>: Clone;
}
impl<T> Foo<T> for i32 {
// I'm not using the `T: Clone` where clause from the trait, so I can
// omit it.
type Assoc = u32;
}
impl<T> Foo<T> for f32 {
type Assoc = OnlyClone<Option<T>> where Option<T>: Clone;
// ^^^^^^^^^^^^^^^^^^^^^^
// this where clause does not exist
// on the original trait decl: illegal
}
So in Rust, where clauses on associated types work exactly like where clauses on trait methods: in an impl, we must substitute the parameters from the traits with values provided by the impl, we may omit them if we don't need them, but we cannot add new where clauses.
Now let's see the generated goal for this general impl:
forall<P1...> {
// Well-formedness of types appearing in the impl
if (FromEnv(WC_impl), FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>))) {
WellFormed(InputTypes(WC_impl)) &&
forall<P2...> {
if (FromEnv(WC_assoc)) {
WellFormed(InputTypes(SomeValue<A3...>))
}
}
}
// Implied bounds checking
if (FromEnv(WC_impl), FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>))) {
WellFormed(SomeType<A2...>: Trait<A1...>) &&
forall<P2...> {
if (FromEnv(WC_assoc)) {
WellFormed(SomeValue<A3...>: Bounds_assoc)
}
}
}
}
Here is the most complex goal. As always, first, assuming that
the various where clauses hold, we prove that every type appearing in the impl
is well-formed, except types appearing in the impl header
SomeType<A2...>: Trait<A1...>
. Instead, we assume that those types are
well-formed
(hence the if (FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>)))
conditions). This is
part of the implied bounds proposal, so that we can rely on the bounds
written on the definition of e.g. the SomeType<A2...>
type (and that we don't
need to repeat those bounds).
Note that we don't need to check well-formedness of types appearing in
WC_assoc
because we already did that in the trait decl (they are just repeated with some substitutions of values which we already assume to be well-formed)
Next, still assuming that the where clauses on the impl WC_impl
hold and that
the input types of SomeType<A2...>
are well-formed, we prove that
WellFormed(SomeType<A2...>: Trait<A1...>)
hold. That is, we want to prove
that SomeType<A2...>
verify all the where clauses that might transitively
be required by the Trait
definition (see
this subsection).
Lastly, assuming in addition that the where clauses on the associated type
WC_assoc
hold,
we prove that WellFormed(SomeValue<A3...>: Bounds_assoc)
hold. Again, we are
not only proving Implemented(SomeValue<A3...>: Bounds_assoc)
, but also
all the facts that might transitively come from Bounds_assoc
. We must do this
because we allow the use of implied bounds on associated types: if we have
FromEnv(SomeType: Trait)
in our environment, the lowering rules
chapter indicates that we are able to deduce
FromEnv(<SomeType as Trait>::Assoc: Bounds_assoc)
without knowing what the
precise value of <SomeType as Trait>::Assoc
is.
Some examples for the generated goal:
// Trait Program Clauses
// These are program clauses that come from the trait definitions below
// and that the trait solver can use for its reasonings. I'm just restating
// them here so that we have them in mind.
trait Copy { }
// This is a program clause that comes from the trait definition above
// and that the trait solver can use for its reasonings. I'm just restating
// it here (and also the few other ones coming just after) so that we have
// them in mind.
// `WellFormed(Self: Copy) :- Implemented(Self: Copy).`
trait Partial where Self: Copy { }
// ```
// WellFormed(Self: Partial) :-
// Implemented(Self: Partial) &&
// WellFormed(Self: Copy).
// ```
trait Complete where Self: Partial { }
// ```
// WellFormed(Self: Complete) :-
// Implemented(Self: Complete) &&
// WellFormed(Self: Partial).
// ```
// Impl WF Goals
impl<T> Partial for T where T: Complete { }
// The generated goal is:
// ```
// forall<T> {
// if (FromEnv(T: Complete)) {
// WellFormed(T: Partial)
// }
// }
// ```
// Then proving `WellFormed(T: Partial)` amounts to proving
// `Implemented(T: Partial)` and `Implemented(T: Copy)`.
// Both those facts can be deduced from the `FromEnv(T: Complete)` in our
// environment: this impl is legal.
impl<T> Complete for T { }
// The generated goal is:
// ```
// forall<T> {
// WellFormed(T: Complete)
// }
// ```
// Then proving `WellFormed(T: Complete)` amounts to proving
// `Implemented(T: Complete)`, `Implemented(T: Partial)` and
// `Implemented(T: Copy)`.
//
// `Implemented(T: Complete)` can be proved thanks to the
// `impl<T> Complete for T` blanket impl.
//
// `Implemented(T: Partial)` can be proved thanks to the
// `impl<T> Partial for T where T: Complete` impl and because we know
// `T: Complete` holds.
// However, `Implemented(T: Copy)` cannot be proved: the impl is illegal.
// An additional `where T: Copy` bound would be sufficient to make that impl
// legal.
trait Bar { }
impl<T> Bar for T where <T as Iterator>::Item: Bar { }
// We have a non-parameter type appearing in the where clauses:
// `<T as Iterator>::Item`. The generated goal is:
// ```
// forall<T> {
// if (FromEnv(<T as Iterator>::Item: Bar)) {
// WellFormed(T: Bar) &&
// WellFormed(<T as Iterator>::Item: Bar)
// }
// }
// ```
// And `WellFormed(<T as Iterator>::Item: Bar)` is not provable: we'd need
// an additional `where T: Iterator` for example.
trait Foo { }
trait Bar {
type Item: Foo;
}
struct Stuff<T> { }
impl<T> Bar for Stuff<T> where T: Foo {
type Item = T;
}
// The generated goal is:
// ```
// forall<T> {
// if (FromEnv(T: Foo)) {
// WellFormed(T: Foo).
// }
// }
// ```
// which is provable.
trait Debug { ... }
// `WellFormed(Self: Debug) :- Implemented(Self: Debug).`
struct Box<T> { ... }
impl<T> Debug for Box<T> where T: Debug { ... }
trait PointerFamily {
type Pointer<T>: Debug where T: Debug;
}
// `WellFormed(Self: PointerFamily) :- Implemented(Self: PointerFamily).`
struct BoxFamily;
impl PointerFamily for BoxFamily {
type Pointer<T> = Box<T> where T: Debug;
}
// The generated goal is:
// ```
// forall<T> {
// WellFormed(BoxFamily: PointerFamily) &&
//
// if (FromEnv(T: Debug)) {
// WellFormed(Box<T>: Debug) &&
// WellFormed(Box<T>)
// }
// }
// ```
// `WellFormed(BoxFamily: PointerFamily)` amounts to proving
// `Implemented(BoxFamily: PointerFamily)`, which is ok thanks to our impl.
//
// `WellFormed(Box<T>)` is always true (there are no where clauses on the
// `Box` type definition).
//
// Moreover, we have an `impl<T: Debug> Debug for Box<T>`, hence
// we can prove `WellFormed(Box<T>: Debug)` and the impl is indeed legal.
trait Foo {
type Assoc<T>;
}
struct OnlyClone<T: Clone> { ... }
impl Foo for i32 {
type Assoc<T> = OnlyClone<T>;
}
// The generated goal is:
// ```
// forall<T> {
// WellFormed(i32: Foo) &&
// WellFormed(OnlyClone<T>)
// }
// ```
// however `WellFormed(OnlyClone<T>)` is not provable because it requires
// `Implemented(T: Clone)`. It would be tempting to just add a `where T: Clone`
// bound inside the `impl Foo for i32` block, however we saw that it was
// illegal to add where clauses that didn't come from the trait definition.
Canonical queries
The "start" of the trait system is the canonical query (these are
both queries in the more general sense of the word – something you
would like to know the answer to – and in the
rustc-specific sense). The idea is that the type
checker or other parts of the system, may in the course of doing their
thing want to know whether some trait is implemented for some type
(e.g., is u32: Debug
true?). Or they may want to
normalize some associated type.
This section covers queries at a fairly high level of abstraction. The subsections look a bit more closely at how these ideas are implemented in rustc.
The traditional, interactive Prolog query
In a traditional Prolog system, when you start a query, the solver will run off and start supplying you with every possible answer it can find. So given something like this:
?- Vec<i32>: AsRef<?U>
The solver might answer:
Vec<i32>: AsRef<[i32]>
continue? (y/n)
This continue
bit is interesting. The idea in Prolog is that the
solver is finding all possible instantiations of your query that
are true. In this case, if we instantiate ?U = [i32]
, then the query
is true (note that a traditional Prolog interface does not, directly,
tell us a value for ?U
, but we can infer one by unifying the
response with our original query – Rust's solver gives back a
substitution instead). If we were to hit y
, the solver might then
give us another possible answer:
Vec<i32>: AsRef<Vec<i32>>
continue? (y/n)
This answer derives from the fact that there is a reflexive impl
(impl<T> AsRef<T> for T
) for AsRef
. If were to hit y
again,
then we might get back a negative response:
no
Naturally, in some cases, there may be no possible answers, and hence
the solver will just give me back no
right away:
?- Box<i32>: Copy
no
In some cases, there might be an infinite number of responses. So for
example if I gave this query, and I kept hitting y
, then the solver
would never stop giving me back answers:
?- Vec<?U>: Clone
Vec<i32>: Clone
continue? (y/n)
Vec<Box<i32>>: Clone
continue? (y/n)
Vec<Box<Box<i32>>>: Clone
continue? (y/n)
Vec<Box<Box<Box<i32>>>>: Clone
continue? (y/n)
As you can imagine, the solver will gleefully keep adding another
layer of Box
until we ask it to stop, or it runs out of memory.
Another interesting thing is that queries might still have variables in them. For example:
?- Rc<?T>: Clone
might produce the answer:
Rc<?T>: Clone
continue? (y/n)
After all, Rc<?T>
is true no matter what type ?T
is.
A trait query in rustc
The trait queries in rustc work somewhat differently. Instead of trying to enumerate all possible answers for you, they are looking for an unambiguous answer. In particular, when they tell you the value for a type variable, that means that this is the only possible instantiation that you could use, given the current set of impls and where-clauses, that would be provable. (Internally within the solver, though, they can potentially enumerate all possible answers. See the description of the SLG solver for details.)
The response to a trait query in rustc is typically a
Result<QueryResult<T>, NoSolution>
(where the T
will vary a bit
depending on the query itself). The Err(NoSolution)
case indicates
that the query was false and had no answers (e.g., Box<i32>: Copy
).
Otherwise, the QueryResult
gives back information about the possible answer(s)
we did find. It consists of four parts:
- Certainty: tells you how sure we are of this answer. It can have two
values:
Proven
means that the result is known to be true.- This might be the result for trying to prove
Vec<i32>: Clone
, say, orRc<?T>: Clone
.
- This might be the result for trying to prove
Ambiguous
means that there were things we could not yet prove to be either true or false, typically because more type information was needed. (We'll see an example shortly.)- This might be the result for trying to prove
Vec<?T>: Clone
.
- This might be the result for trying to prove
- Var values: Values for each of the unbound inference variables
(like
?T
) that appeared in your original query. (Remember that in Prolog, we had to infer these.)- As we'll see in the example below, we can get back var values even
for
Ambiguous
cases.
- As we'll see in the example below, we can get back var values even
for
- Region constraints: these are relations that must hold between the lifetimes that you supplied as inputs. We'll ignore these here, but see the section on handling regions in traits for more details.
- Value: The query result also comes with a value of type
T
. For some specialized queries – like normalizing associated types – this is used to carry back an extra result, but it's often just()
.
Examples
Let's work through an example query to see what all the parts mean.
Consider the Borrow
trait. This trait has a number of
impls; among them, there are these two (for clarity, I've written the
Sized
bounds explicitly):
impl<T> Borrow<T> for T where T: ?Sized
impl<T> Borrow<[T]> for Vec<T> where T: Sized
Example 1. Imagine we are type-checking this (rather artificial) bit of code:
fn foo<A, B>(a: A, vec_b: Option<B>) where A: Borrow<B> { }
fn main() {
let mut t: Vec<_> = vec![]; // Type: Vec<?T>
let mut u: Option<_> = None; // Type: Option<?U>
foo(t, u); // Example 1: requires `Vec<?T>: Borrow<?U>`
...
}
As the comments indicate, we first create two variables t
and u
;
t
is an empty vector and u
is a None
option. Both of these
variables have unbound inference variables in their type: ?T
represents the elements in the vector t
and ?U
represents the
value stored in the option u
. Next, we invoke foo
; comparing the
signature of foo
to its arguments, we wind up with A = Vec<?T>
and
B = ?U
. Therefore, the where clause on foo
requires that Vec<?T>: Borrow<?U>
. This is thus our first example trait query.
There are many possible solutions to the query Vec<?T>: Borrow<?U>
;
for example:
?U = Vec<?T>
,?U = [?T]
,?T = u32, ?U = [u32]
- and so forth.
Therefore, the result we get back would be as follows (I'm going to ignore region constraints and the "value"):
- Certainty:
Ambiguous
– we're not sure yet if this holds - Var values:
[?T = ?T, ?U = ?U]
– we learned nothing about the values of the variables
In short, the query result says that it is too soon to say much about
whether this trait is proven. During type-checking, this is not an
immediate error: instead, the type checker would hold on to this
requirement (Vec<?T>: Borrow<?U>
) and wait. As we'll see in the next
example, it may happen that ?T
and ?U
wind up constrained from
other sources, in which case we can try the trait query again.
Example 2. We can now extend our previous example a bit,
and assign a value to u
:
fn foo<A, B>(a: A, vec_b: Option<B>) where A: Borrow<B> { }
fn main() {
// What we saw before:
let mut t: Vec<_> = vec![]; // Type: Vec<?T>
let mut u: Option<_> = None; // Type: Option<?U>
foo(t, u); // `Vec<?T>: Borrow<?U>` => ambiguous
// New stuff:
u = Some(vec![]); // ?U = Vec<?V>
}
As a result of this assignment, the type of u
is forced to be
Option<Vec<?V>>
, where ?V
represents the element type of the
vector. This in turn implies that ?U
is unified to Vec<?V>
.
Let's suppose that the type checker decides to revisit the
"as-yet-unproven" trait obligation we saw before, Vec<?T>: Borrow<?U>
. ?U
is no longer an unbound inference variable; it now
has a value, Vec<?V>
. So, if we "refresh" the query with that value, we get:
Vec<?T>: Borrow<Vec<?V>>
This time, there is only one impl that applies, the reflexive impl:
impl<T> Borrow<T> for T where T: ?Sized
Therefore, the trait checker will answer:
- Certainty:
Proven
- Var values:
[?T = ?T, ?V = ?T]
Here, it is saying that we have indeed proven that the obligation
holds, and we also know that ?T
and ?V
are the same type (but we
don't know what that type is yet!).
(In fact, as the function ends here, the type checker would give an
error at this point, since the element types of t
and u
are still
not yet known, even though they are known to be the same.)
Canonicalization
Canonicalization is the process of isolating an inference value from its context. It is a key part of implementing canonical queries, and you may wish to read the parent chapter to get more context.
Canonicalization is really based on a very simple concept: every inference variable is always in one of two states: either it is unbound, in which case we don't know yet what type it is, or it is bound, in which case we do. So to isolate some data-structure T that contains types/regions from its environment, we just walk down and find the unbound variables that appear in T; those variables get replaced with "canonical variables", starting from zero and numbered in a fixed order (left to right, for the most part, but really it doesn't matter as long as it is consistent).
So, for example, if we have the type X = (?T, ?U)
, where ?T
and
?U
are distinct, unbound inference variables, then the canonical
form of X
would be (?0, ?1)
, where ?0
and ?1
represent these
canonical placeholders. Note that the type Y = (?U, ?T)
also
canonicalizes to (?0, ?1)
. But the type Z = (?T, ?T)
would
canonicalize to (?0, ?0)
(as would (?U, ?U)
). In other words, the
exact identity of the inference variables is not important – unless
they are repeated.
We use this to improve caching as well as to detect cycles and other
things during trait resolution. Roughly speaking, the idea is that if
two trait queries have the same canonical form, then they will get
the same answer. That answer will be expressed in terms of the
canonical variables (?0
, ?1
), which we can then map back to the
original variables (?T
, ?U
).
Canonicalizing the query
To see how it works, imagine that we are asking to solve the following
trait query: ?A: Foo<'static, ?B>
, where ?A
and ?B
are unbound.
This query contains two unbound variables, but it also contains the
lifetime 'static
. The trait system generally ignores all lifetimes
and treats them equally, so when canonicalizing, we will also
replace any free lifetime with a
canonical variable (Note that 'static
is actually a free lifetime
variable here. We are not considering it in the typing context of the whole
program but only in the context of this trait reference. Mathematically, we
are not quantifying over the whole program, but only this obligation).
Therefore, we get the following result:
?0: Foo<'?1, ?2>
Sometimes we write this differently, like so:
for<T,L,T> { ?0: Foo<'?1, ?2> }
This for<>
gives some information about each of the canonical
variables within. In this case, each T
indicates a type variable,
so ?0
and ?2
are types; the L
indicates a lifetime variable, so
?1
is a lifetime. The canonicalize
method also gives back a
CanonicalVarValues
array OV with the "original values" for each
canonicalized variable:
[?A, 'static, ?B]
We'll need this vector OV later, when we process the query response.
Executing the query
Once we've constructed the canonical query, we can try to solve it. To do so, we will wind up creating a fresh inference context and instantiating the canonical query in that context. The idea is that we create a substitution S from the canonical form containing a fresh inference variable (of suitable kind) for each canonical variable. So, for our example query:
for<T,L,T> { ?0: Foo<'?1, ?2> }
the substitution S might be:
S = [?A, '?B, ?C]
We can then replace the bound canonical variables (?0
, etc) with
these inference variables, yielding the following fully instantiated
query:
?A: Foo<'?B, ?C>
Remember that substitution S though! We're going to need it later.
OK, now that we have a fresh inference context and an instantiated
query, we can go ahead and try to solve it. The trait solver itself is
explained in more detail in another section, but
suffice to say that it will compute a certainty value (Proven
or
Ambiguous
) and have side-effects on the inference variables we've
created. For example, if there were only one impl of Foo
, like so:
impl<'a, X> Foo<'a, X> for Vec<X>
where X: 'a
{ ... }
then we might wind up with a certainty value of Proven
, as well as
creating fresh inference variables '?D
and ?E
(to represent the
parameters on the impl) and unifying as follows:
'?B = '?D
?A = Vec<?E>
?C = ?E
We would also accumulate the region constraint ?E: '?D
, due to the
where clause.
In order to create our final query result, we have to "lift" these values out of the query's inference context and into something that can be reapplied in our original inference context. We do that by re-applying canonicalization, but to the query result.
Canonicalizing the query result
As discussed in the parent section, most trait queries wind up
with a result that brings together a "certainty value" certainty
, a
result substitution var_values
, and some region constraints. To
create this, we wind up re-using the substitution S that we created
when first instantiating our query. To refresh your memory, we had a query
for<T,L,T> { ?0: Foo<'?1, ?2> }
for which we made a substutition S:
S = [?A, '?B, ?C]
We then did some work which unified some of those variables with other things. If we "refresh" S with the latest results, we get:
S = [Vec<?E>, '?D, ?E]
These are precisely the new values for the three input variables from
our original query. Note though that they include some new variables
(like ?E
). We can make those go away by canonicalizing again! We don't
just canonicalize S, though, we canonicalize the whole query response QR:
QR = {
certainty: Proven, // or whatever
var_values: [Vec<?E>, '?D, ?E] // this is S
region_constraints: [?E: '?D], // from the impl
value: (), // for our purposes, just (), but
// in some cases this might have
// a type or other info
}
The result would be as follows:
Canonical(QR) = for<T, L> {
certainty: Proven,
var_values: [Vec<?0>, '?1, ?0]
region_constraints: [?0: '?1],
value: (),
}
(One subtle point: when we canonicalize the query result, we do not
use any special treatment for free lifetimes. Note that both
references to '?D
, for example, were converted into the same
canonical variable (?1
). This is in contrast to the original query,
where we canonicalized every free lifetime into a fresh canonical
variable.)
Now, this result must be reapplied in each context where needed.
Processing the canonicalized query result
In the previous section we produced a canonical query result. We now have to apply that result in our original context. If you recall, way back in the beginning, we were trying to prove this query:
?A: Foo<'static, ?B>
We canonicalized that into this:
for<T,L,T> { ?0: Foo<'?1, ?2> }
and now we got back a canonical response:
for<T, L> {
certainty: Proven,
var_values: [Vec<?0>, '?1, ?0]
region_constraints: [?0: '?1],
value: (),
}
We now want to apply that response to our context. Conceptually, how we do that is to (a) instantiate each of the canonical variables in the result with a fresh inference variable, (b) unify the values in the result with the original values, and then (c) record the region constraints for later. Doing step (a) would yield a result of
{
certainty: Proven,
var_values: [Vec<?C>, '?D, ?C]
^^ ^^^ fresh inference variables
region_constraints: [?C: '?D],
value: (),
}
Step (b) would then unify:
?A with Vec<?C>
'static with '?D
?B with ?C
And finally the region constraint of ?C: 'static
would be recorded
for later verification.
(What we actually do is a mildly optimized variant of that: Rather
than eagerly instantiating all of the canonical values in the result
with variables, we instead walk the vector of values, looking for
cases where the value is just a canonical variable. In our example,
values[2]
is ?C
, so that means we can deduce that ?C := ?B
and
'?D := 'static
. This gives us a partial set of values. Anything for
which we do not find a value, we create an inference variable.)
The On-Demand SLG solver
Given a set of program clauses (provided by our lowering rules) and a query, we need to return the result of the query and the value of any type variables we can determine. This is the job of the solver.
For example, exists<T> { Vec<T>: FromIterator<u32> }
has one solution, so
its result is Unique; substitution [?T := u32]
. A solution also comes with
a set of region constraints, which we'll ignore in this introduction.
Goals of the Solver
On demand
There are often many, or even infinitely many, solutions to a query. For
example, say we want to prove that exists<T> { Vec<T>: Debug }
for some
type ?T
. Our solver should be capable of yielding one answer at a time, say
?T = u32
, then ?T = i32
, and so on, rather than iterating over every type
in the type system. If we need more answers, we can request more until we are
done. This is similar to how Prolog works.
See also: The traditional, interactive Prolog query
Breadth-first
Vec<?T>: Debug
is true if ?T: Debug
. This leads to a cycle: [Vec<u32>, Vec<Vec<u32>>, Vec<Vec<Vec<u32>>>]
, and so on all implement Debug
. Our
solver ought to be breadth first and consider answers like [Vec<u32>: Debug, Vec<i32>: Debug, ...]
before it recurses, or we may never find the answer
we're looking for.
Cachable
To speed up compilation, we need to cache results, including partial results left over from past solver queries.
Description of how it works
The basis of the solver is the Forest
type. A forest stores a
collection of tables as well as a stack. Each table represents
the stored results of a particular query that is being performed, as
well as the various strands, which are basically suspended
computations that may be used to find more answers. Tables are
interdependent: solving one query may require solving others.
Walkthrough
Perhaps the easiest way to explain how the solver works is to walk through an example. Let's imagine that we have the following program:
trait Debug { }
struct u32 { }
impl Debug for u32 { }
struct Rc<T> { }
impl<T: Debug> Debug for Rc<T> { }
struct Vec<T> { }
impl<T: Debug> Debug for Vec<T> { }
Now imagine that we want to find answers for the query exists<T> { Rc<T>: Debug }
. The first step would be to u-canonicalize this query; this is the
act of giving canonical names to all the unbound inference variables based on
the order of their left-most appearance, as well as canonicalizing the
universes of any universally bound names (e.g., the T
in forall<T> { ... }
). In this case, there are no universally bound names, but the canonical
form Q of the query might look something like:
Rc<?0>: Debug
where ?0
is a variable in the root universe U0. We would then go and
look for a table with this canonical query as the key: since the forest is
empty, this lookup will fail, and we will create a new table T0,
corresponding to the u-canonical goal Q.
Ignoring negative reasoning and regions. To start, we'll ignore
the possibility of negative goals like not { Foo }
. We'll phase them
in later, as they bring several complications.
Creating a table. When we first create a table, we also initialize
it with a set of initial strands. A "strand" is kind of like a
"thread" for the solver: it contains a particular way to produce an
answer. The initial set of strands for a goal like Rc<?0>: Debug
(i.e., a "domain goal") is determined by looking for clauses in the
environment. In Rust, these clauses derive from impls, but also from
where-clauses that are in scope. In the case of our example, there
would be three clauses, each coming from the program. Using a
Prolog-like notation, these look like:
(u32: Debug).
(Rc<T>: Debug) :- (T: Debug).
(Vec<T>: Debug) :- (T: Debug).
To create our initial strands, then, we will try to apply each of
these clauses to our goal of Rc<?0>: Debug
. The first and third
clauses are inapplicable because u32
and Vec<?0>
cannot be unified
with Rc<?0>
. The second clause, however, will work.
What is a strand? Let's talk a bit more about what a strand is. In the code, a strand
is the combination of an inference table, an X-clause, and (possibly)
a selected subgoal from that X-clause. But what is an X-clause
(ExClause
, in the code)? An X-clause pulls together a few things:
- The current state of the goal we are trying to prove;
- A set of subgoals that have yet to be proven;
- There are also a few things we're ignoring for now:
- delayed literals, region constraints
The general form of an X-clause is written much like a Prolog clause, but with somewhat different semantics. Since we're ignoring delayed literals and region constraints, an X-clause just looks like this:
G :- L
where G is a goal and L is a set of subgoals that must be proven. (The L stands for literal -- when we address negative reasoning, a literal will be either a positive or negative subgoal.) The idea is that if we are able to prove L then the goal G can be considered true.
In the case of our example, we would wind up creating one strand, with an X-clause like so:
(Rc<?T>: Debug) :- (?T: Debug)
Here, the ?T
refers to one of the inference variables created in the
inference table that accompanies the strand. (I'll use named variables
to refer to inference variables, and numbered variables like ?0
to
refer to variables in a canonicalized goal; in the code, however, they
are both represented with an index.)
For each strand, we also optionally store a selected subgoal. This
is the subgoal after the turnstile (:-
) that we are currently trying
to prove in this strand. Initially, when a strand is first created,
there is no selected subgoal.
Activating a strand. Now that we have created the table T0 and
initialized it with strands, we have to actually try and produce an answer.
We do this by invoking the ensure_root_answer
operation on the table:
specifically, we say ensure_root_answer(T0, A0)
, meaning "ensure that there
is a 0th answer A0 to query T0".
Remember that tables store not only strands, but also a vector of cached
answers. The first thing that ensure_root_answer
does is to check whether
answer A0 is in this vector. If so, we can just return immediately. In this
case, the vector will be empty, and hence that does not apply (this becomes
important for cyclic checks later on).
When there is no cached answer, ensure_root_answer
will try to produce one.
It does this by selecting a strand from the set of active strands -- the
strands are stored in a VecDeque
and hence processed in a round-robin
fashion. Right now, we have only one strand, storing the following X-clause
with no selected subgoal:
(Rc<?T>: Debug) :- (?T: Debug)
When we activate the strand, we see that we have no selected subgoal,
and so we first pick one of the subgoals to process. Here, there is only
one (?T: Debug
), so that becomes the selected subgoal, changing
the state of the strand to:
(Rc<?T>: Debug) :- selected(?T: Debug, A0)
Here, we write selected(L, An)
to indicate that (a) the literal L
is the selected subgoal and (b) which answer An
we are looking for. We
start out looking for A0
.
Processing the selected subgoal. Next, we have to try and find an
answer to this selected goal. To do that, we will u-canonicalize it
and try to find an associated table. In this case, the u-canonical
form of the subgoal is ?0: Debug
: we don't have a table yet for
that, so we can create a new one, T1. As before, we'll initialize T1
with strands. In this case, there will be three strands, because all
the program clauses are potentially applicable. Those three strands
will be:
(u32: Debug) :-
, derived from the program clause(u32: Debug).
.- Note: This strand has no subgoals.
(Vec<?U>: Debug) :- (?U: Debug)
, derived from theVec
impl.(Rc<?U>: Debug) :- (?U: Debug)
, derived from theRc
impl.
We can thus summarize the state of the whole forest at this point as follows:
Table T0 [Rc<?0>: Debug]
Strands:
(Rc<?T>: Debug) :- selected(?T: Debug, A0)
Table T1 [?0: Debug]
Strands:
(u32: Debug) :-
(Vec<?U>: Debug) :- (?U: Debug)
(Rc<?V>: Debug) :- (?V: Debug)
Delegation between tables. Now that the active strand from T0 has
created the table T1, it can try to extract an answer. It does this
via that same ensure_answer
operation we saw before. In this case,
the strand would invoke ensure_answer(T1, A0)
, since we will start
with the first answer. This will cause T1 to activate its first
strand, u32: Debug :-
.
This strand is somewhat special: it has no subgoals at all. This means
that the goal is proven. We can therefore add u32: Debug
to the set
of answers for our table, calling it answer A0 (it is the first
answer). The strand is then removed from the list of strands.
The state of table T1 is therefore:
Table T1 [?0: Debug]
Answers:
A0 = [?0 = u32]
Strand:
(Vec<?U>: Debug) :- (?U: Debug)
(Rc<?V>: Debug) :- (?V: Debug)
Note that I am writing out the answer A0 as a substitution that can be applied to the table goal; actually, in the code, the goals for each X-clause are also represented as substitutions, but in this exposition I've chosen to write them as full goals, following NFTD.
Since we now have an answer, ensure_answer(T1, A0)
will return Ok
to the table T0, indicating that answer A0 is available. T0 now has
the job of incorporating that result into its active strand. It does
this in two ways. First, it creates a new strand that is looking for
the next possible answer of T1. Next, it incorpoates the answer from
A0 and removes the subgoal. The resulting state of table T0 is:
Table T0 [Rc<?0>: Debug]
Strands:
(Rc<?T>: Debug) :- selected(?T: Debug, A1)
(Rc<u32>: Debug) :-
We then immediately activate the strand that incorporated the answer
(the Rc<u32>: Debug
one). In this case, that strand has no further
subgoals, so it becomes an answer to the table T0. This answer can
then be returned up to our caller, and the whole forest goes quiescent
at this point (remember, we only do enough work to generate one
answer). The ending state of the forest at this point will be:
Table T0 [Rc<?0>: Debug]
Answer:
A0 = [?0 = u32]
Strands:
(Rc<?T>: Debug) :- selected(?T: Debug, A1)
Table T1 [?0: Debug]
Answers:
A0 = [?0 = u32]
Strand:
(Vec<?U>: Debug) :- (?U: Debug)
(Rc<?V>: Debug) :- (?V: Debug)
Here you can see how the forest captures both the answers we have created thus far and the strands that will let us try to produce more answers later on.
See also
- chalk_solve README, which contains links to papers used and acronyms referenced in the code
- This section is a lightly adapted version of the blog post An on-demand SLG solver for chalk
- Negative Reasoning in Chalk explains the need for negative reasoning, but not how the SLG solver does it
An Overview of Chalk
Chalk is under heavy development, so if any of these links are broken or if any of the information is inconsistent with the code or outdated, please open an issue so we can fix it. If you are able to fix the issue yourself, we would love your contribution!
Chalk recasts Rust's trait system explicitly in terms of logic programming by "lowering" Rust code into a kind of logic program we can then execute queries against (see Lowering to Logic and Lowering Rules). Its goal is to be an executable, highly readable specification of the Rust trait system.
There are many expected benefits from this work. It will consolidate our existing, somewhat ad-hoc implementation into something far more principled and expressive, which should behave better in corner cases, and be much easier to extend.
Chalk Structure
Chalk has two main "products". The first of these is the
chalk_engine
crate, which defines the core SLG
solver. This is the part rustc uses.
The rest of chalk can be considered an elaborate testing harness. Chalk is capable of parsing Rust-like "programs", lowering them to logic, and performing queries on them.
Here's a sample session in the chalk repl, chalki. After feeding it our program, we perform some queries on it.
?- program
Enter a program; press Ctrl-D when finished
| struct Foo { }
| struct Bar { }
| struct Vec<T> { }
| trait Clone { }
| impl<T> Clone for Vec<T> where T: Clone { }
| impl Clone for Foo { }
?- Vec<Foo>: Clone
Unique; substitution [], lifetime constraints []
?- Vec<Bar>: Clone
No possible solution.
?- exists<T> { Vec<T>: Clone }
Ambiguous; no inference guidance
You can see more examples of programs and queries in the unit tests.
Next we'll go through each stage required to produce the output above.
Parsing (chalk_parse)
Chalk is designed to be incorporated with the Rust compiler, so the syntax and concepts it deals with heavily borrow from Rust. It is convenient for the sake of testing to be able to run chalk on its own, so chalk includes a parser for a Rust-like syntax. This syntax is orthogonal to the Rust AST and grammar. It is not intended to look exactly like it or support the exact same syntax.
The parser takes that syntax and produces an Abstract Syntax Tree (AST). You can find the complete definition of the AST in the source code.
The syntax contains things from Rust that we know and love, for example: traits, impls, and struct definitions. Parsing is often the first "phase" of transformation that a program goes through in order to become a format that chalk can understand.
Rust Intermediate Representation (chalk_rust_ir)
After getting the AST we convert it to a more convenient intermediate
representation called chalk_rust_ir
. This is sort of
analogous to the HIR in Rust. The process of converting to IR is called
lowering.
The chalk::program::Program
struct contains some "rust things"
but indexed and accessible in a different way. For example, if you have a
type like Foo<Bar>
, we would represent Foo
as a string in the AST but in
chalk::program::Program
, we use numeric indices (ItemId
).
The IR source code contains the complete definition.
Chalk Intermediate Representation (chalk_ir)
Once we have Rust IR it is time to convert it to "program clauses". A
ProgramClause
is essentially one of the following:
- A clause of the form
consequence :- conditions
where:-
is read as "if" andconditions = cond1 && cond2 && ...
- A universally quantified clause of the form
forall<T> { consequence :- conditions }
forall<T> { ... }
is used to represent universal quantification. See the section on Lowering to logic for more information.- A key thing to note about
forall
is that we don't allow you to "quantify" over traits, only types and regions (lifetimes). That is, you can't make a rule likeforall<Trait> { u32: Trait }
which would say "u32
implements all traits". You can however sayforall<T> { T: Trait }
meaning "Trait
is implemented by all types". forall<T> { ... }
is represented in the code using theBinders<T>
struct.
See also: Goals and Clauses
This is where we encode the rules of the trait system into logic. For example, if we have the following Rust:
impl<T: Clone> Clone for Vec<T> {}
We generate the following program clause:
forall<T> { (Vec<T>: Clone) :- (T: Clone) }
This rule dictates that Vec<T>: Clone
is only satisfied if T: Clone
is also
satisfied (i.e. "provable").
Similar to chalk::program::Program
which has "rust-like
things", chalk_ir defines ProgramEnvironment
which is "pure logic".
The main field in that struct is program_clauses
, which contains the
ProgramClause
s generated by the rules module.
Rules (chalk_solve)
The chalk_solve
crate (source code) defines the logic rules we
use for each item in the Rust IR. It works by iterating over every trait, impl,
etc. and emitting the rules that come from each one.
See also: Lowering Rules
Well-formedness checks
As part of lowering to logic, we also do some "well formedness" checks. See
the chalk_solve::wf
source code for where those are done.
See also: Well-formedness checking
Coherence
The method CoherenceSolver::specialization_priorities
in the coherence
module
(source code) checks "coherence", which means that it
ensures that two impls of the same trait for the same type cannot exist.
Solver (chalk_solve)
Finally, when we've collected all the program clauses we care about, we want to perform queries on it. The component that finds the answer to these queries is called the solver.
See also: The SLG Solver
Crates
Chalk's functionality is broken up into the following crates:
- chalk_engine: Defines the core SLG solver.
- chalk_rust_ir, containing the "HIR-like" form of the AST
- chalk_ir: Defines chalk's internal representation of types, lifetimes, and goals.
- chalk_solve: Combines
chalk_ir
andchalk_engine
, effectively, which implements logic rules convertingchalk_rust_ir
tochalk_ir
- Defines the
coherence
module, which implements coherence rules chalk_engine::context
provides the necessary hooks.
- Defines the
- chalk_parse: Defines the raw AST and a parser.
- chalk: Brings everything together. Defines the following
modules:
chalk::lowering
, which converts AST tochalk_rust_ir
- Also includes chalki, chalk's REPL.
Testing
chalk has a test framework for lowering programs to logic, checking the lowered logic, and performing queries on it. This is how we test the implementation of chalk itself, and the viability of the lowering rules.
The main kind of tests in chalk are goal tests. They contain a program, which is expected to lower to logic successfully, and a set of queries (goals) along with the expected output. Here's an example. Since chalk's output can be quite long, goal tests support specifying only a prefix of the output.
Lowering tests check the stages that occur before we can issue queries to the solver: the lowering to chalk_rust_ir, and the well-formedness checks that occur after that.
Testing internals
Goal tests use a test!
macro that takes chalk's Rust-like
syntax and runs it through the full pipeline described above. The macro
ultimately calls the solve_goal
function.
Likewise, lowering tests use the lowering_success!
and
lowering_error!
macros.
More Resources
Blog Posts
- Lowering Rust traits to logic
- Unification in Chalk, part 1
- Unification in Chalk, part 2
- Negative reasoning in Chalk
- Query structure in chalk
- Cyclic queries in chalk
- An on-demand SLG solver for chalk
Bibliography
If you'd like to read more background material, here are some recommended texts and papers:
Programming with Higher-order Logic, by Dale Miller and Gopalan Nadathur, covers the key concepts of Lambda prolog. Although it's a slim little volume, it's the kind of book where you learn something new every time you open it.
"A proof procedure for the logic of Hereditary Harrop formulas", by Gopalan Nadathur. This paper covers the basics of universes, environments, and Lambda Prolog-style proof search. Quite readable.
"A new formulation of tabled resolution with delay", by Theresa Swift. This paper gives a kind of abstract treatment of the SLG formulation that is the basis for our on-demand solver.
Type checking
The rustc_typeck
crate contains the source for "type collection"
and "type checking", as well as a few other bits of related functionality. (It
draws heavily on the type inference and trait solving.)
Type collection
Type "collection" is the process of converting the types found in the HIR
(hir::Ty
), which represent the syntactic things that the user wrote, into the
internal representation used by the compiler (Ty<'tcx>
) – we also do
similar conversions for where-clauses and other bits of the function signature.
To try and get a sense for the difference, consider this function:
struct Foo { }
fn foo(x: Foo, y: self::Foo) { ... }
// ^^^ ^^^^^^^^^
Those two parameters x
and y
each have the same type: but they will have
distinct hir::Ty
nodes. Those nodes will have different spans, and of course
they encode the path somewhat differently. But once they are "collected" into
Ty<'tcx>
nodes, they will be represented by the exact same internal type.
Collection is defined as a bundle of queries for computing information about the various functions, traits, and other items in the crate being compiled. Note that each of these queries is concerned with interprocedural things – for example, for a function definition, collection will figure out the type and signature of the function, but it will not visit the body of the function in any way, nor examine type annotations on local variables (that's the job of type checking).
For more details, see the collect
module.
TODO: actually talk about type checking...
Method lookup
Method lookup can be rather complex due to the interaction of a number of factors, such as self types, autoderef, trait lookup, etc. This file provides an overview of the process. More detailed notes are in the code itself, naturally.
One way to think of method lookup is that we convert an expression of the form:
receiver.method(...)
into a more explicit UFCS form:
Trait::method(ADJ(receiver), ...) // for a trait call
ReceiverType::method(ADJ(receiver), ...) // for an inherent method call
Here ADJ
is some kind of adjustment, which is typically a series of
autoderefs and then possibly an autoref (e.g., &**receiver
). However
we sometimes do other adjustments and coercions along the way, in
particular unsizing (e.g., converting from [T; n]
to [T]
).
Method lookup is divided into two major phases:
- Probing (
probe.rs
). The probe phase is when we decide what method to call and how to adjust the receiver. - Confirmation (
confirm.rs
). The confirmation phase "applies" this selection, updating the side-tables, unifying type variables, and otherwise doing side-effectful things.
One reason for this division is to be more amenable to caching. The
probe phase produces a "pick" (probe::Pick
), which is designed to be
cacheable across method-call sites. Therefore, it does not include
inference variables or other information.
The Probe phase
Steps
The first thing that the probe phase does is to create a series of
steps. This is done by progressively dereferencing the receiver type
until it cannot be deref'd anymore, as well as applying an optional
"unsize" step. So if the receiver has type Rc<Box<[T; 3]>>
, this
might yield:
Rc<Box<[T; 3]>>
Box<[T; 3]>
[T; 3]
[T]
Candidate assembly
We then search along those steps to create a list of candidates. A
Candidate
is a method item that might plausibly be the method being
invoked. For each candidate, we'll derive a "transformed self type"
that takes into account explicit self.
Candidates are grouped into two kinds, inherent and extension.
Inherent candidates are those that are derived from the
type of the receiver itself. So, if you have a receiver of some
nominal type Foo
(e.g., a struct), any methods defined within an
impl like impl Foo
are inherent methods. Nothing needs to be
imported to use an inherent method, they are associated with the type
itself (note that inherent impls can only be defined in the same
module as the type itself).
FIXME: Inherent candidates are not always derived from impls. If you
have a trait object, such as a value of type Box<ToString>
, then the
trait methods (to_string()
, in this case) are inherently associated
with it. Another case is type parameters, in which case the methods of
their bounds are inherent. However, this part of the rules is subject
to change: when DST's "impl Trait for Trait" is complete, trait object
dispatch could be subsumed into trait matching, and the type parameter
behavior should be reconsidered in light of where clauses.
TODO: Is this FIXME still accurate?
Extension candidates are derived from imported traits. If I have
the trait ToString
imported, and I call to_string()
on a value of
type T
, then we will go off to find out whether there is an impl of
ToString
for T
. These kinds of method calls are called "extension
methods". They can be defined in any module, not only the one that
defined T
. Furthermore, you must import the trait to call such a
method.
So, let's continue our example. Imagine that we were calling a method
foo
with the receiver Rc<Box<[T; 3]>>
and there is a trait Foo
that defines it with &self
for the type Rc<U>
as well as a method
on the type Box
that defines Foo
but with &mut self
. Then we
might have two candidates:
&Rc<Box<[T; 3]>> from the impl of `Foo` for `Rc<U>` where `U=Box<T; 3]>
&mut Box<[T; 3]>> from the inherent impl on `Box<U>` where `U=[T; 3]`
Candidate search
Finally, to actually pick the method, we will search down the steps, trying to match the receiver type against the candidate types. At each step, we also consider an auto-ref and auto-mut-ref to see whether that makes any of the candidates match. We pick the first step where we find a match.
In the case of our example, the first step is Rc<Box<[T; 3]>>
,
which does not itself match any candidate. But when we autoref it, we
get the type &Rc<Box<[T; 3]>>
which does match. We would then
recursively consider all where-clauses that appear on the impl: if
those match (or we cannot rule out that they do), then this is the
method we would pick. Otherwise, we would continue down the series of
steps.
Variance of type and lifetime parameters
For a more general background on variance, see the background appendix.
During type checking we must infer the variance of type and lifetime parameters. The algorithm is taken from Section 4 of the paper "Taming the Wildcards: Combining Definition- and Use-Site Variance" published in PLDI'11 and written by Altidor et al., and hereafter referred to as The Paper.
This inference is explicitly designed not to consider the uses of
types within code. To determine the variance of type parameters
defined on type X
, we only consider the definition of the type X
and the definitions of any types it references.
We only infer variance for type parameters found on data types
like structs and enums. In these cases, there is a fairly straightforward
explanation for what variance means. The variance of the type
or lifetime parameters defines whether T<A>
is a subtype of T<B>
(resp. T<'a>
and T<'b>
) based on the relationship of A
and B
(resp. 'a
and 'b
).
We do not infer variance for type parameters found on traits, functions, or impls. Variance on trait parameters can indeed make sense (and we used to compute it) but it is actually rather subtle in meaning and not that useful in practice, so we removed it. See the addendum for some details. Variances on function/impl parameters, on the other hand, doesn't make sense because these parameters are instantiated and then forgotten, they don't persist in types or compiled byproducts.
Notation
We use the notation of The Paper throughout this chapter:
+
is covariance.-
is contravariance.*
is bivariance.o
is invariance.
The algorithm
The basic idea is quite straightforward. We iterate over the types
defined and, for each use of a type parameter X
, accumulate a
constraint indicating that the variance of X
must be valid for the
variance of that use site. We then iteratively refine the variance of
X
until all constraints are met. There is always a solution, because at
the limit we can declare all type parameters to be invariant and all
constraints will be satisfied.
As a simple example, consider:
enum Option<A> { Some(A), None }
enum OptionalFn<B> { Some(|B|), None }
enum OptionalMap<C> { Some(|C| -> C), None }
Here, we will generate the constraints:
1. V(A) <= +
2. V(B) <= -
3. V(C) <= +
4. V(C) <= -
These indicate that (1) the variance of A must be at most covariant; (2) the variance of B must be at most contravariant; and (3, 4) the variance of C must be at most covariant and contravariant. All of these results are based on a variance lattice defined as follows:
* Top (bivariant)
- +
o Bottom (invariant)
Based on this lattice, the solution V(A)=+
, V(B)=-
, V(C)=o
is the
optimal solution. Note that there is always a naive solution which
just declares all variables to be invariant.
You may be wondering why fixed-point iteration is required. The reason is that the variance of a use site may itself be a function of the variance of other type parameters. In full generality, our constraints take the form:
V(X) <= Term
Term := + | - | * | o | V(X) | Term x Term
Here the notation V(X)
indicates the variance of a type/region
parameter X
with respect to its defining class. Term x Term
represents the "variance transform" as defined in the paper:
If the variance of a type variable
X
in type expressionE
isV2
and the definition-site variance of the corresponding type parameter of a classC
isV1
, then the variance ofX
in the type expressionC<E>
isV3 = V1.xform(V2)
.
Constraints
If I have a struct or enum with where clauses:
struct Foo<T: Bar> { ... }
you might wonder whether the variance of T
with respect to Bar
affects the
variance T
with respect to Foo
. I claim no. The reason: assume that T
is
invariant with respect to Bar
but covariant with respect to Foo
. And then
we have a Foo<X>
that is upcast to Foo<Y>
, where X <: Y
. However, while
X : Bar
, Y : Bar
does not hold. In that case, the upcast will be illegal,
but not because of a variance failure, but rather because the target type
Foo<Y>
is itself just not well-formed. Basically we get to assume
well-formedness of all types involved before considering variance.
Dependency graph management
Because variance is a whole-crate inference, its dependency graph can become quite muddled if we are not careful. To resolve this, we refactor into two queries:
crate_variances
computes the variance for all items in the current crate.variances_of
accesses the variance for an individual reading; it works by requestingcrate_variances
and extracting the relevant data.
If you limit yourself to reading variances_of
, your code will only
depend then on the inference of that particular item.
Ultimately, this setup relies on the red-green algorithm. In particular,
every variance query effectively depends on all type definitions in the entire
crate (through crate_variances
), but since most changes will not result in a
change to the actual results from variance inference, the variances_of
query
will wind up being considered green after it is re-evaluated.
Addendum: Variance on traits
As mentioned above, we used to permit variance on traits. This was
computed based on the appearance of trait type parameters in
method signatures and was used to represent the compatibility of
vtables in trait objects (and also "virtual" vtables or dictionary
in trait bounds). One complication was that variance for
associated types is less obvious, since they can be projected out
and put to myriad uses, so it's not clear when it is safe to allow
X<A>::Bar
to vary (or indeed just what that means). Moreover (as
covered below) all inputs on any trait with an associated type had
to be invariant, limiting the applicability. Finally, the
annotations (MarkerTrait
, PhantomFn
) needed to ensure that all
trait type parameters had a variance were confusing and annoying
for little benefit.
Just for historical reference, I am going to preserve some text indicating how one could interpret variance and trait matching.
Variance and object types
Just as with structs and enums, we can decide the subtyping
relationship between two object types &Trait<A>
and &Trait<B>
based on the relationship of A
and B
. Note that for object
types we ignore the Self
type parameter – it is unknown, and
the nature of dynamic dispatch ensures that we will always call a
function that is expected the appropriate Self
type. However, we
must be careful with the other type parameters, or else we could
end up calling a function that is expecting one type but provided
another.
To see what I mean, consider a trait like so:
#![allow(unused_variables)] fn main() { trait ConvertTo<A> { fn convertTo(&self) -> A; } }
Intuitively, If we had one object O=&ConvertTo<Object>
and another
S=&ConvertTo<String>
, then S <: O
because String <: Object
(presuming Java-like "string" and "object" types, my go to examples
for subtyping). The actual algorithm would be to compare the
(explicit) type parameters pairwise respecting their variance: here,
the type parameter A is covariant (it appears only in a return
position), and hence we require that String <: Object
.
You'll note though that we did not consider the binding for the
(implicit) Self
type parameter: in fact, it is unknown, so that's
good. The reason we can ignore that parameter is precisely because we
don't need to know its value until a call occurs, and at that time (as
you said) the dynamic nature of virtual dispatch means the code we run
will be correct for whatever value Self
happens to be bound to for
the particular object whose method we called. Self
is thus different
from A
, because the caller requires that A
be known in order to
know the return type of the method convertTo()
. (As an aside, we
have rules preventing methods where Self
appears outside of the
receiver position from being called via an object.)
Trait variance and vtable resolution
But traits aren't only used with objects. They're also used when deciding whether a given impl satisfies a given trait bound. To set the scene here, imagine I had a function:
fn convertAll<A,T:ConvertTo<A>>(v: &[T]) { ... }
Now imagine that I have an implementation of ConvertTo
for Object
:
impl ConvertTo<i32> for Object { ... }
And I want to call convertAll
on an array of strings. Suppose
further that for whatever reason I specifically supply the value of
String
for the type parameter T
:
let mut vector = vec!["string", ...];
convertAll::<i32, String>(vector);
Is this legal? To put another way, can we apply the impl
for
Object
to the type String
? The answer is yes, but to see why
we have to expand out what will happen:
-
convertAll
will create a pointer to one of the entries in the vector, which will have type&String
-
It will then call the impl of
convertTo()
that is intended for use with objects. This has the typefn(self: &Object) -> i32
.It is OK to provide a value for
self
of type&String
because&String <: &Object
.
OK, so intuitively we want this to be legal, so let's bring this back
to variance and see whether we are computing the correct result. We
must first figure out how to phrase the question "is an impl for
Object,i32
usable where an impl for String,i32
is expected?"
Maybe it's helpful to think of a dictionary-passing implementation of
type classes. In that case, convertAll()
takes an implicit parameter
representing the impl. In short, we have an impl of type:
V_O = ConvertTo<i32> for Object
and the function prototype expects an impl of type:
V_S = ConvertTo<i32> for String
As with any argument, this is legal if the type of the value given
(V_O
) is a subtype of the type expected (V_S
). So is V_O <: V_S
?
The answer will depend on the variance of the various parameters. In
this case, because the Self
parameter is contravariant and A
is
covariant, it means that:
V_O <: V_S iff
i32 <: i32
String <: Object
These conditions are satisfied and so we are happy.
Variance and associated types
Traits with associated types – or at minimum projection expressions – must be invariant with respect to all of their inputs. To see why this makes sense, consider what subtyping for a trait reference means:
<T as Trait> <: <U as Trait>
means that if I know that T as Trait
, I also know that U as Trait
. Moreover, if you think of it as dictionary passing style,
it means that a dictionary for <T as Trait>
is safe to use where
a dictionary for <U as Trait>
is expected.
The problem is that when you can project types out from <T as Trait>
, the relationship to types projected out of <U as Trait>
is completely unknown unless T==U
(see #21726 for more
details). Making Trait
invariant ensures that this is true.
Another related reason is that if we didn't make traits with associated types invariant, then projection is no longer a function with a single result. Consider:
trait Identity { type Out; fn foo(&self); }
impl<T> Identity for T { type Out = T; ... }
Now if I have <&'static () as Identity>::Out
, this can be
validly derived as &'a ()
for any 'a
:
<&'a () as Identity> <: <&'static () as Identity>
if &'static () < : &'a () -- Identity is contravariant in Self
if 'static : 'a -- Subtyping rules for relations
This change otoh means that <'static () as Identity>::Out
is
always &'static ()
(which might then be upcast to 'a ()
,
separately). This was helpful in solving #21750.
Opaque types (type alias impl Trait
)
Opaque types are syntax to declare an opaque type alias that only exposes a specific set of traits as their interface; the concrete type in the background is inferred from a certain set of use sites of the opaque type.
This is expressed by using impl Trait
within type aliases, for example:
type Foo = impl Bar;
This declares an opaque type named Foo
, of which the only information is that
it implements Bar
. Therefore, any of Bar
's interface can be used on a Foo
,
but nothing else (regardless of whether it implements any other traits).
Since there needs to be a concrete background type, you can currently express that type by using the opaque type in a "defining use site".
struct Struct;
impl Bar for Struct { /* stuff */ }
fn foo() -> Foo {
Struct
}
Any other "defining use site" needs to produce the exact same type.
Defining use site(s)
Currently only the return value of a function can be a defining use site of an opaque type (and only if the return type of that function contains the opaque type).
The defining use of an opaque type can be any code within the parent of the opaque type definition. This includes any siblings of the opaque type and all children of the siblings.
The initiative for "not causing fatal brain damage to developers due to accidentally running infinite loops in their brain while trying to comprehend what the type system is doing" has decided to disallow children of opaque types to be defining use sites.
Associated opaque types
Associated opaque types can be defined by any other associated item
on the same trait impl
or a child of these associated items. For instance:
trait Baz {
type Foo;
fn foo() -> Self::Foo;
}
struct Quux;
impl Baz for Quux {
type Foo = impl Bar;
fn foo() -> Self::Foo { ... }
}
Pattern and Exhaustiveness Checking
In Rust, pattern matching and bindings have a few very helpful properties. The compiler will check that bindings are irrefutable when made and that match arms are exhaustive.
TODO: write this chapter.
MIR borrow check
The borrow check is Rust's "secret sauce" – it is tasked with enforcing a number of properties:
- That all variables are initialized before they are used.
- That you can't move the same value twice.
- That you can't move a value while it is borrowed.
- That you can't access a place while it is mutably borrowed (except through the reference).
- That you can't mutate a place while it is immutably borrowed.
- etc
The borrow checker operates on the MIR. An older implementation operated on the HIR. Doing borrow checking on MIR has several advantages:
- The MIR is far less complex than the HIR; the radical desugaring helps prevent bugs in the borrow checker. (If you're curious, you can see a list of bugs that the MIR-based borrow checker fixes here.)
- Even more importantly, using the MIR enables "non-lexical lifetimes", which are regions derived from the control-flow graph.
Major phases of the borrow checker
The borrow checker source is found in
the rustc_mir::borrow_check
module. The main entry point is
the mir_borrowck
query.
- We first create a local copy of the MIR. In the coming steps, we will modify this copy in place to modify the types and things to include references to the new regions that we are computing.
- We then invoke
replace_regions_in_mir
to modify our local MIR. Among other things, this function will replace all of the regions in the MIR with fresh inference variables. - Next, we perform a number of dataflow analyses that compute what data is moved and when.
- We then do a second type check across the MIR: the purpose of this type check is to determine all of the constraints between different regions.
- Next, we do region inference, which computes the values of each region — basically, the points in the control-flow graph where each lifetime must be valid according to the constraints we collected.
- At this point, we can compute the "borrows in scope" at each point.
- Finally, we do a second walk over the MIR, looking at the actions it
does and reporting errors. For example, if we see a statement like
*a + 1
, then we would check that the variablea
is initialized and that it is not mutably borrowed, as either of those would require an error to be reported. Doing this check requires the results of all the previous analyses.
Tracking moves and initialization
Part of the borrow checker's job is to track which variables are "initialized" at any given point in time -- this also requires figuring out where moves occur and tracking those.
Initialization and moves
From a user's perspective, initialization -- giving a variable some value -- and moves -- transferring ownership to another place -- might seem like distinct topics. Indeed, our borrow checker error messages often talk about them differently. But within the borrow checker, they are not nearly as separate. Roughly speaking, the borrow checker tracks the set of "initialized places" at any point in the source code. Assigning to a previously uninitialized local variable adds it to that set; moving from a local variable removes it from that set.
Consider this example:
fn foo() {
let a: Vec<u32>;
// a is not initialized yet
a = vec![22];
// a is initialized here
std::mem::drop(a); // a is moved here
// a is no longer initialized here
let l = a.len(); //~ ERROR
}
Here you can see that a
starts off as uninitialized; once it is
assigned, it becomes initialized. But when drop(a)
is called, that
moves a
into the call, and hence it becomes uninitialized again.
Subsections
To make it easier to peruse, this section is broken into a number of subsections:
- Move paths the move path concept that we use to track which local variables (or parts of local variables, in some cases) are initialized.
- TODO Rest not yet written =)
Move paths
In reality, it's not enough to track initialization at the granularity of local variables. Rust also allows us to do moves and initialization at the field granularity:
fn foo() {
let a: (Vec<u32>, Vec<u32>) = (vec![22], vec![44]);
// a.0 and a.1 are both initialized
let b = a.0; // moves a.0
// a.0 is not initialized, but a.1 still is
let c = a.0; // ERROR
let d = a.1; // OK
}
To handle this, we track initialization at the granularity of a move
path. A MovePath
represents some location that the user can
initialize, move, etc. So e.g. there is a move-path representing the
local variable a
, and there is a move-path representing a.0
. Move
paths roughly correspond to the concept of a Place
from MIR, but
they are indexed in ways that enable us to do move analysis more
efficiently.
Move path indices
Although there is a MovePath
data structure, they are never referenced
directly. Instead, all the code passes around indices of type
MovePathIndex
. If you need to get information about a move path, you use
this index with the move_paths
field of the MoveData
. For
example, to convert a MovePathIndex
mpi
into a MIR Place
, you might
access the MovePath::place
field like so:
move_data.move_paths[mpi].place
Building move paths
One of the first things we do in the MIR borrow check is to construct
the set of move paths. This is done as part of the
MoveData::gather_moves
function. This function uses a MIR visitor
called Gatherer
to walk the MIR and look at how each Place
within is accessed. For each such Place
, it constructs a
corresponding MovePathIndex
. It also records when/where that
particular move path is moved/initialized, but we'll get to that in a
later section.
Illegal move paths
We don't actually create a move-path for every Place
that gets
used. In particular, if it is illegal to move from a Place
, then
there is no need for a MovePathIndex
. Some examples:
- You cannot move from a static variable, so we do not create a
MovePathIndex
for static variables. - You cannot move an individual element of an array, so if we have e.g.
foo: [String; 3]
, there would be no move-path forfoo[1]
. - You cannot move from inside of a borrowed reference, so if we have e.g.
foo: &String
, there would be no move-path for*foo
.
These rules are enforced by the move_path_for
function, which
converts a Place
into a MovePathIndex
-- in error cases like
those just discussed, the function returns an Err
. This in turn
means we don't have to bother tracking whether those places are
initialized (which lowers overhead).
Looking up a move-path
If you have a Place
and you would like to convert it to a MovePathIndex
, you
can do that using the MovePathLookup
structure found in the rev_lookup
field
of MoveData
. There are two different methods:
find_local
, which takes amir::Local
representing a local variable. This is the easier method, because we always create aMovePathIndex
for every local variable.find
, which takes an arbitraryPlace
. This method is a bit more annoying to use, precisely because we don't have aMovePathIndex
for everyPlace
(as we just discussed in the "illegal move paths" section). Therefore,find
returns aLookupResult
indicating the closest path it was able to find that exists (e.g., forfoo[1]
, it might return just the path forfoo
).
Cross-references
As we noted above, move-paths are stored in a big vector and
referenced via their MovePathIndex
. However, within this vector,
they are also structured into a tree. So for example if you have the
MovePathIndex
for a.b.c
, you can go to its parent move-path
a.b
. You can also iterate over all children paths: so, from a.b
,
you might iterate to find the path a.b.c
(here you are iterating
just over the paths that are actually referenced in the source,
not all possible paths that could have been referenced). These
references are used for example in the
find_in_move_path_or_its_descendants
function, which determines
whether a move-path (e.g., a.b
) or any child of that move-path
(e.g.,a.b.c
) matches a given predicate.
The MIR type-check
A key component of the borrow check is the MIR type-check. This check walks the MIR and does a complete "type check" -- the same kind you might find in any other language. In the process of doing this type-check, we also uncover the region constraints that apply to the program.
TODO -- elaborate further? Maybe? :)
Region inference (NLL)
The MIR-based region checking code is located in the rustc_mir::borrow_check
module.
The MIR-based region analysis consists of two major functions:
replace_regions_in_mir
, invoked first, has two jobs:- First, it finds the set of regions that appear within the
signature of the function (e.g.,
'a
infn foo<'a>(&'a u32) { ... }
). These are called the "universal" or "free" regions – in particular, they are the regions that appear free in the function body. - Second, it replaces all the regions from the function body with fresh inference variables. This is because (presently) those regions are the results of lexical region inference and hence are not of much interest. The intention is that – eventually – they will be "erased regions" (i.e., no information at all), since we won't be doing lexical region inference at all.
- First, it finds the set of regions that appear within the
signature of the function (e.g.,
compute_regions
, invoked second: this is given as argument the results of move analysis. It has the job of computing values for all the inference variables thatreplace_regions_in_mir
introduced.- To do that, it first runs the MIR type checker. This is basically a normal type-checker but specialized to MIR, which is much simpler than full Rust, of course. Running the MIR type checker will however create various constraints between region variables, indicating their potential values and relationships to one another.
- After this, we perform constraint propagation by creating a
RegionInferenceContext
and invoking itssolve
method. - The NLL RFC also includes fairly thorough (and hopefully readable) coverage.
Universal regions
The UniversalRegions
type represents a collection of universal regions
corresponding to some MIR DefId
. It is constructed in
replace_regions_in_mir
when we replace all regions with fresh inference
variables. UniversalRegions
contains indices for all the free regions in
the given MIR along with any relationships that are known to hold between
them (e.g. implied bounds, where clauses, etc.).
For example, given the MIR for the following function:
#![allow(unused_variables)] fn main() { fn foo<'a>(x: &'a u32) { // ... } }
we would create a universal region for 'a
and one for 'static
. There may
also be some complications for handling closures, but we will ignore those for
the moment.
TODO: write about how these regions are computed.
Region variables
The value of a region can be thought of as a set. This set contains all
points in the MIR where the region is valid along with any regions that are
outlived by this region (e.g. if 'a: 'b
, then end('b)
is in the set for
'a
); we call the domain of this set a RegionElement
. In the code, the value
for all regions is maintained in the
rustc_mir::borrow_check::nll::region_infer
module. For each region we
maintain a set storing what elements are present in its value (to make this
efficient, we give each kind of element an index, the RegionElementIndex
, and
use sparse bitsets).
The kinds of region elements are as follows:
- Each
location
in the MIR control-flow graph: a location is just the pair of a basic block and an index. This identifies the point on entry to the statement with that index (or the terminator, if the index is equal tostatements.len()
). - There is an element
end('a)
for each universal region'a
, corresponding to some portion of the caller's (or caller's caller, etc) control-flow graph. - Similarly, there is an element denoted
end('static)
corresponding to the remainder of program execution after this function returns. - There is an element
!1
for each placeholder region!1
. This corresponds (intuitively) to some unknown set of other elements – for details on placeholders, see the section placeholders and universes.
Constraints
Before we can infer the value of regions, we need to collect constraints on the regions. The full set of constraints is described in the section on constraint propagation, but the two most common sorts of constraints are:
- Outlives constraints. These are constraints that one region outlives another
(e.g.
'a: 'b
). Outlives constraints are generated by the MIR type checker. - Liveness constraints. Each region needs to be live at points where it can be
used. These constraints are collected by
generate_constraints
.
Inference Overview
So how do we compute the contents of a region? This process is called region inference. The high-level idea is pretty simple, but there are some details we need to take care of.
Here is the high-level idea: we start off each region with the MIR locations we
know must be in it from the liveness constraints. From there, we use all of the
outlives constraints computed from the type checker to propagate the
constraints: for each region 'a
, if 'a: 'b
, then we add all elements of
'b
to 'a
, including end('b)
. This all happens in
propagate_constraints
.
Then, we will check for errors. We first check that type tests are satisfied by
calling check_type_tests
. This checks constraints like T: 'a
. Second, we
check that universal regions are not "too big". This is done by calling
check_universal_regions
. This checks that for each region 'a
if 'a
contains the element end('b)
, then we must already know that 'a: 'b
holds
(e.g. from a where clause). If we don't already know this, that is an error...
well, almost. There is some special handling for closures that we will discuss
later.
Example
Consider the following example:
fn foo<'a, 'b>(x: &'a usize) -> &'b usize {
x
}
Clearly, this should not compile because we don't know if 'a
outlives 'b
(if it doesn't then the return value could be a dangling reference).
Let's back up a bit. We need to introduce some free inference variables (as is
done in replace_regions_in_mir
). This example doesn't use the exact regions
produced, but it (hopefully) is enough to get the idea across.
fn foo<'a, 'b>(x: &'a /* '#1 */ usize) -> &'b /* '#3 */ usize {
x // '#2, location L1
}
Some notation: '#1
, '#3
, and '#2
represent the universal regions for the
argument, return value, and the expression x
, respectively. Additionally, I
will call the location of the expression x
L1
.
So now we can use the liveness constraints to get the following starting points:
Region | Contents |
---|---|
'#1 | |
'#2 | L1 |
'#3 | L1 |
Now we use the outlives constraints to expand each region. Specifically, we
know that '#2: '#3
...
Region | Contents |
---|---|
'#1 | L1 |
'#2 | L1, end('#3) // add contents of '#3 and end('#3) |
'#3 | L1 |
... and '#1: '#2
, so ...
Region | Contents |
---|---|
'#1 | L1, end('#2), end('#3) // add contents of '#2 and end('#2) |
'#2 | L1, end('#3) |
'#3 | L1 |
Now, we need to check that no regions were too big (we don't have any type
tests to check in this case). Notice that '#1
now contains end('#3)
, but
we have no where
clause or implied bound to say that 'a: 'b
... that's an
error!
Some details
The RegionInferenceContext
type contains all of the information needed to
do inference, including the universal regions from replace_regions_in_mir
and
the constraints computed for each region. It is constructed just after we
compute the liveness constraints.
Here are some of the fields of the struct:
constraints
: contains all the outlives constraints.liveness_constraints
: contains all the liveness constraints.universal_regions
: contains theUniversalRegions
returned byreplace_regions_in_mir
.universal_region_relations
: contains relations known to be true about universal regions. For example, if we have a where clause that'a: 'b
, that relation is assumed to be true while borrow checking the implementation (it is checked at the caller), souniversal_region_relations
would contain'a: 'b
.type_tests
: contains some constraints on types that we must check after inference (e.g.T: 'a
).closure_bounds_mapping
: used for propagating region constraints from closures back out to the creator of the closure.
TODO: should we discuss any of the others fields? What about the SCCs?
Ok, now that we have constructed a RegionInferenceContext
, we can do
inference. This is done by calling the solve
method on the context. This
is where we call propagate_constraints
and then check the resulting type
tests and universal regions, as discussed above.
Constraint propagation
The main work of the region inference is constraint propagation,
which is done in the propagate_constraints
function. There are
three sorts of constraints that are used in NLL, and we'll explain how
propagate_constraints
works by "layering" those sorts of constraints
on one at a time (each of them is fairly independent from the others):
- liveness constraints (
R live at E
), which arise from liveness; - outlives constraints (
R1: R2
), which arise from subtyping; - member constraints (
member R_m of [R_c...]
), which arise from impl Trait.
In this chapter, we'll explain the "heart" of constraint propagation, covering both liveness and outlives constraints.
Notation and high-level concepts
Conceptually, region inference is a "fixed-point" computation. It is
given some set of constraints {C}
and it computes a set of values
Values: R -> {E}
that maps each region R
to a set of elements
{E}
(see here for more notes on region elements):
- Initially, each region is mapped to an empty set, so
Values(R) = {}
for all regionsR
. - Next, we process the constraints repeatedly until a fixed-point is reached:
- For each constraint C:
- Update
Values
as needed to satisfy the constraint
- Update
- For each constraint C:
As a simple example, if we have a liveness constraint R live at E
,
then we can apply Values(R) = Values(R) union {E}
to make the
constraint be satisfied. Similarly, if we have an outlives constraints
R1: R2
, we can apply Values(R1) = Values(R1) union Values(R2)
.
(Member constraints are more complex and we discuss them in this section.)
In practice, however, we are a bit more clever. Instead of applying the constraints in a loop, we can analyze the constraints and figure out the correct order to apply them, so that we only have to apply each constraint once in order to find the final result.
Similarly, in the implementation, the Values
set is stored in the
scc_values
field, but they are indexed not by a region but by a
strongly connected component (SCC). SCCs are an optimization that
avoids a lot of redundant storage and computation. They are explained
in the section on outlives constraints.
Liveness constraints
A liveness constraint arises when some variable whose type includes a region R is live at some point P. This simply means that the value of R must include the point P. Liveness constraints are computed by the MIR type checker.
A liveness constraint R live at E
is satisfied if E
is a member of
Values(R)
. So to "apply" such a constraint to Values
, we just have
to compute Values(R) = Values(R) union {E}
.
The liveness values are computed in the type-check and passed to the
region inference upon creation in the liveness_constraints
argument.
These are not represented as individual constraints like R live at E
though; instead, we store a (sparse) bitset per region variable (of
type LivenessValues
). This way we only need a single bit for each
liveness constraint.
One thing that is worth mentioning: All lifetime parameters are always considered to be live over the entire function body. This is because they correspond to some portion of the caller's execution, and that execution clearly includes the time spent in this function, since the caller is waiting for us to return.
Outlives constraints
An outlives constraint 'a: 'b
indicates that the value of 'a
must
be a superset of the value of 'b
. That is, an outlives
constraint R1: R2
is satisfied if Values(R1)
is a superset of
Values(R2)
. So to "apply" such a constraint to Values
, we just
have to compute Values(R1) = Values(R1) union Values(R2)
.
One observation that follows from this is that if you have R1: R2
and R2: R1
, then R1 = R2
must be true. Similarly, if you have:
R1: R2
R2: R3
R3: R4
R4: R1
then R1 = R2 = R3 = R4
follows. We take advantage of this to make things
much faster, as described shortly.
In the code, the set of outlives constraints is given to the region
inference context on creation in a parameter of type
OutlivesConstraintSet
. The constraint set is basically just a list of 'a: 'b
constraints.
The outlives constraint graph and SCCs
In order to work more efficiently with outlives constraints, they are
converted into the form of a graph, where the nodes of the
graph are region variables ('a
, 'b
) and each constraint 'a: 'b
induces an edge 'a -> 'b
. This conversion happens in the
RegionInferenceContext::new
function that creates the inference
context.
When using a graph representation, we can detect regions that must be equal by looking for cycles. That is, if you have a constraint like
'a: 'b
'b: 'c
'c: 'd
'd: 'a
then this will correspond to a cycle in the graph containing the
elements 'a...'d
.
Therefore, one of the first things that we do in propagating region
values is to compute the strongly connected components (SCCs) in
the constraint graph. The result is stored in the constraint_sccs
field. You can then easily find the SCC that a region r
is a part of
by invoking constraint_sccs.scc(r)
.
Working in terms of SCCs allows us to be more efficient: if we have a
set of regions 'a...'d
that are part of a single SCC, we don't have
to compute/store their values separately. We can just store one value
for the SCC, since they must all be equal.
If you look over the region inference code, you will see that a number
of fields are defined in terms of SCCs. For example, the
scc_values
field stores the values of each SCC. To get the value
of a specific region 'a
then, we first figure out the SCC that the
region is a part of, and then find the value of that SCC.
When we compute SCCs, we not only figure out which regions are a member of each SCC, we also figure out the edges between them. So for example consider this set of outlives constraints:
'a: 'b
'b: 'a
'a: 'c
'c: 'd
'd: 'c
Here we have two SCCs: S0 contains 'a
and 'b
, and S1 contains 'c
and 'd
. But these SCCs are not independent: because 'a: 'c
, that
means that S0: S1
as well. That is -- the value of S0
must be a
superset of the value of S1
. One crucial thing is that this graph of
SCCs is always a DAG -- that is, it never has cycles. This is because
all the cycles have been removed to form the SCCs themselves.
Applying liveness constraints to SCCs
The liveness constraints that come in from the type-checker are
expressed in terms of regions -- that is, we have a map like
Liveness: R -> {E}
. But we want our final result to be expressed
in terms of SCCs -- we can integrate these liveness constraints very
easily just by taking the union:
for each region R:
let S be the SCC that contains R
Values(S) = Values(S) union Liveness(R)
In the region inferencer, this step is done in RegionInferenceContext::new
.
Applying outlives constraints
Once we have computed the DAG of SCCs, we use that to structure out
entire computation. If we have an edge S1 -> S2
between two SCCs,
that means that Values(S1) >= Values(S2)
must hold. So, to compute
the value of S1
, we first compute the values of each successor S2
.
Then we simply union all of those values together. To use a
quasi-iterator-like notation:
Values(S1) =
s1.successors()
.map(|s2| Values(s2))
.union()
In the code, this work starts in the propagate_constraints
function, which iterates over all the SCCs. For each SCC S1
, we
compute its value by first computing the value of its
successors. Since SCCs form a DAG, we don't have to be concerned about
cycles, though we do need to keep a set around to track whether we
have already processed a given SCC or not. For each successor S2
, once
we have computed S2
's value, we can union those elements into the
value for S1
. (Although we have to be careful in this process to
properly handle higher-ranked
placeholders. Note that the value
for S1
already contains the liveness constraints, since they were
added in RegionInferenceContext::new
.
Once that process is done, we now have the "minimal value" for S1
,
taking into account all of the liveness and outlives
constraints. However, in order to complete the process, we must also
consider member constraints, which are described in a later
section.
Universal regions
"Universal regions" is the name that the code uses to refer to "named
lifetimes" -- e.g., lifetime parameters and 'static
. The name
derives from the fact that such lifetimes are "universally quantified"
(i.e., we must make sure the code is true for all values of those
lifetimes). It is worth spending a bit of discussing how lifetime
parameters are handled during region inference. Consider this example:
fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 {
x
}
This example is intended not to compile, because we are returning x
,
which has type &'a u32
, but our signature promises that we will
return a &'b u32
value. But how are lifetimes like 'a
and 'b
integrated into region inference, and how this error wind up being
detected?
Universal regions and their relationships to one another
Early on in region inference, one of the first things we do is to
construct a UniversalRegions
struct. This struct tracks the
various universal regions in scope on a particular function. We also
create a UniversalRegionRelations
struct, which tracks their
relationships to one another. So if you have e.g. where 'a: 'b
, then
the UniversalRegionRelations
struct would track that 'a: 'b
is
known to hold (which could be tested with the outlives
function.
Everything is a region variable
One important aspect of how NLL region inference works is that all
lifetimes are represented as numbered variables. This means that the
only variant of ty::RegionKind
that we use is the ReVar
variant. These region variables are broken into two major categories,
based on their index:
- 0..N: universal regions -- the ones we are discussing here. In this case, the code must be correct with respect to any value of those variables that meets the declared relationships.
- N..M: existential regions -- inference variables where the region inferencer is tasked with finding some suitable value.
In fact, the universal regions can be further subdivided based on
where they were brought into scope (see the RegionClassification
type). These subdivisions are not important for the topics discussed
here, but become important when we consider closure constraint
propagation, so we discuss them there.
Universal lifetimes as the elements of a region's value
As noted previously, the value that we infer for each region is a set
{E}
. The elements of this set can be points in the control-flow
graph, but they can also be an element end('a)
corresponding to each
universal lifetime 'a
. If the value for some region R0
includes
end('a
), then this implies that R0
must extend until the end of 'a
in the caller.
The "value" of a universal region
During region inference, we compute a value for each universal region in the same way as we compute values for other regions. This value represents, effectively, the lower bound on that universal region -- the things that it must outlive. We now describe how we use this value to check for errors.
Liveness and universal regions
All universal regions have an initial liveness constraint that
includes the entire function body. This is because lifetime parameters
are defined in the caller and must include the entirety of the
function call that invokes this particular function. In addition, each
universal region 'a
includes itself (that is, end('a)
) in its
liveness constraint (i.e., 'a
must extend until the end of
itself). In the code, these liveness constraints are setup in
init_free_and_bound_regions
.
Propagating outlives constraints for universal regions
So, consider the first example of this section:
fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 {
x
}
Here, returning x
requires that &'a u32 <: &'b u32
, which gives
rise to an outlives constraint 'a: 'b
. Combined with our default liveness
constraints we get:
'a live at {B, end('a)} // B represents the "function body"
'b live at {B, end('b)}
'a: 'b
When we process the 'a: 'b
constraint, therefore, we will add
end('b)
into the value for 'a
, resulting in a final value of {B, end('a), end('b)}
.
Detecting errors
Once we have finished constraint propagation, we then enforce a
constraint that if some universal region 'a
includes an element
end('b)
, then 'a: 'b
must be declared in the function's bounds. If
not, as in our example, that is an error. This check is done in the
check_universal_regions
function, which simply iterates over all
universal regions, inspects their final value, and tests against the
declared UniversalRegionRelations
.
Member constraints
A member constraint 'm member of ['c_1..'c_N]
expresses that the
region 'm
must be equal to some choice regions 'c_i
(for
some i
). These constraints cannot be expressed by users, but they
arise from impl Trait
due to its lifetime capture rules. Consider a
function such as the following:
fn make(a: &'a u32, b: &'b u32) -> impl Trait<'a, 'b> { .. }
Here, the true return type (often called the "hidden type") is only
permitted to capture the lifetimes 'a
or 'b
. You can kind of see
this more clearly by desugaring that impl Trait
return type into its
more explicit form:
type MakeReturn<'x, 'y> = impl Trait<'x, 'y>;
fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> { .. }
Here, the idea is that the hidden type must be some type that could
have been written in place of the impl Trait<'x, 'y>
-- but clearly
such a type can only reference the regions 'x
or 'y
(or
'static
!), as those are the only names in scope. This limitation is
then translated into a restriction to only access 'a
or 'b
because
we are returning MakeReturn<'a, 'b>
, where 'x
and 'y
have been
replaced with 'a
and 'b
respectively.
Detailed example
To help us explain member constraints in more detail, let's spell out
the make
example in a bit more detail. First off, let's assume that
you have some dummy trait:
trait Trait<'a, 'b> { }
impl<T> Trait<'_, '_> for T { }
and this is the make
function (in desugared form):
type MakeReturn<'x, 'y> = impl Trait<'x, 'y>;
fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> {
(a, b)
}
What happens in this case is that the return type will be (&'0 u32, &'1 u32)
,
where '0
and '1
are fresh region variables. We will have the following
region constraints:
'0 live at {L}
'1 live at {L}
'a: '0
'b: '1
'0 member of ['a, 'b, 'static]
'1 member of ['a, 'b, 'static]
Here the "liveness set" {L}
corresponds to that subset of the body
where '0
and '1
are live -- basically the point from where the
return tuple is constructed to where it is returned (in fact, '0
and
'1
might have slightly different liveness sets, but that's not very
interesting to the point we are illustrating here).
The 'a: '0
and 'b: '1
constraints arise from subtyping. When we
construct the (a, b)
value, it will be assigned type (&'0 u32, &'1 u32)
-- the region variables reflect that the lifetimes of these
references could be made smaller. For this value to be created from
a
and b
, however, we do require that:
(&'a u32, &'b u32) <: (&'0 u32, &'1 u32)
which means in turn that &'a u32 <: &'0 u32
and hence that 'a: '0
(and similarly that &'b u32 <: &'1 u32
, 'b: '1
).
Note that if we ignore member constraints, the value of '0
would be
inferred to some subset of the function body (from the liveness
constraints, which we did not write explicitly). It would never become
'a
, because there is no need for it too -- we have a constraint that
'a: '0
, but that just puts a "cap" on how large '0
can grow to
become. Since we compute the minimal value that we can, we are happy
to leave '0
as being just equal to the liveness set. This is where
member constraints come in.
Choices are always lifetime parameters
At present, the "choice" regions from a member constraint are always
lifetime parameters from the current function. This falls out from the
placement of impl Trait, though in the future it may not be the case.
We take some advantage of this fact, as it simplifies the current
code. In particular, we don't have to consider a case like '0 member of ['1, 'static]
, in which the value of both '0
and '1
are being
inferred and hence changing. See rust-lang/rust#61773 for more
information.
Applying member constraints
Member constraints are a bit more complex than other forms of
constraints. This is because they have a "or" quality to them -- that
is, they describe multiple choices that we must select from. E.g., in
our example constraint '0 member of ['a, 'b, 'static]
, it might be
that '0
is equal to 'a
, 'b
, or 'static
. How can we pick the
correct one? What we currently do is to look for a minimal choice
-- if we find one, then we will grow '0
to be equal to that minimal
choice. To find that minimal choice, we take two factors into
consideration: lower and upper bounds.
Lower bounds
The lower bounds are those lifetimes that '0
must outlive --
i.e., that '0
must be larger than. In fact, when it comes time to
apply member constraints, we've already computed the lower bounds of
'0
because we computed its minimal value (or at least, the lower
bounds considering everything but member constraints).
Let LB
be the current value of '0
. We know then that '0: LB
must
hold, whatever the final value of '0
is. Therefore, we can rule out
any choice 'choice
where 'choice: LB
does not hold.
Unfortunately, in our example, this is not very helpful. The lower
bound for '0
will just be the liveness set {L}
, and we know that
all the lifetime parameters outlive that set. So we are left with the
same set of choices here. (But in other examples, particularly those
with different variance, lower bound constraints may be relevant.)
Upper bounds
The upper bounds are those lifetimes that must outlive '0
--
i.e., that '0
must be smaller than. In our example, this would be
'a
, because we have the constraint that 'a: '0
. In more complex
examples, the chain may be more indirect.
We can use upper bounds to rule out members in a very similar way to
lower lower bounds. If UB is some upper bound, then we know that UB: '0
must hold, so we can rule out any choice 'choice
where UB: 'choice
does not hold.
In our example, we would be able to reduce our choice set from ['a, 'b, 'static]
to just ['a]
. This is because '0
has an upper bound
of 'a
, and neither 'a: 'b
nor 'a: 'static
is known to hold.
(For notes on how we collect upper bounds in the implementation, see the section below.)
Minimal choice
After applying lower and upper bounds, we can still sometimes have
multiple possibilities. For example, imagine a variant of our example
using types with the opposite variance. In that case, we would have
the constraint '0: 'a
instead of 'a: '0
. Hence the current value
of '0
would be {L, 'a}
. Using this as a lower bound, we would be
able to narrow down the member choices to ['a, 'static]
because 'b: 'a
is not known to hold (but 'a: 'a
and 'static: 'a
do hold). We
would not have any upper bounds, so that would be our final set of choices.
In that case, we apply the minimal choice rule -- basically, if
one of our choices if smaller than the others, we can use that. In
this case, we would opt for 'a
(and not 'static
).
This choice is consistent with the general 'flow' of region propagation, which always aims to compute a minimal value for the region being inferred. However, it is somewhat arbitrary.
Collecting upper bounds in the implementation
In practice, computing upper bounds is a bit inconvenient, because our
data structures are setup for the opposite. What we do is to compute
the reverse SCC graph (we do this lazily and cache the result) --
that is, a graph where 'a: 'b
induces an edge SCC('b) -> SCC('a)
. Like the normal SCC graph, this is a DAG. We can then do a
depth-first search starting from SCC('0)
in this graph. This will
take us to all the SCCs that must outlive '0
.
One wrinkle is that, as we walk the "upper bound" SCCs, their values
will not yet have been fully computed. However, we have already
applied their liveness constraints, so we have some information about
their value. In particular, for any regions representing lifetime
parameters, their value will contain themselves (i.e., the initial
value for 'a
includes 'a
and the value for 'b
contains 'b
). So
we can collect all of the lifetime parameters that are reachable,
which is precisely what we are interested in.
Placeholders and universes
From time to time we have to reason about regions that we can't concretely know. For example, consider this program:
// A function that needs a static reference
fn foo(x: &'static u32) { }
fn bar(f: for<'a> fn(&'a u32)) {
// ^^^^^^^^^^^^^^^^^^^ a function that can accept **any** reference
let x = 22;
f(&x);
}
fn main() {
bar(foo);
}
This program ought not to type-check: foo
needs a static reference
for its argument, and bar
wants to be given a function that that
accepts any reference (so it can call it with something on its
stack, for example). But how do we reject it and why?
Subtyping and Placeholders
When we type-check main
, and in particular the call bar(foo)
, we
are going to wind up with a subtyping relationship like this one:
fn(&'static u32) <: for<'a> fn(&'a u32)
---------------- -------------------
the type of `foo` the type `bar` expects
We handle this sort of subtyping by taking the variables that are
bound in the supertype and replacing them with
universally quantified
representatives, denoted like !1
here. We call these regions "placeholder
regions" – they represent, basically, "some unknown region".
Once we've done that replacement, we have the following relation:
fn(&'static u32) <: fn(&'!1 u32)
The key idea here is that this unknown region '!1
is not related to
any other regions. So if we can prove that the subtyping relationship
is true for '!1
, then it ought to be true for any region, which is
what we wanted.
So let's work through what happens next. To check if two functions are subtypes, we check if their arguments have the desired relationship (fn arguments are contravariant, so we swap the left and right here):
&'!1 u32 <: &'static u32
According to the basic subtyping rules for a reference, this will be
true if '!1: 'static
. That is – if "some unknown region !1
" lives
outlives 'static
. Now, this might be true – after all, '!1
could be 'static
– but we don't know that it's true. So this
should yield up an error (eventually).
What is a universe?
In the previous section, we introduced the idea of a placeholder
region, and we denoted it !1
. We call this number 1
the universe
index. The idea of a "universe" is that it is a set of names that
are in scope within some type or at some point. Universes are formed
into a tree, where each child extends its parents with some new names.
So the root universe conceptually contains global names, such as
the the lifetime 'static
or the type i32
. In the compiler, we also
put generic type parameters into this root universe (in this sense,
there is not just one root universe, but one per item). So consider
this function bar
:
struct Foo { }
fn bar<'a, T>(t: &'a T) {
...
}
Here, the root universe would consist of the lifetimes 'static
and
'a
. In fact, although we're focused on lifetimes here, we can apply
the same concept to types, in which case the types Foo
and T
would
be in the root universe (along with other global types, like i32
).
Basically, the root universe contains all the names that
appear free in the body of bar
.
Now let's extend bar
a bit by adding a variable x
:
fn bar<'a, T>(t: &'a T) {
let x: for<'b> fn(&'b u32) = ...;
}
Here, the name 'b
is not part of the root universe. Instead, when we
"enter" into this for<'b>
(e.g., by replacing it with a placeholder), we will create
a child universe of the root, let's call it U1:
U0 (root universe)
│
└─ U1 (child universe)
The idea is that this child universe U1 extends the root universe U0
with a new name, which we are identifying by its universe number:
!1
.
Now let's extend bar
a bit by adding one more variable, y
:
fn bar<'a, T>(t: &'a T) {
let x: for<'b> fn(&'b u32) = ...;
let y: for<'c> fn(&'b u32) = ...;
}
When we enter this type, we will again create a new universe, which
we'll call U2
. Its parent will be the root universe, and U1 will be
its sibling:
U0 (root universe)
│
├─ U1 (child universe)
│
└─ U2 (child universe)
This implies that, while in U2, we can name things from U0 or U2, but not U1.
Giving existential variables a universe. Now that we have this
notion of universes, we can use it to extend our type-checker and
things to prevent illegal names from leaking out. The idea is that we
give each inference (existential) variable – whether it be a type or
a lifetime – a universe. That variable's value can then only
reference names visible from that universe. So for example if a
lifetime variable is created in U0, then it cannot be assigned a value
of !1
or !2
, because those names are not visible from the universe
U0.
Representing universes with just a counter. You might be surprised to see that the compiler doesn't keep track of a full tree of universes. Instead, it just keeps a counter – and, to determine if one universe can see another one, it just checks if the index is greater. For example, U2 can see U0 because 2 >= 0. But U0 cannot see U2, because 0 >= 2 is false.
How can we get away with this? Doesn't this mean that we would allow U2 to also see U1? The answer is that, yes, we would, if that question ever arose. But because of the structure of our type checker etc, there is no way for that to happen. In order for something happening in the universe U1 to "communicate" with something happening in U2, they would have to have a shared inference variable X in common. And because everything in U1 is scoped to just U1 and its children, that inference variable X would have to be in U0. And since X is in U0, it cannot name anything from U1 (or U2). This is perhaps easiest to see by using a kind of generic "logic" example:
exists<X> {
forall<Y> { ... /* Y is in U1 ... */ }
forall<Z> { ... /* Z is in U2 ... */ }
}
Here, the only way for the two foralls to interact would be through X, but neither Y nor Z are in scope when X is declared, so its value cannot reference either of them.
Universes and placeholder region elements
But where does that error come from? The way it happens is like this.
When we are constructing the region inference context, we can tell
from the type inference context how many placeholder variables exist
(the InferCtxt
has an internal counter). For each of those, we
create a corresponding universal region variable !n
and a "region
element" placeholder(n)
. This corresponds to "some unknown set of other
elements". The value of !n
is {placeholder(n)}
.
At the same time, we also give each existential variable a
universe (also taken from the InferCtxt
). This universe
determines which placeholder elements may appear in its value: For
example, a variable in universe U3 may name placeholder(1)
, placeholder(2)
, and
placeholder(3)
, but not placeholder(4)
. Note that the universe of an inference
variable controls what region elements can appear in its value; it
does not say region elements will appear.
Placeholders and outlives constraints
In the region inference engine, outlives constraints have the form:
V1: V2 @ P
where V1
and V2
are region indices, and hence map to some region
variable (which may be universally or existentially quantified). The
P
here is a "point" in the control-flow graph; it's not important
for this section. This variable will have a universe, so let's call
those universes U(V1)
and U(V2)
respectively. (Actually, the only
one we are going to care about is U(V1)
.)
When we encounter this constraint, the ordinary procedure is to start
a DFS from P
. We keep walking so long as the nodes we are walking
are present in value(V2)
and we add those nodes to value(V1)
. If
we reach a return point, we add in any end(X)
elements. That part
remains unchanged.
But then after that we want to iterate over the placeholder placeholder(x)
elements in V2 (each of those must be visible to U(V2)
, but we
should be able to just assume that is true, we don't have to check
it). We have to ensure that value(V1)
outlives each of those
placeholder elements.
Now there are two ways that could happen. First, if U(V1)
can see
the universe x
(i.e., x <= U(V1)
), then we can just add placeholder(x)
to value(V1)
and be done. But if not, then we have to approximate:
we may not know what set of elements placeholder(x)
represents, but we
should be able to compute some sort of upper bound B for it –
some region B that outlives placeholder(x)
. For now, we'll just use
'static
for that (since it outlives everything) – in the future, we
can sometimes be smarter here (and in fact we have code for doing this
already in other contexts). Moreover, since 'static
is in the root
universe U0, we know that all variables can see it – so basically if
we find that value(V2)
contains placeholder(x)
for some universe x
that V1
can't see, then we force V1
to 'static
.
Extending the "universal regions" check
After all constraints have been propagated, the NLL region inference
has one final check, where it goes over the values that wound up being
computed for each universal region and checks that they did not get
'too large'. In our case, we will go through each placeholder region
and check that it contains only the placeholder(u)
element it is known to
outlive. (Later, we might be able to know that there are relationships
between two placeholder regions and take those into account, as we do
for universal regions from the fn signature.)
Put another way, the "universal regions" check can be considered to be checking constraints like:
{placeholder(1)}: V1
where {placeholder(1)}
is like a constant set, and V1 is the variable we
made to represent the !1
region.
Back to our example
OK, so far so good. Now let's walk through what would happen with our first example:
fn(&'static u32) <: fn(&'!1 u32) @ P // this point P is not imp't here
The region inference engine will create a region element domain like this:
{ CFG; end('static); placeholder(1) }
--- ------------ ------- from the universe `!1`
| 'static is always in scope
all points in the CFG; not especially relevant here
It will always create two universal variables, one representing
'static
and one representing '!1
. Let's call them Vs and V1. They
will have initial values like so:
Vs = { CFG; end('static) } // it is in U0, so can't name anything else
V1 = { placeholder(1) }
From the subtyping constraint above, we would have an outlives constraint like
'!1: 'static @ P
To process this, we would grow the value of V1 to include all of Vs:
Vs = { CFG; end('static) }
V1 = { CFG; end('static), placeholder(1) }
At that point, constraint propagation is complete, because all the outlives relationships are satisfied. Then we would go to the "check universal regions" portion of the code, which would test that no universal region grew too large.
In this case, V1
did grow too large – it is not known to outlive
end('static)
, nor any of the CFG – so we would report an error.
Another example
What about this subtyping relationship?
for<'a> fn(&'a u32, &'a u32)
<:
for<'b, 'c> fn(&'b u32, &'c u32)
Here we would replace the bound region in the supertype with a placeholder, as before, yielding:
for<'a> fn(&'a u32, &'a u32)
<:
fn(&'!1 u32, &'!2 u32)
then we instantiate the variable on the left-hand side with an
existential in universe U2, yielding the following (?n
is a notation
for an existential variable):
fn(&'?3 u32, &'?3 u32)
<:
fn(&'!1 u32, &'!2 u32)
Then we break this down further:
&'!1 u32 <: &'?3 u32
&'!2 u32 <: &'?3 u32
and even further, yield up our region constraints:
'!1: '?3
'!2: '?3
Note that, in this case, both '!1
and '!2
have to outlive the
variable '?3
, but the variable '?3
is not forced to outlive
anything else. Therefore, it simply starts and ends as the empty set
of elements, and hence the type-check succeeds here.
(This should surprise you a little. It surprised me when I first realized it.
We are saying that if we are a fn that needs both of its arguments to have
the same region, we can accept being called with arguments with two
distinct regions. That seems intuitively unsound. But in fact, it's fine, as
I tried to explain in this issue on the Rust issue
tracker long ago. The reason is that even if we get called with arguments of
two distinct lifetimes, those two lifetimes have some intersection (the call
itself), and that intersection can be our value of 'a
that we use as the
common lifetime of our arguments. -nmatsakis)
Final example
Let's look at one last example. We'll extend the previous one to have a return type:
for<'a> fn(&'a u32, &'a u32) -> &'a u32
<:
for<'b, 'c> fn(&'b u32, &'c u32) -> &'b u32
Despite seeming very similar to the previous example, this case is going to get an error. That's good: the problem is that we've gone from a fn that promises to return one of its two arguments, to a fn that is promising to return the first one. That is unsound. Let's see how it plays out.
First, we replace the bound region in the supertype with a placeholder:
for<'a> fn(&'a u32, &'a u32) -> &'a u32
<:
fn(&'!1 u32, &'!2 u32) -> &'!1 u32
Then we instantiate the subtype with existentials (in U2):
fn(&'?3 u32, &'?3 u32) -> &'?3 u32
<:
fn(&'!1 u32, &'!2 u32) -> &'!1 u32
And now we create the subtyping relationships:
&'!1 u32 <: &'?3 u32 // arg 1
&'!2 u32 <: &'?3 u32 // arg 2
&'?3 u32 <: &'!1 u32 // return type
And finally the outlives relationships. Here, let V1, V2, and V3 be the
variables we assign to !1
, !2
, and ?3
respectively:
V1: V3
V2: V3
V3: V1
Those variables will have these initial values:
V1 in U1 = {placeholder(1)}
V2 in U2 = {placeholder(2)}
V3 in U2 = {}
Now because of the V3: V1
constraint, we have to add placeholder(1)
into V3
(and
indeed it is visible from V3
), so we get:
V3 in U2 = {placeholder(1)}
then we have this constraint V2: V3
, so we wind up having to enlarge
V2
to include placeholder(1)
(which it can also see):
V2 in U2 = {placeholder(1), placeholder(2)}
Now constraint propagation is done, but when we check the outlives
relationships, we find that V2
includes this new element placeholder(1)
,
so we report an error.
Propagating closure constraints
When we are checking the type tests and universal regions, we may come across a constraint that we can't prove yet if we are in a closure body! However, the necessary constraints may actually hold (we just don't know it yet). Thus, if we are inside a closure, we just collect all the constraints we can't prove yet and return them. Later, when we are borrow check the MIR node that created the closure, we can also check that these constraints hold. At that time, if we can't prove they hold, we report an error.
Reporting region errors
TODO: we should discuss how to generate errors from the results of these analyses.
Two-phase borrows
Two-phase borrows are a more permissive version of mutable borrows that allow
nested method calls such as vec.push(vec.len())
. Such borrows first act as
shared borrows in a "reservation" phase and can later be "activated" into a
full mutable borrow.
Only certain implicit mutable borrows can be two-phase, any &mut
or ref mut
in the source code is never a two-phase borrow. The cases where we generate a
two-phase borrow are:
- The autoref borrow when calling a method with a mutable reference receiver.
- A mutable reborrow in function arguments.
- The implicit mutable borrow in an overloaded compound assignment operator.
To give some examples:
#![allow(unused_variables)] fn main() { // In the source code // Case 1: let mut v = Vec::new(); v.push(v.len()); let r = &mut Vec::new(); r.push(r.len()); // Case 2: std::mem::replace(r, vec![1, r.len()]); // Case 3: let mut x = std::num::Wrapping(2); x += x; }
Expanding these enough to show the two-phase borrows:
// Case 1:
let mut v = Vec::new();
let temp1 = &two_phase v;
let temp2 = v.len();
Vec::push(temp1, temp2);
let r = &mut Vec::new();
let temp3 = &two_phase *r;
let temp4 = r.len();
Vec::push(temp3, temp4);
// Case 2:
let temp5 = &two_phase *r;
let temp6 = vec![1, r.len()];
std::mem::replace(temp5, temp6);
// Case 3:
let mut x = std::num::Wrapping(2);
let temp7 = &two_phase x;
let temp8 = x;
std::ops::AddAssign::add_assign(temp7, temp8);
Whether a borrow can be two-phase is tracked by a flag on the AutoBorrow
after type checking, which is then converted to a BorrowKind
during MIR
construction.
Each two-phase borrow is assigned to a temporary that is only used once. As such we can define:
- The point where the temporary is assigned to is called the reservation point of the two-phase borrow.
- The point where the temporary is used, which is effectively always a function call, is called the activation point.
The activation points are found using the GatherBorrows
visitor. The
BorrowData
then holds both the reservation and activation points for the
borrow.
Checking two-phase borrows
Two-phase borrows are treated as if they were mutable borrows with the following exceptions:
- At every location in the MIR we check if any two-phase borrows are activated at this location. If a live two phase borrow is activated at a location, then we check that there are no borrows that conflict with the two-phase borrow.
- At the reservation point we error if there are conflicting live mutable borrows. And lint if there are any conflicting shared borrows.
- Between the reservation and the activation point, the two-phase borrow acts
as a shared borrow. We determine (in
is_active
) if we're at such a point by using theDominators
for the MIR graph. - After the activation point, the two-phase borrow acts as a mutable borrow.
Parameter Environment
When working with associated and/or or generic items (types, constants,
functions/methods) it is often relevant to have more information about the
Self
or generic parameters. Trait bounds and similar information is encoded in
the ParamEnv
. Often this is not enough information to obtain things like the
type's Layout
, but you can do all kinds of other checks on it (e.g. whether a
type implements Copy
) or you can evaluate an associated constant whose value
does not depend on anything from the parameter environment.
For example if you have a function
#![allow(unused_variables)] fn main() { fn foo<T: Copy>(t: T) { ... } }
the parameter environment for that function is [T: Copy]
. This means any
evaluation within this function will, when accessing the type T
, know about
its Copy
bound via the parameter environment.
You can get the parameter environment for a def_id
using the
param_env
query. However, this ParamEnv
can be too generic for
your use case. Using the ParamEnv
from the surrounding context can allow you
to evaluate more things. For example, suppose we had something the following:
#![allow(unused_variables)] fn main() { trait Foo { type Assoc; } trait Bar { } trait Baz { fn stuff() -> bool; } fn foo<T>(t: T) where T: Foo, <T as Foo>::Assoc: Bar { bar::<T::Assoc>() } fn bar<T: Baz>() { if T::stuff() { mep() } else { mop() } } }
We may know some things inside bar
that we wouldn't know if we just fetched
bar
's param env because of the <T as Foo>::Assoc: Bar
bound in foo
. This
is a contrived example that makes no sense in our existing analyses, but we may
run into similar cases when doing analyses with associated constants on generic
traits or traits with assoc types.
Bundling
Another great thing about ParamEnv
is that you can use it to bundle the thing
depending on generic parameters (e.g. a Ty
) by calling the and
method. This will produce a ParamEnvAnd<Ty>
, making clear that you
should probably not be using the inner value without taking care to also use
the ParamEnv
.
From MIR to Binaries
All of the preceding chapters of this guide have one thing in common: we never generated any executable machine code at all! With this chapter, all of that changes.
So far, we've shown how the compiler can take raw source code in text format and transform it into MIR. We have also shown how the compiler does various analyses on the code to detect things like type or lifetime errors. Now, we will finally take the MIR and produce some executable machine code.
NOTE: This part of a compiler is often called the backend the term is a bit overloaded because in the compiler source, it usually refers to the "codegen backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend" in this part, we are refering to the "codegen backend".
So what do we need to do?
- First, we need to collect the set of things to generate code for. In particular, we need to find out which concrete types to substitute for generic ones, since we need to generate code for the concrete types. Generating code for the concrete types (i.e. emitting a copy of the code for each concrete type) is called monomorphization, so the process of collecting all the concrete types is called monomorphization collection.
- Next, we need to actually lower the MIR to a codegen IR (usually LLVM IR) for each concrete type we collected.
- Finally, we need to invoke LLVM or Cranelift, which runs a bunch of optimization passes, generates executable code, and links together an executable binary.
The code for codegen is actually a bit complex due to a few factors:
- Support for multiple codegen backends (LLVM and Cranelift). We try to share as much backend code between them as possible, so a lot of it is generic over the codegen implementation. This means that there are often a lot of layers of abstraction.
- Codegen happens asynchronously in another thread for performance.
- The actual codegen is done by a third-party library (either LLVM or Cranelift).
Generally, the rustc_codegen_ssa
crate contains backend-agnostic code
(i.e. independent of LLVM or Cranelift), while the rustc_codegen_llvm
crate contains code specific to LLVM codegen.
At a very high level, the entry point is
rustc_codegen_ssa::base::codegen_crate
. This function starts the
process discussed in the rest of this chapter.
MIR (中层IR)
MIR 是 Rust's 中层中间表示. MIR是在RFC 1211中引入的。 它是Rust的一种非常简化的形式,用于某些对控制流敏感的安全检查——尤其是是借用检查器! ——以及优化和代码生成。 如果您想阅读对MIR非常层次的介绍,以及它所依赖的一些编译器概念(例如控制流图和简化),则可以欣赏介绍MIR的rust-lang博客文章 。
介绍 MIR
MIR 在 src/librustc_middle/mir/
模块中定义,但许多操纵它的代码都在 src/librustc_mir
.
MIR的一些核心特征有:
- 它基于 控制流图。
- 他没有嵌套的表达式。
- MIR中的所有类型都是完全显式的。
MIR核心词汇
本节介绍了MIR的关键概念,总结如下:
- 基本块: 控制流图的单元,包含了:
- 语句: 有一个后继的动作
- 终结句: 可能有多个后继的动作,永远在块的末尾
- (如果你对术语基本块不熟悉,见 背景知识)
- 本地变量: 在堆栈上分配的内存位置(至少在概念上是这样),例如函数参数,局部变量和临时变量。
这些由索引标识,并带有前导下划线,例如
_1
。 还有一个特殊的“本地变量”(_0
)分配来存储返回值。 - 位置: 用来表达内存中一个位置的表达式,像
_1
或者_1.f
. - 右值: 生成一个值的表达式,“右”意味着这些表达式一般只会出现在赋值语句的右侧。
- 操作数: 右值表达式的参数,可以是一个常数(如
22
)或者一个位置(如_1
)。
- 操作数: 右值表达式的参数,可以是一个常数(如
通过将简单的程序转换为MIR并读取pretty print的输出,您可以了解MIR的结构。 实际上,playgroud使得此操作变得容易,因为它提供了一个MIR按钮,该按钮将向您显示程序的MIR。 尝试运行此程序(或单击此链接),然后单击顶部的“ MIR”按钮:
fn main() { let mut vec = Vec::new(); vec.push(1); vec.push(2); }
你会看见:
// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn main() -> () {
...
}
这是 main
函数的MIR格式。
变量定义 如果我们深入一些,我们可以看到函数以一些变量定义开始,他们看起来像这样:
let mut _0: (); // return place
let mut _1: std::vec::Vec<i32>; // in scope 0 at src/main.rs:2:9: 2:16
let mut _2: ();
let mut _3: &mut std::vec::Vec<i32>;
let mut _4: ();
let mut _5: &mut std::vec::Vec<i32>;
您会看到MIR中的变量没有名称,而是具有索引,例如_0
或_1
。
我们还将用户变量(例如_1
)与临时值(例如_2
或_3
)混为一谈。
但您还是可以区分出哪些是用户定义的变量,因为它们具有与之相关联的调试信息(请参见下文)。
用户变量的调试信息 在变量定义下面,我们能发现唯一能提醒我们 _1
代表的是一个用户变量的提示:
scope 1 {
debug vec => _1; // in scope 1 at src/main.rs:2:9: 2:16
}
每个 debug <Name> => <Place>;
注解都描述了一个用户定义变量与调试器在哪里(即位置)能找到这个变量对应的数据。
这里这个映射非常简单,但优化可能会使得这个位置的使用情况复杂化,也可能会让多个用户变量共享同一个位置。
另外,闭包的捕获也是用同一套系统描述的,这种情况下,即使不进行优化,也已经很复杂了。如:debug x => (*((*_1).0: &T));
。
“scope”块(例如,scope 1 {..}
)描述了源程序的词法结构(某个名称在哪个作用域中),
因此,用// in scope 0
中注释的程序的任何部分都看不到vec
,在调试器中单步执行代码时就能发现这一点。
基本块:进一步阅读代码,我们能看到我们的第一个“基本块”(自然,当您查看它时,它看起来可能略有不同,我也省略了一些注释):
bb0: {
StorageLive(_1);
_1 = const <std::vec::Vec<T>>::new() -> bb2;
}
基本块由一系列语句和最终终结句定义。 在这个例子,有一个语句:
StorageLive(_1);
该语句表明变量 _1
是“活动的”,这意味着它可以在以后使用 —— 它将持续存在,直到遇到 StorageDead(_1)
语句为止,该语句表明变量_1
已完成使用。
LLVM使用这些“存储语句”来分配栈空间。
bb0
块的 终结句 是对 Vec::new
的调用:
_1 = const <std::vec::Vec<T>>::new() -> bb2;
终结句和一般语句不同,它们能有多个后继 —— 控制流可能会流向不同的地方。
像 Vec::new
这样的函数调用永远是终结句,因为这可能可以导致堆栈解退,尽管在Vec::new
的情况下显然堆栈解退是不可能的,因此我们只列出了唯一的后继块bb2
。
如果我们继续向前看到 bb2
,我们可以看见像这样的代码:
bb2: {
StorageLive(_3);
_3 = &mut _1;
_2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4];
}
这里有两个语句:另一个 StorageLive
,引入了 _3
临时变量,然后是一个赋值:
_3 = &mut _1;
赋值一般有形式:
<Place> = <Rvalue>
位置是类似于_3
,_ 3.f
或* _3
的表达式——它表示内存中的位置。
右值是一个创建值的表达式:在这种情况下,rvalue是一个可变借用表达式,看起来像&mut <Place>
。
因此,我们可以为右值定义语法,如下所示:
<Rvalue> = & (mut)? <Place>
| <Operand> + <Operand>
| <Operand> - <Operand>
| ...
<Operand> = Constant
| copy Place
| move Place
从该语法可以看出,右值不能嵌套——它们只能引用位置和常量。
此外,当您使用某个位置时,我们会指明是要复制该位置(要求该位置的类型为 T: Copy
)还是移动它(适用于 任何类型的位置)。
因此,例如,如果我们在Rust中写了表达式x = a + b + c
,它将被编译为两个语句和一个临时变量:
TMP1 = a + b
x = TMP1 + c
(试试看,你可能想要使用release模式来编译来跳过overflow检查)
MIR 中的数据类型
MIR中的数据类型的定义在 src/librustc_middle/mir/
模块中。
前面章节提到的关键概念都有一个直接对应的Rust类型。
MIR的主要数据类型为Mir
。 它包含单个函数的数据(以及Mir的“提升过的常量”的子实例,您可以在下面阅读其中的内容)。
- 基本块: 基本块被保存在
basic_blocks
成员中;这是一个BasicBlockData
向量。 我们不会直接引用一个基本块,代替地,我们会传递BasicBlock
值,其实际上是newtype过的这个向量中的索引。 - 语句 由
Statement
类型表示。 - 终结句 由
Terminator
类型表示。 - 本地变量 由类型
Local
(newtype过的索引)表示。 本地变量的实际数据保存在Mir
中的local_decls
。 也有一个特殊的常量RETURN_PLACE
来标记一个特殊的表示返回值的本地变量。 - 位置 由枚举
Place
表示。有如下变种:- 本地变量如
_1
- 静态变量如
FOO
- 投影,这一般是结构的成员或者从某个基位置“投影”出来的位置。
例如
_1.f
就是从)1
上投影出来的。*_1
也是一个投影,这类投影由ProjectionElem::Deref
代表。
- 本地变量如
- Rvalues 由
Rvalue
枚举表示。 - Operands 由
Operand
枚举表示。
表示常量
to be written
提升过的常量
to be written
MIR optimizations
MIR optimizations are optimizations run on the MIR to produce better MIR before codegen. This is important for two reasons: first, it makes the final generated executable code better, and second, it means that LLVM has less work to do, so compilation is faster. Note that since MIR is generic (not monomorphized yet), these optimizations are particularly effective; we can optimize the generic version, so all of the monomorphizations are cheaper!
MIR optimizations run after borrow checking. We run a series of optimization
passes over the MIR to improve it. Some passes are required to run on all code,
some passes don't actually do optimizations but only check stuff, and some
passes are only turned on in release
mode.
The optimized_mir
query is called to produce the optimized MIR
for a given DefId
. This query makes sure that the borrow checker has
run and that some validation has occurred. Then, it steals the MIR,
optimizes it, and returns the improved MIR.
Defining optimization passes
The list of passes run and the order in which they are run is defined by the
run_optimization_passes
function. It contains an array of passes to
run. Each pass in the array is a struct that implements the MirPass
trait.
The array is an array of &dyn MirPass
trait objects. Typically, a pass is
implemented in its own submodule of the rustc_mir::transform
module.
Some examples of passes are:
CleanupNonCodegenStatements
: remove some of the info that is only needed for analyses, rather than codegen.ConstProp
: Does constant propagation
You can see the "Implementors" section of the MirPass
rustdocs for more examples.
MIR Debugging
The -Zdump-mir
flag can be used to dump a text representation of the MIR. The
-Zdump-mir-graphviz
flag can be used to dump a .dot
file that represents
MIR as a control-flow graph.
-Zdump-mir=F
is a handy compiler options that will let you view the MIR for
each function at each stage of compilation. -Zdump-mir
takes a filter F
which allows you to control which functions and which passes you are
interesting in. For example:
> rustc -Zdump-mir=foo ...
This will dump the MIR for any function whose name contains foo
; it
will dump the MIR both before and after every pass. Those files will
be created in the mir_dump
directory. There will likely be quite a
lot of them!
> cat > foo.rs
fn main() {
println!("Hello, world!");
}
^D
> rustc -Zdump-mir=main foo.rs
> ls mir_dump/* | wc -l
161
The files have names like rustc.main.000-000.CleanEndRegions.after.mir
. These
names have a number of parts:
rustc.main.000-000.CleanEndRegions.after.mir
---- --- --- --------------- ----- either before or after
| | | name of the pass
| | index of dump within the pass (usually 0, but some passes dump intermediate states)
| index of the pass
def-path to the function etc being dumped
You can also make more selective filters. For example, main & CleanEndRegions
will select for things that reference both main
and the pass
CleanEndRegions
:
> rustc -Zdump-mir='main & CleanEndRegions' foo.rs
> ls mir_dump
rustc.main.000-000.CleanEndRegions.after.mir rustc.main.000-000.CleanEndRegions.before.mir
Filters can also have |
parts to combine multiple sets of
&
-filters. For example main & CleanEndRegions | main & NoLandingPads
will select either main
and CleanEndRegions
or
main
and NoLandingPads
:
> rustc -Zdump-mir='main & CleanEndRegions | main & NoLandingPads' foo.rs
> ls mir_dump
rustc.main-promoted[0].002-000.NoLandingPads.after.mir
rustc.main-promoted[0].002-000.NoLandingPads.before.mir
rustc.main-promoted[0].002-006.NoLandingPads.after.mir
rustc.main-promoted[0].002-006.NoLandingPads.before.mir
rustc.main-promoted[1].002-000.NoLandingPads.after.mir
rustc.main-promoted[1].002-000.NoLandingPads.before.mir
rustc.main-promoted[1].002-006.NoLandingPads.after.mir
rustc.main-promoted[1].002-006.NoLandingPads.before.mir
rustc.main.000-000.CleanEndRegions.after.mir
rustc.main.000-000.CleanEndRegions.before.mir
rustc.main.002-000.NoLandingPads.after.mir
rustc.main.002-000.NoLandingPads.before.mir
rustc.main.002-006.NoLandingPads.after.mir
rustc.main.002-006.NoLandingPads.before.mir
(Here, the main-promoted[0]
files refer to the MIR for "promoted constants"
that appeared within the main
function.)
TODO: anything else?
Constant Evaluation
Constant evaluation is the process of computing values at compile time. For a specific item (constant/static/array length) this happens after the MIR for the item is borrow-checked and optimized. In many cases trying to const evaluate an item will trigger the computation of its MIR for the first time.
Prominent examples are
- The initializer of a
static
- Array length
- needs to be known to reserve stack or heap space
- Enum variant discriminants
- needs to be known to prevent two variants from having the same discriminant
- Patterns
- need to be known to check for overlapping patterns
Additionally constant evaluation can be used to reduce the workload or binary size at runtime by precomputing complex operations at compiletime and only storing the result.
Constant evaluation can be done by calling the const_eval
query of TyCtxt
.
The const_eval
query takes a ParamEnv
of environment in
which the constant is evaluated (e.g. the function within which the constant is
used) and a GlobalId
. The GlobalId
is made up of an
Instance
referring to a constant or static or of an
Instance
of a function and an index into the function's Promoted
table.
Constant evaluation returns a Result
with either the error, or the simplest
representation of the constant. "simplest" meaning if it is representable as an
integer or fat pointer, it will directly yield the value (via ConstValue::Scalar
or
ConstValue::ScalarPair
), instead of referring to the miri
virtual
memory allocation (via ConstValue::ByRef
). This means that the const_eval
function cannot be used to create miri-pointers to the evaluated constant or
static. If you need that, you need to directly work with the functions in
src/librustc_mir/const_eval.rs.
Miri
Miri (MIR Interpreter) is a virtual machine for executing MIR without
compiling to machine code. It is usually invoked via tcx.const_eval
.
If you start out with a constant
#![allow(unused_variables)] fn main() { const FOO: usize = 1 << 12; }
rustc doesn't actually invoke anything until the constant is either used or placed into metadata.
Once you have a use-site like
type Foo = [u8; FOO - 42];
The compiler needs to figure out the length of the array before being able to create items that use the type (locals, constants, function arguments, ...).
To obtain the (in this case empty) parameter environment, one can call
let param_env = tcx.param_env(length_def_id);
. The GlobalId
needed is
let gid = GlobalId {
promoted: None,
instance: Instance::mono(length_def_id),
};
Invoking tcx.const_eval(param_env.and(gid))
will now trigger the creation of
the MIR of the array length expression. The MIR will look something like this:
const Foo::{{initializer}}: usize = {
let mut _0: usize; // return pointer
let mut _1: (usize, bool);
bb0: {
_1 = CheckedSub(const Unevaluated(FOO, Slice([])), const 42usize);
assert(!(_1.1: bool), "attempt to subtract with overflow") -> bb1;
}
bb1: {
_0 = (_1.0: usize);
return;
}
}
Before the evaluation, a virtual memory location (in this case essentially a
vec![u8; 4]
or vec![u8; 8]
) is created for storing the evaluation result.
At the start of the evaluation, _0
and _1
are
Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))
. This is quite
a mouthful: Operand
can represent either data stored somewhere in the
interpreter memory (Operand::Indirect
), or (as an optimization)
immediate data stored in-line. And Immediate
can either be a single
(potentially uninitialized) scalar value (integer or thin pointer),
or a pair of two of them. In our case, the single scalar value is not (yet)
initialized.
When the initialization of _1
is invoked, the value of the FOO
constant is
required, and triggers another call to tcx.const_eval
, which will not be shown
here. If the evaluation of FOO is successful, 42
will be subtracted from its
value 4096
and the result stored in _1
as
Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. }, Scalar::Raw { data: 0, .. })
. The first part of the pair is the computed value,
the second part is a bool that's true if an overflow happened. A Scalar::Raw
also stores the size (in bytes) of this scalar value; we are eliding that here.
The next statement asserts that said boolean is 0
. In case the assertion
fails, its error message is used for reporting a compile-time error.
Since it does not fail, Operand::Immediate(Immediate::Scalar(Scalar::Raw { data: 4054, .. }))
is stored in the virtual memory was allocated before the
evaluation. _0
always refers to that location directly.
After the evaluation is done, the return value is converted from Operand
to
ConstValue
by op_to_const
: the former representation is geared towards
what is needed during cost evaluation, while ConstValue
is shaped by the
needs of the remaining parts of the compiler that consume the results of const
evaluation. As part of this conversion, for types with scalar values, even if
the resulting Operand
is Indirect
, it will return an immediate
ConstValue::Scalar(computed_value)
(instead of the usual ConstValue::ByRef
).
This makes using the result much more efficient and also more convenient, as no
further queries need to be executed in order to get at something as simple as a
usize
.
Future evaluations of the same constants will not actually invoke Miri, but just use the cached result.
Datastructures
Miri's outside-facing datastructures can be found in
librustc_middle/mir/interpret.
This is mainly the error enum and the ConstValue
and Scalar
types. A
ConstValue
can be either Scalar
(a single Scalar
, i.e., integer or thin
pointer), Slice
(to represent byte slices and strings, as needed for pattern
matching) or ByRef
, which is used for anything else and refers to a virtual
allocation. These allocations can be accessed via the methods on
tcx.interpret_interner
. A Scalar
is either some Raw
integer or a pointer;
see the next section for more on that.
If you are expecting a numeric result, you can use eval_usize
(panics on
anything that can't be representad as a u64
) or try_eval_usize
which results
in an Option<u64>
yielding the Scalar
if possible.
Memory
To support any kind of pointers, Miri needs to have a "virtual memory" that the
pointers can point to. This is implemented in the Memory
type. In the
simplest model, every global variable, stack variable and every dynamic
allocation corresponds to an Allocation
in that memory. (Actually using an
allocation for every MIR stack variable would be very inefficient; that's why we
have Operand::Immediate
for stack variables that are both small and never have
their address taken. But that is purely an optimization.)
Such an Allocation
is basically just a sequence of u8
storing the value of
each byte in this allocation. (Plus some extra data, see below.) Every
Allocation
has a globally unique AllocId
assigned in Memory
. With that, a
Pointer
consists of a pair of an AllocId
(indicating the allocation) and
an offset into the allocation (indicating which byte of the allocation the
pointer points to). It may seem odd that a Pointer
is not just an integer
address, but remember that during const evaluation, we cannot know at which
actual integer address the allocation will end up -- so we use AllocId
as
symbolic base addresses, which means we need a separate offset. (As an aside,
it turns out that pointers at run-time are
more than just integers, too.)
These allocations exist so that references and raw pointers have something to
point to. There is no global linear heap in which things are allocated, but each
allocation (be it for a local variable, a static or a (future) heap allocation)
gets its own little memory with exactly the required size. So if you have a
pointer to an allocation for a local variable a
, there is no possible (no
matter how unsafe) operation that you can do that would ever change said pointer
to a pointer to a different local variable b
.
Pointer arithmetic on a
will only ever change its offset; the AllocId
stays the same.
This, however, causes a problem when we want to store a Pointer
into an
Allocation
: we cannot turn it into a sequence of u8
of the right length!
AllocId
and offset together are twice as big as a pointer "seems" to be. This
is what the relocation
field of Allocation
is for: the byte offset of the
Pointer
gets stored as a bunch of u8
, while its AllocId
gets stored
out-of-band. The two are reassembled when the Pointer
is read from memory.
The other bit of extra data an Allocation
needs is undef_mask
for keeping
track of which of its bytes are initialized.
Global memory and exotic allocations
Memory
exists only during the Miri evaluation; it gets destroyed when the
final value of the constant is computed. In case that constant contains any
pointers, those get "interned" and moved to a global "const eval memory" that is
part of TyCtxt
. These allocations stay around for the remaining computation
and get serialized into the final output (so that dependent crates can use
them).
Moreover, to also support function pointers, the global memory in TyCtxt
can
also contain "virtual allocations": instead of an Allocation
, these contain an
Instance
. That allows a Pointer
to point to either normal data or a
function, which is needed to be able to evaluate casts from function pointers to
raw pointers.
Finally, the GlobalAlloc
type used in the global memory also contains a
variant Static
that points to a particular const
or static
item. This is
needed to support circular statics, where we need to have a Pointer
to a
static
for which we cannot yet have an Allocation
as we do not know the
bytes of its value.
Pointer values vs Pointer types
One common cause of confusion in Miri is that being a pointer value and having
a pointer type are entirely independent properties. By "pointer value", we
refer to a Scalar::Ptr
containing a Pointer
and thus pointing somewhere into
Miri's virtual memory. This is in contrast to Scalar::Raw
, which is just some
concrete integer.
However, a variable of pointer or reference type, such as *const T
or &T
,
does not have to have a pointer value: it could be obtaining by casting or
transmuting an integer to a pointer (currently that is hard to do in const eval,
but eventually transmute
will be stable as a const fn
). And similarly, when
casting or transmuting a reference to some actual allocation to an integer, we
end up with a pointer value (Scalar::Ptr
) at integer type (usize
). This
is a problem because we cannot meaningfully perform integer operations such as
division on pointer values.
Interpretation
Although the main entry point to constant evaluation is the tcx.const_eval
query, there are additional functions in
librustc_mir/const_eval.rs
that allow accessing the fields of a ConstValue
(ByRef
or otherwise). You should
never have to access an Allocation
directly except for translating it to the
compilation target (at the moment just LLVM).
Miri starts by creating a virtual stack frame for the current constant that is being evaluated. There's essentially no difference between a constant and a function with no arguments, except that constants do not allow local (named) variables at the time of writing this guide.
A stack frame is defined by the Frame
type in
librustc_mir/interpret/eval_context.rs
and contains all the local
variables memory (None
at the start of evaluation). Each frame refers to the
evaluation of either the root constant or subsequent calls to const fn
. The
evaluation of another constant simply calls tcx.const_eval
, which produces an
entirely new and independent stack frame.
The frames are just a Vec<Frame>
, there's no way to actually refer to a
Frame
's memory even if horrible shenanigans are done via unsafe code. The only
memory that can be referred to are Allocation
s.
Miri now calls the step
method (in
librustc_mir/interpret/step.rs
) until it either returns an error or has no further statements to execute. Each
statement will now initialize or modify the locals or the virtual memory
referred to by a local. This might require evaluating other constants or
statics, which just recursively invokes tcx.const_eval
.
Monomorphization
As you probably know, rust has a very expressive type system that has extensive support for generic types. But of course, assembly is not generic, so we need to figure out the concrete types of all the generics before the code can execute.
Different languages handle this problem differently. For example, in some languages, such as Java, we may not know the most precise type of value until runtime. In the case of Java, this is ok because (almost) all variables are reference values anyway (i.e. pointers to a stack allocated object). This flexibility comes at the cost of performance, since all accesses to an object must dereference a pointer.
Rust takes a different approach: it monomorphizes all generic types. This
means that compiler stamps out a different copy of the code of a generic
function for each concrete type needed. For example, if I use a Vec<u64>
and
a Vec<String>
in my code, then the generated binary will have two copies of
the generated code for Vec
: one for Vec<u64>
and another for Vec<String>
.
The result is fast programs, but it comes at the cost of compile time (creating
all those copies can take a while) and binary size (all those copies might take
a lot of space).
Monomorphization is the first step in the backend of the rust compiler.
Collection
First, we need to figure out what concrete types we need for all the generic things in our program. This is called collection, and the code that does this is called the monomorphization collector.
Take this example:
fn banana() { peach::<u64>(); } fn main() { banana(); }
The monomorphization collector will give you a list of [main, banana, peach::<u64>]
. These are the functions that will have machine code generated
for them. Collector will also add things like statics to that list.
See the collector rustdocs for more info.
The monomorphization collector is run just before MIR lowering and codegen.
rustc_codegen_ssa::base::codegen_crate
calls the
collect_and_partition_mono_items
query, which does monomorphization
collection and then partitions them into codegen
units.
Polymorphization
As mentioned above, monomorphization produces fast code, but it comes at the cost of compile time and binary size. MIR optimizations can help a bit with this. Another optimization currently under development is called polymorphization.
The general idea is that often we can share some code between monomorphized copies of code. More precisely, if a MIR block is not dependent on a type parameter, it may not need to be monomorphized into many copies. Consider the following example:
#![allow(unused_variables)] fn main() { pub fn f() { g::<bool>(); g::<usize>(); } fn g<T>() -> usize { let n = 1; let closure = || n; closure() } }
In this case, we would currently collect [f, g::<bool>, g::<usize>, g::<bool>::{{closure}}, g::<usize>::{{closure}}]
, but notice that the two
closures would be identical -- they don't depend on the type parameter T
of
function g
. So we only need to emit one copy of the closure.
For more information, see this thread on github.
Lowering MIR to a Codegen IR
Now that we have a list of symbols to generate from the collector, we need to generate some sort of codegen IR. In this chapter, we will assume LLVM IR, since that's what rustc usually uses. The actual monomorphization is performed as we go, while we do the translation.
Recall that the backend is started by
rustc_codegen_ssa::base::codegen_crate
. Eventually, this reaches
rustc_codegen_ssa::mir::codegen_mir
, which does the lowering from
MIR to LLVM IR.
The code is split into modules which handle particular MIR primitives:
librustc_codegen_ssa::mir::block
will deal with translating blocks and their terminators. The most complicated and also the most interesting thing this module does is generating code for function calls, including the necessary unwinding handling IR.librustc_codegen_ssa::mir::statement
translates MIR statements.librustc_codegen_ssa::mir::operand
translates MIR operands.librustc_codegen_ssa::mir::place
translates MIR place references.librustc_codegen_ssa::mir::rvalue
translates MIR r-values.
Before a function is translated a number of simple and primitive analysis
passes will run to help us generate simpler and more efficient LLVM IR. An
example of such an analysis pass would be figuring out which variables are
SSA-like, so that we can translate them to SSA directly rather than relying on
LLVM's mem2reg
for those variables. The analysis can be found in
rustc_codegen_ssa::mir::analyze
.
Usually a single MIR basic block will map to a LLVM basic block, with very few
exceptions: intrinsic or function calls and less basic MIR statements like
assert
can result in multiple basic blocks. This is a perfect lede into the
non-portable LLVM-specific part of the code generation. Intrinsic generation is
fairly easy to understand as it involves very few abstraction levels in between
and can be found in rustc_codegen_llvm::intrinsic
.
Everything else will use the builder interface. This is the code that gets
called in the librustc_codegen_ssa::mir::*
modules discussed above.
TODO: discuss how constants are generated
Code generation
Code generation or "codegen" is the part of the compiler that actually
generates an executable binary. Usually, rustc uses LLVM for code generation;
there is also support for Cranelift. The key is that rustc doesn't implement
codegen itself. It's worth noting, though, that in the rust source code, many
parts of the backend have codegen
in their names (there are no hard
boundaries).
NOTE: If you are looking for hints on how to debug code generation bugs, please see this section of the debugging chapter.
What is LLVM?
LLVM is "a collection of modular and reusable compiler and
toolchain technologies". In particular, the LLVM project contains a pluggable
compiler backend (also called "LLVM"), which is used by many compiler projects,
including the clang
C compiler and our beloved rustc
.
LLVM takes input in the form of LLVM IR. It is basically assembly code with additional low-level types and annotations added. These annotations are helpful for doing optimizations on the LLVM IR and outputted machine code. The end result of all this is (at long last) something executable (e.g. an ELF object, an EXE, or wasm).
There are a few benefits to using LLVM:
- We don't have to write a whole compiler backend. This reduces implementation and maintenance burden.
- We benefit from the large suite of advanced optimizations that the LLVM project has been collecting.
- We can automatically compile Rust to any of the platforms for which LLVM has support. For example, as soon as LLVM added support for wasm, voila! rustc, clang, and a bunch of other languages were able to compile to wasm! (Well, there was some extra stuff to be done, but we were 90% there anyway).
- We and other compiler projects benefit from each other. For example, when the Spectre and Meltdown security vulnerabilities were discovered, only LLVM needed to be patched.
Running LLVM, linking, and metadata generation
Once LLVM IR for all of the functions and statics, etc is built, it is time to start running LLVM and its optimization passes. LLVM IR is grouped into "modules". Multiple "modules" can be codegened at the same time to aid in multi-core utilization. These "modules" are what we refer to as codegen units. These units were established way back during monomorphization collection phase.
Once LLVM produces objects from these modules, these objects are passed to the linker along with, optionally, the metadata object and an archive or an executable is produced.
It is not necessarily the codegen phase described above that runs the optimizations. With certain kinds of LTO, the optimization might happen at the linking time instead. It is also possible for some optimizations to happen before objects are passed on to the linker and some to happen during the linking.
This all happens towards the very end of compilation. The code for this can be
found in librustc_codegen_ssa::back
and
librustc_codegen_llvm::back
. Sadly, this piece of code is not
really well-separated into LLVM-dependent code; the rustc_codegen_ssa
contains a fair amount of code specific to the LLVM backend.
Once these components are done with their work you end up with a number of files in your filesystem corresponding to the outputs you have requested.
Updating LLVM
The Rust compiler uses LLVM as its primary codegen backend today, and naturally we want to at least occasionally update this dependency! Currently we do not have a strict policy about when to update LLVM or what it can be updated to, but a few guidelines are applied:
- We try to always support the latest released version of LLVM
- We try to support the "last few" versions of LLVM (how many is changing over time)
- We allow moving to arbitrary commits during development.
- Strongly prefer to upstream all patches to LLVM before including them in rustc.
This policy may change over time (or may actually start to exist as a formal policy!), but for now these are rough guidelines!
Why update LLVM?
There are a few reasons nowadays that we want to update LLVM in one way or another:
-
A bug could have been fixed! Often we find bugs in the compiler and fix them upstream in LLVM. We'll want to pull fixes back to the compiler itself as they're merged upstream.
-
A new feature may be available in LLVM that we want to use in rustc, but we don't want to wait for a full LLVM release to test it out.
-
LLVM itself may have a new release and we'd like to update to this LLVM release.
Each of these reasons has a different strategy for updating LLVM, and we'll go over them in detail here.
Bugfix Updates
For updates of LLVM that are to fix a small bug, we cherry-pick the bugfix to the branch we're already using. The steps for this are:
- Make sure the bugfix is in upstream LLVM.
- Identify the branch that rustc is currently using. The
src/llvm-project
submodule is always pinned to a branch of the rust-lang/llvm-project repository. - Fork the rust-lang/llvm-project repository
- Check out the appropriate branch (typically named
rustc/a.b-yyyy-mm-dd
) - Cherry-pick the upstream commit onto the branch
- Push this branch to your fork
- Send a Pull Request to rust-lang/llvm-project to the same branch as before. Be sure to reference the Rust and/or LLVM issue that you're fixing in the PR description.
- Wait for the PR to be merged
- Send a PR to rust-lang/rust updating the
src/llvm-project
submodule with your bugfix. This can be done locally withgit submodule update --remote src/llvm-project
typically. - Wait for PR to be merged
The tl;dr; is that we can cherry-pick bugfixes at any time and pull them back into the rust-lang/llvm-project branch that we're using, and getting it into the compiler is just updating the submodule via a PR!
Example PRs look like: #59089
Feature updates
Note that this is all information as applies to the current day in age. This process for updating LLVM changes with practically all LLVM updates, so this may be out of date!
Unlike bugfixes, updating to pick up a new feature of LLVM typically requires a lot more work. This is where we can't reasonably cherry-pick commits backwards so we need to do a full update. There's a lot of stuff to do here, so let's go through each in detail.
-
Create a new branch in the rust-lang/llvm-project repository. This branch should be named
rustc/a.b-yyyy-mm-dd
wherea.b
is the current version number of LLVM in-tree at the time of the branch and the remaining part is today's date. Move this branch to the commit in LLVM that you'd like, which for this is probably the current LLVM HEAD. -
Apply Rust-specific patches to the llvm-project repository. All features and bugfixes are upstream, but there's often some weird build-related patches that don't make sense to upstream which we have on our repositories. These patches are around the latest patches in the rust-lang/llvm-project branch that rustc is currently using.
-
Build the new LLVM in the
rust
repository. To do this you'll want to update thesrc/llvm-project
repository to your branch and the revision you've created. It's also typically a good idea to update.gitmodules
with the new branch name of the LLVM submodule. Make sure you've committed changes tosrc/llvm-project
to ensure submodule updates aren't reverted. Some commands you should execute are:./x.py build src/llvm
- test that LLVM still builds./x.py build src/tools/lld
- same for LLD./x.py build
- build the rest of rustc
You'll likely need to update
src/rustllvm/*.cpp
to compile with updated LLVM bindings. Note that you should use#ifdef
and such to ensure that the bindings still compile on older LLVM versions. -
Test for regressions across other platforms. LLVM often has at least one bug for non-tier-1 architectures, so it's good to do some more testing before sending this to bors! If you're low on resources you can send the PR as-is now to bors, though, and it'll get tested anyway.
Ideally, build LLVM and test it on a few platforms:
- Linux
- OSX
- Windows
and afterwards run some docker containers that CI also does:
./src/ci/docker/run.sh wasm32-unknown
./src/ci/docker/run.sh arm-android
./src/ci/docker/run.sh dist-various-1
./src/ci/docker/run.sh dist-various-2
./src/ci/docker/run.sh armhf-gnu
-
Prepare a PR to
rust-lang/rust
. Work with maintainers ofrust-lang/llvm-project
to get your commit in a branch of that repository, and then you can send a PR torust-lang/rust
. You'll change at leastsrc/llvm-project
and will likely also changesrc/rustllvm/*
as well.
For prior art, previous LLVM updates look like
#55835
#47828
#62474
#62592. Note that sometimes it's
easiest to land src/rustllvm/*
compatibility as a PR before actually updating
src/llvm-project
. This way while you're working through LLVM issues others
interested in trying out the new LLVM can benefit from work you've done to
update the C++ bindings.
Caveats and gotchas
Ideally the above instructions are pretty smooth, but here's some caveats to keep in mind while going through them:
- LLVM bugs are hard to find, don't hesitate to ask for help! Bisection is definitely your friend here (yes LLVM takes forever to build, yet bisection is still your friend)
- If you've got general questions, @alexcrichton can help you out.
- Creating branches is a privileged operation on GitHub, so you'll need someone with write access to create the branches for you most likely.
New LLVM Release Updates
Updating to a new release of LLVM is very similar to the "feature updates" section above. The release process for LLVM is often months-long though and we like to ensure compatibility ASAP. The main tweaks to the "feature updates" section above is generally around branch naming. The sequence of events typically looks like:
-
LLVM announces that its latest release version has branched. This will show up as a branch in https://github.com/llvm/llvm-project typically named
release/$N.x
where$N
is the version of LLVM that's being released. -
We then follow the "feature updates" section above to create a new branch of LLVM in our rust-lang/llvm-project repository. This follows the same naming convention of branches as usual, except that
a.b
is the new version. This update is eventually landed in the rust-lang/rust repository. -
Over the next few months, LLVM will continually push commits to its
release/a.b
branch. Often those are bug fixes we'd like to have as well. The merge process for that is to usegit merge
itself to merge LLVM'srelease/a.b
branch with the branch created in step 2. This is typically done multiple times when necessary while LLVM's release branch is baking. -
LLVM then announces the release of version
a.b
. -
After LLVM's official release, we follow the "feature update" section again to create a new branch in the rust-lang/llvm-project repository, this time with a new date. The commit history should look much cleaner as just a few Rust-specific commits stacked on top of stock LLVM's release branch.
Debugging LLVM
NOTE: If you are looking for info about code generation, please see this chapter instead.
This section is about debugging compiler bugs in code generation (e.g. why the compiler generated some piece of code or crashed in LLVM). LLVM is a big project on its own that probably needs to have its own debugging document (not that I could find one). But here are some tips that are important in a rustc context:
As a general rule, compilers generate lots of information from analyzing code. Thus, a useful first step is usually to find a minimal example. One way to do this is to
-
create a new crate that reproduces the issue (e.g. adding whatever crate is at fault as a dependency, and using it from there)
-
minimize the crate by removing external dependencies; that is, moving everything relevant to the new crate
-
further minimize the issue by making the code shorter (there are tools that help with this like
creduce
)
The official compilers (including nightlies) have LLVM assertions disabled,
which means that LLVM assertion failures can show up as compiler crashes (not
ICEs but "real" crashes) and other sorts of weird behavior. If you are
encountering these, it is a good idea to try using a compiler with LLVM
assertions enabled - either an "alt" nightly or a compiler you build yourself
by setting [llvm] assertions=true
in your config.toml - and see whether
anything turns up.
The rustc build process builds the LLVM tools into
./build/<host-triple>/llvm/bin
. They can be called directly.
The default rustc compilation pipeline has multiple codegen units, which is
hard to replicate manually and means that LLVM is called multiple times in
parallel. If you can get away with it (i.e. if it doesn't make your bug
disappear), passing -C codegen-units=1
to rustc will make debugging easier.
For rustc to generate LLVM IR, you need to pass the --emit=llvm-ir
flag. If
you are building via cargo, use the RUSTFLAGS
environment variable (e.g.
RUSTFLAGS='--emit=llvm-ir'
). This causes rustc to spit out LLVM IR into the
target directory.
cargo llvm-ir [options] path
spits out the LLVM IR for a particular function
at path
. (cargo install cargo-asm
installs cargo asm
and cargo llvm-ir
). --build-type=debug
emits code for debug builds. There are also
other useful options. Also, debug info in LLVM IR can clutter the output a lot:
RUSTFLAGS="-C debuginfo=0"
is really useful.
RUSTFLAGS="-C save-temps"
outputs LLVM bitcode (not the same as IR) at
different stages during compilation, which is sometimes useful. One just needs
to convert the bitcode files to .ll
files using llvm-dis
which should be in
the target local compilation of rustc.
If you want to play with the optimization pipeline, you can use the opt
tool
from ./build/<host-triple>/llvm/bin/
with the LLVM IR emitted by rustc. Note
that rustc emits different IR depending on whether -O
is enabled, even
without LLVM's optimizations, so if you want to play with the IR rustc emits,
you should:
$ rustc +local my-file.rs --emit=llvm-ir -O -C no-prepopulate-passes \
-C codegen-units=1
$ OPT=./build/$TRIPLE/llvm/bin/opt
$ $OPT -S -O2 < my-file.ll > my
If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which
IR causes an optimization-time assertion to fail, or to see when LLVM performs
a particular optimization, you can pass the rustc flag -C llvm-args=-print-after-all
, and possibly add -C llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME
(e.g. -C llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\ 7replace17hbe10ea2e7c809b0bE'
).
That produces a lot of output into standard error, so you'll want to pipe that
to some file. Also, if you are using neither -filter-print-funcs
nor -C codegen-units=1
, then, because the multiple codegen units run in parallel, the
printouts will mix together and you won't be able to read anything.
If you want just the IR for a specific function (say, you want to see why it
causes an assertion or doesn't optimize correctly), you can use llvm-extract
,
e.g.
$ ./build/$TRIPLE/llvm/bin/llvm-extract \
-func='_ZN11collections3str21_$LT$impl$u20$str$GT$7replace17hbe10ea2e7c809b0bE' \
-S \
< unextracted.ll \
> extracted.ll
Getting help and asking questions
If you have some questions, head over to the rust-lang Zulip and
specifically the #t-compiler/wg-llvm
stream.
Compiler options to know and love
The -Chelp
and -Zhelp
compiler switches will list out a variety
of interesting options you may find useful. Here are a few of the most
common that pertain to LLVM development (some of them are employed in the
tutorial above):
- The
--emit llvm-ir
option emits a<filename>.ll
file with LLVM IR in textual format- The
--emit llvm-bc
option emits in bytecode format (<filename>.bc
)
- The
- Passing
-Cllvm-args=<foo>
allows passing pretty much all the options that tools like llc and opt would accept; e.g.-Cllvm-args=-print-before-all
to print IR before every LLVM pass. - The
-Cno-prepopulate-passes
will avoid pre-populate the LLVM pass manager with a list of passes. This will allow you to view the LLVM IR that rustc generates, not the LLVM IR after optimizations. - The
-Cpasses=val
option allows you to supply a (space seprated) list of extra LLVM passes to run - The
-Csave-temps
option saves all temporary output files during compilation - The
-Zprint-llvm-passes
option will print out LLVM optimization passes being run - The
-Ztime-llvm-passes
option measures the time of each LLVM pass - The
-Zverify-llvm-ir
option will verify the LLVM IR for correctness - The
-Zno-parallel-llvm
will disable parallel compilation of distinct compilation units - The
-Zllvm-time-trace
option will output a Chrome profiler compatible JSON file which contains details and timings for LLVM passes.
Filing LLVM bug reports
When filing an LLVM bug report, you will probably want some sort of minimal working example that demonstrates the problem. The Godbolt compiler explorer is really helpful for this.
-
Once you have some LLVM IR for the problematic code (see above), you can create a minimal working example with Godbolt. Go to gcc.godbolt.org.
-
Choose
LLVM-IR
as programming language. -
Use
llc
to compile the IR to a particular target as is:- There are some useful flags:
-mattr
enables target features,-march=
selects the target,-mcpu=
selects the CPU, etc. - Commands like
llc -march=help
output all architectures available, which is useful because sometimes the Rust arch names and the LLVM names do not match. - If you have compiled rustc yourself somewhere, in the target directory
you have binaries for
llc
,opt
, etc.
- There are some useful flags:
-
If you want to optimize the LLVM-IR, you can use
opt
to see how the LLVM optimizations transform it. -
Once you have a godbolt link demonstrating the issue, it is pretty easy to fill in an LLVM bug. Just visit bugs.llvm.org.
Porting bug fixes from LLVM
Once you've identified the bug as an LLVM bug, you will sometimes find that it has already been reported and fixed in LLVM, but we haven't gotten the fix yet (or perhaps you are familiar enough with LLVM to fix it yourself).
In that case, we can sometimes opt to port the fix for the bug directly to our own LLVM fork, so that rustc can use it more easily. Our fork of LLVM is maintained in rust-lang/llvm-project. Once you've landed the fix there, you'll also need to land a PR modifying our submodule commits -- ask around on Zulip for help.
Backend Agnostic Codegen
In the future, it would be nice to allow other codegen backends (e.g.
Cranelift). To this end, librustc_codegen_ssa
provides an
abstract interface for all backends to implenent.
The following is a copy/paste of a README from the rust-lang/rust repo. Please submit a PR if it needs updating.
Refactoring of rustc_codegen_llvm
by Denis Merigoux, October 23rd 2018
State of the code before the refactoring
All the code related to the compilation of MIR into LLVM IR was contained
inside the rustc_codegen_llvm
crate. Here is the breakdown of the most
important elements:
- the
back
folder (7,800 LOC) implements the mechanisms for creating the different object files and archive through LLVM, but also the communication mechanisms for parallel code generation; - the
debuginfo
(3,200 LOC) folder contains all code that passes debug information down to LLVM; - the
llvm
(2,200 LOC) folder defines the FFI necessary to communicate with LLVM using the C++ API; - the
mir
(4,300 LOC) folder implements the actual lowering from MIR to LLVM IR; - the
base.rs
(1,300 LOC) file contains some helper functions but also the high-level code that launches the code generation and distributes the work. - the
builder.rs
(1,200 LOC) file contains all the functions generating individual LLVM IR instructions inside a basic block; - the
common.rs
(450 LOC) contains various helper functions and all the functions generating LLVM static values; - the
type_.rs
(300 LOC) defines most of the type translations to LLVM IR.
The goal of this refactoring is to separate inside this crate code that is
specific to the LLVM from code that can be reused for other rustc backends. For
instance, the mir
folder is almost entirely backend-specific but it relies
heavily on other parts of the crate. The separation of the code must not affect
the logic of the code nor its performance.
For these reasons, the separation process involves two transformations that have to be done at the same time for the resulting code to compile :
- replace all the LLVM-specific types by generics inside function signatures and structure definitions;
- encapsulate all functions calling the LLVM FFI inside a set of traits that will define the interface between backend-agnostic code and the backend.
While the LLVM-specific code will be left in rustc_codegen_llvm
, all the new
traits and backend-agnostic code will be moved in rustc_codegen_ssa
(name
suggestion by @eddyb).
Generic types and structures
@irinagpopa started to parametrize the types of rustc_codegen_llvm
by a
generic Value
type, implemented in LLVM by a reference &'ll Value
. This
work has been extended to all structures inside the mir
folder and elsewhere,
as well as for LLVM's BasicBlock
and Type
types.
The two most important structures for the LLVM codegen are CodegenCx
and
Builder
. They are parametrized by multiple lifetime parameters and the type
for Value
.
struct CodegenCx<'ll, 'tcx> {
/* ... */
}
struct Builder<'a, 'll, 'tcx> {
cx: &'a CodegenCx<'ll, 'tcx>,
/* ... */
}
CodegenCx
is used to compile one codegen-unit that can contain multiple
functions, whereas Builder
is created to compile one basic block.
The code in rustc_codegen_llvm
has to deal with multiple explicit lifetime
parameters, that correspond to the following:
'tcx
is the longest lifetime, that corresponds to the originalTyCtxt
containing the program's information;'a
is a short-lived reference of aCodegenCx
or another object inside a struct;'ll
is the lifetime of references to LLVM objects such asValue
orType
.
Although there are already many lifetime parameters in the code, making it
generic uncovered situations where the borrow-checker was passing only due to
the special nature of the LLVM objects manipulated (they are extern pointers).
For instance, an additional lifetime parameter had to be added to
LocalAnalyser
in analyse.rs
, leading to the definition:
struct LocalAnalyzer<'mir, 'a, 'tcx> {
/* ... */
}
However, the two most important structures CodegenCx
and Builder
are not
defined in the backend-agnostic code. Indeed, their content is highly specific
of the backend and it makes more sense to leave their definition to the backend
implementor than to allow just a narrow spot via a generic field for the
backend's context.
Traits and interface
Because they have to be defined by the backend, CodegenCx
and Builder
will
be the structures implementing all the traits defining the backend's interface.
These traits are defined in the folder rustc_codegen_ssa/traits
and all the
backend-agnostic code is parametrized by them. For instance, let us explain how
a function in base.rs
is parametrized:
pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
cx: &'a Bx::CodegenCx,
instance: Instance<'tcx>
) {
/* ... */
}
In this signature, we have the two lifetime parameters explained earlier and
the master type Bx
which satisfies the trait BuilderMethods
corresponding
to the interface satisfied by the Builder
struct. The BuilderMethods
defines an associated type Bx::CodegenCx
that itself satisfies the
CodegenMethods
traits implemented by the struct CodegenCx
.
On the trait side, here is an example with part of the definition of
BuilderMethods
in traits/builder.rs
:
pub trait BuilderMethods<'a, 'tcx>:
HasCodegen<'tcx>
+ DebugInfoBuilderMethods<'tcx>
+ ArgTypeMethods<'tcx>
+ AbiBuilderMethods<'tcx>
+ IntrinsicCallMethods<'tcx>
+ AsmBuilderMethods<'tcx>
{
fn new_block<'b>(
cx: &'a Self::CodegenCx,
llfn: Self::Function,
name: &'b str
) -> Self;
/* ... */
fn cond_br(
&mut self,
cond: Self::Value,
then_llbb: Self::BasicBlock,
else_llbb: Self::BasicBlock,
);
/* ... */
}
Finally, a master structure implementing the ExtraBackendMethods
trait is
used for high-level codegen-driving functions like codegen_crate
in
base.rs
. For LLVM, it is the empty LlvmCodegenBackend
.
ExtraBackendMethods
should be implemented by the same structure that
implements the CodegenBackend
defined in
rustc_codegen_utils/codegen_backend.rs
.
During the traitification process, certain functions have been converted from
methods of a local structure to methods of CodegenCx
or Builder
and a
corresponding self
parameter has been added. Indeed, LLVM stores information
internally that it can access when called through its API. This information
does not show up in a Rust data structure carried around when these methods are
called. However, when implementing a Rust backend for rustc
, these methods
will need information from CodegenCx
, hence the additional parameter (unused
in the LLVM implementation of the trait).
State of the code after the refactoring
The traits offer an API which is very similar to the API of LLVM. This is not the best solution since LLVM has a very special way of doing things: when addding another backend, the traits definition might be changed in order to offer more flexibility.
However, the current separation between backend-agnostic and LLVM-specific code
has allowed the reuse of a significant part of the old rustc_codegen_llvm
.
Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the
most important elements:
back
folder: 3,800 (BA) vs 4,100 (LLVM);mir
folder: 4,400 (BA) vs 0 (LLVM);base.rs
: 1,100 (BA) vs 250 (LLVM);builder.rs
: 1,400 (BA) vs 0 (LLVM);common.rs
: 350 (BA) vs 350 (LLVM);
The debuginfo
folder has been left almost untouched by the splitting and is
specific to LLVM. Only its high-level features have been traitified.
The new traits
folder has 1500 LOC only for trait definitions. Overall, the
27,000 LOC-sized old rustc_codegen_llvm
code has been split into the new
18,500 LOC-sized new rustc_codegen_llvm
and the 12,000 LOC-sized
rustc_codegen_ssa
. We can say that this refactoring allowed the reuse of
approximately 10,000 LOC that would otherwise have had to be duplicated between
the multiple backends of rustc
.
The refactored version of rustc
's backend introduced no regression over the
test suite nor in performance benchmark, which is in coherence with the nature
of the refactoring that used only compile-time parametricity (no trait
objects).
Implicit Caller Location
Approved in RFC 2091, this feature enables the accurate reporting of caller location during panics
initiated from functions like Option::unwrap
, Result::expect
, and Index::index
. This feature
adds the #[track_caller]
attribute for functions, the
caller_location
intrinsic, and the stabilization-friendly
core::panic::Location::caller
wrapper.
Motivating Example
Take this example program:
fn main() { let foo: Option<()> = None; foo.unwrap(); // this should produce a useful panic message! }
Prior to Rust 1.42, panics like this unwrap()
printed a location in libcore:
$ rustc +1.41.0 example.rs; example.exe
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value',...core\macros\mod.rs:15:40
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
As of 1.42, we get a much more helpful message:
$ rustc +1.42.0 example.rs; example.exe
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', example.rs:3:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
These error messages are achieved through a combination of changes to panic!
internals to make use
of core::panic::Location::caller
and a number of #[track_caller]
annotations in the standard
library which propagate caller information.
Reading Caller Location
Previously, panic!
made use of the file!()
, line!()
, and column!()
macros to construct a
Location
pointing to where the panic occurred. These macros couldn't be given an overridden
location, so functions which intentionally invoked panic!
couldn't provide their own location,
hiding the actual source of error.
Internally, panic!()
now calls core::panic::Location::caller()
to find out where it
was expanded. This function is itself annotated with #[track_caller]
and wraps the
caller_location
compiler intrinsic implemented by rustc. This intrinsic is easiest
explained in terms of how it works in a const
context.
Caller Location in const
There are two main phases to returning the caller location in a const context: walking up the stack to find the right location and allocating a const value to return.
Finding the right Location
In a const context we "walk up the stack" from where the intrinsic is invoked, stopping when we
reach the first function call in the stack which does not have the attribute. This walk is in
InterpCx::find_closest_untracked_caller_location()
.
Starting at the bottom, we iterate up over stack Frame
s in the
InterpCx::stack
, calling
InstanceDef::requires_caller_location
on the
Instance
s from each Frame
. We stop once we find one that returns false
and
return the span of the previous frame which was the "topmost" tracked function.
Allocating a static Location
Once we have a Span
, we need to allocate static memory for the Location
, which is performed by
the TyCtxt::const_caller_location()
query. Internally this calls
InterpCx::alloc_caller_location()
and results in a unique
memory kind (MemoryKind::CallerLocation
). The SSA codegen backend is able
to emit code for these same values, and we use this code there as well.
Once our Location
has been allocated in static memory, our intrinsic returns a reference to it.
Generating code for #[track_caller]
callees
To generate efficient code for a tracked function and its callers, we need to provide the same behavior from the intrinsic's point of view without having a stack to walk up at runtime. We invert the approach: as we grow the stack down we pass an additional argument to calls of tracked functions rather than walking up the stack when the intrinsic is called. That additional argument can be returned wherever the caller location is queried.
The argument we append is of type &'static core::panic::Location<'staic>
. A reference was chosen
to avoid unnecessary copying because a pointer is a third the size of
std::mem::size_of::<core::panic::Location>() == 24
at time of writing.
When generating a call to a function which is tracked, we pass the location argument the value of
FunctionCx::get_caller_location
.
If the calling function is tracked, get_caller_location
returns the local in
FunctionCx::caller_location
which was populated by the current caller's caller.
In these cases the intrinsic "returns" a reference which was actually provided in an argument to its
caller.
If the calling function is not tracked, get_caller_location
allocates a Location
static from
the current Span
and returns a reference to that.
We more efficiently achieve the same behavior as a loop starting from the bottom by passing a single
&Location
value through the caller_location
fields of multiple FunctionCx
s as we grow the
stack downward.
Codegen examples
What does this transformation look like in practice? Take this example which uses the new feature:
#![feature(track_caller)] use std::panic::Location; #[track_caller] fn print_caller() { println!("called from {}", Location::caller()); } fn main() { print_caller(); }
Here print_caller()
appears to take no arguments, but we compile it to something like this:
#![feature(panic_internals)] use std::panic::Location; fn print_caller(caller: &Location) { println!("called from {}", caller); } fn main() { print_caller(&Location::internal_constructor(file!(), line!(), column!())); }
Dynamic Dispatch
In codegen contexts we have to modify the callee ABI to pass this information down the stack, but the attribute expressly does not modify the type of the function. The ABI change must be transparent to type checking and remain sound in all uses.
Direct calls to tracked functions will always know the full codegen flags for the callee and can
generate appropriate code. Indirect callers won't have this information and it's not encoded in
the type of the function pointer they call, so we generate a ReifyShim
around the function
whenever taking a pointer to it. This shim isn't able to report the actual location of the indirect
call (the function's definition site is reported instead), but it prevents miscompilation and is
probably the best we can do without modifying fully-stabilized type signatures.
Note: We always emit a
ReifyShim
when taking a pointer to a tracked function. While the constraint here is imposed by codegen contexts, we don't know during MIR construction of the shim whether we'll be called in a const context (safe to ignore shim) or in a codegen context (unsafe to ignore shim). Even if we did know, the results from const and codegen contexts must agree.
The Attribute
The #[track_caller]
attribute is checked alongside other codegen attributes to ensure the
function:
- has the
"Rust"
ABI (as opposed to e.g.,"C"
) - is not a foreign import (e.g., in an
extern {...}
block) - is not a closure
- is not
#[naked]
If the use is valid, we set CodegenFnAttrsFlags::TRACK_CALLER
. This flag influences
the return value of InstanceDef::requires_caller_location
which is in turn
used in both const and codegen contexts to ensure correct propagation.
Traits
When applied to trait method implementations, the attribute works as it does for regular functions.
When applied to a trait method prototype, the attribute applies to all implementations of the method. When applied to a default trait method implementation, the attribute takes effect on that implementation and any overrides.
Examples:
#![feature(track_caller)] macro_rules! assert_tracked { () => {{ let location = std::panic::Location::caller(); assert_eq!(location.file(), file!()); assert_ne!(location.line(), line!(), "line should be outside this fn"); println!("called at {}", location); }}; } trait TrackedFourWays { /// All implementations inherit `#[track_caller]`. #[track_caller] fn blanket_tracked(); /// Implementors can annotate themselves. fn local_tracked(); /// This implementation is tracked (overrides are too). #[track_caller] fn default_tracked() { assert_tracked!(); } /// Overrides of this implementation are tracked (it is too). #[track_caller] fn default_tracked_to_override() { assert_tracked!(); } } /// This impl uses the default impl for `default_tracked` and provides its own for /// `default_tracked_to_override`. impl TrackedFourWays for () { fn blanket_tracked() { assert_tracked!(); } #[track_caller] fn local_tracked() { assert_tracked!(); } fn default_tracked_to_override() { assert_tracked!(); } } fn main() { <() as TrackedFourWays>::blanket_tracked(); <() as TrackedFourWays>::default_tracked(); <() as TrackedFourWays>::default_tracked_to_override(); <() as TrackedFourWays>::local_tracked(); }
Background/History
Broadly speaking, this feature's goal is to improve common Rust error messages without breaking stability guarantees, requiring modifications to end-user source, relying on platform-specific debug-info, or preventing user-defined types from having the same error-reporting benefits.
Improving the output of these panics has been a goal of proposals since at least mid-2016 (see non-viable alternatives in the approved RFC for details). It took two more years until RFC 2091 was approved, much of its rationale for this feature's design having been discovered through the discussion around several earlier proposals.
The design in the original RFC limited itself to implementations that could be done inside the compiler at the time without significant refactoring. However in the year and a half between the approval of the RFC and the actual implementation work, a revised design was proposed and written up on the tracking issue. During the course of implementing that, it was also discovered that an implementation was possible without modifying the number of arguments in a function's MIR, which would simplify later stages and unlock use in traits.
Because the RFC's implementation strategy could not readily support traits, the semantics were not originally specified. They have since been implemented following the path which seemed most correct to the author and reviewers.
Profile Guided Optimization
rustc
supports doing profile-guided optimization (PGO).
This chapter describes what PGO is and how the support for it is
implemented in rustc
.
What Is Profiled-Guided Optimization?
The basic concept of PGO is to collect data about the typical execution of a program (e.g. which branches it is likely to take) and then use this data to inform optimizations such as inlining, machine-code layout, register allocation, etc.
There are different ways of collecting data about a program's execution.
One is to run the program inside a profiler (such as perf
) and another
is to create an instrumented binary, that is, a binary that has data
collection built into it, and run that.
The latter usually provides more accurate data.
How is PGO implemented in rustc
?
rustc
current PGO implementation relies entirely on LLVM.
LLVM actually supports multiple forms of PGO:
- Sampling-based PGO where an external profiling tool like
perf
is used to collect data about a program's execution. - GCOV-based profiling, where code coverage infrastructure is used to collect profiling information.
- Front-end based instrumentation, where the compiler front-end (e.g. Clang) inserts instrumentation intrinsics into the LLVM IR it generates.
- IR-level instrumentation, where LLVM inserts the instrumentation intrinsics itself during optimization passes.
rustc
supports only the last approach, IR-level instrumentation, mainly
because it is almost exclusively implemented in LLVM and needs little
maintenance on the Rust side. Fortunately, it is also the most modern approach,
yielding the best results.
So, we are dealing with an instrumentation-based approach, i.e. profiling data is generated by a specially instrumented version of the program that's being optimized. Instrumentation-based PGO has two components: a compile-time component and run-time component, and one needs to understand the overall workflow to see how they interact.
Overall Workflow
Generating a PGO-optimized program involves the following four steps:
- Compile the program with instrumentation enabled (e.g.
rustc -Cprofile-generate main.rs
) - Run the instrumented program (e.g.
./main
) which generates adefault-<id>.profraw
file - Convert the
.profraw
file into a.profdata
file using LLVM'sllvm-profdata
tool. - Compile the program again, this time making use of the profiling data
(e.g.
rustc -Cprofile-use=merged.profdata main.rs
)
Compile-Time Aspects
Depending on which step in the above workflow we are in, two different things can happen at compile time:
Create Binaries with Instrumentation
As mentioned above, the profiling instrumentation is added by LLVM.
rustc
instructs LLVM to do so by setting the appropriate
flags when creating LLVM PassManager
s:
// `PMBR` is an `LLVMPassManagerBuilderRef`
unwrap(PMBR)->EnablePGOInstrGen = true;
// Instrumented binaries have a default output path for the `.profraw` file
// hard-coded into them:
unwrap(PMBR)->PGOInstrGen = PGOGenPath;
rustc
also has to make sure that some of the symbols from LLVM's profiling
runtime are not removed by marking the with the right export level.
Compile Binaries Where Optimizations Make Use Of Profiling Data
In the final step of the workflow described above, the program is compiled
again, with the compiler using the gathered profiling data in order to drive
optimization decisions. rustc
again leaves most of the work to LLVM here,
basically just telling the LLVM PassManagerBuilder
where the profiling data can be found:
unwrap(PMBR)->PGOInstrUse = PGOUsePath;
LLVM does the rest (e.g. setting branch weights, marking functions with
cold
or inlinehint
, etc).
Runtime Aspects
Instrumentation-based approaches always also have a runtime component, i.e. once we have an instrumented program, that program needs to be run in order to generate profiling data, and collecting and persisting this profiling data needs some infrastructure in place.
In the case of LLVM, these runtime components are implemented in
compiler-rt and statically linked into any instrumented
binaries.
The rustc
version of this can be found in src/libprofiler_builtins
which
basically packs the C code from compiler-rt
into a Rust crate.
In order for libprofiler_builtins
to be built, profiler = true
must be set
in rustc
's config.toml
.
Testing PGO
Since the PGO workflow spans multiple compiler invocations most testing happens
in run-make tests (the relevant tests have pgo
in their name).
There is also a codegen test that checks that some expected
instrumentation artifacts show up in LLVM IR.
Additional Information
Clang's documentation contains a good overview on PGO in LLVM here: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
Sanitizers Support
The rustc compiler contains support for following sanitizers:
- AddressSanitizer a faster memory error detector. Can detect out-of-bounds access to heap, stack, and globals, use after free, use after return, double free, invalid free, memory leaks.
- LeakSanitizer a run-time memory leak detector.
- MemorySanitizer a detector of uninitialized reads.
- ThreadSanitizer a fast data race detector.
How to use the sanitizers?
To enable a sanitizer compile with -Zsanitizer=...
option, where value is one
of address
, leak
, memory
or thread
. For more details how to use
sanitizers please refer to the unstable book.
How are sanitizers implemented in rustc?
The implementation of sanitizers relies almost entirely on LLVM. The rustc is an integration point for LLVM compile time instrumentation passes and runtime libraries. Highlight of the most important aspects of the implementation:
-
The sanitizer runtime libraries are part of the compiler-rt project, and will be built on supported targets when enabled in
config.toml
:[build] sanitizers = true
The runtimes are placed into target libdir.
-
During LLVM code generation, the functions intended for instrumentation are marked with appropriate LLVM attribute:
SanitizeAddress
,SanitizeMemory
, orSanitizeThread
. By default all functions are instrumented, but this behaviour can be changed with#[no_sanitize(...)]
. -
The decision whether to perform instrumentation or not is possible only at a function granularity. In the cases were those decision differ between functions it might be necessary to inhibit inlining, both at MIR level and LLVM level.
-
The LLVM IR generated by rustc is instrumented by dedicated LLVM passes, different for each sanitizer. Instrumentation passes are invoked after optimization passes.
-
When producing an executable, the sanitizer specific runtime library is linked in. The libraries are searched for in target libdir relative to default system root, so that this process is not affected by sysroot overrides used for example by cargo
-Zbuild-std
functionality.
Additional Information
- Sanitizers project page
- AddressSanitizer in Clang
- LeakSanitizer in Clang
- MemorySanitizer in Clang
- ThreadSanitizer in Clang
Debugging support in the Rust compiler
This document explains the state of debugging tools support in the Rust compiler (rustc). The document gives an overview of debugging tools like GDB, LLDB etc. and infrastructure around Rust compiler to debug Rust code. If you want to learn how to debug the Rust compiler itself, then you must see Debugging the Compiler page.
The material is gathered from YouTube video Tom Tromey discusses debugging support in rustc.
Preliminaries
Debuggers
According to Wikipedia
A debugger or debugging tool is a computer program that is used to test and debug other programs (the "target" program).
Writing a debugger from scratch for a language requires a lot of work, especially if debuggers have to be supported on various platforms. GDB and LLDB, however, can be extended to support debugging a language. This is the path that Rust has chosen. This document's main goal is to document the said debuggers support in Rust compiler.
DWARF
According to the DWARF standard website
DWARF is a debugging file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.
DWARF reader is a program that consumes the DWARF format and creates debugger compatible output.
This program may live in the compiler itself. DWARF uses a data structure called
Debugging Information Entry (DIE) which stores the information as "tags" to denote functions,
variables etc., e.g., DW_TAG_variable
, DW_TAG_pointer_type
, DW_TAG_subprogram
etc.
You can also invent your own tags and attributes.
Supported debuggers
GDB
We have our own fork of GDB - https://github.com/rust-dev-tools/gdb
Rust expression parser
To be able to show debug output we need an expression parser. This (GDB) expression parser is written in Bison and is only a subset of Rust expressions. This means that this parser can parse only a subset of Rust expressions. GDB parser was written from scratch and has no relation to any other parser. For example, this parser is not related to Rustc's parser.
GDB has Rust like value and type output. It can print values and types in a way that look like Rust syntax in the output. Or when you print a type as ptype in GDB, it also looks like Rust source code. Checkout the documentation in the manual for GDB/Rust.
Parser extensions
Expression parser has a couple of extensions in it to facilitate features that you cannot do with Rust. Some limitations are listed in the manual for GDB/Rust. There is some special code in the DWARF reader in GDB to support the extensions.
A couple of examples of DWARF reader support needed are as follows -
-
Enum: Needed for support for enum types. The Rustc writes the information about enum into DWARF and GDB reads the DWARF to understand where is the tag field or is there a tag field or is the tag slot shared with non-zero optimization etc.
-
Dissect trait objects: DWARF extension where the trait object's description in the DWARF also points to a stub description of the corresponding vtable which in turn points to the concrete type for which this trait object exists. This means that you can do a
print *object
for that trait object, and GDB will understand how to find the correct type of the payload in the trait object.
TODO: Figure out if the following should be mentioned in the GDB-Rust document rather than this guide page so there is no duplication. This is regarding the following comments:
gdb's Rust extensions and limitations are documented in the gdb manual: https://sourceware.org/gdb/onlinedocs/gdb/Rust.html -- however, this neglects to mention that gdb convenience variables and registers follow the gdb $ convention, and that the Rust parser implements the gdb @ extension.
@tromey do you think we should mention this part in the GDB-Rust document rather than this document so there is no duplication etc.?
Developer notes
- This work is now upstream. Bugs can be reported in GDB Bugzilla.
LLDB
We have our own fork of LLDB - https://github.com/rust-lang/lldb
Fork of LLVM project - https://github.com/rust-lang/llvm-project
LLDB currently only works on macOS because of a dependency issue. This issue was easier to solve for macOS as compared to Linux. However, Tom has a possible solution which can enable us to ship LLDB everywhere.
Rust expression parser
This expression parser is written in C++. It is a type of Recursive Descent parser. Implements slightly less of the Rust language than GDB. LLDB has Rust like value and type output.
Parser extensions
There is some special code in the DWARF reader in LLDB to support the extensions. A couple of examples of DWARF reader support needed are as follows -
- Enum: Needed for support for enum types. The Rustc writes the information about enum into DWARF and LLDB reads the DWARF to understand where is the tag field or is there a tag field or is the tag slot shared with non-zero optimization etc. In other words, it has enum support as well.
Developer notes
- None of the LLDB work is upstream. This rust-lang/lldb wiki page explains a few details.
- The reason for forking LLDB is that LLDB recently removed all the other language plugins due to lack of maintenance.
- LLDB has a plugin architecture but that does not work for language support.
- LLDB is available via Rust build (
rustup
). - GDB generally works better on Linux.
DWARF and Rustc
DWARF is the standard way compilers generate debugging information that debuggers read. It is the debugging format on macOS and Linux. It is a multi-language, extensible format and is mostly good enough for Rust's purposes. Hence, the current implementation reuses DWARF's concepts. This is true even if some of the concepts in DWARF do not align with Rust semantically because generally there can be some kind of mapping between the two.
We have some DWARF extensions that the Rust compiler emits and the debuggers understand that are not in the DWARF standard.
-
Rust compiler will emit DWARF for a virtual table, and this
vtable
object will have aDW_AT_containing_type
that points to the real type. This lets debuggers dissect a trait object pointer to correctly find the payload. E.g., here's such a DIE, from a test case in the gdb repository:<1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type) <1aa> DW_AT_containing_type: <0x1b4> <1ae> DW_AT_name : (indirect string, offset: 0x23d): vtable <1b2> DW_AT_byte_size : 0 <1b3> DW_AT_alignment : 8
-
The other extension is that the Rust compiler can emit a tagless discriminated union. See DWARF feature request for this item.
Current limitations of DWARF
- Traits - require a bigger change than normal to DWARF, on how to represent Traits in DWARF.
- DWARF provides no way to differentiate between Structs and Tuples. Rust compiler emits
fields with
__0
and debuggers look for a sequence of such names to overcome this limitation. For example, in this case the debugger would look at a field viax.__0
instead ofx.0
. This is resolved via the Rust parser in the debugger so now you can dox.0
.
DWARF relies on debuggers to know some information about platform ABI. Rust does not do that all the time.
Developer notes
This section is from the talk about certain aspects of development.
What is missing
Shipping GDB in Rustup
Tracking issue: https://github.com/rust-lang/rust/issues/34457
Shipping GDB requires change to Rustup delivery system. To manage Rustup build size and times we need to build GDB separately, on its own and somehow provide the artifacts produced to be included in the final build. However, if we can ship GDB with rustup, it will simplify the development process by having compiler emit new debug info which can be readily consumed.
Main issue in achieving this is setting up dependencies. One such dependency is Python. That is why we have our own fork of GDB because one of the drivers is patched on Rust's side to check the correct version of Python (Python 2.7 in this case. Note: Python3 is not chosen for this purpose because Python's stable ABI is limited and is not sufficient for GDB's needs. See https://docs.python.org/3/c-api/stable.html).
This is to keep updates to debugger as fast as possible as we make changes to the debugging symbols. In essence, to ship the debugger as soon as new debugging info is added. GDB only releases every six months or so. However, the changes that are not related to Rust itself should ideally be first merged to upstream eventually.
Code signing for LLDB debug server on macOS
According to Wikipedia, System Integrity Protection is
System Integrity Protection (SIP, sometimes referred to as rootless) is a security feature of Apple's macOS operating system introduced in OS X El Capitan. It comprises a number of mechanisms that are enforced by the kernel. A centerpiece is the protection of system-owned files and directories against modifications by processes without a specific "entitlement", even when executed by the root user or a user with root privileges (sudo).
It prevents processes using ptrace
syscall. If a process wants to use ptrace
it has to be
code signed. The certificate that signs it has to be trusted on your machine.
See Apple developer documentation for System Integrity Protection.
We may need to sign up with Apple and get the keys to do this signing. Tom has looked into if Mozilla cannot do this because it is at the maximum number of keys it is allowed to sign. Tom does not know if Mozilla could get more keys.
Alternatively, Tom suggests that maybe a Rust legal entity is needed to get the keys via Apple. This problem is not technical in nature. If we had such a key we could sign GDB as well and ship that.
DWARF and Traits
Rust traits are not emitted into DWARF at all. The impact of this is calling a method x.method()
does not work as is. The reason being that method is implemented by a trait, as opposed
to a type. That information is not present so finding trait methods is missing.
DWARF has a notion of interface types (possibly added for Java). Tom's idea was to use this interface type as traits.
DWARF only deals with concrete names, not the reference types. So, a given implementation of a
trait for a type would be one of these interfaces (DW_tag_interface
type). Also, the type for
which it is implemented would describe all the interfaces this type implements. This requires a
DWARF extension.
Issue on Github: https://github.com/rust-lang/rust/issues/33014
Typical process for a Debug Info change (LLVM)
LLVM has Debug Info (DI) builders. This is the primary thing that Rust calls into. This is why we need to change LLVM first because that is emitted first and not DWARF directly. This is a kind of metadata that you construct and hand-off to LLVM. For the Rustc/LLVM hand-off some LLVM DI builder methods are called to construct representation of a type.
The steps of this process are as follows -
-
LLVM needs changing.
LLVM does not emit Interface types at all, so this needs to be implemented in the LLVM first.
Get sign off on LLVM maintainers that this is a good idea.
-
Change the DWARF extension.
-
Update the debuggers.
Update DWARF readers, expression evaluators.
-
Update Rust compiler.
Change it to emit this new information.
Procedural macro stepping
A deeply profound question is that how do you actually debug a procedural macro? What is the location you emit for a macro expansion? Consider some of the following cases -
- You can emit location of the invocation of the macro.
- You can emit the location of the definition of the macro.
- You can emit locations of the content of the macro.
RFC: https://github.com/rust-lang/rfcs/pull/2117
Focus is to let macros decide what to do. This can be achieved by having some kind of attribute that lets the macro tell the compiler where the line marker should be. This affects where you set the breakpoints and what happens when you step it.
Source file checksums in debug info
Both DWARF and CodeView (PDB) support embedding a cryptographic hash of each source file that contributed to the associated binary.
The cryptographic hash can be used by a debugger to verify that the source file matches the executable. If the source file does not match, the debugger can provide a warning to the user.
The hash can also be used to prove that a given source file has not been modified since it was used to compile an executable. Because MD5 and SHA1 both have demonstrated vulnerabilities, using SHA256 is recommended for this application.
The Rust compiler stores the hash for each source file in the corresponding SourceFile
in
the SourceMap
. The hashes of input files to external crates are stored in rlib
metadata.
A default hashing algorithm is set in the target specification. This allows the target to specify the best hash available, since not all targets support all hash algorithms.
The hashing algorithm for a target can also be overridden with the -Z source-file-checksum=
command-line option.
DWARF 5
DWARF version 5 supports embedding an MD5 hash to validate the source file version in use. DWARF 5 - Section 6.2.4.1 opcode DW_LNCT_MD5
LLVM
LLVM IR supports MD5 and SHA1 (and SHA256 in LLVM 11+) source file checksums in the DIFile node.
Microsoft Visual C++ Compiler /ZH option
The MSVC compiler supports embedding MD5, SHA1, or SHA256 hashes in the PDB using the /ZH
compiler option.
Clang
Clang always embeds an MD5 checksum, though this does not appear in documentation.
Future work
Name mangling changes
- New demangler in
libiberty
(gcc source tree). - New demangler in LLVM or LLDB.
TODO: Check the location of the demangler source. Question on Github.
Reuse Rust compiler for expressions
This is an important idea because debuggers by and large do not try to implement type inference. You need to be much more explicit when you type into the debugger than your actual source code. So, you cannot just copy and paste an expression from your source code to debugger and expect the same answer but this would be nice. This can be helped by using compiler.
It is certainly doable but it is a large project. You certainly need a bridge to the debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang) have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC.
Both debuggers expression evaluation implement both a superset and a subset of Rust. They implement just the expression language but they also add some extensions like GDB has convenience variables. Therefore, if you are taking this route then you not only need to do this bridge but may have to add some mode to let the compiler understand some extensions.
Appendix B: Background topics
This section covers a numbers of common compiler terms that arise in this guide. We try to give the general definition while providing some Rust-specific context.
What is a control-flow graph?
A control-flow graph is a common term from compilers. If you've ever used a flow-chart, then the concept of a control-flow graph will be pretty familiar to you. It's a representation of your program that exposes the underlying control flow in a very clear way.
A control-flow graph is structured as a set of basic blocks connected by edges. The key idea of a basic block is that it is a set of statements that execute "together" – that is, whenever you branch to a basic block, you start at the first statement and then execute all the remainder. Only at the end of the block is there the possibility of branching to more than one place (in MIR, we call that final statement the terminator):
bb0: {
statement0;
statement1;
statement2;
...
terminator;
}
Many expressions that you are used to in Rust compile down to multiple basic blocks. For example, consider an if statement:
a = 1;
if some_variable {
b = 1;
} else {
c = 1;
}
d = 1;
This would compile into four basic blocks:
BB0: {
a = 1;
if some_variable { goto BB1 } else { goto BB2 }
}
BB1: {
b = 1;
goto BB3;
}
BB2: {
c = 1;
goto BB3;
}
BB3: {
d = 1;
...;
}
When using a control-flow graph, a loop simply appears as a cycle in
the graph, and the break
keyword translates into a path out of that
cycle.
What is a dataflow analysis?
Static Program Analysis by Anders Møller and Michael I. Schwartzbach is an incredible resource!
to be written
What is "universally quantified"? What about "existentially quantified"?
to be written
What is co- and contra-variance?
Check out the subtyping chapter from the Rust Nomicon.
See the variance chapter of this guide for more info on how the type checker handles variance.
What is a "free region" or a "free variable"? What about "bound region"?
Let's describe the concepts of free vs bound in terms of program variables, since that's the thing we're most familiar with.
- Consider this expression, which creates a closure:
|a, b| a + b
. Here, thea
andb
ina + b
refer to the arguments that the closure will be given when it is called. We say that thea
andb
there are bound to the closure, and that the closure signature|a, b|
is a binder for the namesa
andb
(because any references toa
orb
within refer to the variables that it introduces). - Consider this expression:
a + b
. In this expression,a
andb
refer to local variables that are defined outside of the expression. We say that those variables appear free in the expression (i.e., they are free, not bound (tied up)).
So there you have it: a variable "appears free" in some expression/statement/whatever if it refers to something defined outside of that expressions/statement/whatever. Equivalently, we can then refer to the "free variables" of an expression – which is just the set of variables that "appear free".
So what does this have to do with regions? Well, we can apply the
analogous concept to type and regions. For example, in the type &'a u32
, 'a
appears free. But in the type for<'a> fn(&'a u32)
, it
does not.
Further Reading About Compilers
Thanks to
mem
,scottmcm
, andLevi
on the official Discord for the recommendations, and totinaun
for posting a link to a twitter thread from Graydon Hoare which had some more recommendations!Other sources: https://gcc.gnu.org/wiki/ListOfCompilerBooks
If you have other suggestions, please feel free to open an issue or PR.
Books
- Types and Programming Languages
- Programming Language Pragmatics
- Practical Foundations for Programming Languages
- Compilers: Principles, Techniques, and Tools, 2nd Edition
- Garbage Collection: Algorithms for Automatic Dynamic Memory Management
- Linkers and Loaders
- Advanced Compiler Design and Implementation
- Building an Optimizing Compiler
- Crafting Interpreters
Courses
Wikis
Misc Papers and Blog Posts
Appendix C: Glossary
The compiler uses a number of...idiosyncratic abbreviations and things. This glossary attempts to list them and give you a few pointers for understanding them better.
Term | Meaning |
---|---|
arena/arena allocation | An arena is a large memory buffer from which other memory allocations are made. This style of allocation is called arena allocation. See this chapter for more info. |
AST | The abstract syntax tree produced by the rustc_ast crate; reflects user syntax very closely. |
binder | A "binder" is a place where a variable or type is declared; for example, the <T> is a binder for the generic type parameter T in fn foo<T>(..) , and |a | ... is a binder for the parameter a . See the background chapter for more. |
BodyId <div id="body-id"/ | An identifier that refers to a specific body (definition of a function or constant) in the crate. See the HIR chapter for more. |
bound variable | A "bound variable" is one that is declared within an expression/term. For example, the variable a is bound within the closure expression |a | a * 2 . See the background chapter for more |
codegen | The code to translate MIR into LLVM IR. |
codegen unit | When we produce LLVM IR, we group the Rust code into a number of codegen units (sometimes abbreviated as CGUs). Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use. (see more) |
completeness | A technical term in type theory, it means that every type-safe program also type-checks. Having both soundness and completeness is very hard, and usually soundness is more important. (see "soundness"). |
control-flow graph | A representation of the control-flow of a program; see the background chapter for more |
CTFE | Short for Compile-Time Function Evaluation, this is the ability of the compiler to evaluate const fn s at compile time. This is part of the compiler's constant evaluation system. (see more) |
cx | We tend to use "cx" as an abbreviation for context. See also tcx , infcx , etc. |
DAG | A directed acyclic graph is used during compilation to keep track of dependencies between queries. (see more) |
data-flow analysis | A static analysis that figures out what properties are true at each point in the control-flow of a program; see the background chapter for more. |
DefId | An index identifying a definition (see librustc_middle/hir/def_id.rs ). Uniquely identifies a DefPath . See the HIR chapter for more. |
Double pointer | A pointer with additional metadata. See "fat pointer" for more. |
drop glue | (internal) compiler-generated instructions that handle calling the destructors (Drop ) for data types. |
DST | Short for Dynamically-Sized Type, this is a type for which the compiler cannot statically know the size in memory (e.g. str or [u8] ). Such types don't implement Sized and cannot be allocated on the stack. They can only occur as the last field in a struct. They can only be used behind a pointer (e.g. &str or &[u8] ). |
early-bound lifetime | A lifetime region that is substituted at its definition site. Bound in an item's Generics and substituted using a Substs . Contrast with late-bound lifetime. (see more) |
empty type | see "uninhabited type". |
Fat pointer | A two word value carrying the address of some value, along with some further information necessary to put the value to use. Rust includes two kinds of "fat pointers": references to slices, and trait objects. A reference to a slice carries the starting address of the slice and its length. A trait object carries a value's address and a pointer to the trait's implementation appropriate to that value. "Fat pointers" are also known as "wide pointers", and "double pointers". |
free variable | A "free variable" is one that is not bound within an expression or term; see the background chapter for more |
generics | The set of generic type parameters defined on a type or item. |
HIR | The High-level IR, created by lowering and desugaring the AST. (see more) |
HirId | Identifies a particular node in the HIR by combining a def-id with an "intra-definition offset". See the HIR chapter for more. |
HIR Map | The HIR map, accessible via tcx.hir, allows you to quickly navigate the HIR and convert between various forms of identifiers. |
ICE | Short for internal compiler error, this is when the compiler crashes. |
ICH | Short for incremental compilation hash, these are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled. |
infcx | The inference context (see librustc_middle/infer ) |
inference variable | When doing type or region inference, an "inference variable" is a kind of special type/region that represents what you are trying to infer. Think of X in algebra. For example, if we are trying to infer the type of a variable in a program, we create an inference variable to represent that unknown type. |
intern | Interning refers to storing certain frequently-used constant data, such as strings, and then referring to the data by an identifier (e.g. a Symbol ) rather than the data itself, to reduce memory usage and number of allocations. See this chapter for more info. |
IR | Short for Intermediate Representation, a general term in compilers. During compilation, the code is transformed from raw source (ASCII text) to various IRs. In Rust, these are primarily HIR, MIR, and LLVM IR. Each IR is well-suited for some set of computations. For example, MIR is well-suited for the borrow checker, and LLVM IR is well-suited for codegen because LLVM accepts it. |
IRLO | IRLO or irlo is sometimes used as an abbreviation for internals.rust-lang.org. |
item | A kind of "definition" in the language, such as a static, const, use statement, module, struct, etc. Concretely, this corresponds to the Item type. |
lang item | Items that represent concepts intrinsic to the language itself, such as special built-in traits like Sync and Send ; or traits representing operations such as Add ; or functions that are called by the compiler. (see more) |
late-bound lifetime | A lifetime region that is substituted at its call site. Bound in a HRTB and substituted by specific functions in the compiler, such as liberate_late_bound_regions . Contrast with early-bound lifetime. (see more) |
local crate | The crate currently being compiled. |
LTO | Short for Link-Time Optimizations, this is a set of optimizations offered by LLVM that occur just before the final binary is linked. These include optimizations like removing functions that are never used in the final program, for example. ThinLTO is a variant of LTO that aims to be a bit more scalable and efficient, but possibly sacrifices some optimizations. You may also read issues in the Rust repo about "FatLTO", which is the loving nickname given to non-Thin LTO. LLVM documentation: here and here. |
LLVM | (actually not an acronym :P) an open-source compiler backend. It accepts LLVM IR and outputs native binaries. Various languages (e.g. Rust) can then implement a compiler front-end that outputs LLVM IR and use LLVM to compile to all the platforms LLVM supports. |
memoization | The process of storing the results of (pure) computations (such as pure function calls) to avoid having to repeat them in the future. This is typically a trade-off between execution speed and memory usage. |
MIR | The Mid-level IR that is created after type-checking for use by borrowck and codegen. (see more) |
miri | An interpreter for MIR used for constant evaluation. (see more) |
monomorphization | The process of taking generic implementations of types and functions and instantiating them with concrete types. For example, in the code we might have Vec<T> , but in the final executable, we will have a copy of the Vec code for every concrete type used in the program (e.g. a copy for Vec<usize> , a copy for Vec<MyStruct> , etc). |
normalize | A general term for converting to a more canonical form, but in the case of rustc typically refers to associated type normalization. |
newtype | A wrapper around some other type (e.g., struct Foo(T) is a "newtype" for T ). This is commonly used in Rust to give a stronger type for indices. |
NLL | Short for non-lexical lifetimes, this is an extension to Rust's borrowing system to make it be based on the control-flow graph. |
node-id or NodeId | An index identifying a particular node in the AST or HIR; gradually being phased out and replaced with HirId . See the HIR chapter for more. |
obligation | Something that must be proven by the trait system. (see more) |
placeholder | NOTE: skolemization is deprecated by placeholder a way of handling subtyping around "for-all" types (e.g., for<'a> fn(&'a u32) ) as well as solving higher-ranked trait bounds (e.g., for<'a> T: Trait<'a> ). See the chapter on placeholder and universes for more details. |
point | Used in the NLL analysis to refer to some particular location in the MIR; typically used to refer to a node in the control-flow graph. |
polymorphize | An optimization that avoids unnecessary monomorphisation. (see more) |
projection | A general term for a "relative path", e.g. x.f is a "field projection", and T::Item is an "associated type projection". |
promoted constants | Constants extracted from a function and lifted to static scope; see this section for more details. |
provider | The function that executes a query. (see more) |
quantified | In math or logic, existential and universal quantification are used to ask questions like "is there any type T for which is true?" or "is this true for all types T?"; see the background chapter for more. |
query | Perhaps some sub-computation during compilation. (see more) |
region | Another term for "lifetime" often used in the literature and in the borrow checker. |
rib | A data structure in the name resolver that keeps track of a single scope for names. (see more) |
sess | The compiler session, which stores global data used throughout compilation |
side tables | Because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node. |
sigil | Like a keyword but composed entirely of non-alphanumeric tokens. For example, & is a sigil for references. |
soundness | A technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness"). |
span | A location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more. |
substs | The substitutions for a given generic type or item (e.g. the i32 , u32 in HashMap<i32, u32> ). |
tcx | The "typing context", main data structure of the compiler. (see more) |
'tcx | The lifetime of the allocation arena. (see more) |
token | The smallest unit of parsing. Tokens are produced after lexing (see more). |
TLS | Thread-Local Storage. Variables may be defined so that each thread has its own copy (rather than all threads sharing the variable). This has some interactions with LLVM. Not all platforms support TLS. |
trait reference | The name of a trait along with a suitable set of input type/lifetimes. (see more) |
trans | The code to translate MIR into LLVM IR. Renamed to codegen. |
ty | The internal representation of a type. (see more) |
UFCS | Short for Universal Function Call Syntax, this is an unambiguous syntax for calling a method. (see more) |
uninhabited type | A type which has no values. This is not the same as a ZST, which has exactly 1 value. An example of an uninhabited type is enum Foo {} , which has no variants, and so, can never be created. The compiler can treat code that deals with uninhabited types as dead code, since there is no such value to be manipulated. ! (the never type) is an uninhabited type. Uninhabited types are also called "empty types". |
upvar | A variable captured by a closure from outside the closure. |
variance | Determines how changes to a generic type/lifetime parameter affect subtyping; for example, if T is a subtype of U , then Vec<T> is a subtype Vec<U> because Vec is covariant in its generic parameter. See the background chapter for a more general explanation. See the variance chapter for an explanation of how type checking handles variance. |
Wide pointer | A pointer with additional metadata. See "fat pointer" for more. |
ZST | Zero-Sized Type. A type whose values have size 0 bytes. Since 2^0 = 1 , such types can have exactly one value. For example, () (unit) is a ZST. struct Foo; is also a ZST. The compiler can do some nice optimizations around ZSTs. |
Appendix D: Code Index
rustc has a lot of important data structures. This is an attempt to give some guidance on where to learn more about some of the key data structures of the compiler.
Item | Kind | Short description | Chapter | Declaration |
---|---|---|---|---|
BodyId | struct | One of four types of HIR node identifiers | Identifiers in the HIR | src/librustc_hir/hir.rs |
Compiler | struct | Represents a compiler session and can be used to drive a compilation. | The Rustc Driver and Interface | src/librustc_interface/interface.rs |
ast::Crate | struct | A syntax-level representation of a parsed crate | The parser | src/librustc_ast/ast.rs |
rustc_hir::Crate | struct | A more abstract, compiler-friendly form of a crate's AST | The Hir | src/librustc_hir/hir.rs |
DefId | struct | One of four types of HIR node identifiers | Identifiers in the HIR | src/librustc_hir/def_id.rs |
DiagnosticBuilder | struct | A struct for building up compiler diagnostics, such as errors or lints | Emitting Diagnostics | src/librustc_errors/diagnostic_builder.rs |
DocContext | struct | A state container used by rustdoc when crawling through a crate to gather its documentation | Rustdoc | src/librustdoc/core.rs |
HirId | struct | One of four types of HIR node identifiers | Identifiers in the HIR | src/librustc_hir/hir_id.rs |
NodeId | struct | One of four types of HIR node identifiers. Being phased out | Identifiers in the HIR | src/librustc_ast/ast.rs |
P | struct | An owned immutable smart pointer. By contrast, &T is not owned, and Box<T> is not immutable. | None | src/librustc_ast/ptr.rs |
ParamEnv | struct | Information about generic parameters or Self , useful for working with associated or generic items | Parameter Environment | src/librustc_middle/ty/mod.rs |
ParseSess | struct | This struct contains information about a parsing session | The parser | src/librustc_session/parse/parse.rs |
Query | struct | Represents the result of query to the Compiler interface and allows stealing, borrowing, and returning the results of compiler passes. | The Rustc Driver and Interface | src/librustc_interface/queries.rs |
Rib | struct | Represents a single scope of names | Name resolution | src/librustc_resolve/lib.rs |
Session | struct | The data associated with a compilation session | The parser, The Rustc Driver and Interface | src/librustc_middle/session/mod.html |
SourceFile | struct | Part of the SourceMap . Maps AST nodes to their source code for a single source file. Was previously called FileMap | The parser | src/librustc_span/lib.rs |
SourceMap | struct | Maps AST nodes to their source code. It is composed of SourceFile s. Was previously called CodeMap | The parser | src/librustc_span/source_map.rs |
Span | struct | A location in the user's source code, used for error reporting primarily | Emitting Diagnostics | src/librustc_span/span_encoding.rs |
StringReader | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | The parser | src/librustc_parse/lexer/mod.rs |
rustc_ast::token_stream::TokenStream | struct | An abstract sequence of tokens, organized into TokenTree s | The parser, Macro expansion | src/librustc_ast/tokenstream.rs |
TraitDef | struct | This struct contains a trait's definition with type information | The ty modules | src/librustc_middle/ty/trait_def.rs |
TraitRef | struct | The combination of a trait and its input types (e.g. P0: Trait<P1...Pn> ) | Trait Solving: Goals and Clauses, Trait Solving: Lowering impls | src/librustc_middle/ty/sty.rs |
Ty<'tcx> | struct | This is the internal representation of a type used for type checking | Type checking | src/librustc_middle/ty/mod.rs |
TyCtxt<'tcx> | struct | The "typing context". This is the central data structure in the compiler. It is the context that you use to perform all manner of queries | The ty modules | src/librustc_middle/ty/context.rs |
Compiler Lecture Series
These are videos where various experts explain different parts of the compiler:
- Tom Tromey discusses debugging support in rustc
- How Salsa Works (2019.01)
- Salsa In More Depth (2019.01)
- RLS 2.0, Salsa, and Name Resolution
- Cranelift
- Rust analyzer guide
- Rust analyzer syntax trees
- rust-analyzer type-checker overview by flodiebold
- oli-obk on miri and constant evaluation
- Polonius-rustc walkthrough
- rustc-chalk integration overview
- Coherence in Chalk by Sunjay Varma - Bay Area Rust Meetup
- How the chalk-engine crate works
- How the chalk-engine crate works 2
- RFC #2229 Disjoint Field Capture plan
- closures and upvar capture
- blitzerr closure upvar tys
- Convert Closure Upvar Representation to Tuples with blitzerr
- async-await implementation plans
- async-await region inferencer
- Universes and Lifetimes
- Representing types in rustc
- Polonius WG: Initialization and move tracking
Rust Bibliography
This is a reading list of material relevant to Rust. It includes prior research that has - at one time or another - influenced the design of Rust, as well as publications about Rust.
Type system
- Region based memory management in Cyclone
- Safe manual memory management in Cyclone
- Making ad-hoc polymorphism less ad hoc
- Macros that work together
- Traits: composable units of behavior
- Alias burying - We tried something similar and abandoned it.
- External uniqueness is unique enough
- Uniqueness and Reference Immutability for Safe Parallelism
- Region Based Memory Management
Concurrency
- Singularity: rethinking the software stack
- Language support for fast and reliable message passing in singularity OS
- Scheduling multithreaded computations by work stealing
- Thread scheduling for multiprogramming multiprocessors
- The data locality of work stealing
- Dynamic circular work stealing deque - The Chase/Lev deque
- Work-first and help-first scheduling policies for async-finish task parallelism - More general than fully-strict work stealing
- A Java fork/join calamity - critique of Java's fork/join library, particularly its application of work stealing to non-strict computation
- Scheduling techniques for concurrent systems
- Contention aware scheduling
- Balanced work stealing for time-sharing multicores
- Three layer cake for shared-memory programming
- Non-blocking steal-half work queues
- Reagents: expressing and composing fine-grained concurrency
- Algorithms for scalable synchronization of shared-memory multiprocessors
- Epoch-based reclamation.
Others
- Crash-only software
- Composing High-Performance Memory Allocators
- Reconsidering Custom Memory Allocation
Papers about Rust
- GPU Programming in Rust: Implementing High Level Abstractions in a Systems Level Language. Early GPU work by Eric Holk.
- Parallel closures: a new twist on an old
idea
- not exactly about Rust, but by nmatsakis
- Patina: A Formalization of the Rust Programming Language. Early formalization of a subset of the type system, by Eric Reed.
- Experience Report: Developing the Servo Web Browser Engine using Rust. By Lars Bergstrom.
- Implementing a Generic Radix Trie in Rust. Undergrad paper by Michael Sproul.
- Reenix: Implementing a Unix-Like Operating System in Rust. Undergrad paper by Alex Light.
- Evaluation of performance and productivity metrics of potential programming languages in the HPC environment. Bachelor's thesis by Florian Wilkens. Compares C, Go and Rust.
- Nom, a byte oriented, streaming, zero copy, parser combinators library in Rust. By Geoffroy Couprie, research for VLC.
- Graph-Based Higher-Order Intermediate Representation. An experimental IR implemented in Impala, a Rust-like language.
- Code Refinement of Stencil Codes. Another paper using Impala.
- Parallelization in Rust with fork-join and friends. Linus Farnstrand's master's thesis.
- Session Types for Rust. Philip Munksgaard's master's thesis. Research for Servo.
- Ownership is Theft: Experiences Building an Embedded OS in Rust - Amit Levy, et. al.
- You can't spell trust without Rust. Alexis Beingessner's master's thesis.
- Rust-Bio: a fast and safe bioinformatics library. Johannes Köster
- Safe, Correct, and Fast Low-Level Networking. Robert Clipsham's master's thesis.
- Formalizing Rust traits. Jonatan Milewski's master's thesis.
- Rust as a Language for High Performance GC Implementation
- Simple Verification of Rust Programs via Functional Purification. Sebastian Ullrich's master's thesis.
- Writing parsers like it is 2017 Pierre Chifflier and Geoffroy Couprie for the Langsec Workshop
- The Case for Writing a Kernel in Rust
- RustBelt: Securing the Foundations of the Rust Programming Language
Humor in Rust
What's a project without a sense of humor? And frankly some of these are enlightening?