关于本指南

本指南旨在帮助记录rustc——Rust编译器的工作方式,并帮助新的参与者参与rustc开发。

本指南分为六个部分:

  1. 构建,调试和为rustc贡献代码:包含无论您准备如何作出贡献,这部分信息都应该是有用的,例如作出贡献的一般过程,如何构建编译器等。
  2. 编译器的高层架构:讨论编译器的高级架构和编译过程的各个阶段。
  3. 源码表示:描述了从用户那里获取源代码并将其转换为编译器可以轻松使用的各种形式的过程。
  4. 分析:讨论编译器用来检查代码的各种属性并通知编译过程的后期阶段(例如,类型检查)的分析过程。
  5. 从MIR到Binaries:如何生成链接好的可执行机器代码。
  6. 附录:其中有大量实用的参考信息,包括词汇表。

该指南本身当然也是开源的,可以在GitHub repository中找到本书的源码。如果您在本书中发现任何错误,请开一个issue,或更好的,开一个带有更正内容的PR!

其他信息

以下站点也可能对您来说有用:

  • rustc API docs -- 编译器的rustdoc文档
  • Forge -- 包含rust基础设施、团队工作流程、以及更多
  • compiler-team -- rust编译器团队的基地,其中包含对团队工作流程,活动中的工作组和团队日历的描述。

第1部分:构建,调试和向rustc贡献代码

rustc-dev-guide的这一部分包含对您无论您在编译器的哪个部分上工作都很有用的知识。 这既包括 技术信息和提示(例如,如何编译和调试编译器)和Rust项目中的工作流程的信息(例如,稳定性和编译器小组的信息)。

About the compiler team

rustc is maintained by the Rust compiler team. The people who belong to this team collectively work to track regressions and implement new features. Members of the Rust compiler team are people who have made significant contributions to rustc and its design.

Discussion

Currently the compiler team chats in 2 places:

Expert map

If you're interested in figuring out who can answer questions about a particular part of the compiler, or you'd just like to know who works on what, check out our experts directory. It contains a listing of the various parts of the compiler and a list of people who are experts on each one.

Rust compiler meeting

The compiler team has a weekly meeting where we do triage and try to generally stay on top of new bugs, regressions, and other things. They are held on Zulip. It works roughly as follows:

  • Review P-high bugs: P-high bugs are those that are sufficiently important for us to actively track progress. P-high bugs should ideally always have an assignee.
  • Look over new regressions: we then look for new cases where the compiler broke previously working code in the wild. Regressions are almost always marked as P-high; the major exception would be bug fixes (though even there we often aim to give warnings first).
  • Check I-nominated issues: These are issues where feedback from the team is desired.
  • Check for beta nominations: These are nominations of things to backport to beta.
  • Possibly WG checking: A WG may give an update at this point, if there is time.

The meeting currently takes place on Thursdays at 10am Boston time (UTC-4 typically, but daylight savings time sometimes makes things complicated).

The meeting is held over a "chat medium", currently on zulip.

Team membership

Membership in the Rust team is typically offered when someone has been making significant contributions to the compiler for some time. Membership is both a recognition but also an obligation: compiler team members are generally expected to help with upkeep as well as doing reviews and other work.

If you are interested in becoming a compiler team member, the first thing to do is to start fixing some bugs, or get involved in a working group. One good way to find bugs is to look for open issues tagged with E-easy or E-mentor.

r+ rights

Once you have made a number of individual PRs to rustc, we will often offer r+ privileges. This means that you have the right to instruct "bors" (the robot that manages which PRs get landed into rustc) to merge a PR (here are some instructions for how to talk to bors).

The guidelines for reviewers are as follows:

  • You are always welcome to review any PR, regardless of who it is assigned to. However, do not r+ PRs unless:
    • You are confident in that part of the code.
    • You are confident that nobody else wants to review it first.
      • For example, sometimes people will express a desire to review a PR before it lands, perhaps because it touches a particularly sensitive part of the code.
  • Always be polite when reviewing: you are a representative of the Rust project, so it is expected that you will go above and beyond when it comes to the Code of Conduct.

high-five

Once you have r+ rights, you can also be added to the high-five rotation. high-five is the bot that assigns incoming PRs to reviewers. If you are added, you will be randomly selected to review PRs. If you find you are assigned a PR that you don't feel comfortable reviewing, you can also leave a comment like r? @so-and-so to assign to someone else — if you don't know who to request, just write r? @nikomatsakis for reassignment and @nikomatsakis will pick someone for you.

Getting on the high-five list is much appreciated as it lowers the review burden for all of us! However, if you don't have time to give people timely feedback on their PRs, it may be better that you don't get on the list.

Full team membership

Full team membership is typically extended once someone made many contributions to the Rust compiler over time, ideally (but not necessarily) to multiple areas. Sometimes this might be implementing a new feature, but it is also important — perhaps more important! — to have time and willingness to help out with general upkeep such as bugfixes, tracking regressions, and other less glamorous work.

如何构建并运行编译器

编译器使用 x.py 工具构建。您将需要安装Python才能运行它。 但是在此之前,如果您打算修改rustc的代码,则需要调整编译器的配置。因为默认配置面向以用户而不是开发人员来进行构建。

创建一个 config.toml

先将 config.toml.example 复制为 config.toml:

> cd $RUST_CHECKOUT
> cp config.toml.example config.toml

然后,您将需要打开这个文件并更改以下设置(根据需求不同可能也要修改其他的设置,例如llvm.ccache):

[llvm]
# Enables LLVM assertions, which will check that the LLVM bitcode generated
# by the compiler is internally consistent. These are particularly helpful
# if you edit `codegen`.
assertions = true

[rust]
# This will make your build more parallel; it costs a bit of runtime
# performance perhaps (less inlining) but it's worth it.
codegen-units = 0

# This enables full debuginfo and debug assertions. The line debuginfo is also
# enabled by `debuginfo-level = 1`. Full debuginfo is also enabled by
# `debuginfo-level = 2`. Debug assertions can also be enabled with
# `debug-assertions = true`. Note that `debug = true` will make your build
# slower, so you may want to try individually enabling debuginfo and assertions
# or enable only line debuginfo which is basically free.
debug = true

如果您已经构建过了rustc,那么您可能必须执行rm -rf build才能使配置更改生效。 请注意,./x.py clean 不会导致重新构建LLVM。 因此,如果您的配置更改影响LLVM,则在重新构建之前,您将需要手动rm -rf build /

x.py是什么?

x.py是用于编排rustc存储库的各种构建的脚本。 该脚本可以构建文档,运行测试并编译rustc。 现在它替代了以前的makefile,是构建rustc的首选方法。下面是利用x.py来有效处理各种任务的常见方式。

本章侧重于提高生产力的基础知识,但是,如果您想了解有关x.py的更多信息,请在此处阅读其README.md。

自举

要记住的一件事是rustc是一个自举式编译器。 也就是说,由于rustc是用Rust编写的,因此我们需要使用较旧版本的编译器来编译较新的版本。 特别是,新版本的编译器以及构建该编译器所需的一些组件,例如libstd和其他工具,可能在内部使用一些unstable的特性,因此需要能使用这些unstable特性的特定版本。

因此编译rustc需要分阶段完成:

  • Stage 0:stage0中使用的编译器通常是当前最新的的beta 版本rustc编译器及其关联的动态库(您也可以将x.py配置为使用其他版本的编译器)。 此stage 0编译器仅用于编译rustbuildstdrustc

    编译rustc时,此stage0编译器使用新编译的std

    这里有两个概念:一个编译器(及其依赖)及其“目标”或“对象”库(stdrustc)。两者均会在此阶段出现,但以交错方式进行。

  • Stage 1:然后使用stage0编译器编译你的代码库中的代码,以生成新的stage1编译器。

    但是,它是使用较旧的编译器(stage0)构建的,因此为了优化stage1编译器,我们需要进入下一阶段。

    • 从理论上讲,stage1编译器在功能上与stage2编译器相同,但实际上它们存在细微差别。

      特别是,stage1中使用的编译器本身是由stage0编译器构建的,而不是由您的工作目录中的源构建的。

      这意味着,在编译器源代码中使用的符号名称可能与stage1编译器生成的符号名称不匹配。

      这在使用动态链接时非常重要(例如,带有derive的代码)。有时这意味着某些测试在与stage1运行时不起作用。

  • Stage 2:我们使用stage1中得到的编译器重新构建其自身,以产生具有所有最新优化的stage2编译器。 (默认情况下,我们复制stage1中的库供stage2编译器使用,因为它们应该是相同的。)

  • (可选)Stage 3:要完全检查我们的新编译器,我们可以使用stage2编译器来构建库。除非出现故障,否则结果应与之前相同。 要了解有关自举过程的更多信息,请阅读本章

构建编译器

要完整构建编译器,请运行./x.py build。 这将完成上述整个引导过程,并从您的源代码中生成可用的编译器工具链。 这需要很长时间,因此通常不需要真的运行这条命令(稍后会详细介绍)。

您可以将许多标志传递给x.py的build命令,这些标志可以减少编译时间或适应您可能需要更改的其他内容。 他们是:

Options:
    -v, --verbose       use verbose output (-vv for very verbose)
    -i, --incremental   use incremental compilation
        --config FILE   TOML configuration file for build
        --build BUILD   build target of the stage0 compiler
        --host HOST     host targets to build
        --target TARGET target targets to build
        --on-fail CMD   command to run on failure
        --stage N       stage to build
        --keep-stage N  stage to keep without recompiling
        --src DIR       path to the root of the rust checkout
    -j, --jobs JOBS     number of jobs to run in parallel
    -h, --help          print this help message

对于一些hacking,通常构建stage 1编译器就足够了,但是对于最终测试和发布,则使用stage 2编译器。

./x.py check可以快速构建rust编译器。 当您执行某种“基于类型的重构”(例如重命名方法或更改某些函数的签名)时,它特别有用。

在创建了config.toml之后,就可以运行x.py。 这里有很多选项,但让我们从构建本地rust的最佳命令开始:

./x.py build -i --stage 1 src/libstd

看起来好像它仅构建libstd,但事实并非如此。该命令的作用如下:

  • 使用stage0编译器构建libstd(使用增量)
  • 使用stage0编译器构建librustc(使用增量)
    • 这产生了stage1编译器
  • 使用stage1编译器构建libstd(不能使用增量式)

最终产品 (stage1编译器+使用该编译器构建的库)是构建其他rust程序所需要的(除非使用#![no_std]#![no_core])。

该命令包括-i开关,该开关启用增量编译。这将用于加快该过程的前两个步骤:特别是,如果您进行了较小的更改,我们应该能够使用您上一次编译的结果来更快地生成stage1编译器。

不幸的是,不能使用增量来加速stage1库的构建。 这是因为增量仅在连续运行同一编译器两次时才起作用。 在这种情况下,我们每次都会构建一个新的stage1编译器。 因此,旧的增量结果可能不适用。 您可能会发现构建stage1 libstd对您来说是一个瓶颈 —— 但不要担心,这有一个(hacky的)解决方法。请参阅下面“推荐的工作流程”部分。

请注意,这整个命令只是为您提供完整rustc构建的一部分。完整的rustc构建(即./x.py build命令)还有很多步骤:

  • 使用stage1编译器构建librustc和rustc。
    • 此处生成的编译器为stage2编译器。
  • 使用stage2编译器构建libstd
  • 使用stage2编译器构建librustdoc和其他内容。

构建特定组件

只构建 libcore 库

./x.py build src/libcore

只构建 libcore 和 libproc_macro 库

./x.py build src/libcore src/libproc_macro

只构建到 Stage 1 为止的 libcore

./x.py build src/libcore --stage 1

有时您可能只想测试您正在处理的部分是否可以编译。 使用这些命令,您可以在进行较大的构建之前进行测试,以确保它可以与编译器一起使用。 如前所示,您还可以在末尾传递标志,例如--stage。

创建一个rustup工具链

成功构建rustc之后,您在构建目录中已经创建了一堆文件。 为了实际运行生成的rustc,我们建议创建两个rustup工具链。 第一个将运行stage1编译器(上面构建的结果)。 第二个将执行stage2编译器(我们尚未构建这个编译器,但是您可能需要在某个时候构建它;例如,如果您想运行整个测试套件)。

rustup toolchain link stage1 build/<host-triple>/stage1
rustup toolchain link stage2 build/<host-triple>/stage2

<host-triple> 一般来说是以下三者之一:

  • Linux: x86_64-unknown-linux-gnu
  • Mac: x86_64-apple-darwin
  • Windows: x86_64-pc-windows-msvc

现在,您可以运行构建出的rustc。 如果使用-vV运行,则应该看到以-dev结尾的版本号,表示从本地环境构建的版本:

$ rustc +stage1 -vV
rustc 1.25.0-dev
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.25.0-dev
LLVM version: 4.0

其他 x.py 命令

这是其他一些有用的x.py命令。我们将在其他章节中详细介绍其中一些:

  • 构建:
    • ./x.py clean – 清理构建目录 (rm -rf build 也能达到这个效果,但你必须重新构建LLVM)
    • ./x.py build --stage 1 – 使用stage 1 编译器构建所有东西,不止是 libstd
    • ./x.py build – 构建 stage2 编译器
  • 运行测试 (见 运行测试 章节):
    • ./x.py test --stage 1 src/libstd – 为libstd运行 #[test] 测试

    • ./x.py test --stage 1 src/test/ui – 运行 ui 测试组

    • ./x.py test --stage 1 src/test/ui/const-generics - 运行ui 测试组下的 const-generics/ 子文件夹中的测试

    • ./x.py test --stage 1 src/test/ui/const-generics/const-types.rs

      • 运行ui测试组下的 const-types.rs 中的测试

清理构建文件夹

有时您需要重新开始,但是通常情况并非如此。 如果您感到需要这么做,那么其实有可能是rustbuild无法正确执行,此时你应该提出一个bug来告知我们什么出错了。 如果确实需要清理所有内容,则只需运行一个命令!

./x.py clean

推荐的工作流程

完整的自举过程需要比较长的时间。 这里有三个建议能使您的生活更轻松。

Check, check,再 check

第一个工作流程在执行简单重构时非常有用,那就是持续地运行./x.py check。 这样,您只是在检查编译器是否可以编译,但这通常就是您所需要的(例如,重命名方法时)。 然后,当您实际需要运行测试时,可以运行./x.py build

实际上,有时即使您不完全确定该代码将起作用,也可以暂时把测试放在一边。 然后,您可以继续构建重构,提交代码,并稍后再运行测试。 然后,您可以使用git bisect精确地跟踪导致问题的提交。 这种风格的一个很好的副作用是最后能得到一组相当细粒度的提交,所有这些提交都能构建并通过测试。 这通常有助于审核代码。

使用 --keep-stage 持续构建

有时仅检查编译器是否能够编译是不够的。 一个常见的例子是您需要添加一条debug!语句来检查某些状态的值或更好地理解问题。 在这种情况下,您确实需要完整的构建。 但是,通过利用增量,您通常可以使这些构建非常快速地完成(例如,大约30秒)。 唯一的问题是,这需要一些伪造,并且可能会导致编译器无法正常工作(但是很容易检测和修复)。

所需的命令序列如下:

  • 初始构建: ./x.py build -i --stage 1 src/libstd
    • 上文所述,这将在运行所有stage0命令,包括括构建一个stage1编译器与与其相兼容的libstd, 并运行"stage 1 actions"的前几步,到“stage1 (sysroot stage1) builds libstd”为止。
  • 接下来的构建: ./x.py build -i --stage 1 src/libstd --keep-stage 1
    • 注意我们在此添加了 --keep-stage 1 flag

如前所述,-keep-stage 1的作用是我们假设可以复用旧的标准库。 如果你修改的是编译器的话,几乎总是这样:毕竟,您还没有更改标准库。 但是有时候,这是不行的:例如,如果您正在编辑编译器的“元数据”部分, 该部分控制着编译器如何将类型和其他状态编码到rlib文件中, 或者您正在编辑的部分会体现在元数据上(例如MIR的定义)。

TL;DR,使用--keep-stage 1 时您得到的结果可能会有奇怪的行为, 例如,奇怪的ICE或其他panic。 在这种情况下,您只需从命令中删除--keep-stage 1,然后重新构建。 这样应该就能解决问题了。

使用系统自带的 LLVM 构建

默认情况下,LLVM是从源代码构建的,这可能会花费大量时间。 一种替代方法是使用计算机上已经安装的LLVM。

config.toml中的 target一节进行配置:

[target.x86_64-unknown-linux-gnu]
llvm-config = "/path/to/llvm/llvm-7.0.1/bin/llvm-config"

我们之前已经观察到以下路径,这些路径可能与您的系统上的不同:

  • /usr/bin/llvm-config-8
  • /usr/lib/llvm-8/bin/llvm-config

请注意,您需要安装LLVMFileCheck工具,该工具用于代码生成测试。 这个工具通常是LLVM内建的,但是如果您使用自己的预装LLVM,则需要以其他方式提供FileCheck。 在基于Debian的系统上,您可以安装llvm-N-tools软件包(其中N是LLVM版本号,例如llvm-8-tools)。 或者,您可以使用config.toml中的llvm-filecheck配置项指定FileCheck的路径, 也可以使用config.toml中的codegen-tests项禁用代码生成测试。

编译器自举

本小节与自举过程有关。

运行x.py时,您将看到如下输出:

Building stage0 std artifacts
Copying stage0 std from stage0
Building stage0 compiler artifacts
Copying stage0 rustc from stage0
Building LLVM for x86_64-apple-darwin
Building stage0 codegen artifacts
Assembling stage1 compiler
Building stage1 std artifacts
Copying stage1 std from stage1
Building stage1 compiler artifacts
Copying stage1 rustc from stage1
Building stage1 codegen artifacts
Assembling stage2 compiler
Uplifting stage1 std
Copying stage2 std from stage1
Generating unstable book md files
Building stage0 tool unstable-book-gen
Building stage0 tool rustbook
Documenting standalone
Building rustdoc for stage2
Documenting book redirect pages
Documenting stage2 std
Building rustdoc for stage1
Documenting stage2 whitelisted compiler
Documenting stage2 compiler
Documenting stage2 rustdoc
Documenting error index
Uplifting stage1 rustc
Copying stage2 rustc from stage1
Building stage2 tool error_index_generator

在这里可以更深入地了解x.py的各个阶段:

A diagram of the rustc compilation phases

请记住,此图只是一个简化,即rustdoc可以在不同阶段构建,当传递诸如--keep-stage之类的标志或存在和宿主机类型不同的目标时,该过程会有所不同。

下表列出了各种阶段操作的输出:

Stage 0 动作Output
提取betabuild/HOST/stage0
stage0 构建 bootstrapbuild/bootstrap
stage0 构建 libstdbuild/HOST/stage0-std/TARGET
复制 stage0-std (HOST only)build/HOST/stage0-sysroot/lib/rustlib/HOST
stage0 使用stage0-sysroot构建 rustcbuild/HOST/stage0-rustc/HOST
复制 stage0-rustc (可执行文件除外)build/HOST/stage0-sysroot/lib/rustlib/HOST
构建 llvmbuild/HOST/llvm
stage0 使用stage0-sysroot构建 codegenbuild/HOST/stage0-codegen/HOST
stage0 使用stage0-sysroot构建 rustdocbuild/HOST/stage0-tools/HOST

--stage=0 到此为止。

Stage 1 动作Output
复制 (提升) stage0-rustc 可执行文件到 stage1build/HOST/stage1/bin
复制 (提升) stage0-codegenstage1build/HOST/stage1/lib
复制 (提升) stage0-sysrootstage1build/HOST/stage1/lib
stage1 构建 libstdbuild/HOST/stage1-std/TARGET
复制 stage1-std (HOST only)build/HOST/stage1/lib/rustlib/HOST
stage1 构建 rustcbuild/HOST/stage1-rustc/HOST
复制 stage1-rustc (可执行文件除外)build/HOST/stage1/lib/rustlib/HOST
stage1 构建 codegenbuild/HOST/stage1-codegen/HOST

--stage=1 到此为止。

Stage 2 动作Output
复制 (提升) stage1-rustc 可执行文件build/HOST/stage2/bin
复制 (提升) stage1-sysrootbuild/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST
stage2 构建 libstd (除 HOST?)build/HOST/stage2-std/TARGET
复制 stage2-std (非 HOST 目标)build/HOST/stage2/lib/rustlib/TARGET
stage2 构建 rustdocbuild/HOST/stage2-tools/HOST
复制 rustdocbuild/HOST/stage2/bin

--stage=2 到此为止。

注意,x.py使用的约定是:

  • “stage N 产品”是由stage N编译器产生的制品。
  • “stage (N+1)编译器”由“stage N 产品”组成。
  • “--stage N”标志表示使用stage N构建。

简而言之,stage 0使用stage0编译器创建stage0产品,随后将其提升为stage1

每次编译任何主要产品(stdrustc)时,都会执行两个步骤。 当std由N级编译器编译时,该std将链接到由N级编译器构建的程序(包括稍后构建的rustc)。 stage (N+1)编译器还将使用它与自身链接。 如果有人认为stage (N+1)编译器“只是”我们正在使用阶段N编译器构建的另一个程序,那么这有点直观。 在某些方面,可以将rustc(二进制文件,而不是rustbuild步骤)视为少数no_core二进制文件之一。

因此,“stage0 std制品”实际上是下载的stage0编译器的输出,并且将用于stage0编译器构建的任何内容: 例如 rustc制品。 当它宣布正在“构建stage1 std制品”时,它已进入下一个自举阶段。 在以后的阶段中,这种模式仍在继续。

还要注意,根据stage的不同,构建主机std和目标std的情况有所不同(例如,在表格中看到stage2仅构建非主机std目标。 这是因为在stage2期间,主机std是从stage 1std提升过来的—— 特别地,当宣布“ Building stage 1 artifacts”时,它也随后被复制到stage2中(编译器的libdir sysroot))。

对于编译器的任何有用的工作,这个std都是非常必要的。 具体来说,它用作新编译的编译器所编译程序的std (因此,当您编译fn main() {}时,它将链接到使用x.py build --stage 1 src/libstd编译的最后一个std)。

由stage0编译器生成的rustc链接到新构建的libstd,这意味着在大多数情况下仅需要对std进行cfg门控,以便rustc可以立即使用添加到std的功能。 添加后,无需进入下载的Beta。由stage1/bin/rustc编译器构建的libstd,也称“stage1 std”构件,不一定与该编译器具有ABI兼容性。 也就是说,rustc二进制文件很可能无法使用此std本身。 然而,它与stage1/bin/rustc二进制文件所构建的任何程序(包括其自身)都具有ABI兼容性,因此从某种意义上讲,它们是配对的。

这也是--keep-stage 1 src/libstd起作用的地方。 由于对编译器的大多数更改实际上并未更改ABI,因此,一旦在阶段1中生成了libstd,您就可以将其与其他编译器一起使用。 如果ABI没变,那就很好了,不需要花费时间重新编译std--keep-stage假设先前的编译没有问题,然后将这些制品复制到适当的位置,从而跳过cargo调用。

我们首先构建std,然后构建rustc的原因基本上是因为我们要最小化rustc代码中的cfg(stage0)。 当前rustc总是与“新的”std链接,因此它不必关心std的差异。它可以假定std尽可能新。

我们需要两次构建它的原因是因为ABI兼容性。 Beta编译器具有自己的ABI,而stage1/bin/rustc编译器将使用新的ABI生成程序/库。 我们曾经要编译3次,但是由于我们假设ABI在代码库中是恒定的, 我们假定“stage2”编译器生成的库(由stage1/bin/rustc编译器产生)与stage1/bin/rustc编译器产生的库ABI兼容。 这意味着我们可以跳过最后一次编译 —— 只需使用stage2/bin/rustc编译器自身所使用的库。

这个stage2/bin/rustc编译器和stage 1 {std, rustc}一起被交付给最终用户。

环境变量

在自举过程中,使用了很多编译器内部的环境变量。 如果您尝试运行rustc的中间版本,有时可能需要手动设置其中一些环境变量。 否则,您将得到如下错误:

thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', src/libcore/result.rs:1165:5

如果./stageN/bin/rustc给出了有关环境变量的错误,那通常意味着有些不对劲 ——或者您正在尝试编译例如librustclibstd或依赖于环境变量的东西。 在极少数情况下,您才会需要在这种情况下调用rustc, 您可以通过在x.py命令中添加以下标志来找到环境变量值:--on-fail=print-env

Build distribution artifacts

You might want to build and package up the compiler for distribution. You’ll want to run this command to do it:

./x.py dist

Install distribution artifacts

If you’ve built a distribution artifact you might want to install it and test that it works on your target system. You’ll want to run this command:

./x.py install

Note: If you are testing out a modification to a compiler, you might want to use it to compile some project. Usually, you do not want to use ./x.py install for testing. Rather, you should create a toolchain as discussed in here.

For example, if the toolchain you created is called foo, you would then invoke it with rustc +foo ... (where ... represents the rest of the arguments).

Documenting rustc

You might want to build documentation of the various components available like the standard library. There’s two ways to go about this. You can run rustdoc directly on the file to make sure the HTML is correct, which is fast. Alternatively, you can build the documentation as part of the build process through x.py. Both are viable methods since documentation is more about the content.

Document everything

./x.py doc

If you want to avoid the whole Stage 2 build

./x.py doc --stage 1

First the compiler and rustdoc get built to make sure everything is okay and then it documents the files.

Document specific components

./x.py doc src/doc/book
./x.py doc src/doc/nomicon
./x.py doc src/doc/book src/libstd

Much like individual tests or building certain components you can build only the documentation you want.

Document internal rustc items

Compiler documentation is not built by default. To enable it, modify config.toml:

[build]
compiler-docs = true

Note that when enabled, documentation for internal compiler items will also be built.

Compiler Documentation

The documentation for the rust components are found at rustc doc.

ctags

One of the challenges with rustc is that the RLS can't handle it, since it's a bootstrapping compiler. This makes code navigation difficult. One solution is to use ctags.

ctags has a long history and several variants. Exuberant Ctags seems to be quite commonly distributed but it does not have out-of-box Rust support. Some distributions seem to use Universal Ctags, which is a maintained fork and does have built-in Rust support.

The following script can be used to set up Exuberant Ctags: https://github.com/nikomatsakis/rust-etags.

ctags integrates into emacs and vim quite easily. The following can then be used to build and generate tags:

$ rust-ctags src/lib* && ./x.py build <something>

This allows you to do "jump-to-def" with whatever functions were around when you last built, which is ridiculously useful.

The compiler testing framework

The Rust project runs a wide variety of different tests, orchestrated by the build system (x.py test). The main test harness for testing the compiler itself is a tool called compiletest (located in the src/tools/compiletest directory). This section gives a brief overview of how the testing framework is setup, and then gets into some of the details on how to run tests as well as how to add new tests.

Compiletest test suites

The compiletest tests are located in the tree in the src/test directory. Immediately within you will see a series of subdirectories (e.g. ui, run-make, and so forth). Each of those directories is called a test suite – they house a group of tests that are run in a distinct mode.

Here is a brief summary of the test suites and what they mean. In some cases, the test suites are linked to parts of the manual that give more details.

  • ui – tests that check the exact stdout/stderr from compilation and/or running the test
  • run-pass-valgrind – tests that ought to run with valgrind
  • run-fail – tests that are expected to compile but then panic during execution
  • compile-fail – tests that are expected to fail compilation.
  • parse-fail – tests that are expected to fail to parse
  • pretty – tests targeting the Rust "pretty printer", which generates valid Rust code from the AST
  • debuginfo – tests that run in gdb or lldb and query the debug info
  • codegen – tests that compile and then test the generated LLVM code to make sure that the optimizations we want are taking effect. See LLVM docs for how to write such tests.
  • assembly – similar to codegen tests, but verifies assembly output to make sure LLVM target backend can handle provided code.
  • mir-opt – tests that check parts of the generated MIR to make sure we are building things correctly or doing the optimizations we expect.
  • incremental – tests for incremental compilation, checking that when certain modifications are performed, we are able to reuse the results from previous compilations.
  • run-make – tests that basically just execute a Makefile; the ultimate in flexibility but quite annoying to write.
  • rustdoc – tests for rustdoc, making sure that the generated files contain the expected documentation.
  • *-fulldeps – same as above, but indicates that the test depends on things other than libstd (and hence those things must be built)

Other Tests

The Rust build system handles running tests for various other things, including:

  • Tidy – This is a custom tool used for validating source code style and formatting conventions, such as rejecting long lines. There is more information in the section on coding conventions.

    Example: ./x.py test tidy

  • Formatting – Rustfmt is integrated with the build system to enforce uniform style across the compiler. In the CI, we check that the formatting is correct. The formatting check is also automatically run by the Tidy tool mentioned above.

    Example: ./x.py fmt --check checks formatting an exits with an error if formatting is needed.

    Example: ./x.py fmt runs rustfmt on the codebase.

    Example: ./x.py test tidy --bless does formatting before doing other tidy checks.

  • Unit tests – The Rust standard library and many of the Rust packages include typical Rust #[test] unittests. Under the hood, x.py will run cargo test on each package to run all the tests.

    Example: ./x.py test src/libstd

  • Doc tests – Example code embedded within Rust documentation is executed via rustdoc --test. Examples:

    ./x.py test src/doc – Runs rustdoc --test for all documentation in src/doc.

    ./x.py test --doc src/libstd – Runs rustdoc --test on the standard library.

  • Link checker – A small tool for verifying href links within documentation.

    Example: ./x.py test src/tools/linkchecker

  • Dist check – This verifies that the source distribution tarball created by the build system will unpack, build, and run all tests.

    Example: ./x.py test distcheck

  • Tool tests – Packages that are included with Rust have all of their tests run as well (typically by running cargo test within their directory). This includes things such as cargo, clippy, rustfmt, rls, miri, bootstrap (testing the Rust build system itself), etc.

  • Cargo test – This is a small tool which runs cargo test on a few significant projects (such as servo, ripgrep, tokei, etc.) just to ensure there aren't any significant regressions.

    Example: ./x.py test src/tools/cargotest

Testing infrastructure

When a Pull Request is opened on Github, Azure Pipelines will automatically launch a build that will run all tests on some configurations (x86_64-gnu-llvm-6.0 linux. x86_64-gnu-tools linux, mingw-check linux). In essence, it runs ./x.py test after building for each of them.

The integration bot bors is used for coordinating merges to the master branch. When a PR is approved, it goes into a queue where merges are tested one at a time on a wide set of platforms using Azure Pipelines (currently over 50 different configurations). Most platforms only run the build steps, some run a restricted set of tests, only a subset run the full suite of tests (see Rust's platform tiers).

Testing with Docker images

The Rust tree includes Docker image definitions for the platforms used on Azure Pipelines in src/ci/docker. The script src/ci/docker/run.sh is used to build the Docker image, run it, build Rust within the image, and run the tests.

TODO: What is a typical workflow for testing/debugging on a platform that you don't have easy access to? Do people build Docker images and enter them to test things out?

Testing on emulators

Some platforms are tested via an emulator for architectures that aren't readily available. There is a set of tools for orchestrating running the tests within the emulator. Platforms such as arm-android and arm-unknown-linux-gnueabihf are set up to automatically run the tests under emulation on Travis. The following will take a look at how a target's tests are run under emulation.

The Docker image for armhf-gnu includes QEMU to emulate the ARM CPU architecture. Included in the Rust tree are the tools remote-test-client and remote-test-server which are programs for sending test programs and libraries to the emulator, and running the tests within the emulator, and reading the results. The Docker image is set up to launch remote-test-server and the build tools use remote-test-client to communicate with the server to coordinate running tests (see src/bootstrap/test.rs).

TODO: What are the steps for manually running tests within an emulator? ./src/ci/docker/run.sh armhf-gnu will do everything, but takes hours to run and doesn't offer much help with interacting within the emulator.

Is there any support for emulating other (non-Android) platforms, such as running on an iOS emulator?

Is there anything else interesting that can be said here about running tests remotely on real hardware?

It's also unclear to me how the wasm or asm.js tests are run.

Crater

Crater is a tool for compiling and running tests for every crate on crates.io (and a few on GitHub). It is mainly used for checking for extent of breakage when implementing potentially breaking changes and ensuring lack of breakage by running beta vs stable compiler versions.

When to run Crater

You should request a crater run if your PR makes large changes to the compiler or could cause breakage. If you are unsure, feel free to ask your PR's reviewer.

Requesting Crater Runs

The rust team maintains a few machines that can be used for running crater runs on the changes introduced by a PR. If your PR needs a crater run, leave a comment for the triage team in the PR thread. Please inform the team whether you require a "check-only" crater run, a "build only" crater run, or a "build-and-test" crater run. The difference is primarily in time; the conservative (if you're not sure) option is to go for the build-and-test run. If making changes that will only have an effect at compile-time (e.g., implementing a new trait) then you only need a check run.

Your PR will be enqueued by the triage team and the results will be posted when they are ready. Check runs will take around ~3-4 days, with the other two taking 5-6 days on average.

While crater is really useful, it is also important to be aware of a few caveats:

  • Not all code is on crates.io! There is a lot of code in repos on GitHub and elsewhere. Also, companies may not wish to publish their code. Thus, a successful crater run is not a magically green light that there will be no breakage; you still need to be careful.

  • Crater only runs Linux builds on x86_64. Thus, other architectures and platforms are not tested. Critically, this includes Windows.

  • Many crates are not tested. This could be for a lot of reasons, including that the crate doesn't compile any more (e.g. used old nightly features), has broken or flaky tests, requires network access, or other reasons.

  • Before crater can be run, @bors try needs to succeed in building artifacts. This means that if your code doesn't compile, you cannot run crater.

Perf runs

A lot of work is put into improving the performance of the compiler and preventing performance regressions. A "perf run" is used to compare the performance of the compiler in different configurations for a large collection of popular crates. Different configurations include "fresh builds", builds with incremental compilation, etc.

The result of a perf run is a comparison between two versions of the compiler (by their commit hashes).

You should request a perf run if your PR may affect performance, especially if it can affect performance adversely.

Further reading

The following blog posts may also be of interest:

Running tests

You can run the tests using x.py. The most basic command – which you will almost never want to use! – is as follows:

./x.py test

This will build the full stage 2 compiler and then run the whole test suite. You probably don't want to do this very often, because it takes a very long time, and anyway bors / travis will do it for you. (Often, I will run this command in the background after opening a PR that I think is done, but rarely otherwise. -nmatsakis)

The test results are cached and previously successful tests are ignored during testing. The stdout/stderr contents as well as a timestamp file for every test can be found under build/ARCH/test/. To force-rerun a test (e.g. in case the test runner fails to notice a change) you can simply remove the timestamp file.

Note that some tests require a Python-enabled gdb. You can test if your gdb install supports Python by using the python command from within gdb. Once invoked you can type some Python code (e.g. print("hi")) followed by return and then CTRL+D to execute it. If you are building gdb from source, you will need to configure with --with-python=<path-to-python-binary>.

Running a subset of the test suites

When working on a specific PR, you will usually want to run a smaller set of tests, and with a stage 1 build. For example, a good "smoke test" that can be used after modifying rustc to see if things are generally working correctly would be the following:

./x.py test --stage 1 src/test/{ui,compile-fail}

This will run the ui and compile-fail test suites, and only with the stage 1 build. Of course, the choice of test suites is somewhat arbitrary, and may not suit the task you are doing. For example, if you are hacking on debuginfo, you may be better off with the debuginfo test suite:

./x.py test --stage 1 src/test/debuginfo

If you only need to test a specific subdirectory of tests for any given test suite, you can pass that directory to x.py test:

./x.py test --stage 1 src/test/ui/const-generics

Likewise, you can test a single file by passing its path:

./x.py test --stage 1 src/test/ui/const-generics/const-test.rs

Run only the tidy script

./x.py test tidy

Run tests on the standard library

./x.py test src/libstd

Run the tidy script and tests on the standard library

./x.py test tidy src/libstd

Run tests on the standard library using a stage 1 compiler

>   ./x.py test src/libstd --stage 1

By listing which test suites you want to run you avoid having to run tests for components you did not change at all.

Warning: Note that bors only runs the tests with the full stage 2 build; therefore, while the tests usually work fine with stage 1, there are some limitations.

Running an individual test

Another common thing that people want to do is to run an individual test, often the test they are trying to fix. As mentioned earlier, you may pass the full file path to achieve this, or alternatively one may invoke x.py with the --test-args option:

./x.py test --stage 1 src/test/ui --test-args issue-1234

Under the hood, the test runner invokes the standard rust test runner (the same one you get with #[test]), so this command would wind up filtering for tests that include "issue-1234" in the name. (Thus --test-args is a good way to run a collection of related tests.)

Editing and updating the reference files

If you have changed the compiler's output intentionally, or you are making a new test, you can pass --bless to the test subcommand. E.g. if some tests in src/test/ui are failing, you can run

./x.py test --stage 1 src/test/ui --bless

to automatically adjust the .stderr, .stdout or .fixed files of all tests. Of course you can also target just specific tests with the --test-args your_test_name flag, just like when running the tests.

Passing --pass $mode

Pass UI tests now have three modes, check-pass, build-pass and run-pass. When --pass $mode is passed, these tests will be forced to run under the given $mode unless the directive // ignore-pass exists in the test file. For example, you can run all the tests in src/test/ui as check-pass:

./x.py test --stage 1 src/test/ui --pass check

By passing --pass $mode, you can reduce the testing time. For each mode, please see here.

Using incremental compilation

You can further enable the --incremental flag to save additional time in subsequent rebuilds:

./x.py test --stage 1 src/test/ui --incremental --test-args issue-1234

If you don't want to include the flag with every command, you can enable it in the config.toml, too:

# Whether to always use incremental compilation when building rustc
incremental = true

Note that incremental compilation will use more disk space than usual. If disk space is a concern for you, you might want to check the size of the build directory from time to time.

Running tests manually

Sometimes it's easier and faster to just run the test by hand. Most tests are just rs files, so you can do something like

rustc +stage1 src/test/ui/issue-1234.rs

This is much faster, but doesn't always work. For example, some tests include directives that specify specific compiler flags, or which rely on other crates, and they may not run the same without those options.

Adding new tests

In general, we expect every PR that fixes a bug in rustc to come accompanied by a regression test of some kind. This test should fail in master but pass after the PR. These tests are really useful for preventing us from repeating the mistakes of the past.

To add a new test, the first thing you generally do is to create a file, typically a Rust source file. Test files have a particular structure:

Depending on the test suite, there may be some other details to be aware of:

What kind of test should I add?

It can be difficult to know what kind of test to use. Here are some rough heuristics:

  • Some tests have specialized needs:
    • need to run gdb or lldb? use the debuginfo test suite
    • need to inspect LLVM IR or MIR IR? use the codegen or mir-opt test suites
    • need to run rustdoc? Prefer a rustdoc test
    • need to inspect the resulting binary in some way? Then use run-make
  • For most other things, a ui (or ui-fulldeps) test is to be preferred:
    • ui tests subsume both run-pass, compile-fail, and parse-fail tests
    • in the case of warnings or errors, ui tests capture the full output, which makes it easier to review but also helps prevent "hidden" regressions in the output

Naming your test

We have not traditionally had a lot of structure in the names of tests. Moreover, for a long time, the rustc test runner did not support subdirectories (it now does), so test suites like src/test/ui have a huge mess of files in them. This is not considered an ideal setup.

For regression tests – basically, some random snippet of code that came in from the internet – we often name the test after the issue plus a short description. Ideally, the test should be added to a directory that helps identify what piece of code is being tested here (e.g., src/test/ui/borrowck/issue-54597-reject-move-out-of-borrow-via-pat.rs) If you've tried and cannot find a more relevant place, the test may be added to src/test/ui/issues/. Still, do include the issue number somewhere.

When writing a new feature, create a subdirectory to store your tests. For example, if you are implementing RFC 1234 ("Widgets"), then it might make sense to put the tests in a directory like src/test/ui/rfc1234-widgets/.

In other cases, there may already be a suitable directory. (The proper directory structure to use is actually an area of active debate.)

Comment explaining what the test is about

When you create a test file, include a comment summarizing the point of the test at the start of the file. This should highlight which parts of the test are more important, and what the bug was that the test is fixing. Citing an issue number is often very helpful.

This comment doesn't have to be super extensive. Just something like "Regression test for #18060: match arms were matching in the wrong order." might already be enough.

These comments are very useful to others later on when your test breaks, since they often can highlight what the problem is. They are also useful if for some reason the tests need to be refactored, since they let others know which parts of the test were important (often a test must be rewritten because it no longer tests what is was meant to test, and then it's useful to know what it was meant to test exactly).

Header commands: configuring rustc

Header commands are special comments that the test runner knows how to interpret. They must appear before the Rust source in the test. They are normally put after the short comment that explains the point of this test. For example, this test uses the // compile-flags command to specify a custom flag to give to rustc when the test is compiled:

// Test the behavior of `0 - 1` when overflow checks are disabled.

// compile-flags: -Coverflow-checks=off

fn main() {
    let x = 0 - 1;
    ...
}

Ignoring tests

These are used to ignore the test in some situations, which means the test won't be compiled or run.

  • ignore-X where X is a target detail or stage will ignore the test accordingly (see below)
  • only-X is like ignore-X, but will only run the test on that target or stage
  • ignore-pretty will not compile the pretty-printed test (this is done to test the pretty-printer, but might not always work)
  • ignore-test always ignores the test
  • ignore-lldb and ignore-gdb will skip a debuginfo test on that debugger.
  • ignore-gdb-version can be used to ignore the test when certain gdb versions are used

Some examples of X in ignore-X:

  • Architecture: aarch64, arm, asmjs, mips, wasm32, x86_64, x86, ...
  • OS: android, emscripten, freebsd, ios, linux, macos, windows, ...
  • Environment (fourth word of the target triple): gnu, msvc, musl.
  • Pointer width: 32bit, 64bit.
  • Stage: stage0, stage1, stage2.

Other Header Commands

Here is a list of other header commands. This list is not exhaustive. Header commands can generally be found by browsing the TestProps structure found in header.rs from the compiletest source.

  • run-rustfix for UI tests, indicates that the test produces structured suggestions. The test writer should create a .fixed file, which contains the source with the suggestions applied. When the test is run, compiletest first checks that the correct lint/warning is generated. Then, it applies the suggestion and compares against .fixed (they must match). Finally, the fixed source is compiled, and this compilation is required to succeed. The .fixed file can also be generated automatically with the --bless option, described in this section.
  • min-gdb-version specifies the minimum gdb version required for this test; see also ignore-gdb-version
  • min-lldb-version specifies the minimum lldb version required for this test
  • rust-lldb causes the lldb part of the test to only be run if the lldb in use contains the Rust plugin
  • no-system-llvm causes the test to be ignored if the system llvm is used
  • min-llvm-version specifies the minimum llvm version required for this test
  • min-system-llvm-version specifies the minimum system llvm version required for this test; the test is ignored if the system llvm is in use and it doesn't meet the minimum version. This is useful when an llvm feature has been backported to rust-llvm
  • ignore-llvm-version can be used to skip the test when certain LLVM versions are used. This takes one or two arguments; the first argument is the first version to ignore. If no second argument is given, all subsequent versions are ignored; otherwise, the second argument is the last version to ignore.
  • build-pass for UI tests, indicates that the test is supposed to successfully compile and link, as opposed to the default where the test is supposed to error out.
  • compile-flags passes extra command-line args to the compiler, e.g. compile-flags -g which forces debuginfo to be enabled.
  • should-fail indicates that the test should fail; used for "meta testing", where we test the compiletest program itself to check that it will generate errors in appropriate scenarios. This header is ignored for pretty-printer tests.
  • gate-test-X where X is a feature marks the test as "gate test" for feature X. Such tests are supposed to ensure that the compiler errors when usage of a gated feature is attempted without the proper #![feature(X)] tag. Each unstable lang feature is required to have a gate test.

Error annotations

Error annotations specify the errors that the compiler is expected to emit. They are "attached" to the line in source where the error is located.

  • ~: Associates the following error level and message with the current line
  • ~|: Associates the following error level and message with the same line as the previous comment
  • ~^: Associates the following error level and message with the previous line. Each caret (^) that you add adds a line to this, so ~^^^^^^^ is seven lines up.

The error levels that you can have are:

  1. ERROR
  2. WARNING
  3. NOTE
  4. HELP and SUGGESTION*

* Note: SUGGESTION must follow immediately after HELP.

Revisions

Certain classes of tests support "revisions" (as of the time of this writing, this includes compile-fail, run-fail, and incremental, though incremental tests are somewhat different). Revisions allow a single test file to be used for multiple tests. This is done by adding a special header at the top of the file:


#![allow(unused_variables)]
fn main() {
// revisions: foo bar baz
}

This will result in the test being compiled (and tested) three times, once with --cfg foo, once with --cfg bar, and once with --cfg baz. You can therefore use #[cfg(foo)] etc within the test to tweak each of these results.

You can also customize headers and expected error messages to a particular revision. To do this, add [foo] (or bar, baz, etc) after the // comment, like so:


#![allow(unused_variables)]
fn main() {
// A flag to pass in only for cfg `foo`:
//[foo]compile-flags: -Z verbose

#[cfg(foo)]
fn test_foo() {
    let x: usize = 32_u32; //[foo]~ ERROR mismatched types
}
}

Note that not all headers have meaning when customized to a revision. For example, the ignore-test header (and all "ignore" headers) currently only apply to the test as a whole, not to particular revisions. The only headers that are intended to really work when customized to a revision are error patterns and compiler flags.

Guide to the UI tests

The UI tests are intended to capture the compiler's complete output, so that we can test all aspects of the presentation. They work by compiling a file (e.g., ui/hello_world/main.rs), capturing the output, and then applying some normalization (see below). This normalized result is then compared against reference files named ui/hello_world/main.stderr and ui/hello_world/main.stdout. If either of those files doesn't exist, the output must be empty (that is actually the case for this particular test). If the test run fails, we will print out the current output, but it is also saved in build/<target-triple>/test/ui/hello_world/main.stdout (this path is printed as part of the test failure message), so you can run diff and so forth.

Tests that do not result in compile errors

By default, a UI test is expected not to compile (in which case, it should contain at least one //~ ERROR annotation). However, you can also make UI tests where compilation is expected to succeed, and you can even run the resulting program. Just add one of the following header commands:

  • // check-pass - compilation should succeed but skip codegen (which is expensive and isn't supposed to fail in most cases)
  • // build-pass – compilation and linking should succeed but do not run the resulting binary
  • // run-pass – compilation should succeed and we should run the resulting binary

Normalization

The normalization applied is aimed at eliminating output difference between platforms, mainly about filenames:

  • the test directory is replaced with $DIR
  • all backslashes (\) are converted to forward slashes (/) (for Windows)
  • all CR LF newlines are converted to LF

Sometimes these built-in normalizations are not enough. In such cases, you may provide custom normalization rules using the header commands, e.g.


#![allow(unused_variables)]
fn main() {
// normalize-stdout-test: "foo" -> "bar"
// normalize-stderr-32bit: "fn\(\) \(32 bits\)" -> "fn\(\) \($$PTR bits\)"
// normalize-stderr-64bit: "fn\(\) \(64 bits\)" -> "fn\(\) \($$PTR bits\)"
}

This tells the test, on 32-bit platforms, whenever the compiler writes fn() (32 bits) to stderr, it should be normalized to read fn() ($PTR bits) instead. Similar for 64-bit. The replacement is performed by regexes using default regex flavor provided by regex crate.

The corresponding reference file will use the normalized output to test both 32-bit and 64-bit platforms:

...
   |
   = note: source type: fn() ($PTR bits)
   = note: target type: u16 (16 bits)
...

Please see ui/transmute/main.rs and main.stderr for a concrete usage example.

Besides normalize-stderr-32bit and -64bit, one may use any target information or stage supported by ignore-X here as well (e.g. normalize-stderr-windows or simply normalize-stderr-test for unconditional replacement).

compiletest

Introduction

compiletest is the main test harness of the Rust test suite. It allows test authors to organize large numbers of tests (the Rust compiler has many thousands), efficient test execution (parallel execution is supported), and allows the test author to configure behavior and expected results of both individual and groups of tests.

compiletest tests may check test code for success, for failure or in some cases, even failure to compile. Tests are typically organized as a Rust source file with annotations in comments before and/or within the test code, which serve to direct compiletest on if or how to run the test, what behavior to expect, and more. If you are unfamiliar with the compiler testing framework, see this chapter for additional background.

The tests themselves are typically (but not always) organized into "suites" – for example, run-fail, a folder holding tests that should compile successfully, but return a failure (non-zero status), compile-fail, a folder holding tests that should fail to compile, and many more. The various suites are defined in src/tools/compiletest/src/common.rs in the pub enum Mode declaration. And a very good introduction to the different suites of compiler tests along with details about them can be found in Adding new tests.

Adding a new test file

Briefly, simply create your new test in the appropriate location under src/test. No registration of test files is necessary as compiletest will scan the src/test subfolder recursively, and will execute any Rust source files it finds as tests. See Adding new tests for a complete guide on how to adding new tests.

Header Commands

Source file annotations which appear in comments near the top of the source file before any test code are known as header commands. These commands can instruct compiletest to ignore this test, set expectations on whether it is expected to succeed at compiling, or what the test's return code is expected to be. Header commands (and their inline counterparts, Error Info commands) are described more fully here.

Adding a new header command

Header commands are defined in the TestProps struct in src/tools/compiletest/src/header.rs. At a high level, there are dozens of test properties defined here, all set to default values in the TestProp struct's impl block. Any test can override this default value by specifying the property in question as header command as a comment (//) in the test source file, before any source code.

Using a header command

Here is an example, specifying the must-compile-successfully header command, which takes no arguments, followed by the failure-status header command, which takes a single argument (which, in this case is a value of 1). failure-status is instructing compiletest to expect a failure status of 1 (rather than the current Rust default of 101). The header command and the argument list (if present) are typically separated by a colon:

// must-compile-successfully
// failure-status: 1

#![feature(termination_trait)]

use std::io::{Error, ErrorKind};

fn main() -> Result<(), Box<Error>> {
    Err(Box::new(Error::new(ErrorKind::Other, "returned Box<Error> from main()")))
}

Adding a new header command property

One would add a new header command if there is a need to define some test property or behavior on an individual, test-by-test basis. A header command property serves as the header command's backing store (holds the command's current value) at runtime.

To add a new header command property: 1. Look for the pub struct TestProps declaration in src/tools/compiletest/src/header.rs and add the new public property to the end of the declaration. 2. Look for the impl TestProps implementation block immediately following the struct declaration and initialize the new property to its default value.

Adding a new header command parser

When compiletest encounters a test file, it parses the file a line at a time by calling every parser defined in the Config struct's implementation block, also in src/tools/compiletest/src/header.rs (note the Config struct's declaration block is found in src/tools/compiletest/src/common.rs. TestProps's load_from() method will try passing the current line of text to each parser, which, in turn typically checks to see if the line begins with a particular commented (//) header command such as // must-compile-successfully or // failure-status. Whitespace after the comment marker is optional.

Parsers will override a given header command property's default value merely by being specified in the test file as a header command or by having a parameter value specified in the test file, depending on the header command.

Parsers defined in impl Config are typically named parse_<header_command> (note kebab-case <header-command> transformed to snake-case <header_command>). impl Config also defines several 'low-level' parsers which make it simple to parse common patterns like simple presence or not (parse_name_directive()), header-command:parameter(s) (parse_name_value_directive()), optional parsing only if a particular cfg attribute is defined (has_cfg_prefix()) and many more. The low-level parsers are found near the end of the impl Config block; be sure to look through them and their associated parsers immediately above to see how they are used to avoid writing additional parsing code unnecessarily.

As a concrete example, here is the implementation for the parse_failure_status() parser, in src/tools/compiletest/src/header.rs:

@@ -232,6 +232,7 @@ pub struct TestProps {
     // customized normalization rules
     pub normalize_stdout: Vec<(String, String)>,
     pub normalize_stderr: Vec<(String, String)>,
+    pub failure_status: i32,
 }

 impl TestProps {
@@ -260,6 +261,7 @@ impl TestProps {
             run_pass: false,
             normalize_stdout: vec![],
             normalize_stderr: vec![],
+            failure_status: 101,
         }
     }

@@ -383,6 +385,10 @@ impl TestProps {
             if let Some(rule) = config.parse_custom_normalization(ln, "normalize-stderr") {
                 self.normalize_stderr.push(rule);
             }
+
+            if let Some(code) = config.parse_failure_status(ln) {
+                self.failure_status = code;
+            }
         });

         for key in &["RUST_TEST_NOCAPTURE", "RUST_TEST_THREADS"] {
@@ -488,6 +494,13 @@ impl Config {
         self.parse_name_directive(line, "pretty-compare-only")
     }

+    fn parse_failure_status(&self, line: &str) -> Option<i32> {
+        match self.parse_name_value_directive(line, "failure-status") {
+            Some(code) => code.trim().parse::<i32>().ok(),
+            _ => None,
+        }
+    }

Implementing the behavior change

When a test invokes a particular header command, it is expected that some behavior will change as a result. What behavior, obviously, will depend on the purpose of the header command. In the case of failure-status, the behavior that changes is that compiletest expects the failure code defined by the header command invoked in the test, rather than the default value.

Although specific to failure-status (as every header command will have a different implementation in order to invoke behavior change) perhaps it is helpful to see the behavior change implementation of one case, simply as an example. To implement failure-status, the check_correct_failure_status() function found in the TestCx implementation block, located in src/tools/compiletest/src/runtest.rs, was modified as per below:

@@ -295,11 +295,14 @@ impl<'test> TestCx<'test> {
     }

     fn check_correct_failure_status(&self, proc_res: &ProcRes) {
-        // The value the rust runtime returns on failure
-        const RUST_ERR: i32 = 101;
-        if proc_res.status.code() != Some(RUST_ERR) {
+        let expected_status = Some(self.props.failure_status);
+        let received_status = proc_res.status.code();
+
+        if expected_status != received_status {
             self.fatal_proc_rec(
-                &format!("failure produced the wrong error: {}", proc_res.status),
+                &format!("Error: expected failure status ({:?}) but received status {:?}.",
+                         expected_status,
+                         received_status),
                 proc_res,
             );
         }
@@ -320,7 +323,6 @@ impl<'test> TestCx<'test> {
         );

         let proc_res = self.exec_compiled_test();
-
         if !proc_res.status.success() {
             self.fatal_proc_rec("test run failed!", &proc_res);
         }
@@ -499,7 +501,6 @@ impl<'test> TestCx<'test> {
                 expected,
                 actual
             );
-            panic!();
         }
     }

Note the use of self.props.failure_status to access the header command property. In tests which do not specify the failure status header command, self.props.failure_status will evaluate to the default value of 101 at the time of this writing. But for a test which specifies a header command of, for example, // failure-status: 1, self.props.failure_status will evaluate to 1, as parse_failure_status() will have overridden the TestProps default value, for that test specifically.

Walkthrough: a typical contribution

There are a lot of ways to contribute to the rust compiler, including fixing bugs, improving performance, helping design features, providing feedback on existing features, etc. This chapter does not claim to scratch the surface. Instead, it walks through the design and implementation of a new feature. Not all of the steps and processes described here are needed for every contribution, and I will try to point those out as they arise.

In general, if you are interested in making a contribution and aren't sure where to start, please feel free to ask!

Overview

The feature I will discuss in this chapter is the ? Kleene operator for macros. Basically, we want to be able to write something like this:

macro_rules! foo {
    ($arg:ident $(, $optional_arg:ident)?) => {
        println!("{}", $arg);

        $(
            println!("{}", $optional_arg);
        )?
    }
}

fn main() {
    let x = 0;
    foo!(x); // ok! prints "0"
    foo!(x, x); // ok! prints "0 0"
}

So basically, the $(pat)? matcher in the macro means "this pattern can occur 0 or 1 times", similar to other regex syntaxes.

There were a number of steps to go from an idea to stable rust feature. Here is a quick list. We will go through each of these in order below. As I mentioned before, not all of these are needed for every type of contribution.

  • Idea discussion/Pre-RFC A Pre-RFC is an early draft or design discussion of a feature. This stage is intended to flesh out the design space a bit and get a grasp on the different merits and problems with an idea. It's a great way to get early feedback on your idea before presenting it the wider audience. You can find the original discussion here.
  • RFC This is when you formally present your idea to the community for consideration. You can find the RFC here.
  • Implementation Implement your idea unstably in the compiler. You can find the original implementation here.
  • Possibly iterate/refine As the community gets experience with your feature on the nightly compiler and in libstd, there may be additional feedback about design choice that might be adjusted. This particular feature went through a number of iterations.
  • Stabilization When your feature has baked enough, a rust team member may propose to stabilize it. If there is consensus, this is done.
  • Relax Your feature is now a stable rust feature!

Pre-RFC and RFC

NOTE: In general, if you are not proposing a new feature or substantial change to rust or the ecosystem, you don't need to follow the RFC process. Instead, you can just jump to implementation.

You can find the official guidelines for when to open an RFC here.

An RFC is a document that describes the feature or change you are proposing in detail. Anyone can write an RFC; the process is the same for everyone, including rust team members.

To open an RFC, open a PR on the rust-lang/rfcs repo on GitHub. You can find detailed instructions in the README.

Before opening an RFC, you should do the research to "flesh out" your idea. Hastily-proposed RFCs tend not to be accepted. You should generally have a good description of the motivation, impact, disadvantages, and potential interactions with other features.

If that sounds like a lot of work, it's because it is. But no fear! Even if you're not a compiler hacker, you can get great feedback by doing a pre-RFC. This is an informal discussion of the idea. The best place to do this is internals.rust-lang.org. Your post doesn't have to follow any particular structure. It doesn't even need to be a cohesive idea. Generally, you will get tons of feedback that you can integrate back to produce a good RFC.

(Another pro-tip: try searching the RFCs repo and internals for prior related ideas. A lot of times an idea has already been considered and was either rejected or postponed to be tried again later. This can save you and everybody else some time)

In the case of our example, a participant in the pre-RFC thread pointed out a syntax ambiguity and a potential resolution. Also, the overall feedback seemed positive. In this case, the discussion converged pretty quickly, but for some ideas, a lot more discussion can happen (e.g. see this RFC which received a whopping 684 comments!). If that happens, don't be discouraged; it means the community is interested in your idea, but it perhaps needs some adjustments.

The RFC for our ? macro feature did receive some discussion on the RFC thread too. As with most RFCs, there were a few questions that we couldn't answer by discussion: we needed experience using the feature to decide. Such questions are listed in the "Unresolved Questions" section of the RFC. Also, over the course of the RFC discussion, you will probably want to update the RFC document itself to reflect the course of the discussion (e.g. new alternatives or prior work may be added or you may decide to change parts of the proposal itself).

In the end, when the discussion seems to reach a consensus and die down a bit, a rust team member may propose to move to "final comment period" (FCP) with one of three possible dispositions. This means that they want the other members of the appropriate teams to review and comment on the RFC. More discussion may ensue, which may result in more changes or unresolved questions being added. At some point, when everyone is satisfied, the RFC enters the FCP, which is the last chance for people to bring up objections. When the FCP is over, the disposition is adopted. Here are the three possible dispositions:

  • Merge: accept the feature. Here is the proposal to merge for our ? macro feature.
  • Close: this feature in its current form is not a good fit for rust. Don't be discouraged if this happens to your RFC, and don't take it personally. This is not a reflection on you, but rather a community decision that rust will go a different direction.
  • Postpone: there is interest in going this direction but not at the moment. This happens most often because the appropriate rust team doesn't have the bandwidth to shepherd the feature through the process to stabilization. Often this is the case when the feature doesn't fit into the team's roadmap. Postponed ideas may be revisited later.

When an RFC is merged, the PR is merged into the RFCs repo. A new tracking issue is created in the rust-lang/rust repo to track progress on the feature and discuss unresolved questions, implementation progress and blockers, etc. Here is the tracking issue on for our ? macro feature.

Implementation

To make a change to the compiler, open a PR against the rust-lang/rust repo.

Depending on the feature/change/bug fix/improvement, implementation may be relatively-straightforward or it may be a major undertaking. You can always ask for help or mentorship from more experienced compiler devs. Also, you don't have to be the one to implement your feature; but keep in mind that if you don't it might be a while before someone else does.

For the ? macro feature, I needed to go understand the relevant parts of macro expansion in the compiler. Personally, I find that improving the comments in the code is a helpful way of making sure I understand it, but you don't have to do that if you don't want to.

I then implemented the original feature, as described in the RFC. When a new feature is implemented, it goes behind a feature gate, which means that you have to use #![feature(my_feature_name)] to use the feature. The feature gate is removed when the feature is stabilized.

Most bug fixes and improvements don't require a feature gate. You can just make your changes/improvements.

When you open a PR on the rust-lang/rust, a bot will assign your PR to a review. If there is a particular rust team member you are working with, you can request that reviewer by leaving a comment on the thread with r? @reviewer-github-id (e.g. r? @eddyb). If you don't know who to request, don't request anyone; the bot will assign someone automatically.

The reviewer may request changes before they approve your PR. Feel free to ask questions or discuss things you don't understand or disagree with. However, recognize that the PR won't be merged unless someone on the rust team approves it.

When your review approves the PR, it will go into a queue for yet another bot called @bors. @bors manages the CI build/merge queue. When your PR reaches the head of the @bors queue, @bors will test out the merge by running all tests against your PR on Travis CI. This takes a lot of time to finish. If all tests pass, the PR is merged and becomes part of the next nightly compiler!

There are a couple of things that may happen for some PRs during the review process

  • If the change is substantial enough, the reviewer may request an FCP on the PR. This gives all members of the appropriate team a chance to review the changes.
  • If the change may cause breakage, the reviewer may request a crater run. This compiles the compiler with your changes and then attempts to compile all crates on crates.io with your modified compiler. This is a great smoke test to check if you introduced a change to compiler behavior that affects a large portion of the ecosystem.
  • If the diff of your PR is large or the reviewer is busy, your PR may have some merge conflicts with other PRs that happen to get merged first. You should fix these merge conflicts using the normal git procedures.

If you are not doing a new feature or something like that (e.g. if you are fixing a bug), then that's it! Thanks for your contribution :)

Refining your implementation

As people get experience with your new feature on nightly, slight changes may be proposed and unresolved questions may become resolved. Updates/changes go through the same process for implementing any other changes, as described above (i.e. submit a PR, go through review, wait for @bors, etc).

Some changes may be major enough to require an FCP and some review by rust team members.

For the ? macro feature, we went through a few different iterations after the original implementation: 1, 2, 3.

Along the way, we decided that ? should not take a separator, which was previously an unresolved question listed in the RFC. We also changed the disambiguation strategy: we decided to remove the ability to use ? as a separator token for other repetition operators (e.g. + or *). However, since this was a breaking change, we decided to do it over an edition boundary. Thus, the new feature can be enabled only in edition 2018. These deviations from the original RFC required another FCP.

Stabilization

Finally, after the feature had baked for a while on nightly, a language team member moved to stabilize it.

A stabilization report needs to be written that includes

  • brief description of the behavior and any deviations from the RFC
  • which edition(s) are affected and how
  • links to a few tests to show the interesting aspects

The stabilization report for our feature is here.

After this, a PR is made to remove the feature gate, enabling the feature by default (on the 2018 edition). A note is added to the Release notes about the feature.

Steps to stabilize the feature can be found at Stabilizing Features.

Rustc Bug Fix Procedure

This page defines the best practices procedure for making bug fixes or soundness corrections in the compiler that can cause existing code to stop compiling. This text is based on RFC 1589.

Motivation

From time to time, we encounter the need to make a bug fix, soundness correction, or other change in the compiler which will cause existing code to stop compiling. When this happens, it is important that we handle the change in a way that gives users of Rust a smooth transition. What we want to avoid is that existing programs suddenly stop compiling with opaque error messages: we would prefer to have a gradual period of warnings, with clear guidance as to what the problem is, how to fix it, and why the change was made. This RFC describes the procedure that we have been developing for handling breaking changes that aims to achieve that kind of smooth transition.

One of the key points of this policy is that (a) warnings should be issued initially rather than hard errors if at all possible and (b) every change that causes existing code to stop compiling will have an associated tracking issue. This issue provides a point to collect feedback on the results of that change. Sometimes changes have unexpectedly large consequences or there may be a way to avoid the change that was not considered. In those cases, we may decide to change course and roll back the change, or find another solution (if warnings are being used, this is particularly easy to do).

What qualifies as a bug fix?

Note that this RFC does not try to define when a breaking change is permitted. That is already covered under RFC 1122. This document assumes that the change being made is in accordance with those policies. Here is a summary of the conditions from RFC 1122:

  • Soundness changes: Fixes to holes uncovered in the type system.
  • Compiler bugs: Places where the compiler is not implementing the specified semantics found in an RFC or lang-team decision.
  • Underspecified language semantics: Clarifications to grey areas where the compiler behaves inconsistently and no formal behavior had been previously decided.

Please see the RFC for full details!

Detailed design

The procedure for making a breaking change is as follows (each of these steps is described in more detail below):

  1. Do a crater run to assess the impact of the change.
  2. Make a special tracking issue dedicated to the change.
  3. Do not report an error right away. Instead, issue forwards-compatibility lint warnings.
    • Sometimes this is not straightforward. See the text below for suggestions on different techniques we have employed in the past.
    • For cases where warnings are infeasible:
      • Report errors, but make every effort to give a targeted error message that directs users to the tracking issue
      • Submit PRs to all known affected crates that fix the issue
        • or, at minimum, alert the owners of those crates to the problem and direct them to the tracking issue
  4. Once the change has been in the wild for at least one cycle, we can stabilize the change, converting those warnings into errors.

Finally, for changes to librustc_ast that will affect plugins, the general policy is to batch these changes. That is discussed below in more detail.

Tracking issue

Every breaking change should be accompanied by a dedicated tracking issue for that change. The main text of this issue should describe the change being made, with a focus on what users must do to fix their code. The issue should be approachable and practical; it may make sense to direct users to an RFC or some other issue for the full details. The issue also serves as a place where users can comment with questions or other concerns.

A template for these breaking-change tracking issues can be found below. An example of how such an issue should look can be found here.

The issue should be tagged with (at least) B-unstable and T-compiler.

Tracking issue template

This is a template to use for tracking issues:

This is the **summary issue** for the `YOUR_LINT_NAME_HERE`
future-compatibility warning and other related errors. The goal of
this page is describe why this change was made and how you can fix
code that is affected by it. It also provides a place to ask questions
or register a complaint if you feel the change should not be made. For
more information on the policy around future-compatibility warnings,
see our [breaking change policy guidelines][guidelines].

[guidelines]: LINK_TO_THIS_RFC

#### What is the warning for?

*Describe the conditions that trigger the warning and how they can be
fixed. Also explain why the change was made.**

#### When will this warning become a hard error?

At the beginning of each 6-week release cycle, the Rust compiler team
will review the set of outstanding future compatibility warnings and
nominate some of them for **Final Comment Period**. Toward the end of
the cycle, we will review any comments and make a final determination
whether to convert the warning into a hard error or remove it
entirely.

Issuing future compatibility warnings

The best way to handle a breaking change is to begin by issuing future-compatibility warnings. These are a special category of lint warning. Adding a new future-compatibility warning can be done as follows.


#![allow(unused_variables)]
fn main() {
// 1. Define the lint in `src/librustc/lint/builtin.rs`:
declare_lint! {
    pub YOUR_ERROR_HERE,
    Warn,
    "illegal use of foo bar baz"
}

// 2. Add to the list of HardwiredLints in the same file:
impl LintPass for HardwiredLints {
    fn get_lints(&self) -> LintArray {
        lint_array!(
            ..,
            YOUR_ERROR_HERE
        )
    }
}

// 3. Register the lint in `src/librustc_lint/lib.rs`:
store.register_future_incompatible(sess, vec![
    ...,
    FutureIncompatibleInfo {
        id: LintId::of(YOUR_ERROR_HERE),
        reference: "issue #1234", // your tracking issue here!
    },
]);

// 4. Report the lint:
tcx.lint_node(
    lint::builtin::YOUR_ERROR_HERE,
    path_id,
    binding.span,
    format!("some helper message here"));
}

Helpful techniques

It can often be challenging to filter out new warnings from older, pre-existing errors. One technique that has been used in the past is to run the older code unchanged and collect the errors it would have reported. You can then issue warnings for any errors you would give which do not appear in that original set. Another option is to abort compilation after the original code completes if errors are reported: then you know that your new code will only execute when there were no errors before.

Crater and crates.io

We should always do a crater run to assess impact. It is polite and considerate to at least notify the authors of affected crates the breaking change. If we can submit PRs to fix the problem, so much the better.

Is it ever acceptable to go directly to issuing errors?

Changes that are believed to have negligible impact can go directly to issuing an error. One rule of thumb would be to check against crates.io: if fewer than 10 total affected projects are found (not root errors), we can move straight to an error. In such cases, we should still make the "breaking change" page as before, and we should ensure that the error directs users to this page. In other words, everything should be the same except that users are getting an error, and not a warning. Moreover, we should submit PRs to the affected projects (ideally before the PR implementing the change lands in rustc).

If the impact is not believed to be negligible (e.g., more than 10 crates are affected), then warnings are required (unless the compiler team agrees to grant a special exemption in some particular case). If implementing warnings is not feasible, then we should make an aggressive strategy of migrating crates before we land the change so as to lower the number of affected crates. Here are some techniques for approaching this scenario:

  1. Issue warnings for subparts of the problem, and reserve the new errors for the smallest set of cases you can.
  2. Try to give a very precise error message that suggests how to fix the problem and directs users to the tracking issue.
  3. It may also make sense to layer the fix:
    • First, add warnings where possible and let those land before proceeding to issue errors.
    • Work with authors of affected crates to ensure that corrected versions are available before the fix lands, so that downstream users can use them.

Stabilization

After a change is made, we will stabilize the change using the same process that we use for unstable features:

  • After a new release is made, we will go through the outstanding tracking issues corresponding to breaking changes and nominate some of them for final comment period (FCP).

  • The FCP for such issues lasts for one cycle. In the final week or two of the cycle, we will review comments and make a final determination:

    • Convert to error: the change should be made into a hard error.
    • Revert: we should remove the warning and continue to allow the older code to compile.
    • Defer: can't decide yet, wait longer, or try other strategies.

Ideally, breaking changes should have landed on the stable branch of the compiler before they are finalized.

Removing a lint

Once we have decided to make a "future warning" into a hard error, we need a PR that removes the custom lint. As an example, here are the steps required to remove the overlapping_inherent_impls compatibility lint. First, convert the name of the lint to uppercase (OVERLAPPING_INHERENT_IMPLS) ripgrep through the source for that string. We will basically by converting each place where this lint name is mentioned (in the compiler, we use the upper-case name, and a macro automatically generates the lower-case string; so searching for overlapping_inherent_impls would not find much).

NOTE: these exact files don't exist anymore, but the procedure is still the same.

Remove the lint.

The first reference you will likely find is the lint definition in librustc/lint/builtin.rs that resembles this:


#![allow(unused_variables)]
fn main() {
declare_lint! {
    pub OVERLAPPING_INHERENT_IMPLS,
    Deny, // this may also say Warning
    "two overlapping inherent impls define an item with the same name were erroneously allowed"
}
}

This declare_lint! macro creates the relevant data structures. Remove it. You will also find that there is a mention of OVERLAPPING_INHERENT_IMPLS later in the file as part of a lint_array!; remove it too,

Next, you see see a reference to OVERLAPPING_INHERENT_IMPLS in librustc_lint/lib.rs. This defining the lint as a "future compatibility lint":


#![allow(unused_variables)]
fn main() {
FutureIncompatibleInfo {
    id: LintId::of(OVERLAPPING_INHERENT_IMPLS),
    reference: "issue #36889 <https://github.com/rust-lang/rust/issues/36889>",
},
}

Remove this too.

Add the lint to the list of removed lists.

In src/librustc_lint/lib.rs there is a list of "renamed and removed lints". You can add this lint to the list:


#![allow(unused_variables)]
fn main() {
store.register_removed("overlapping_inherent_impls", "converted into hard error, see #36889");
}

where #36889 is the tracking issue for your lint.

Update the places that issue the lint

Finally, the last class of references you will see are the places that actually trigger the lint itself (i.e., what causes the warnings to appear). These you do not want to delete. Instead, you want to convert them into errors. In this case, the add_lint call looks like this:


#![allow(unused_variables)]
fn main() {
self.tcx.sess.add_lint(lint::builtin::OVERLAPPING_INHERENT_IMPLS,
                       node_id,
                       self.tcx.span_of_impl(item1).unwrap(),
                       msg);
}

We want to convert this into an error. In some cases, there may be an existing error for this scenario. In others, we will need to allocate a fresh diagnostic code. Instructions for allocating a fresh diagnostic code can be found here. You may want to mention in the extended description that the compiler behavior changed on this point, and include a reference to the tracking issue for the change.

Let's say that we've adopted E0592 as our code. Then we can change the add_lint() call above to something like:


#![allow(unused_variables)]
fn main() {
struct_span_err!(self.tcx.sess, self.tcx.span_of_impl(item1).unwrap(), msg)
    .emit();
}

Update tests

Finally, run the test suite. These should be some tests that used to reference the overlapping_inherent_impls lint, those will need to be updated. In general, if the test used to have #[deny(overlapping_inherent_impls)], that can just be removed.

./x.py test

All done!

Open a PR. =)

Implement New Feature

When you want to implement a new significant feature in the compiler, you need to go through this process to make sure everything goes smoothly.

The @rfcbot (p)FCP process

When the change is small and uncontroversial, then it can be done with just writing a PR and getting r+ from someone who knows that part of the code. However, if the change is potentially controversial, it would be a bad idea to push it without consensus from the rest of the team (both in the "distributed system" sense to make sure you don't break anything you don't know about, and in the social sense to avoid PR fights).

If such a change seems to be too small to require a full formal RFC process (e.g. a big refactoring of the code, or a "technically-breaking" change, or a "big bugfix" that basically amounts to a small feature) but is still too controversial or big to get by with a single r+, you can start a pFCP (or, if you don't have r+ rights, ask someone who has them to start one - and unless they have a concern themselves, they should).

Again, the pFCP process is only needed if you need consensus - if you don't think anyone would have a problem with your change, it's ok to get by with only an r+. For example, it is OK to add or modify unstable command-line flags or attributes without an pFCP for compiler development or standard library use, as long as you don't expect them to be in wide use in the nightly ecosystem.

You don't need to have the implementation fully ready for r+ to ask for a pFCP, but it is generally a good idea to have at least a proof of concept so that people can see what you are talking about.

That starts a "proposed final comment period" (pFCP), which requires all members of the team to sign off the FCP. After they all do so, there's a 10 day long "final comment period" where everybody can comment, and if no new concerns are raised, the PR/issue gets FCP approval.

The logistics of writing features

There are a few "logistic" hoops you might need to go through in order to implement a feature in a working way.

Warning Cycles

In some cases, a feature or bugfix might break some existing programs in some edge cases. In that case, you might want to do a crater run to assess the impact and possibly add a future-compatibility lint, similar to those used for edition-gated lints.

Stability

We value the stability of Rust. Code that works and runs on stable should (mostly) not break. Because of that, we don't want to release a feature to the world with only team consensus and code review - we want to gain real-world experience on using that feature on nightly, and we might want to change the feature based on that experience.

To allow for that, we must make sure users don't accidentally depend on that new feature - otherwise, especially if experimentation takes time or is delayed and the feature takes the trains to stable, it would end up de facto stable and we'll not be able to make changes in it without breaking people's code.

The way we do that is that we make sure all new features are feature gated - they can't be used without enabling a feature gate (#[feature(foo)]), which can't be done in a stable/beta compiler. See the stability in code section for the technical details.

Eventually, after we gain enough experience using the feature, make the necessary changes, and are satisfied, we expose it to the world using the stabilization process described here. Until then, the feature is not set in stone: every part of the feature can be changed, or the feature might be completely rewritten or removed. Features are not supposed to gain tenure by being unstable and unchanged for a year.

Tracking Issues

To keep track of the status of an unstable feature, the experience we get while using it on nightly, and of the concerns that block its stabilization, every feature-gate needs a tracking issue.

General discussions about the feature should be done on the tracking issue.

For features that have an RFC, you should use the RFC's tracking issue for the feature.

For other features, you'll have to make a tracking issue for that feature. The issue title should be "Tracking issue for YOUR FEATURE".

For tracking issues for features (as opposed to future-compat warnings), I don't think the description has to contain anything specific. Generally we put the list of items required for stabilization in a checklist, e.g.,

**Steps:**

- [ ] Implement the RFC. (CC @rust-lang/compiler -- can anyone write
      up mentoring instructions?)
- [ ] Adjust the documentation. ([See instructions on rustc-dev-guide.](https://rustc-dev-guide.rust-lang.org/stabilization_guide.html#documentation-prs))
- [ ] Stabilize the feature. ([See instructions on rustc-dev-guide.](https://rustc-dev-guide.rust-lang.org/stabilization_guide.html#stabilization-pr))

Stability in code

The below steps needs to be followed in order to implement a new unstable feature:

  1. Open a tracking issue - if you have an RFC, you can use the tracking issue for the RFC.

    The tracking issue should be labeled with at least C-tracking-issue. For a language feature, a label F-feature_name should be added as well.

  2. Pick a name for the feature gate (for RFCs, use the name in the RFC).

  3. Add a feature gate declaration to librustc_feature/active.rs in the active declare_features block:

    /// description of feature
    (active, $feature_name, "$current_nightly_version", Some($tracking_issue_number), $edition)
    

    where $edition has the type Option<Edition>, and is typically just None.

    For example:

    /// Allows defining identifiers beyond ASCII.
    (active, non_ascii_idents, "1.0.0", Some(55467), None),
    

    When added, the current version should be the one for the current nightly. Once the feature is moved to accepted.rs, the version is changed to that nightly version.

  4. Prevent usage of the new feature unless the feature gate is set. You can check it in most places in the compiler using the expression tcx.features().$feature_name (or sess.features_untracked().$feature_name if the tcx is unavailable)

    If the feature gate is not set, you should either maintain the pre-feature behavior or raise an error, depending on what makes sense.

    For features introducing new syntax, pre-expansion gating should be used instead. To do so, extend the GatedSpans struct, add spans to it during parsing, and then finally feature-gate all the spans in rustc_ast_passes::feature_gate::check_crate.

  5. Add a test to ensure the feature cannot be used without a feature gate, by creating feature-gate-$feature_name.rs and feature-gate-$feature_name.stderr files under the directory where the other tests for your feature reside.

  6. Add a section to the unstable book, in src/doc/unstable-book/src/language-features/$feature_name.md.

  7. Write a lots of tests for the new feature. PRs without tests will not be accepted!

  8. Get your PR reviewed and land it. You have now successfully implemented a feature in Rust!

Stability attributes

This section is about the stability attributes and schemes that allow stable APIs to use unstable APIs internally in the rustc standard library.

For instructions on stabilizing a language feature see Stabilizing Features.

unstable

The #[unstable(feature = "foo", issue = "1234", reason = "lorem ipsum")] attribute explicitly marks an item as unstable. Items that are marked as "unstable" cannot be used without a corresponding #![feature] attribute on the crate, even on a nightly compiler. This restriction only applies across crate boundaries, unstable items may be used within the crate that defines them.

The issue field specifies the associated GitHub issue number. This field is required and all unstable features should have an associated tracking issue. In rare cases where there is no sensible value issue = "none" is used.

The unstable attribute infects all sub-items, where the attribute doesn't have to be reapplied. So if you apply this to a module, all items in the module will be unstable.

You can make specific sub-items stable by using the #[stable] attribute on them. The stability scheme works similarly to how pub works. You can have public functions of nonpublic modules and you can have stable functions in unstable modules or vice versa.

Note, however, that due to a rustc bug, stable items inside unstable modules are available to stable code in that location! So, for example, stable code can import core::intrinsics::transmute even though intrinsics is an unstable module. Thus, this kind of nesting should be avoided when possible.

The unstable attribute may also have the soft value, which makes it a future-incompatible deny-by-default lint instead of a hard error. This is used by the bench attribute which was accidentally accepted in the past. This prevents breaking dependencies by leveraging Cargo's lint capping.

stable

The #[stable(feature = "foo", "since = "1.420.69")] attribute explicitly marks an item as stabilized. To do this, follow the instructions in Stabilizing Features.

Note that stable functions may use unstable things in their body.

rustc_const_unstable

The #[rustc_const_unstable(feature = "foo", issue = "1234", reason = "lorem ipsum")] has the same interface as the unstable attribute. It is used to mark const fn as having their constness be unstable. This allows you to make a function stable without stabilizing its constness or even just marking an existing stable function as const fn without instantly stabilizing the const fnness.

Furthermore this attribute is needed to mark an intrinsic as const fn, because there's no way to add const to functions in extern blocks for now.

rustc_const_stable

The #[stable(feature = "foo", "since = "1.420.69")] attribute explicitly marks a const fn as having its constness be stable. This attribute can make sense even on an unstable function, if that function is called from another rustc_const_stable function.

Furthermore this attribute is needed to mark an intrinsic as callable from rustc_const_stable functions.

allow_internal_unstable

Macros, compiler desugarings and const fns expose their bodies to the call site. To work around not being able to use unstable things in the standard library's macros, there's the #[allow_internal_unstable(feature1, feature2)] attribute that whitelists the given features for usage in stable macros or const fns.

Note that const fns are even more special in this regard. You can't just whitelist any feature, the features need an implementation in qualify_min_const_fn.rs. For example the const_fn_union feature gate allows accessing fields of unions inside stable const fns. The rules for when it's ok to use such a feature gate are that behavior matches the runtime behavior of the same code (see also this blog post). This means that you may not create a const fn that e.g. transmutes a memory address to an integer, because the addresses of things are nondeterministic and often unknown at compile-time.

Always ping @oli-obk, @RalfJung, and @Centril if you are adding more allow_internal_unstable attributes to any const fn

staged_api

Any crate that uses the stable, unstable, or rustc_deprecated attributes must include the #![feature(staged_api)] attribute on the crate.

rustc_deprecated

The deprecation system shares the same infrastructure as the stable/unstable attributes. The rustc_deprecated attribute is similar to the deprecated attribute. It was previously called deprecated, but was split off when deprecated was stabilized. The deprecated attribute cannot be used in a staged_api crate, rustc_deprecated must be used instead. The deprecated item must also have a stable or unstable attribute.

rustc_deprecated has the following form:

#[rustc_deprecated(
    since = "1.38.0",
    reason = "explanation for deprecation",
    suggestion = "other_function"
)]

The suggestion field is optional. If given, it should be a string that can be used as a machine-applicable suggestion to correct the warning. This is typically used when the identifier is renamed, but no other significant changes are necessary.

Another difference from the deprecated attribute is that the since field is actually checked against the current version of rustc. If since is in a future version, then the deprecated_in_future lint is triggered which is default allow, but most of the standard library raises it to a warning with #![warn(deprecated_in_future)].

-Zforce-unstable-if-unmarked

The -Zforce-unstable-if-unmarked flag has a variety of purposes to help enforce that the correct crates are marked as unstable. It was introduced primarily to allow rustc and the standard library to link to arbitrary crates on crates.io which do not themselves use staged_api. rustc also relies on this flag to mark all of its crates as unstable with the rustc_private feature so that each crate does not need to be carefully marked with unstable.

This flag is automatically applied to all of rustc and the standard library by the bootstrap scripts. This is needed because the compiler and all of its dependencies are shipped in the sysroot to all users.

This flag has the following effects:

  • Marks the crate as "unstable" with the rustc_private feature if it is not itself marked as stable or unstable.
  • Allows these crates to access other forced-unstable crates without any need for attributes. Normally a crate would need a #![feature(rustc_private)] attribute to use other unstable crates. However, that would make it impossible for a crate from crates.io to access its own dependencies since that crate won't have a feature(rustc_private) attribute, but everything is compiled with -Zforce-unstable-if-unmarked.

Code which does not use -Zforce-unstable-if-unmarked should include the #![feature(rustc_private)] crate attribute to access these force-unstable crates. This is needed for things that link rustc, such as miri, rls, or clippy.

Request for stabilization

Once an unstable feature has been well-tested with no outstanding concern, anyone may push for its stabilization. It involves the following steps.

  • Documentation PRs
  • Write a stabilization report
  • FCP
  • Stabilization PR

Documentation PRs

If any documentation for this feature exists, it should be in the Unstable Book, located at src/doc/unstable-book. If it exists, the page for the feature gate should be removed.

If there was documentation there, integrating it into the existing documentation is needed.

If there wasn't documentation there, it needs to be added.

Places that may need updated documentation:

  • The Reference: This must be updated, in full detail.
  • The Book: This may or may not need updating, depends. If you're not sure, please open an issue on this repository and it can be discussed.
  • standard library documentation: As needed. Language features often don't need this, but if it's a feature that changes how good examples are written, such as when ? was added to the language, updating examples is important.
  • Rust by Example: As needed.

Prepare PRs to update documentation involving this new feature for repositories mentioned above. Maintainers of these repositories will keep these PRs open until the whole stabilization process has completed. Meanwhile, we can proceed to the next step.

Write a stabilization report

Find the tracking issue of the feature, and create a short stabilization report. Essentially this would be a brief summary of the feature plus some links to test cases showing it works as expected, along with a list of edge cases that came up and were considered. This is a minimal "due diligence" that we do before stabilizing.

The report should contain:

  • A summary, showing examples (e.g. code snippets) what is enabled by this feature.
  • Links to test cases in our test suite regarding this feature and describe the feature's behavior on encountering edge cases.
  • Links to the documentations (the PRs we have made in the previous steps).
  • Any other relevant information(Examples of such reports can be found in rust-lang/rust#44494 and rust-lang/rust#28237).
  • The resolutions of any unresolved questions if the stabilization is for an RFC.

FCP

If any member of the team responsible for tracking this feature agrees with stabilizing this feature, they will start the FCP (final-comment-period) process by commenting

@rfcbot fcp merge

The rest of the team members will review the proposal. If the final decision is to stabilize, we proceed to do the actual code modification.

Stabilization PR

Once we have decided to stabilize a feature, we need to have a PR that actually makes that stabilization happen. These kinds of PRs are a great way to get involved in Rust, as they take you on a little tour through the source code.

Here is a general guide to how to stabilize a feature -- every feature is different, of course, so some features may require steps beyond what this guide talks about.

Note: Before we stabilize any feature, it's the rule that it should appear in the documentation.

Updating the feature-gate listing

There is a central listing of feature-gates in src/librustc_feature. Search for the declare_features! macro. There should be an entry for the feature you are aiming to stabilize, something like (this example is taken from rust-lang/rust#32409:

// pub(restricted) visibilities (RFC 1422)
(active, pub_restricted, "1.9.0", Some(32409)),

The above line should be moved down to the area for "accepted" features, declared below in a separate call to declare_features!. When it is done, it should look like:

// pub(restricted) visibilities (RFC 1422)
(accepted, pub_restricted, "1.31.0", Some(32409)),
// note that we changed this

Note that, the version number is updated to be the version number of the stable release where this feature will appear. This can be found by consulting the forge, which will guide you the next stable release number. You want to add 1 to that, because the code that lands today will become go into beta on that date, and then become stable after that. So, at the time of this writing, the next stable release (i.e. what is currently beta) was 1.30.0, hence I wrote 1.31.0 above.

Removing existing uses of the feature-gate

Next search for the feature string (in this case, pub_restricted) in the codebase to find where it appears. Change uses of #![feature(XXX)] from the libstd and any rustc crates to be #![cfg_attr(bootstrap, feature(XXX))]. This includes the feature-gate only for stage0, which is built using the current beta (this is needed because the feature is still unstable in the current beta).

Also, remove those strings from any tests. If there are tests specifically targeting the feature-gate (i.e., testing that the feature-gate is required to use the feature, but nothing else), simply remove the test.

Do not require the feature-gate to use the feature

Most importantly, remove the code which flags an error if the feature-gate is not present (since the feature is now considered stable). If the feature can be detected because it employs some new syntax, then a common place for that code to be is in the same src/librustc_ast_passes/feature_gate.rs. For example, you might see code like this:

gate_feature_post!(&self, pub_restricted, span,
 "`pub(restricted)` syntax is experimental");

This gate_feature_post! macro prints an error if the pub_restricted feature is not enabled. It is not needed now that #[pub_restricted] is stable.

For more subtle features, you may find code like this:

if self.tcx.sess.features.borrow().pub_restricted { /* XXX */ }

This pub_restricted field (obviously named after the feature) would ordinarily be false if the feature flag is not present and true if it is. So transform the code to assume that the field is true. In this case, that would mean removing the if and leaving just the /* XXX */.

if self.tcx.sess.features.borrow().pub_restricted { /* XXX */ }
becomes
/* XXX */

if self.tcx.sess.features.borrow().pub_restricted && something { /* XXX */ }
 becomes
if something { /* XXX */ }

Debugging the compiler

This chapter contains a few tips to debug the compiler. These tips aim to be useful no matter what you are working on. Some of the other chapters have advice about specific parts of the compiler (e.g. the Queries Debugging and Testing chapter or the LLVM Debugging chapter).

-Z flags

The compiler has a bunch of -Z flags. These are unstable flags that are only enabled on nightly. Many of them are useful for debugging. To get a full listing of -Z flags, use -Z help.

One useful flag is -Z verbose, which generally enables printing more info that could be useful for debugging.

Getting a backtrace

When you have an ICE (panic in the compiler), you can set RUST_BACKTRACE=1 to get the stack trace of the panic! like in normal Rust programs. IIRC backtraces don't work on MinGW, sorry. If you have trouble or the backtraces are full of unknown, you might want to find some way to use Linux, Mac, or MSVC on Windows.

In the default configuration, you don't have line numbers enabled, so the backtrace looks like this:

stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::begin_panic
   (~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
  32: rustc_typeck::check_crate
  33: <std::thread::local::LocalKey<T>>::with
  34: <std::thread::local::LocalKey<T>>::with
  35: rustc::ty::context::TyCtxt::create_and_enter
  36: rustc_driver::driver::compile_input
  37: rustc_driver::run_compiler

If you want line numbers for the stack trace, you can enable debug = true in your config.toml and rebuild the compiler (debuginfo-level = 1 will also add line numbers, but debug = true gives full debuginfo). Then the backtrace will look like this:

stack backtrace:
   (~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
             at /home/user/rust/src/librustc_typeck/check/cast.rs:110
   7: rustc_typeck::check::cast::CastCheck::check
             at /home/user/rust/src/librustc_typeck/check/cast.rs:572
             at /home/user/rust/src/librustc_typeck/check/cast.rs:460
             at /home/user/rust/src/librustc_typeck/check/cast.rs:370
   (~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
  33: rustc_driver::driver::compile_input
             at /home/user/rust/src/librustc_driver/driver.rs:1010
             at /home/user/rust/src/librustc_driver/driver.rs:212
  34: rustc_driver::run_compiler
             at /home/user/rust/src/librustc_driver/lib.rs:253

Getting a backtrace for errors

If you want to get a backtrace to the point where the compiler emits an error message, you can pass the -Z treat-err-as-bug=n, which will make the compiler skip n errors or delay_span_bug calls and then panic on the next one. If you leave off =n, the compiler will assume 0 for n and thus panic on the first error it encounters.

This can also help when debugging delay_span_bug calls - it will make the first delay_span_bug call panic, which will give you a useful backtrace.

For example:

$ cat error.rs
fn main() {
    1 + ();
}
$ ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc error.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
 --> error.rs:2:7
  |
2 |     1 + ();
  |       ^ no implementation for `{integer} + ()`
  |
  = help: the trait `std::ops::Add<()>` is not implemented for `{integer}`

error: aborting due to previous error

$ # Now, where does the error above come from?
$ RUST_BACKTRACE=1 \
    ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc \
    error.rs \
    -Z treat-err-as-bug
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
 --> error.rs:2:7
  |
2 |     1 + ();
  |       ^ no implementation for `{integer} + ()`
  |
  = help: the trait `std::ops::Add<()>` is not implemented for `{integer}`

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.24.0-dev running on x86_64-unknown-linux-gnu

note: run with `RUST_BACKTRACE=1` for a backtrace

thread 'rustc' panicked at 'encountered error with `-Z treat_err_as_bug',
/home/user/rust/src/librustc_errors/lib.rs:411:12
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
backtrace.
stack backtrace:
  (~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
   7: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'tcx>>
             ::report_selection_error
             at /home/user/rust/src/librustc_middle/traits/error_reporting.rs:823
   8: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'tcx>>
             ::report_fulfillment_errors
             at /home/user/rust/src/librustc_middle/traits/error_reporting.rs:160
             at /home/user/rust/src/librustc_middle/traits/error_reporting.rs:112
   9: rustc_typeck::check::FnCtxt::select_obligations_where_possible
             at /home/user/rust/src/librustc_typeck/check/mod.rs:2192
  (~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
  36: rustc_driver::run_compiler
             at /home/user/rust/src/librustc_driver/lib.rs:253
$ # Cool, now I have a backtrace for the error

Getting logging output

These crates are used in compiler for logging:

  • log
  • env-logger: check the link to see the full RUSTC_LOG syntax

The compiler has a lot of debug! calls, which print out logging information at many points. These are very useful to at least narrow down the location of a bug if not to find it entirely, or just to orient yourself as to why the compiler is doing a particular thing.

To see the logs, you need to set the RUSTC_LOG environment variable to your log filter, e.g. to get the logs for a specific module, you can run the compiler as RUSTC_LOG=module::path rustc my-file.rs. All debug! output will then appear in standard error.

Note that unless you use a very strict filter, the logger will emit a lot of output, so use the most specific module(s) you can (comma-separated if multiple). It's typically a good idea to pipe standard error to a file and look at the log output with a text editor.

So to put it together.

# This puts the output of all debug calls in `librustc_middle/traits` into
# standard error, which might fill your console backscroll.
$ RUSTC_LOG=rustc::traits rustc +local my-file.rs

# This puts the output of all debug calls in `librustc_middle/traits` in
# `traits-log`, so you can then see it with a text editor.
$ RUSTC_LOG=rustc::traits rustc +local my-file.rs 2>traits-log

# Not recommended. This will show the output of all `debug!` calls
# in the Rust compiler, and there are a *lot* of them, so it will be
# hard to find anything.
$ RUSTC_LOG=debug rustc +local my-file.rs 2>all-log

# This will show the output of all `info!` calls in `rustc_trans`.
#
# There's an `info!` statement in `trans_instance` that outputs
# every function that is translated. This is useful to find out
# which function triggers an LLVM assertion, and this is an `info!`
# log rather than a `debug!` log so it will work on the official
# compilers.
$ RUSTC_LOG=rustc_trans=info rustc +local my-file.rs

How to keep or remove debug! and trace! calls from the resulting binary

While calls to error!, warn! and info! are included in every build of the compiler, calls to debug! and trace! are only included in the program if debug-assertions=yes is turned on in config.toml (it is turned off by default), so if you don't see DEBUG logs, especially if you run the compiler with RUSTC_LOG=rustc rustc some.rs and only see INFO logs, make sure that debug-assertions=yes is turned on in your config.toml.

I also think that in some cases just setting it will not trigger a rebuild, so if you changed it and you already have a compiler built, you might want to call x.py clean to force one.

Logging etiquette and conventions

Because calls to debug! are removed by default, in most cases, don't worry about adding "unnecessary" calls to debug! and leaving them in code you commit - they won't slow down the performance of what we ship, and if they helped you pinning down a bug, they will probably help someone else with a different one.

A loosely followed convention is to use debug!("foo(...)") at the start of a function foo and debug!("foo: ...") within the function. Another loosely followed convention is to use the {:?} format specifier for debug logs.

One thing to be careful of is expensive operations in logs.

If in the module rustc::foo you have a statement

debug!("{:?}", random_operation(tcx));

Then if someone runs a debug rustc with RUSTC_LOG=rustc::bar, then random_operation() will run.

This means that you should not put anything too expensive or likely to crash there - that would annoy anyone who wants to use logging for their own module. No-one will know it until someone tries to use logging to find another bug.

Formatting Graphviz output (.dot files)

Some compiler options for debugging specific features yield graphviz graphs - e.g. the #[rustc_mir(borrowck_graphviz_postflow="suffix.dot")] attribute dumps various borrow-checker dataflow graphs.

These all produce .dot files. To view these files, install graphviz (e.g. apt-get install graphviz) and then run the following commands:

$ dot -T pdf maybe_init_suffix.dot > maybe_init_suffix.pdf
$ firefox maybe_init_suffix.pdf # Or your favorite pdf viewer

Narrowing (Bisecting) Regressions

The cargo-bisect-rustc tool can be used as a quick and easy way to find exactly which PR caused a change in rustc behavior. It automatically downloads rustc PR artifacts and tests them against a project you provide until it finds the regression. You can then look at the PR to get more context on why it was changed. See this tutorial on how to use it.

Downloading Artifacts from Rust's CI

The rustup-toolchain-install-master tool by kennytm can be used to download the artifacts produced by Rust's CI for a specific SHA1 -- this basically corresponds to the successful landing of some PR -- and then sets them up for your local use. This also works for artifacts produced by @bors try. This is helpful when you want to examine the resulting build of a PR without doing the build yourself.

Profiling the compiler

This section talks about how to profile the compiler and find out where it spends its time.

Depending on what you're trying to measure, there are several different approaches:

  • If you want to see if a PR improves or regresses compiler performance:

    • The rustc-perf project makes this easy and can be triggered to run on a PR via the @rustc-perf bot.
  • If you want a medium-to-high level overview of where rustc is spending its time:

    • The -Zself-profile flag and measureme tools offer a query-based approach to profiling. See their docs for more information.
  • If you want function level performance data or even just more details than the above approaches:

    • Consider using a native code profiler such as perf.

Profiling with perf

This is a guide for how to profile rustc with perf.

Initial steps

  • Get a clean checkout of rust-lang/master, or whatever it is you want to profile.
  • Set the following settings in your config.toml:
    • debuginfo-level = 1 - enables line debuginfo
    • jemalloc = false - lets you do memory use profiling with valgrind
    • leave everything else the defaults
  • Run ./x.py build to get a full build
  • Make a rustup toolchain pointing to that result

Gathering a perf profile

perf is an excellent tool on linux that can be used to gather and analyze all kinds of information. Mostly it is used to figure out where a program spends its time. It can also be used for other sorts of events, though, like cache misses and so forth.

The basics

The basic perf command is this:

perf record -F99 --call-graph dwarf XXX

The -F99 tells perf to sample at 99 Hz, which avoids generating too much data for longer runs (why 99 Hz you ask? It is often chosen because it is unlikely to be in lockstep with other periodic activity). The --call-graph dwarf tells perf to get call-graph information from debuginfo, which is accurate. The XXX is the command you want to profile. So, for example, you might do:

perf record -F99 --call-graph dwarf cargo +<toolchain> rustc

to run cargo -- here <toolchain> should be the name of the toolchain you made in the beginning. But there are some things to be aware of:

  • You probably don't want to profile the time spend building dependencies. So something like cargo build; cargo clean -p $C may be helpful (where $C is the crate name)
    • Though usually I just do touch src/lib.rs and rebuild instead. =)
  • You probably don't want incremental messing about with your profile. So something like CARGO_INCREMENTAL=0 can be helpful.

Gathering a perf profile from a perf.rust-lang.org test

Often we want to analyze a specific test from perf.rust-lang.org. To do that, the first step is to clone the rustc-perf repository:

git clone https://github.com/rust-lang/rustc-perf

Doing it the easy way

Once you've cloned the repo, you can use the collector executable to do profiling for you! You can find instructions in the rustc-perf readme.

For example, to measure the clap-rs test, you might do:

./target/release/collector                                      \
  --output-repo /path/to/place/output                           \
  profile perf-record                                           \
  --rustc /path/to/rustc/executable/from/your/build/directory   \
  --cargo `which cargo`                                         \
  --filter clap-rs                                              \
  --builds Check                                                \

You can also use that same command to use cachegrind or other profiling tools.

Doing it the hard way

If you prefer to run things manually, that is also possible. You first need to find the source for the test you want. Sources for the tests are found in the collector/benchmarks directory. So let's go into the directory of a specific test; we'll use clap-rs as an example:

cd collector/benchmarks/clap-rs

In this case, let's say we want to profile the cargo check performance. In that case, I would first run some basic commands to build the dependencies:

# Setup: first clean out any old results and build the dependencies:
cargo +<toolchain> clean
CARGO_INCREMENTAL=0 cargo +<toolchain> check

(Again, <toolchain> should be replaced with the name of the toolchain we made in the first step.)

Next: we want record the execution time for just the clap-rs crate, running cargo check. I tend to use cargo rustc for this, since it also allows me to add explicit flags, which we'll do later on.

touch src/lib.rs
CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib

Note that final command: it's a doozy! It uses the cargo rustc command, which executes rustc with (potentially) additional options; the --profile check and --lib options specify that we are doing a cargo check execution, and that this is a library (not a binary).

At this point, we can use perf tooling to analyze the results. For example:

perf report

will open up an interactive TUI program. In simple cases, that can be helpful. For more detailed examination, the perf-focus tool can be helpful; it is covered below.

A note of caution. Each of the rustc-perf tests is its own special snowflake. In particular, some of them are not libraries, in which case you would want to do touch src/main.rs and avoid passing --lib. I'm not sure how best to tell which test is which to be honest.

Gathering NLL data

If you want to profile an NLL run, you can just pass extra options to the cargo rustc command, like so:

touch src/lib.rs
CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib -- -Zborrowck=mir

Analyzing a perf profile with perf focus

Once you've gathered a perf profile, we want to get some information about it. For this, I personally use perf focus. It's a kind of simple but useful tool that lets you answer queries like:

  • "how much time was spent in function F" (no matter where it was called from)
  • "how much time was spent in function F when it was called from G"
  • "how much time was spent in function F excluding time spent in G"
  • "what functions does F call and how much time does it spend in them"

To understand how it works, you have to know just a bit about perf. Basically, perf works by sampling your process on a regular basis (or whenever some event occurs). For each sample, perf gathers a backtrace. perf focus lets you write a regular expression that tests which functions appear in that backtrace, and then tells you which percentage of samples had a backtrace that met the regular expression. It's probably easiest to explain by walking through how I would analyze NLL performance.

Installing perf-focus

You can install perf-focus using cargo install:

cargo install perf-focus

Example: How much time is spent in MIR borrowck?

Let's say we've gathered the NLL data for a test. We'd like to know how much time it is spending in the MIR borrow-checker. The "main" function of the MIR borrowck is called do_mir_borrowck, so we can do this command:

$ perf focus '{do_mir_borrowck}'
Matcher    : {do_mir_borrowck}
Matches    : 228
Not Matches: 542
Percentage : 29%

The '{do_mir_borrowck}' argument is called the matcher. It specifies the test to be applied on the backtrace. In this case, the {X} indicates that there must be some function on the backtrace that meets the regular expression X. In this case, that regex is just the name of the function we want (in fact, it's a subset of the name; the full name includes a bunch of other stuff, like the module path). In this mode, perf-focus just prints out the percentage of samples where do_mir_borrowck was on the stack: in this case, 29%.

A note about c++filt. To get the data from perf, perf focus currently executes perf script (perhaps there is a better way...). I've sometimes found that perf script outputs C++ mangled names. This is annoying. You can tell by running perf script | head yourself — if you see names like 5rustc6middle instead of rustc::middle, then you have the same problem. You can solve this by doing:

perf script | c++filt | perf focus --from-stdin ...

This will pipe the output from perf script through c++filt and should mostly convert those names into a more friendly format. The --from-stdin flag to perf focus tells it to get its data from stdin, rather than executing perf focus. We should make this more convenient (at worst, maybe add a c++filt option to perf focus, or just always use it — it's pretty harmless).

Example: How much time does MIR borrowck spend solving traits?

Perhaps we'd like to know how much time MIR borrowck spends in the trait checker. We can ask this using a more complex regex:

$ perf focus '{do_mir_borrowck}..{^rustc::traits}'
Matcher    : {do_mir_borrowck},..{^rustc::traits}
Matches    : 12
Not Matches: 1311
Percentage : 0%

Here we used the .. operator to ask "how often do we have do_mir_borrowck on the stack and then, later, some function whose name begins with rusc::traits?" (basically, code in that module). It turns out the answer is "almost never" — only 12 samples fit that description (if you ever see no samples, that often indicates your query is messed up).

If you're curious, you can find out exactly which samples by using the --print-match option. This will print out the full backtrace for each sample. The | at the front of the line indicates the part that the regular expression matched.

Example: Where does MIR borrowck spend its time?

Often we want to do a more "explorational" queries. Like, we know that MIR borrowck is 29% of the time, but where does that time get spent? For that, the --tree-callees option is often the best tool. You usually also want to give --tree-min-percent or --tree-max-depth. The result looks like this:

$ perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
Matcher    : {do_mir_borrowck}
Matches    : 577
Not Matches: 746
Percentage : 43%

Tree
| matched `{do_mir_borrowck}` (43% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (20% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (13% total, 0% self)
: : : | core::ops::function::FnOnce::call_once (5% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (5% total, 3% self)
: : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (3% total, 0% self)
: | rustc::mir::visit::Visitor::visit_mir (8% total, 6% self)
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (5% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)

What happens with --tree-callees is that

  • we find each sample matching the regular expression
  • we look at the code that is occurs after the regex match and try to build up a call tree

The --tree-min-percent 3 option says "only show me things that take more than 3% of the time. Without this, the tree often gets really noisy and includes random stuff like the innards of malloc. --tree-max-depth can be useful too, it just limits how many levels we print.

For each line, we display the percent of time in that function altogether ("total") and the percent of time spent in just that function and not some callee of that function (self). Usually "total" is the more interesting number, but not always.

Relative percentages

By default, all in perf-focus are relative to the total program execution. This is useful to help you keep perspective — often as we drill down to find hot spots, we can lose sight of the fact that, in terms of overall program execution, this "hot spot" is actually not important. It also ensures that percentages between different queries are easily compared against one another.

That said, sometimes it's useful to get relative percentages, so perf focus offers a --relative option. In this case, the percentages are listed only for samples that match (vs all samples). So for example we could get our percentages relative to the borrowck itself like so:

$ perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
Matcher    : {do_mir_borrowck}
Matches    : 577
Not Matches: 746
Percentage : 100%

Tree
| matched `{do_mir_borrowck}` (100% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (47% total, 0% self) [...]
: | rustc::mir::visit::Visitor::visit_mir (19% total, 15% self) [...]
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (13% total, 0% self) [...]
: | rustc_mir::dataflow::do_dataflow (8% total, 1% self) [...]

Here you see that compute_regions came up as "47% total" — that means that 47% of do_mir_borrowck is spent in that function. Before, we saw 20% — that's because do_mir_borrowck itself is only 43% of the total time (and .47 * .43 = .20).

This file offers some tips on the coding conventions for rustc. This chapter covers formatting, coding for correctness, using crates from crates.io, and some tips on structuring your PR for easy review.

Formatting and the tidy script

rustc is slowly moving towards the Rust standard coding style; at the moment, however, it follows a rather more chaotic style. We do have some mandatory formatting conventions, which are automatically enforced by a script we affectionately call the "tidy" script. The tidy script runs automatically when you do ./x.py test and can be run in isolation with ./x.py test tidy.

Copyright notice

In the past, files began with a copyright and license notice. Please omit this notice for new files licensed under the standard terms (dual MIT/Apache-2.0).

All of the copyright notices should be gone by now, but if you come across one in the rust-lang/rust repo, feel free to open a PR to remove it.

Line length

Lines should be at most 100 characters. It's even better if you can keep things to 80.

Ignoring the line length limit. Sometimes – in particular for tests – it can be necessary to exempt yourself from this limit. In that case, you can add a comment towards the top of the file (after the copyright notice) like so:


#![allow(unused_variables)]
fn main() {
// ignore-tidy-linelength
}

Tabs vs spaces

Prefer 4-space indent.

Coding for correctness

Beyond formatting, there are a few other tips that are worth following.

Prefer exhaustive matches

Using _ in a match is convenient, but it means that when new variants are added to the enum, they may not get handled correctly. Ask yourself: if a new variant were added to this enum, what's the chance that it would want to use the _ code, versus having some other treatment? Unless the answer is "low", then prefer an exhaustive match. (The same advice applies to if let and while let, which are effectively tests for a single variant.)

Use "TODO" comments for things you don't want to forget

As a useful tool to yourself, you can insert a // TODO comment for something that you want to get back to before you land your PR:

fn do_something() {
    if something_else {
        unimplemented!(); // TODO write this
    }
}

The tidy script will report an error for a // TODO comment, so this code would not be able to land until the TODO is fixed (or removed).

This can also be useful in a PR as a way to signal from one commit that you are leaving a bug that a later commit will fix:

if foo {
    return true; // TODO wrong, but will be fixed in a later commit
}

Using crates from crates.io

It is allowed to use crates from crates.io, though external dependencies should not be added gratuitously. All such crates must have a suitably permissive license. There is an automatic check which inspects the Cargo metadata to ensure this.

How to structure your PR

How you prepare the commits in your PR can make a big difference for the reviewer. Here are some tips.

Isolate "pure refactorings" into their own commit. For example, if you rename a method, then put that rename into its own commit, along with the renames of all the uses.

More commits is usually better. If you are doing a large change, it's almost always better to break it up into smaller steps that can be independently understood. The one thing to be aware of is that if you introduce some code following one strategy, then change it dramatically (versus adding to it) in a later commit, that 'back-and-forth' can be confusing.

Only run rustfmt on new content. One day, we might enforce formatting for the rust-lang/rust repo. Meanwhile, we prefer that rustfmt not be run on existing code as that will generate large diffs and will make git blame harder to sift through. However, running rustfmt on new content, e.g. a new file or a largely new part of a file is ok. Small formatting adjustments nearby code you are already changing for other purposes are also ok.

No merges. We do not allow merge commits into our history, other than those by bors. If you get a merge conflict, rebase instead via a command like git rebase -i rust-lang/master (presuming you use the name rust-lang for your remote).

Individual commits do not have to build (but it's nice). We do not require that every intermediate commit successfully builds – we only expect to be able to bisect at a PR level. However, if you can make individual commits build, that is always helpful.

Naming conventions

Apart from normal Rust style/naming conventions, there are also some specific to the compiler.

  • cx tends to be short for "context" and is often used as a suffix. For example, tcx is a common name for the Typing Context.

  • 'tcx is used as the lifetime name for the Typing Context.

  • Because crate is a keyword, if you need a variable to represent something crate-related, often the spelling is changed to krate.

crates.io Dependencies

The rust compiler supports building with some dependencies from crates.io. For example, log and env_logger come from crates.io.

In general, you should avoid adding dependencies to the compiler for several reasons:

  • The dependency may not be high quality or well-maintained, whereas we want the compiler to be high-quality.
  • The dependency may not be using a compatible license.
  • The dependency may have transitive dependencies that have one of the above problems.

TODO: what is the vetting process?

Whitelist

The tidy tool has a whitelist of crates that are allowed. To add a dependency that is not already in the compiler, you will need to add it to this whitelist.

Emitting Errors and other Diagnostics

A lot of effort has been put into making rustc have great error messages. This chapter is about how to emit compile errors and lints from the compiler.

Span

Span is the primary data structure in rustc used to represent a location in the code being compiled. Spans are attached to most constructs in HIR and MIR, allowing for more informative error reporting.

A Span can be looked up in a SourceMap to get a "snippet" useful for displaying errors with span_to_snippet and other similar methods on the SourceMap.

Error messages

The rustc_errors crate defines most of the utilities used for reporting errors.

Session and ParseSess have methods (or fields with methods) that allow reporting errors. These methods usually have names like span_err or struct_span_err or span_warn, etc... There are lots of them; they emit different types of "errors", such as warnings, errors, fatal errors, suggestions, etc.

In general, there are two classes of such methods: ones that emit an error directly and ones that allow finer control over what to emit. For example, span_err emits the given error message at the given Span, but struct_span_err instead returns a DiagnosticBuilder.

DiagnosticBuilder allows you to add related notes and suggestions to an error before emitting it by calling the emit method. (Failing to either emit or cancel a DiagnosticBuilder will result in an ICE.) See the docs for more info on what you can do.

// Get a DiagnosticBuilder. This does _not_ emit an error yet.
let mut err = sess.struct_span_err(sp, "oh no! this is an error!");

// In some cases, you might need to check if `sp` is generated by a macro to
// avoid printing weird errors about macro-generated code.

if let Ok(snippet) = sess.source_map().span_to_snippet(sp) {
    // Use the snippet to generate a suggested fix
    err.span_suggestion(suggestion_sp, "try using a qux here", format!("qux {}", snippet));
} else {
    // If we weren't able to generate a snippet, then emit a "help" message
    // instead of a concrete "suggestion". In practice this is unlikely to be
    // reached.
    err.span_help(suggestion_sp, "you could use a qux here instead");
}

// emit the error
err.emit();

Suggestions

In addition to telling the user exactly why their code is wrong, it's oftentimes furthermore possible to tell them how to fix it. To this end, DiagnosticBuilder offers a structured suggestions API, which formats code suggestions pleasingly in the terminal, or (when the --error-format json flag is passed) as JSON for consumption by tools, most notably the Rust Language Server and rustfix.

Not all suggestions should be applied mechanically. Use the span_suggestion method of DiagnosticBuilder to make a suggestion. The last argument provides a hint to tools whether the suggestion is mechanically applicable or not.

For example, to make our qux suggestion machine-applicable, we would do:

let mut err = sess.struct_span_err(sp, "oh no! this is an error!");

if let Ok(snippet) = sess.source_map().span_to_snippet(sp) {
    err.span_suggestion(
        suggestion_sp,
        "try using a qux here",
        format!("qux {}", snippet),
        Applicability::MachineApplicable,
    );
} else {
    err.span_help(suggestion_sp, "you could use a qux here instead");
}

err.emit();

This might emit an error like

$ rustc mycode.rs
error[E0999]: oh no! this is an error!
 --> mycode.rs:3:5
  |
3 |     sad()
  |     ^ help: try using a qux here: `qux sad()`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0999`.

In some cases, like when the suggestion spans multiple lines or when there are multiple suggestions, the suggestions are displayed on their own:

error[E0999]: oh no! this is an error!
 --> mycode.rs:3:5
  |
3 |     sad()
  |     ^
help: try using a qux here:
  |
3 |     qux sad()
  |     ^^^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0999`.

The possible values of Applicability are:

  • MachineApplicable: Can be applied mechanically.
  • HasPlaceholders: Cannot be applied mechanically because it has placeholder text in the suggestions. For example, "Try adding a type: `let x: <type>`".
  • MaybeIncorrect: Cannot be applied mechanically because the suggestion may or may not be a good one.
  • Unspecified: Cannot be applied mechanically because we don't know which of the above cases it falls into.

Lints

The compiler linting infrastructure is defined in the rustc::lint module.

Declaring a lint

The built-in compiler lints are defined in the rustc_lint crate.

Every lint is implemented via a struct that implements the LintPass trait (you also implement one of the more specific lint pass traits, either EarlyLintPass or LateLintPass). The trait implementation allows you to check certain syntactic constructs as the linter walks the source code. You can then choose to emit lints in a very similar way to compile errors.

You also declare the metadata of a particular lint via the declare_lint! macro. This includes the name, the default level, a short description, and some more details.

Note that the lint and the lint pass must be registered with the compiler.

For example, the following lint checks for uses of while true { ... } and suggests using loop { ... } instead.

// Declare a lint called `WHILE_TRUE`
declare_lint! {
    WHILE_TRUE,

    // warn-by-default
    Warn,

    // This string is the lint description
    "suggest using `loop { }` instead of `while true { }`"
}

// Define a struct and `impl LintPass` for it.
#[derive(Copy, Clone)]
pub struct WhileTrue;

// This declares a lint pass, providing a list of associated lints.  The
// compiler currently doesn't use the associated lints directly (e.g., to not
// run the pass or otherwise check that the pass emits the appropriate set of
// lints). However, it's good to be accurate here as it's possible that we're
// going to register the lints via the get_lints method on our lint pass (that
// this macro generates).
impl_lint_pass!(
    WhileTrue => [WHILE_TRUE],
);

// LateLintPass has lots of methods. We only override the definition of
// `check_expr` for this lint because that's all we need, but you could
// override other methods for your own lint. See the rustc docs for a full
// list of methods.
impl<'a, 'tcx> LateLintPass<'a, 'tcx> for WhileTrue {
    fn check_expr(&mut self, cx: &LateContext, e: &hir::Expr) {
        if let hir::ExprWhile(ref cond, ..) = e.node {
            if let hir::ExprLit(ref lit) = cond.node {
                if let ast::LitKind::Bool(true) = lit.node {
                    if lit.span.ctxt() == SyntaxContext::empty() {
                        let msg = "denote infinite loops with `loop { ... }`";
                        let condition_span = cx.tcx.sess.source_map().def_span(e.span);
                        let mut err = cx.struct_span_lint(WHILE_TRUE, condition_span, msg);
                        err.span_suggestion_short(condition_span, "use `loop`", "loop".to_owned());
                        err.emit();
                    }
                }
            }
        }
    }
}

Edition-gated Lints

Sometimes we want to change the behavior of a lint in a new edition. To do this, we just add the transition to our invocation of declare_lint!:

declare_lint! {
    pub ANONYMOUS_PARAMETERS,
    Allow,
    "detects anonymous parameters",
    Edition::Edition2018 => Warn,
}

This makes the ANONYMOUS_PARAMETERS lint allow-by-default in the 2015 edition but warn-by-default in the 2018 edition.

A future-incompatible lint should be declared with the @future_incompatible additional "field":

declare_lint! {
    pub ANONYMOUS_PARAMETERS,
    Allow,
    "detects anonymous parameters",
    @future_incompatible = FutureIncompatibleInfo {
        reference: "issue #41686 <https://github.com/rust-lang/rust/issues/41686>",
        edition: Some(Edition::Edition2018),
    };
}

If you need a combination of options that's not supported by the declare_lint! macro, you can always define your own static with a type of &Lint but this is currently linted against in the compiler tree.

Guidelines for creating a future incompatibility lint

  • Create a lint defaulting to warn as normal, with ideally the same error message you would normally give.
  • Add a suitable reference, typically an RFC or tracking issue. Go ahead and include the full URL, sort items in ascending order of issue numbers.
  • Later, change lint to error.
  • Eventually, remove lint.

Lint Groups

Lints can be turned on in groups. These groups are declared in the register_builtins function in rustc_lint::lib. The add_lint_group! macro is used to declare a new group.

For example,

add_lint_group!(sess,
    "nonstandard_style",
    NON_CAMEL_CASE_TYPES,
    NON_SNAKE_CASE,
    NON_UPPER_CASE_GLOBALS);

This defines the nonstandard_style group which turns on the listed lints. A user can turn on these lints with a !#[warn(nonstandard_style)] attribute in the source code, or by passing -W nonstandard-style on the command line.

Linting early in the compiler

On occasion, you may need to define a lint that runs before the linting system has been initialized (e.g. during parsing or macro expansion). This is problematic because we need to have computed lint levels to know whether we should emit a warning or an error or nothing at all.

To solve this problem, we buffer the lints until the linting system is processed. Session and ParseSess both have buffer_lint methods that allow you to buffer a lint for later. The linting system automatically takes care of handling buffered lints later.

Thus, to define a lint that runs early in the compilation, one defines a lint like normal but invokes the lint with buffer_lint.

Linting even earlier in the compiler

The parser (librustc_ast) is interesting in that it cannot have dependencies on any of the other librustc* crates. In particular, it cannot depend on librustc_middle::lint or librustc_lint, where all of the compiler linting infrastructure is defined. That's troublesome!

To solve this, librustc_ast defines its own buffered lint type, which ParseSess::buffer_lint uses. After macro expansion, these buffered lints are then dumped into the Session::buffered_lints used by the rest of the compiler.

JSON diagnostic output

The compiler accepts an --error-format json flag to output diagnostics as JSON objects (for the benefit of tools such as cargo fix or the RLS). It looks like this—

$ rustc json_error_demo.rs --error-format json
{"message":"cannot add `&str` to `{integer}`","code":{"code":"E0277","explanation":"\nYou tried to use a type which doesn't implement some trait in a place which\nexpected that trait. Erroneous code example:\n\n```compile_fail,E0277\n// here we declare the Foo trait with a bar method\ntrait Foo {\n    fn bar(&self);\n}\n\n// we now declare a function which takes an object implementing the Foo trait\nfn some_func<T: Foo>(foo: T) {\n    foo.bar();\n}\n\nfn main() {\n    // we now call the method with the i32 type, which doesn't implement\n    // the Foo trait\n    some_func(5i32); // error: the trait bound `i32 : Foo` is not satisfied\n}\n```\n\nIn order to fix this error, verify that the type you're using does implement\nthe trait. Example:\n\n```\ntrait Foo {\n    fn bar(&self);\n}\n\nfn some_func<T: Foo>(foo: T) {\n    foo.bar(); // we can now use this method since i32 implements the\n               // Foo trait\n}\n\n// we implement the trait on the i32 type\nimpl Foo for i32 {\n    fn bar(&self) {}\n}\n\nfn main() {\n    some_func(5i32); // ok!\n}\n```\n\nOr in a generic context, an erroneous code example would look like:\n\n```compile_fail,E0277\nfn some_func<T>(foo: T) {\n    println!(\"{:?}\", foo); // error: the trait `core::fmt::Debug` is not\n                           //        implemented for the type `T`\n}\n\nfn main() {\n    // We now call the method with the i32 type,\n    // which *does* implement the Debug trait.\n    some_func(5i32);\n}\n```\n\nNote that the error here is in the definition of the generic function: Although\nwe only call it with a parameter that does implement `Debug`, the compiler\nstill rejects the function: It must work with all possible input types. In\norder to make this example compile, we need to restrict the generic type we're\naccepting:\n\n```\nuse std::fmt;\n\n// Restrict the input type to types that implement Debug.\nfn some_func<T: fmt::Debug>(foo: T) {\n    println!(\"{:?}\", foo);\n}\n\nfn main() {\n    // Calling the method is still fine, as i32 implements Debug.\n    some_func(5i32);\n\n    // This would fail to compile now:\n    // struct WithoutDebug;\n    // some_func(WithoutDebug);\n}\n```\n\nRust only looks at the signature of the called function, as such it must\nalready specify all requirements that will be used for every type parameter.\n"},"level":"error","spans":[{"file_name":"json_error_demo.rs","byte_start":50,"byte_end":51,"line_start":4,"line_end":4,"column_start":7,"column_end":8,"is_primary":true,"text":[{"text":"    a + b","highlight_start":7,"highlight_end":8}],"label":"no implementation for `{integer} + &str`","suggested_replacement":null,"suggestion_applicability":null,"expansion":null}],"children":[{"message":"the trait `std::ops::Add<&str>` is not implemented for `{integer}`","code":null,"level":"help","spans":[],"children":[],"rendered":null}],"rendered":"error[E0277]: cannot add `&str` to `{integer}`\n --> json_error_demo.rs:4:7\n  |\n4 |     a + b\n  |       ^ no implementation for `{integer} + &str`\n  |\n  = help: the trait `std::ops::Add<&str>` is not implemented for `{integer}`\n\n"}
{"message":"aborting due to previous error","code":null,"level":"error","spans":[],"children":[],"rendered":"error: aborting due to previous error\n\n"}
{"message":"For more information about this error, try `rustc --explain E0277`.","code":null,"level":"","spans":[],"children":[],"rendered":"For more information about this error, try `rustc --explain E0277`.\n"}

Note that the output is a series of lines, each of which is a JSON object, but the series of lines taken together is, unfortunately, not valid JSON, thwarting tools and tricks (such as piping to python3 -m json.tool) that require such. (One speculates that this was intentional for LSP performance purposes, so that each line/object can be sent to RLS as it is flushed?)

Also note the "rendered" field, which contains the "human" output as a string; this was introduced so that UI tests could both make use of the structured JSON and see the "human" output (well, sans colors) without having to compile everything twice.

The "human" readable and the json format emitter can be found under librustc_errors, both were moved from the librustc_ast crate to the librustc_errors crate.

The JSON emitter defines its own Diagnostic struct (and sub-structs) for the JSON serialization. Don't confuse this with errors::Diagnostic!

#[rustc_on_unimplemented(...)]

The #[rustc_on_unimplemented] attribute allows trait definitions to add specialized notes to error messages when an implementation was expected but not found. You can refer to the trait's generic arguments by name and to the resolved type using Self.

For example:

#![feature(rustc_attrs)]

#[rustc_on_unimplemented="an iterator over elements of type `{A}` \
    cannot be built from a collection of type `{Self}`"]
trait MyIterator<A> {
    fn next(&mut self) -> A;
}

fn iterate_chars<I: MyIterator<char>>(i: I) {
    // ...
}

fn main() {
    iterate_chars(&[1, 2, 3][..]);
}

When the user compiles this, they will see the following;

error[E0277]: the trait bound `&[{integer}]: MyIterator<char>` is not satisfied
  --> <anon>:14:5
   |
14 |     iterate_chars(&[1, 2, 3][..]);
   |     ^^^^^^^^^^^^^ an iterator over elements of type `char` cannot be built from a collection of type `&[{integer}]`
   |
   = help: the trait `MyIterator<char>` is not implemented for `&[{integer}]`
   = note: required by `iterate_chars`

rustc_on_unimplemented also supports advanced filtering for better targeting of messages, as well as modifying specific parts of the error message. You target the text of:

  • the main error message (message)
  • the label (label)
  • an extra note (note)

For example, the following attribute

#[rustc_on_unimplemented(
    message="message",
    label="label",
    note="note"
)]
trait MyIterator<A> {
    fn next(&mut self) -> A;
}

Would generate the following output:

error[E0277]: message
  --> <anon>:14:5
   |
14 |     iterate_chars(&[1, 2, 3][..]);
   |     ^^^^^^^^^^^^^ label
   |
   = note: note
   = help: the trait `MyIterator<char>` is not implemented for `&[{integer}]`
   = note: required by `iterate_chars`

To allow more targeted error messages, it is possible to filter the application of these fields based on a variety of attributes when using on:

  • crate_local: whether the code causing the trait bound to not be fulfilled is part of the user's crate. This is used to avoid suggesting code changes that would require modifying a dependency.
  • Any of the generic arguments that can be substituted in the text can be referred by name as well for filtering, like Rhs="i32", except for Self.
  • _Self: to filter only on a particular calculated trait resolution, like Self="std::iter::Iterator<char>". This is needed because Self is a keyword which cannot appear in attributes.
  • direct: user-specified rather than derived obligation.
  • from_method: usable both as boolean (whether the flag is present, like crate_local) or matching against a particular method. Currently used for try.
  • from_desugaring: usable both as boolean (whether the flag is present) or matching against a particular desugaring. The desugaring is identified with its variant name in the DesugaringKind enum.

For example, the Iterator trait can be annotated in the following way:

#[rustc_on_unimplemented(
    on(
        _Self="&str",
        note="call `.chars()` or `.as_bytes()` on `{Self}"
    ),
    message="`{Self}` is not an iterator",
    label="`{Self}` is not an iterator",
    note="maybe try calling `.iter()` or a similar method"
)]
pub trait Iterator {}

Which would produce the following outputs:

error[E0277]: `Foo` is not an iterator
 --> src/main.rs:4:16
  |
4 |     for foo in Foo {}
  |                ^^^ `Foo` is not an iterator
  |
  = note: maybe try calling `.iter()` or a similar method
  = help: the trait `std::iter::Iterator` is not implemented for `Foo`
  = note: required by `std::iter::IntoIterator::into_iter`

error[E0277]: `&str` is not an iterator
 --> src/main.rs:5:16
  |
5 |     for foo in "" {}
  |                ^^ `&str` is not an iterator
  |
  = note: call `.chars()` or `.bytes() on `&str`
  = help: the trait `std::iter::Iterator` is not implemented for `&str`
  = note: required by `std::iter::IntoIterator::into_iter`

If you need to filter on multiple attributes, you can use all, any or not in the following way:

#[rustc_on_unimplemented(
    on(
        all(_Self="&str", T="std::string::String"),
        note="you can coerce a `{T}` into a `{Self}` by writing `&*variable`"
    )
)]
pub trait From<T>: Sized { /* ... */ }

Lints

This page documents some of the machinery around lint registration and how we run lints in the compiler.

The LintStore is the central piece of infrastructure, around which everything rotates. It's not available during the early parts of compilation (i.e., before TyCtxt) in most code, as we need to fill it in with all of the lints, which can only happen after plugin registration.

Lints vs. lint passes

There are two parts to the linting mechanism within the compiler: lints and lint passes. Unfortunately, a lot of the documentation we have refers to both of these as just "lints."

First, we have the lint declarations themselves: this is where the name and default lint level and other metadata come from. These are normally defined by way of the declare_lint! macro, which boils down to a static with type &rustc::lint::Lint. We lint against direct declarations without the use of the macro today (though this may change in the future, as the macro is somewhat unwieldy to add new fields to, like all macros by example).

Lint declarations don't carry any "state" - they are merely global identifers and descriptions of lints. We assert at runtime that they are not registered twice (by lint name).

Lint passes are the meat of any lint. Notably, there is not a one-to-one relationship between lints and lint passes; a lint might not have any lint pass that emits it, it could have many, or just one -- the compiler doesn't track whether a pass is in any way associated with a particular lint, and frequently lints are emitted as part of other work (e.g., type checking, etc.).

Registration

High-level overview

The lint store is created and all lints are registered during plugin registration, in rustc_interface::register_plugins. There are three 'sources' of lint: the internal lints, plugin lints, and rustc_interface::Config register_lints. All are registered here, in register_plugins.

Once the registration is complete, we "freeze" the lint store by placing it in an Lrc. Later in the driver, it's passed into the GlobalCtxt constructor where it lives in an immutable form from then on.

Lints are registered via the LintStore::register_lint function. This should happen just once for any lint, or an ICE will occur.

Lint passes are registered separately into one of the categories (pre-expansion, early, late, late module). Passes are registered as a closure -- i.e., impl Fn() -> Box<dyn X>, where dyn X is either an early or late lint pass trait object. When we run the lint passes, we run the closure and then invoke the lint pass methods, which take &mut self -- lint passes can keep track of state internally.

Internal lints

Note, these include both rustc-internal lints, and the traditional lints, like, for example the dead code lint.

These are primarily described in two places: rustc::lint::builtin and rustc_lint::builtin. The first provides the definitions for the lints themselves, and the latter provides the lint pass definitions (and implementations).

The internal lint registration happens in the rustc_lint::register_builtins function, along with the rustc_lint::register_internals function. More generally, the LintStore "constructor" function which is the way to get a LintStore in the compiler (you should not construct it directly) is rustc_lint::new_lint_store; it calls the registration functions.

Plugin lints

This is one of the primary use cases remaining for plugins/drivers. Plugins are given access to the mutable LintStore during registration to call any functions they need on the LintStore, just like rustc code. Plugins are intended to declare lints with the plugin field set to true (e.g., by way of the declare_tool_lint! macro), but this is purely for diagnostics and help text; otherwise plugin lints are mostly just as first class as rustc builtin lints.

Driver lints

These are the lints provided by drivers via the rustc_interface::Config register_lints field, which is a callback. Drivers should, if finding it already set, call the function currently set within the callback they add. The best way for drivers to get access to this is by overriding the Callbacks::config function which gives them direct access to the Config structure.

Compiler lint passes are combined into one pass

Within the compiler, for performance reasons, we usually do not register dozens of lint passes. Instead, we have a single lint pass of each variety (e.g. BuiltinCombinedModuleLateLintPass) which will internally call all of the individual lint passes; this is because then we get the benefits of static over dynamic dispatch for each of the (often empty) trait methods.

Ideally, we'd not have to do this, since it certainly adds to the complexity of understanding the code. However, with the current type-erased lint store approach, it is beneficial to do so for performance reasons.

New lints being added likely want to join one of the existing declarations like late_lint_mod_passes in librustc_lint/lib.rs, which would then auto-propagate into the other.

Diagnostic Codes

We generally try assign each error message a unique code like E0123. These codes are defined in the compiler in the diagnostics.rs files found in each crate, which basically consist of macros. The codes come in two varieties: those that have an extended write-up, and those that do not. Whenever possible, if you are making a new code, you should write an extended write-up.

Allocating a fresh code

If you want to create a new error, you first need to find the next available code. This is a bit tricky since the codes are defined in various crates. To do it, run this obscure command:

./x.py test --stage 0 tidy

This will invoke the tidy script, which generally checks that your code obeys our coding conventions. One of those jobs is to check that diagnostic codes are indeed unique. Once it is finished with that, tidy will print out the lowest unused code:

...
tidy check (x86_64-apple-darwin)
* 470 error codes
* highest error code: E0591
...

Here we see the highest error code in use is E0591, so we probably want E0592. To be sure, run rg E0592 and check, you should see no references.

Next, open src/{crate}/diagnostics.rs within the crate where you wish to issue the error (e.g., src/librustc_typeck/diagnostics.rs). Ideally, you will add the code (in its proper numerical order) into the register_long_diagnostics! macro, sort of like this:


#![allow(unused_variables)]
fn main() {
register_long_diagnostics! {
    ...
    E0592: r##"
Your extended error text goes here!
"##,
}
}

But you can also add it without an extended description:


#![allow(unused_variables)]
fn main() {
register_diagnostics! {
    ...
    E0592, // put a description here
}
}

To actually issue the error, you can use the struct_span_err! macro:


#![allow(unused_variables)]
fn main() {
struct_span_err!(self.tcx.sess, // some path to the session here
                 span, // whatever span in the source you want
                 E0592, // your new error code
                 &format!("text of the error"))
    .emit() // actually issue the error
}

If you want to add notes or other snippets, you can invoke methods before you call .emit():


#![allow(unused_variables)]
fn main() {
struct_span_err!(...)
    .span_label(another_span, "something to label in the source")
    .span_note(another_span, "some separate note, probably avoid these")
    .emit_()
}

ICE-breakers

The ICE-breaker groups are an easy way to help out with rustc in a "piece-meal" fashion, without committing to a larger project. ICE-breaker groups are easy to join (just submit a PR!) and joining does not entail any particular commitment.

Once you join an ICE ICE-breaker group, you will be added to a list that receives pings on github whenever a new issue is found that fits the ICE-breaker group's criteria. If you are interested, you can then claim the issue and start working on it.

Of course, you don't have to wait for new issues to be tagged! If you prefer, you can use the Github label for an ICE-breaker group to search for existing issues that haven't been claimed yet.

What issues are a good fit for ICE-breaker groups?

"ICE-breaker issues" are intended to be isolated bugs of middle priority:

  • By isolated, we mean that we do not expect large-scale refactoring to be required to fix the bug.
  • By middle priority, we mean that we'd like to see the bug fixed, but it's not such a burning problem that we are dropping everything else to fix it. The danger with such bugs, of course, is that they can accumulate over time, and the role of the ICE-breaker groups is to try and stop that from happening!

Joining an ICE-breaker group

To join an ICE-breaker group, you just have to open a PR adding your Github username to the appropriate file in the Rust team repository. See the "example PRs" below to get a precise idea and to identify the file to edit.

Also, if you are not already a member of a Rust team then -- in addition to adding your name to the file -- you have to checkout the repository and run the following command:

cargo run add-person $your_user_name

Example PRs:

Tagging an issue for an ICE-breaker group

To tag an issue as appropriate for an ICE-breaker group, you give rustbot a ping command with the name of the ICE-breakers team. For example:

@rustbot ping icebreakers-llvm
@rustbot ping icebreakers-cleanup-crew

To make these commands shorter and easier to remember, there are aliases, defined in the triagebot.toml file. For example:

@rustbot ping llvm
@rustbot ping cleanup

Keep in mind that these aliases are meant to make humans' life easier. They might be subject to change. If you need to ensure that a command will always be valid, prefer the full invocations over the aliases.

Note though that this should only be done by compiler team members or contributors, and is typically done as part of compiler team triage.

Cleanup Crew

Github Label: ICEBreaker-Cleanup-Crew

The "Cleanup Crew" are focused on improving bug reports. Specifically, the goal is to try to ensure that every bug report has all the information that will be needed for someone to fix it:

  • a minimal, standalone example that shows the problem
  • links to duplicates or related bugs
  • if the bug is a regression (something that used to work, but no longer does), then a bisection to the PR or nightly that caused the regression

This kind of cleanup is invaluable in getting bugs fixed. Better still, it can be done by anybody who knows Rust, without any particularly deep knowledge of the compiler.

Let's look a bit at the workflow for doing "cleanup crew" actions.

Finding a minimal, standalone example

Here the ultimate goal is to produce an example that reproduces the same problem but without relying on any external crates. Such a test ought to contain as little code as possible, as well. This will make it much easier to isolate the problem.

However, even if the "ultimate minimal test" cannot be achieved, it's still useful to post incremental minimizations. For example, if you can eliminate some of the external dependencies, that is helpful, and so forth.

It's particularly useful to reduce to an example that works in the Rust playground, rather than requiring people to checkout a cargo build.

There are many resources for how to produce minimized test cases. Here are a few:

  • The rust-reduce tool can try to reduce code automatically.
    • The C-reduce tool also works on Rust code, though it requires that you start from a single file. (XXX link to some post explaining how to do it?)
  • pnkfelix's Rust Bug Minimization Patterns blog post
    • This post focuses on "heavy bore" techniques, where you are starting with a large, complex cargo project that you wish to narrow down to something standalone.

Links to duplicate or related bugs

If you are on the "Cleanup Crew", you will sometimes see multiple bug reports that seem very similar. You can link one to the other just by mentioning the other bug number in a Github comment. Sometimes it is useful to close duplicate bugs. But if you do so, you should always copy any test case from the bug you are closing to the other bug that remains open, as sometimes duplicate-looking bugs will expose different facets of the same problem.

Bisecting regressions

For regressions (something that used to work, but no longer does), it is super useful if we can figure out precisely when the code stopped working. The gold standard is to be able to identify the precise PR that broke the code, so we can ping the author, but even narrowing it down to a nightly build is helpful, especially as that then gives us a range of PRs. (One other challenge is that we sometimes land "rollup" PRs, which combine multiple PRs into one.)

cargo-bisect-rustc

To help in figuring out the cause of a regression we have a tool called cargo-bisect-rustc. It will automatically download and test various builds of rustc. For recent regressions, it is even able to use the builds from our CI to track down the regression to a specific PR; for older regressions, it will simply identify a nightly.

To learn to use cargo-bisect-rustc, check out this blog post, which gives a quick introduction to how it works. You can also ask questions at the Zulip stream #t-compiler/cargo-bisect-rustc, or help in improving the tool.

identifying the range of PRs in a nightly

If the regression occurred more than 90 days ago, then cargo-bisect-rustc will not able to identify the particular PR that caused the regression, just the nightly build. In that case, we can identify the set of PRs that this corresponds to by using the git history.

The command rustc +nightly -vV will cause rustc to output a number of useful bits of version info, including the commit-hash. Given the commit-hash of two nightly versions, you can find all of PRs that have landed in between by taking the following steps:

  1. Go to an update checkout of the rust-lang/rust repository
  2. Execute the command git log --author=bors --format=oneline SHA1..SHA2
  • This will list out all of the commits by bors, which is our merge bot
  • Each commit corresponds to one PR, and information about the PR should be in the description
  1. Copy and paste that information into the bug report

Often, just eye-balling the PR descriptions (which are included in the commit messages) will give you a good idea which one likely caused the problem. But if you're unsure feel free to just ping the compiler team (@rust-lang/compiler) or else to ping the authors of the PR themselves.

LLVM ICE-breakers

Github Label: ICEBreaker-LLVM

The "LLVM ICE-breakers" are focused on bugs that center around LLVM. These bugs often arise because of LLVM optimizations gone awry, or as the result of an LLVM upgrade. The goal here is:

  • to determine whether the bug is a result of us generating invalid LLVM IR, or LLVM misoptimizing;
  • if the former, to fix our IR;
  • if the latter, to try and file a bug on LLVM (or identify an existing bug).

Helpful tips and options

The "Debugging LLVM" section of the rustc-dev-guide gives a step-by-step process for how to help debug bugs caused by LLVM. In particular, it discusses how to emit LLVM IR, run the LLVM IR optimization pipeliness, and so forth. You may also find it useful to look at the various codegen options listed under -Chelp and the internal options under -Zhelp -- there are a number that pertain to LLVM (just search for LLVM).

If you do narrow to an LLVM bug

The "Debugging LLVM" section also describes what to do once you've identified the bug.

rust-lang/rust Licenses

The rustc compiler source and standard library are dual licensed under the Apache License v2.0 and the MIT License unless otherwise specified.

Detailed licensing information is available in the COPYRIGHT document of the rust-lang/rust repository.

Part 2: 高层编译器架构

本指南的其余部分讨论了编译器的工作方式。 介绍了从编译器的高级结构到编译的每个阶段如何工作的所有内容。 对于对端到端编译过程感兴趣的读者,以及学习他们希望作出贡献的特定系统的读者,这些文章都应该是比较友好的。 如果有任何不清楚的地方,请随时在rustc-dev-guide仓库上提出问题,或联系编译器团队,详细见第1部分的这一章中。

在这一部分中,我们将专门研究编译器的高级体系结构。 具体来说,将研究查询系统,增量编译和interning。 这是三个影响整个编译器的总体设计选择。

Overview of the Compiler

Coming soon! Work is in progress on this chapter. See https://github.com/rust-lang/rustc-dev-guide/pull/633 for the source and the project README for local build instructions.

编译器源代码的高层次概观

Crate 结构

Rust的主要存储库由src目录组成,该目录下有许多crate。 这些crate包含标准库和编译器的源代码。 当然,本文主要针对后者。

Rustc由许多crate组成,包括rustc_astrustcrustc_targetrustc_codegenrustc_driver等。 每个crate的源码都可以在src/libXXX之类的目录中找到,其中XXX是crate名称。

(注:这些crate的名称和划分不是一成不变的,可能会随着时间而改变。 目前,我们倾向于采用更细粒度的划分来帮助缩短编译时间, 尽管随着增量编译的改进,这种情况可能会发生变化。)

这些crate的依赖关系结构大致是钻石形的:

                  rustc_driver
                /      |       \
              /        |         \
            /          |           \
          /            v             \
rustc_codegen  rustc_borrowck   ...  rustc_metadata
          \            |            /
            \          |          /
              \        |        /
                \      v      /
                    rustc
                       |
                       v
                   rustc_ast
                    /    \
                  /       \
           rustc_span  rustc_builtin_macros

在这个格的顶部的rustc_driver crate是rust编译器的"main"函数。 它没有太多的“实际代码”,而是将其他crate中定义的所有代码绑定在一起,并定义了整个执行流程。 (但是,随着我们越来越多地向 查询模型 过渡,编译的“流程”正越来越少地集中定义。)

在另一端,rustc crate定义了其余所有编译器中使用的通用的数据结构(例如,如何表示类型,trait和程序本身)。 它也包含一些编译器本身的代码,尽管相对有限。

最后,位于中间的凸出部分中的所有crate定义了编译器的大部分内容——它们都依赖于rustc, 因此它们可以利用在那里定义的各种类型,并且导出rustc_driver将根据需要调用的公共子过程 (这些crate导出的内容越来越多是“查询定义”,但这些内容将在稍后介绍)。

rustc下面的是构成parser和错误报告机制的各种crate。 它们也是internal部分的一部分 (尽管它们确实确实会被其他的一些crate使用;但我们希望逐渐淘汰这种做法)。

编译的主要阶段

Rust编译器目前处于过渡阶段。 它曾经是一个纯粹的“基于pass”的编译器,我们在整个程序中运行了许多pass,每个过程都进行了特定的转换。 我们正在逐步将这种基于pass的代码替换为基于按需查询的替代方案。 在查询模型中,我们自结果往回工作,执行一个query来表达我们的最终目标(例如“编译此crate”)。 该查询又可以进行其他查询(例如“为我提供crate中所有模块的列表”)。 这些查询会进行其他查询,这些查询最终会在基本操作中触底,例如解析输入,运行类型检查器等等。 这种按需模式允许我们做一些令人兴奋的事情,例如只做少量工作就能完成对单个函数的类型检查。 它还有助于增量编译。 (有关定义查询的详细信息,请查看查询模型。)

无论基于pass还是查询,编译器必须执行的基本操作都是相同的。 唯一改变的是这些操作是前后调用还是按需调用。 为了编译一个Rust crate,以下是我们采取的一般步骤:

  1. Parsing 输入

    • 这一步将处理.rs文件并产生AST(“抽象语法树”)
    • AST是在src/librustc_ast/ast.rs中定义的。 它旨在紧密地匹配Rust语言的词汇语法。
  2. 名称解析,宏扩展和配置

    • parse完成后,我们将递归处理AST,解析路径并扩展宏。这个过程也处理#[cfg]节点,因此也可能把东西从AST中剥离出来。
  3. 降级成HIR

    • 名称解析完成后,我们将AST转换为HIR,或者说“高级中间表示”。 HIR在src/librustc_middle/hir/中定义;该模块还包含降级代码。
    • HIR是AST的轻度简化版。它比AST进行了更多处理,并且更适合随后的分析。 它需要匹配Rust语言的语法。
    • 一个简单例子:在AST中,我们保留了用户编写的括号, 因此,即使((1 + 2)+ 3)1 + 2 + 3是等效的,它们也被解析为不同的抽象语法树。 但是,在HIR中,括号节点被删除,并且这两个表达式以相同的方式表示。
    1. 类型检查和后续分析
    • 处理HIR的重要步骤是执行类型检查。 该过程为每个HIR表达式分配类型,并且还负责解析一些“类型相关”的路径,例如字段访问 (x.f ——我们不知道正在访问哪个字段f,直到我们知道“x”的类型) 和关联类型(T::Item ——在知道T是什么之前,我们无法知道Item是什么类型)。
    • 类型检查会创建“side-tables”(TypeckTables),其中包括表达式的类型,方法的解析方式等。
    • 经过类型检查后,我们可以进行其他分析,例如访问控制检查。
  4. 降级成MIR并进行后续处理

    • 完成类型检查后,我们可以将HIR降低为MIR(“中级IR”),这是Rust的非常脱糖的版本,非常适合借用检查和某些高级优化。
  5. 转换为LLVM和LLVM优化

    • 从MIR,我们可以生成LLVM IR。
    • 然后LLVM会运行其各种优化,这会产生许多 .o文件(每个“codegen单位”一个)。
  6. 链接

    • 最后,这些.o文件会链接在一起。

查询: 需求驱动的编译

编译器高级概述中所述,Rust编译器当前正在从传统的“基于pass”的设置过渡到“需求驱动”的系统。 编译器查询系统是我们新的需求驱动型组织的关键。 背后的想法很简单。 您有各种查询来计算有关输入的内容 – 例如,有一个名为type_of(def_id)的查询,给定某项的def-id,它将计算该项的类型并将其返回给您。

查询执行是“记忆式”的 —— 因此,第一次调用查询时,它将执行计算,但是下一次,结果将从哈希表中返回。 此外,查询执行非常适合“增量计算”; 大致的想法是,当您执行查询时,可能会通过从磁盘加载存储的数据来将结果返回给您(但这是一个单独的主题,我们将不在此处进一步讨论)。

总体愿景是,最终,整个编译器控制流将由查询驱动。 实际上,将有一个顶级查询(“编译”)在一个crate上运行编译。 这反过来会要求从最底层开始的有关该crate的信息。 例如:

  • 此“编译”查询可能需要获取代码生成单元列表(即需要由LLVM编译的模块)。
  • 但是计算代码生成单元列表将调用一些子查询,该子查询返回Rust源代码中定义的所有模块的列表。
  • 该查询会调用一些要求HIR的内容。
  • 这会越来越远,直到我们完成实际的parsing。

但是,这一愿景尚未完全实现。 尽管如此,编译器的大量代码(例如,生成MIR)仍然完全像这样工作。

增量编译的详细说明

增量编译的详细说明一章提供了关于什么是查询及其工作方式的深入描述。 如果您打算编写自己的查询,那么可以读一读这一章节。

调用查询

调用查询很简单。 tcx(“类型上下文”)为每个定义的查询提供了一种方法。 因此,例如,要调用type_of查询,只需执行以下操作:

let ty = tcx.type_of(some_def_id);

编译器如何执行查询

您可能想知道调用查询方法时会发生什么。 答案是,对于每个查询,编译器都会维护一个缓存——如果您的查询已经执行过,那么我们将简单地从缓存中复制上一次的返回值并将其返回 (因此,您应尝试确保查询的返回类型可以低成本的克隆;如有必要,请插入Rc)。

Providers

但是,如果查询不在缓存中,则编译器将尝试找到合适的provider。 provider是已定义并链接到编译器的某个函数,其包含用于计算查询结果的代码。

Provider是按crate定义的。 编译器至少在概念上在内部维护每个crate的provider表。 目前,实际上有两组表:用于查询“本地crate”的provider(即正在编译的crate)和用于查询“外部crate”(即正在编译的crate的依赖) 的provider。 请注意,确定查询所在的crate的类型不是查询的类型,而是。 例如,当您调用tcx.type_of(def_id)时,它可以是本地查询或外部查询, 具体取决于def_id所指的crate(请参阅self::keys::Key trait 以获取有关其工作原理的更多信息)。

Provider 始终具有相同的签名:

fn provider<'tcx>(
    tcx: TyCtxt<'tcx>,
    key: QUERY_KEY,
) -> QUERY_RESULT {
    ...
}

提供者采用两个参数:tcx和查询键。 并返回查询结果。

####如何初始化provider

创建tcx时,它的创建者会使用Providers结构为它提供provider。 此结构是由此处的宏生成的,但基本上就是一大堆函数指针:

struct Providers {
    type_of: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx>,
    ...
}

目前,我们为本地crate提供一份该结构的副本,为所有外部crate提供一份该结构的副本,尽管计划是最终可能为每个crate提供一份。

这些Provider结构最终是由librustc_driver创建并填充的,但是它是通过将工作分配给其他rustc_*crate来完成的。 这是通过调用各种provide函数来完成的。 这些函数看起来像这样:

pub fn provide(providers: &mut Providers) {
    *providers = Providers {
        type_of,
        ..*providers
    };
}

也就是说,他们使用一个 &mut Providers 并对其进行in place的修改。 通常我们使用上面的写法只是因为它看起来比较漂亮,但是您也可以providers.type_of = type_of,这是等效的。 (在这里,type_of 将是一个顶层函数,如我们之前看到的那样定义。) 因此,如果我们想为其他查询添加provider,我们可以在上面的crate中将其称为“ fubar”,我们可以修改 provide()函数如下:

pub fn provide(providers: &mut Providers) {
    *providers = Providers {
        type_of,
        fubar,
        ..*providers
    };
}

fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: DefId) -> Fubar<'tcx> { ... }

注意大多数rustc_* crate仅提供local provider。 几乎所有的extern provider都会通过rustc_metadata crate 进行处理,后者会从crate元数据中加载信息。 但是在某些情况下,某些crate可以既提供本地也提供外部crate查询, 在这种情况下,它们会定义rustc_driver可以调用的provideprovide_extern函数。

添加一种新的查询

假设您想添加一种新的查询,您该怎么做? 定义查询分为两个步骤:

  1. 首先,必须指定查询名称和参数; 然后,
  2. 您必须在需要的地方提供查询提供程序。

要指定查询名称和参数,您只需将条目添加到 src/librustc_middle/query/mod.rs中的大型宏调用之中,类似于:

rustc_queries! {
    Other {
        /// Records the type of every item.
        query type_of(key: DefId) -> Ty<'tcx> {
            cache { key.is_local() }
        }
    }

    ...
}

查询分为几类(OtherCodegnTypeChecking等)。 每组包含一个或多个查询。 每个查询的定义都是这样分解的:

query type_of(key: DefId) -> Ty<'tcx> { ... }
^^    ^^^^^^^      ^^^^^     ^^^^^^^^   ^^^
|     |            |         |          |
|     |            |         |          查询修饰符
|     |            |         查询的结果类型
|     |            查询的 key 的类型
|     查询名称
query关键字

让我们一一介绍它们:

  • query关键字: 表示查询定义的开始。
  • **查询名称:**查询方法的名称(tcx.type_of(..))。也用作将生成以表示此查询的结构的名称(ty::queries::type_of)。
  • **查询的 key 的类型:**此查询的参数类型。此类型必须实现ty::query::keys::Key trait,该trait定义了(例如)如何将其映射到crate,等等。
  • 查询的结果类型: 此查询产生的类型。 这种类型应该(a)不使用RefCell或其他内部可变性模式,并且 (b)可以廉价地克隆。对于非平凡的数据类型,建议使用Interning方法或使用RcArc
    • 一个例外是ty::steal::Steal类型,该类型用于廉价地修改MIR。 有关更多详细信息,请参见Steal的定义。不应该在不警告@rust-lang/compiler的情况下添加对Steal的新的使用。
  • 查询修饰符: 各种标志和选项,可自定义查询的处理方式。

因此,要添加查询:

  • 使用上述格式在rustc_queries!中添加一个条目。
  • 通过修改适当的provide方法链接provider; 或根据需要添加一个新文件,并确保rustc_driver会调用它。

查询结构体和查询描述

对于每种类型,rustc_queries宏都会生成一个以查询命名的“查询结构体”。 此结构体是描述查询的一种占位符。 每个这样的结构都要实现self::config::QueryConfig trait, 该trait具有与该特定查询的键/值相关的类型。 基本上,生成的代码如下所示:

// Dummy struct representing a particular kind of query:
pub struct type_of<'tcx> { data: PhantomData<&'tcx ()> }

impl<'tcx> QueryConfig for type_of<'tcx> {
  type Key = DefId;
  type Value = Ty<'tcx>;

  const NAME: QueryName = QueryName::type_of;
  const CATEGORY: ProfileCategory = ProfileCategory::Other;
}

您可能希望实现一个额外的trait,称为self::config::QueryDescription。 这个trait是用于在发生cycle错误时使用,为查询提供一个“人类可读”的名称,以便我们可以探明在cycle发生的情况。 如果查询键是DefId,则实现此特征是可选的,但是如果实现它,则会得到一个相当普通的错误(“processing foo ...”)。 您可以将新的impl放入config模块中。 他们看起来像这样:

impl<'tcx> QueryDescription for queries::type_of<'tcx> {
    fn describe(tcx: TyCtxt, key: DefId) -> String {
        format!("computing the type of `{}`", tcx.def_path_str(key))
    }
}

另一个选择是添加desc修饰符:

rustc_queries! {
    Other {
        /// Records the type of every item.
        query type_of(key: DefId) -> Ty<'tcx> {
            desc { |tcx| "computing the type of `{}`", tcx.def_path_str(key) }
        }
    }
}

rustc_queries 宏会自动生成合适的 impl

查询求值模型的详细介绍

本章将更深入地探讨建立在查询上的抽象模型。 它不涉及实现细节,而是尝试解释底层逻辑。 因此,这里的示例已经精简和简化,没有直接反映出编译器的内部API。

查询是什么

抽象地,我们将编译器关于给定crate的知识视为“数据库”,而查询是向编译器询问有关该问题的方式,即我们“查询”编译器的“数据库”以获取事实。

但是,此编译器数据库有一些特殊之处:它开始为空,并在执行查询时按需填充。因此,如果数据库尚不包含查询,则查询必须知道如何计算其结果。为此,它可以访问创建数据库时预先填充的其他查询和某些输入值。

因此,查询包含以下内容:

  • 标识查询的名称
  • 一个“键”,指定我们要查找的内容
  • 一种结果类型,用于指定产生什么样的结果
  • 一个 "provider",它是一个函数,用于指定如果数据库中尚不存在结果,该如何计算结果。

例如,type_of查询的名称为type_of,其查询键为DefId,用于标识我们要了解其类型的项目, 结果类型为Ty<'tcx>,并且provider是一个函数,只要向其提供查询键,它就能访问数据库其余部分,计算出该键标识的项的类型。

因此,从某种意义上说,查询只是将查询关键字映射到相应结果的函数。但是,为了使其听起来合理,我们必须应用一些限制:

  • 键和结果必须是不可变的值。
  • provider函数必须是纯函数,即对于相同的键,它必须始终产生相同的结果。
  • provider函数的参数是键和对“查询上下文”的引用(提供对“数据库”其余部分的访问)。

该数据库是通过“懒惰地”调用查询构建的。 provider将调用其他查询,其结果或者已被缓存或者要通过调用另一个provider进行计算。 这些provider调用从概念上形成有向无环图(DAG),在其叶上是创建查询上下文时已知的输入值。

缓存/记忆化

查询调用的结果是“记忆化”的,这意味着查询上下文会将结果缓存在内部表中,并且当再次使用相同的查询键调用查询时,将从缓存中返回结果,而不是再次运行provider。

这种缓存对于提高查询引擎的效率至关重要。 没有记忆化,系统将仍然是健全的(也就是说,它将产生相同的结果),但是相同的计算将一遍又一遍地进行。

记忆化是查询提供程序必须为纯函数的主要原因之一。 如果调用提供程序函数可能对每个调用产生不同的结果(因为它访问某些全局可变状态),则我们将无法记住结果。

输入数据

当查询上下文刚刚被创建出来时,它是空的:未执行任何查询,也不可能缓存任何结果。 但是上下文已经提供了对“输入”数据的访问权限,即在创建上下文之前计算的不可变数据段,并且查询可以访问以执行其计算。 当前,此输入数据主要由HIR map,上游crate元数据和调用编译器的命令行选项组成。 将来,输入将仅包含命令行选项和源文件列表——HIR map本身将由处理这些源文件的查询提供。

没有输入,查询就没有任何用处,没有任何东西可以计算结果(请记住,查询provider只能访问其他查询和上下文,而不能访问任何其他外部状态或信息)。

对于查询provider,输入数据和其他查询的结果看起来完全相同:它只是告诉上下文“给我X的值”。因为输入数据是不可变的,所以提供者可以在不同的查询调用之间依赖于输入数据,就像查询结果一样。

一些查询的执行过程示例

这个查询调用DAG是如何形成的? 在某个时候,编译器驱动程序将创建暂时为空的查询上下文。 然后,它将从查询系统外部调用执行其任务所需的查询。 看起来类似于以下内容:

fn compile_crate() {
    let cli_options = ...;
    let hir_map = ...;

    // Create the query context `tcx`
    let tcx = TyCtxt::new(cli_options, hir_map);

    // Do type checking by invoking the type check query
    tcx.type_check_crate();
}

type_check_crate 查询 provider 看起来像这样:

fn type_check_crate_provider(tcx, _key: ()) {
    let list_of_hir_items = tcx.hir_map.list_of_items();

    for item_def_id in list_of_hir_items {
        tcx.type_check_item(item_def_id);
    }
}

我们看到,type_check_crate查询访问输入数据(tcx.hir_map.list_of_items())并调用其他查询( type_check_item)。 type_check_item调用本身将访问输入数据和/或调用其他查询,因此最后,查询调用的DAG将从最初执行的节点向后构建:

         (2)                                                 (1)
  list_of_all_hir_items <----------------------------- type_check_crate()
                                                               |
    (5)             (4)                  (3)                   |
  Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
                                      |                        |
                    +-----------------+                        |
                    |                                          |
    (7)             v  (6)                  (8)                |
  Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+

// (x) denotes invocation order

我们还看到通常可以从缓存中读取查询结果:type_check_item(foo)调用时已经计算出了type_of(bar), 因此当type_check_item(bar)需要它时,它已经在缓存中了。

只要上下文存在,查询结果就会保留在查询上下文中。 因此,如果编译器驱动程序稍后调用另一个查询,则上面的图将仍然存在,并且已经执行的查询将不必重新执行。

前面我们曾说过,查询调用构成了DAG。 但是,类似如下查询的provider很容易导致形成有环图:

fn cyclic_query_provider(tcx, key) -> u32 {
  // Invoke the same query with the same key again
  tcx.cyclic_query(key)
}

由于查询provider是常规函数,因此其行为将与预期的一样:求值将陷入无限递归中。 这样的查询也不可能会有用。 但是,有时某些类型的无效用户输入可能导致以循环方式调用查询。 查询引擎包括对循环调用的检查,并且由于循环是不可恢复的错误,因此将中止执行,并显示尽可能可读的“cycle error”消息。

"窃取" 查询

一些查询的结果包装在Steal<T>结构中。 这些查询的行为与常规查询完全相同,但有一个例外:它们的结果有时是从缓存中“窃取”来的,这意味着程序的其他部分正在拥有该所有权,并且该结果无法再访问。

这种窃取机制纯粹是作为性能优化而存在的,因为某些结果值的克隆成本太高(例如,函数的MIR)。 结果窃取似乎违反了查询结果必须是不可变的条件(毕竟,我们将结果值移出了缓存),但是只要无法观察到该突变就可以。这可以通过两件事来实现:

  • 在结果被窃取之前,我们确保eager地运行所有可能需要读取该结果的查询。必须通过手动调用这些查询来完成此操作。
  • 每当查询尝试访问被窃取的结果时,我们都会使编译器ICE,以使这种情况不会被忽略。

由于需要手动干预,因此这不是理想的工作方式,因此应谨慎使用它,并且仅在众所周知哪些查询可以访问给定结果的情况下使用。 然而,实际上,窃取并没有成为很大的维护负担。

总结一下:“窃取查询”以受控方式破坏了一些规则。但是有检查确保不会悄悄地出错。

查询的并行执行

查询模型具有一些属性,这些属性使得并行求值多个查询实际上可行,而无需花费太多精力:

  • 查询provider可以访问的所有数据都是通过查询上下文访问的,因此查询上下文可以确保同步访问。
  • 查询结果必须是不可变的,以便不同线程可以同时安全地使用它们。

nightly编译器已经实现了并行查询求值,如下所示:

当求值查询foo时,foo的缓存表被锁定。

  • 如果已经有结果,我们可以克隆它,释放锁,然后就这样完成了查询。
  • 如果没有缓存条目并且没有其他活动中的查询调用正在计算这一结果,则将该键标记为“进行中”,释放锁并开始求值。
  • 如果正在对同一个键进行另一个查询调用,我们将释放锁,并阻塞线程,直到另一个调用计算出我们正在等待的结果。 这不会造成死锁,因为如前所述,查询调用形成了DAG。总会有一些线程可以进展。

Incremental compilation

The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. We'll start by describing a slightly simplified variant of the real thing – the "basic algorithm" – and then describe some possible improvements.

The basic algorithm

The basic algorithm is called the red-green algorithm1. The high-level idea is that, after each run of the compiler, we will save the results of all the queries that we do, as well as the query DAG. The query DAG is a DAG that indexes which queries executed which other queries. So, for example, there would be an edge from a query Q1 to another query Q2 if computing Q1 required computing Q2 (note that because queries cannot depend on themselves, this results in a DAG and not a general graph).

On the next run of the compiler, then, we can sometimes reuse these query results to avoid re-executing a query. We do this by assigning every query a color:

  • If a query is colored red, that means that its result during this compilation has changed from the previous compilation.
  • If a query is colored green, that means that its result is the same as the previous compilation.

There are two key insights here:

  • First, if all the inputs to query Q are colored green, then the query Q must result in the same value as last time and hence need not be re-executed (or else the compiler is not deterministic).
  • Second, even if some inputs to a query changes, it may be that it still produces the same result as the previous compilation. In particular, the query may only use part of its input.
    • Therefore, after executing a query, we always check whether it produced the same result as the previous time. If it did, we can still mark the query as green, and hence avoid re-executing dependent queries.

The try-mark-green algorithm

At the core of incremental compilation is an algorithm called "try-mark-green". It has the job of determining the color of a given query Q (which must not have yet been executed). In cases where Q has red inputs, determining Q's color may involve re-executing Q so that we can compare its output, but if all of Q's inputs are green, then we can conclude that Q must be green without re-executing it or inspecting its value at all. In the compiler, this allows us to avoid deserializing the result from disk when we don't need it, and in fact enables us to sometimes skip serializing the result as well (see the refinements section below).

Try-mark-green works as follows:

  • First check if the query Q was executed during the previous compilation.
    • If not, we can just re-execute the query as normal, and assign it the color of red.
  • If yes, then load the 'dependent queries' of Q.
  • If there is a saved result, then we load the reads(Q) vector from the query DAG. The "reads" is the set of queries that Q executed during its execution.
    • For each query R in reads(Q), we recursively demand the color of R using try-mark-green.
      • Note: it is important that we visit each node in reads(Q) in same order as they occurred in the original compilation. See the section on the query DAG below.
      • If any of the nodes in reads(Q) wind up colored red, then Q is dirty.
        • We re-execute Q and compare the hash of its result to the hash of the result from the previous compilation.
        • If the hash has not changed, we can mark Q as green and return.
      • Otherwise, all of the nodes in reads(Q) must be green. In that case, we can color Q as green and return.

The query DAG

The query DAG code is stored in src/librustc_middle/dep_graph. Construction of the DAG is done by instrumenting the query execution.

One key point is that the query DAG also tracks ordering; that is, for each query Q, we not only track the queries that Q reads, we track the order in which they were read. This allows try-mark-green to walk those queries back in the same order. This is important because once a subquery comes back as red, we can no longer be sure that Q will continue along the same path as before. That is, imagine a query like this:

fn main_query(tcx) {
    if tcx.subquery1() {
        tcx.subquery2()
    } else {
        tcx.subquery3()
    }
}

Now imagine that in the first compilation, main_query starts by executing subquery1, and this returns true. In that case, the next query main_query executes will be subquery2, and subquery3 will not be executed at all.

But now imagine that in the next compilation, the input has changed such that subquery1 returns false. In this case, subquery2 would never execute. If try-mark-green were to visit reads(main_query) out of order, however, it might visit subquery2 before subquery1, and hence execute it. This can lead to ICEs and other problems in the compiler.

Improvements to the basic algorithm

In the description of the basic algorithm, we said that at the end of compilation we would save the results of all the queries that were performed. In practice, this can be quite wasteful – many of those results are very cheap to recompute, and serializing and deserializing them is not a particular win. In practice, what we would do is to save the hashes of all the subqueries that we performed. Then, in select cases, we also save the results.

This is why the incremental algorithm separates computing the color of a node, which often does not require its value, from computing the result of a node. Computing the result is done via a simple algorithm like so:

  • Check if a saved result for Q is available. If so, compute the color of Q. If Q is green, deserialize and return the saved result.
  • Otherwise, execute Q.
    • We can then compare the hash of the result and color Q as green if it did not change.

Resources

The initial design document can be found at https://github.com/nikomatsakis/rustc-on-demand-incremental-design-doc/blob/master/0000-rustc-on-demand-and-incremental.md, which expands on the memoization details, provides more high-level overview and motivation for this system.

Footnotes

1

I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis

Incremental Compilation In Detail

The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. It relies on the fact that:

  1. queries are pure functions -- given the same inputs, a query will always yield the same result, and
  2. the query model structures compilation in an acyclic graph that makes dependencies between individual computations explicit.

This chapter will explain how we can use these properties for making things incremental and then goes on to discuss version implementation issues.

A Basic Algorithm For Incremental Query Evaluation

As explained in the query evaluation model primer, query invocations form a directed-acyclic graph. Here's the example from the previous chapter again:

  list_of_all_hir_items <----------------------------- type_check_crate()
                                                               |
                                                               |
  Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
                                      |                        |
                    +-----------------+                        |
                    |                                          |
                    v                                          |
  Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+

Since every access from one query to another has to go through the query context, we can record these accesses and thus actually build this dependency graph in memory. With dependency tracking enabled, when compilation is done, we know which queries were invoked (the nodes of the graph) and for each invocation, which other queries or input has gone into computing the query's result (the edges of the graph).

Now suppose we change the source code of our program so that HIR of bar looks different than before. Our goal is to only recompute those queries that are actually affected by the change while re-using the cached results of all the other queries. Given the dependency graph we can do exactly that. For a given query invocation, the graph tells us exactly what data has gone into computing its results, we just have to follow the edges until we reach something that has changed. If we don't encounter anything that has changed, we know that the query still would evaluate to the same result we already have in our cache.

Taking the type_of(foo) invocation from above as an example, we can check whether the cached result is still valid by following the edges to its inputs. The only edge leads to Hir(foo), an input that has not been affected by the change. So we know that the cached result for type_of(foo) is still valid.

The story is a bit different for type_check_item(foo): We again walk the edges and already know that type_of(foo) is fine. Then we get to type_of(bar) which we have not checked yet, so we walk the edges of type_of(bar) and encounter Hir(bar) which has changed. Consequently the result of type_of(bar) might yield a different same result than what we have in the cache and, transitively, the result of type_check_item(foo) might have changed too. We thus re-run type_check_item(foo), which in turn will re-run type_of(bar), which will yield an up-to-date result because it reads the up-to-date version of Hir(bar).

The Problem With The Basic Algorithm: False Positives

If you read the previous paragraph carefully you'll notice that it says that type_of(bar) might have changed because one of its inputs has changed. There's also the possibility that it might still yield exactly the same result even though its input has changed. Consider an example with a simple query that just computes the sign of an integer:

  IntValue(x) <---- sign_of(x) <--- some_other_query(x)

Let's say that IntValue(x) starts out as 1000 and then is set to 2000. Even though IntValue(x) is different in the two cases, sign_of(x) yields the result + in both cases.

If we follow the basic algorithm, however, some_other_query(x) would have to (unnecessarily) be re-evaluated because it transitively depends on a changed input. Change detection yields a "false positive" in this case because it has to conservatively assume that some_other_query(x) might be affected by that changed input.

Unfortunately it turns out that the actual queries in the compiler are full of examples like this and small changes to the input often potentially affect very large parts of the output binaries. As a consequence, we had to make the change detection system smarter and more accurate.

Improving Accuracy: The red-green Algorithm

The "false positives" problem can be solved by interleaving change detection and query re-evaluation. Instead of walking the graph all the way to the inputs when trying to find out if some cached result is still valid, we can check if a result has actually changed after we were forced to re-evaluate it.

We call this algorithm the red-green algorithm because nodes in the dependency graph are assigned the color green if we were able to prove that its cached result is still valid and the color red if the result has turned out to be different after re-evaluating it.

The meat of red-green change tracking is implemented in the try-mark-green algorithm, that, you've guessed it, tries to mark a given node as green:

fn try_mark_green(tcx, current_node) -> bool {

    // Fetch the inputs to `current_node`, i.e. get the nodes that the direct
    // edges from `node` lead to.
    let dependencies = tcx.dep_graph.get_dependencies_of(current_node);

    // Now check all the inputs for changes
    for dependency in dependencies {

        match tcx.dep_graph.get_node_color(dependency) {
            Green => {
                // This input has already been checked before and it has not
                // changed; so we can go on to check the next one
            }
            Red => {
                // We found an input that has changed. We cannot mark
                // `current_node` as green without re-running the
                // corresponding query.
                return false
            }
            Unknown => {
                // This is the first time we look at this node. Let's try
                // to mark it green by calling try_mark_green() recursively.
                if try_mark_green(tcx, dependency) {
                    // We successfully marked the input as green, on to the
                    // next.
                } else {
                    // We could *not* mark the input as green. This means we
                    // don't know if its value has changed. In order to find
                    // out, we re-run the corresponding query now!
                    tcx.run_query_for(dependency);

                    // Fetch and check the node color again. Running the query
                    // has forced it to either red (if it yielded a different
                    // result than we have in the cache) or green (if it
                    // yielded the same result).
                    match tcx.dep_graph.get_node_color(dependency) {
                        Red => {
                            // The input turned out to be red, so we cannot
                            // mark `current_node` as green.
                            return false
                        }
                        Green => {
                            // Re-running the query paid off! The result is the
                            // same as before, so this particular input does
                            // not invalidate `current_node`.
                        }
                        Unknown => {
                            // There is no way a node has no color after
                            // re-running the query.
                            panic!("unreachable")
                        }
                    }
                }
            }
        }
    }

    // If we have gotten through the entire loop, it means that all inputs
    // have turned out to be green. If all inputs are unchanged, it means
    // that the query result corresponding to `current_node` cannot have
    // changed either.
    tcx.dep_graph.mark_green(current_node);

    true
}

// Note: The actual implementation can be found in
//       src/librustc_middle/dep_graph/graph.rs

By using red-green marking we can avoid the devastating cumulative effect of having false positives during change detection. Whenever a query is executed in incremental mode, we first check if its already green. If not, we run try_mark_green() on it. If it still isn't green after that, then we actually invoke the query provider to re-compute the result.

The Real World: How Persistence Makes Everything Complicated

The sections above described the underlying algorithm for incremental compilation but because the compiler process exits after being finished and takes the query context with its result cache with it into oblivion, we have to persist data to disk, so the next compilation session can make use of it. This comes with a whole new set of implementation challenges:

  • The query result cache is stored to disk, so they are not readily available for change comparison.
  • A subsequent compilation session will start off with new version of the code that has arbitrary changes applied to it. All kinds of IDs and indices that are generated from a global, sequential counter (e.g. NodeId, DefId, etc) might have shifted, making the persisted results on disk not immediately usable anymore because the same numeric IDs and indices might refer to completely new things in the new compilation session.
  • Persisting things to disk comes at a cost, so not every tiny piece of information should be actually cached in between compilation sessions. Fixed-sized, plain-old-data is preferred to complex things that need to run through an expensive (de-)serialization step.

The following sections describe how the compiler currently solves these issues.

A Question Of Stability: Bridging The Gap Between Compilation Sessions

As noted before, various IDs (like DefId) are generated by the compiler in a way that depends on the contents of the source code being compiled. ID assignment is usually deterministic, that is, if the exact same code is compiled twice, the same things will end up with the same IDs. However, if something changes, e.g. a function is added in the middle of a file, there is no guarantee that anything will have the same ID as it had before.

As a consequence we cannot represent the data in our on-disk cache the same way it is represented in memory. For example, if we just stored a piece of type information like TyKind::FnDef(DefId, &'tcx Substs<'tcx>) (as we do in memory) and then the contained DefId points to a different function in a new compilation session we'd be in trouble.

The solution to this problem is to find "stable" forms for IDs which remain valid in between compilation sessions. For the most important case, DefIds, these are the so-called DefPaths. Each DefId has a corresponding DefPath but in place of a numeric ID, a DefPath is based on the path to the identified item, e.g. std::collections::HashMap. The advantage of an ID like this is that it is not affected by unrelated changes. For example, one can add a new function to std::collections but std::collections::HashMap would still be std::collections::HashMap. A DefPath is "stable" across changes made to the source code while a DefId isn't.

There is also the DefPathHash which is just a 128-bit hash value of the DefPath. The two contain the same information and we mostly use the DefPathHash because it simpler to handle, being Copy and self-contained.

This principle of stable identifiers is used to make the data in the on-disk cache resilient to source code changes. Instead of storing a DefId, we store the DefPathHash and when we deserialize something from the cache, we map the DefPathHash to the corresponding DefId in the current compilation session (which is just a simple hash table lookup).

The HirId, used for identifying HIR components that don't have their own DefId, is another such stable ID. It is (conceptually) a pair of a DefPath and a LocalId, where the LocalId identifies something (e.g. a hir::Expr) locally within its "owner" (e.g. a hir::Item). If the owner is moved around, the LocalIds within it are still the same.

Checking Query Results For Changes: HashStable And Fingerprints

In order to do red-green-marking we often need to check if the result of a query has changed compared to the result it had during the previous compilation session. There are two performance problems with this though:

  • We'd like to avoid having to load the previous result from disk just for doing the comparison. We already computed the new result and will use that. Also loading a result from disk will "pollute" the interners with data that is unlikely to ever be used.
  • We don't want to store each and every result in the on-disk cache. For example, it would be wasted effort to persist things to disk that are already available in upstream crates.

The compiler avoids these problems by using so-called Fingerprints. Each time a new query result is computed, the query engine will compute a 128 bit hash value of the result. We call this hash value "the Fingerprint of the query result". The hashing is (and has to be) done "in a stable way". This means that whenever something is hashed that might change in between compilation sessions (e.g. a DefId), we instead hash its stable equivalent (e.g. the corresponding DefPath). That's what the whole HashStable infrastructure is for. This way Fingerprints computed in two different compilation sessions are still comparable.

The next step is to store these fingerprints along with the dependency graph. This is cheap since fingerprints are just bytes to be copied. It's also cheap to load the entire set of fingerprints together with the dependency graph.

Now, when red-green-marking reaches the point where it needs to check if a result has changed, it can just compare the (already loaded) previous fingerprint to the fingerprint of the new result.

This approach works rather well but it's not without flaws:

  • There is a small possibility of hash collisions. That is, two different results could have the same fingerprint and the system would erroneously assume that the result hasn't changed, leading to a missed update.

    We mitigate this risk by using a high-quality hash function and a 128 bit wide hash value. Due to these measures the practical risk of a hash collision is negligible.

  • Computing fingerprints is quite costly. It is the main reason why incremental compilation can be slower than non-incremental compilation. We are forced to use a good and thus expensive hash function, and we have to map things to their stable equivalents while doing the hashing.

A Tale Of Two DepGraphs: The Old And The New

The initial description of dependency tracking glosses over a few details that quickly become a head scratcher when actually trying to implement things. In particular it's easy to overlook that we are actually dealing with two dependency graphs: The one we built during the previous compilation session and the one that we are building for the current compilation session.

When a compilation session starts, the compiler loads the previous dependency graph into memory as an immutable piece of data. Then, when a query is invoked, it will first try to mark the corresponding node in the graph as green. This means really that we are trying to mark the node in the previous dep-graph as green that corresponds to the query key in the current session. How do we do this mapping between current query key and previous DepNode? The answer is again Fingerprints: Nodes in the dependency graph are identified by a fingerprint of the query key. Since fingerprints are stable across compilation sessions, computing one in the current session allows us to find a node in the dependency graph from the previous session. If we don't find a node with the given fingerprint, it means that the query key refers to something that did not yet exist in the previous session.

So, having found the dep-node in the previous dependency graph, we can look up its dependencies (i.e. also dep-nodes in the previous graph) and continue with the rest of the try-mark-green algorithm. The next interesting thing happens when we successfully marked the node as green. At that point we copy the node and the edges to its dependencies from the old graph into the new graph. We have to do this because the new dep-graph cannot not acquire the node and edges via the regular dependency tracking. The tracking system can only record edges while actually running a query -- but running the query, although we have the result already cached, is exactly what we want to avoid.

Once the compilation session has finished, all the unchanged parts have been copied over from the old into the new dependency graph, while the changed parts have been added to the new graph by the tracking system. At this point, the new graph is serialized out to disk, alongside the query result cache, and can act as the previous dep-graph in a subsequent compilation session.

Didn't You Forget Something?: Cache Promotion

The system described so far has a somewhat subtle property: If all inputs of a dep-node are green then the dep-node itself can be marked as green without computing or loading the corresponding query result. Applying this property transitively often leads to the situation that some intermediate results are never actually loaded from disk, as in the following example:

   input(A) <-- intermediate_query(B) <-- leaf_query(C)

The compiler might need the value of leaf_query(C) in order to generate some output artifact. If it can mark leaf_query(C) as green, it will load the result from the on-disk cache. The result of intermediate_query(B) is never loaded though. As a consequence, when the compiler persists the new result cache by writing all in-memory query results to disk, intermediate_query(B) will not be in memory and thus will be missing from the new result cache.

If there subsequently is another compilation session that actually needs the result of intermediate_query(B) it will have to be re-computed even though we had a perfectly valid result for it in the cache just before.

In order to prevent this from happening, the compiler does something called "cache promotion": Before emitting the new result cache it will walk all green dep-nodes and make sure that their query result is loaded into memory. That way the result cache doesn't unnecessarily shrink again.

Incremental Compilation and the Compiler Backend

The compiler backend, the part involving LLVM, is using the query system but it is not implemented in terms of queries itself. As a consequence it does not automatically partake in dependency tracking. However, the manual integration with the tracking system is pretty straight-forward. The compiler simply tracks what queries get invoked when generating the initial LLVM version of each codegen unit, which results in a dep-node for each of them. In subsequent compilation sessions it then tries to mark the dep-node for a CGU as green. If it succeeds it knows that the corresponding object and bitcode files on disk are still valid. If it doesn't succeed, the entire codegen unit has to be recompiled.

This is the same approach that is used for regular queries. The main differences are:

  • that we cannot easily compute a fingerprint for LLVM modules (because they are opaque C++ objects),

  • that the logic for dealing with cached values is rather different from regular queries because here we have bitcode and object files instead of serialized Rust values in the common result cache file, and

  • the operations around LLVM are so expensive in terms of computation time and memory consumption that we need to have tight control over what is executed when and what stays in memory for how long.

The query system could probably be extended with general purpose mechanisms to deal with all of the above but so far that seemed like more trouble than it would save.

Shortcomings of the Current System

There are many things that still can be improved.

Incrementality of on-disk data structures

The current system is not able to update on-disk caches and the dependency graph in-place. Instead it has to rewrite each file entirely in each compilation session. The overhead of doing so is a few percent of total compilation time.

Unnecessary data dependencies

Data structures used as query results could be factored in a way that removes edges from the dependency graph. Especially "span" information is very volatile, so including it in query result will increase the chance that that result won't be reusable. See https://github.com/rust-lang/rust/issues/47389 for more information.

Debugging and Testing Dependencies

Testing the dependency graph

There are various ways to write tests against the dependency graph. The simplest mechanisms are the #[rustc_if_this_changed] and #[rustc_then_this_would_need] annotations. These are used in compile-fail tests to test whether the expected set of paths exist in the dependency graph. As an example, see src/test/compile-fail/dep-graph-caller-callee.rs.

The idea is that you can annotate a test like:

#[rustc_if_this_changed]
fn foo() { }

#[rustc_then_this_would_need(TypeckTables)] //~ ERROR OK
fn bar() { foo(); }

#[rustc_then_this_would_need(TypeckTables)] //~ ERROR no path
fn baz() { }

This will check whether there is a path in the dependency graph from Hir(foo) to TypeckTables(bar). An error is reported for each #[rustc_then_this_would_need] annotation that indicates whether a path exists. //~ ERROR annotations can then be used to test if a path is found (as demonstrated above).

Debugging the dependency graph

Dumping the graph

The compiler is also capable of dumping the dependency graph for your debugging pleasure. To do so, pass the -Z dump-dep-graph flag. The graph will be dumped to dep_graph.{txt,dot} in the current directory. You can override the filename with the RUST_DEP_GRAPH environment variable.

Frequently, though, the full dep graph is quite overwhelming and not particularly helpful. Therefore, the compiler also allows you to filter the graph. You can filter in three ways:

  1. All edges originating in a particular set of nodes (usually a single node).
  2. All edges reaching a particular set of nodes.
  3. All edges that lie between given start and end nodes.

To filter, use the RUST_DEP_GRAPH_FILTER environment variable, which should look like one of the following:

source_filter     // nodes originating from source_filter
-> target_filter  // nodes that can reach target_filter
source_filter -> target_filter // nodes in between source_filter and target_filter

source_filter and target_filter are a &-separated list of strings. A node is considered to match a filter if all of those strings appear in its label. So, for example:

RUST_DEP_GRAPH_FILTER='-> TypeckTables'

would select the predecessors of all TypeckTables nodes. Usually though you want the TypeckTables node for some particular fn, so you might write:

RUST_DEP_GRAPH_FILTER='-> TypeckTables & bar'

This will select only the predecessors of TypeckTables nodes for functions with bar in their name.

Perhaps you are finding that when you change foo you need to re-type-check bar, but you don't think you should have to. In that case, you might do:

RUST_DEP_GRAPH_FILTER='Hir & foo -> TypeckTables & bar'

This will dump out all the nodes that lead from Hir(foo) to TypeckTables(bar), from which you can (hopefully) see the source of the erroneous edge.

Tracking down incorrect edges

Sometimes, after you dump the dependency graph, you will find some path that should not exist, but you will not be quite sure how it came to be. When the compiler is built with debug assertions, it can help you track that down. Simply set the RUST_FORBID_DEP_GRAPH_EDGE environment variable to a filter. Every edge created in the dep-graph will be tested against that filter – if it matches, a bug! is reported, so you can easily see the backtrace (RUST_BACKTRACE=1).

The syntax for these filters is the same as described in the previous section. However, note that this filter is applied to every edge and doesn't handle longer paths in the graph, unlike the previous section.

Example:

You find that there is a path from the Hir of foo to the type check of bar and you don't think there should be. You dump the dep-graph as described in the previous section and open dep-graph.txt to see something like:

Hir(foo) -> Collect(bar)
Collect(bar) -> TypeckTables(bar)

That first edge looks suspicious to you. So you set RUST_FORBID_DEP_GRAPH_EDGE to Hir&foo -> Collect&bar, re-run, and then observe the backtrace. Voila, bug fixed!

Profiling Queries

In an effort to support incremental compilation, the latest design of the Rust compiler consists of a query-based model.

The details of this model are (currently) outside the scope of this document, however, we explain some background of this model, in an effort to explain how we profile its performance. We intend this profiling effort to address issue 42678.

Quick Start

0. Enable debug assertions

./configure --enable-debug-assertions

1. Compile rustc

Compile the compiler, up to at least stage 1:

python x.py --stage 1

2. Run rustc, with flags

Run the compiler on a source file, supplying two additional debugging flags with -Z:

rustc -Z profile-queries -Z incremental=cache foo.rs

Regarding the two additional parameters:

  • -Z profile-queries tells the compiler to run a separate thread that profiles the queries made by the main compiler thread(s).
  • -Z incremental=cache tells the compiler to "cache" various files that describe the compilation dependencies, in the subdirectory cache.

This command will generate the following files:

  • profile_queries.html consists of an HTML-based representation of the trace of queries.
  • profile_queries.counts.txt consists of a histogram, where each histogram "bucket" is a query provider.

3. Run rustc, with -Z time-passes:

  • This additional flag will add all timed passes to the output files mentioned above, in step 2. As described below, these passes appear visually distinct from the queries in the HTML output (they currently appear as green boxes, via CSS).

4. Inspect the output

  • 4(a). Open the HTML file (profile_queries.html) with a browser. See this section for an explanation of this file.
  • 4(b). Open the data file (profile_queries.counts.txt) with a text editor, or spreadsheet. See this section for an explanation of this file.

Interpret the HTML Output

Example 0

The following image gives some example output, from tracing the queries of hello_world.rs (a single main function, that prints "hello world" via the macro println!). This image only shows a short prefix of the total output; the actual output is much longer.

Example HTML output View full HTML output. Note; it could take up to a second to properly render depending on your browser.

Here is the corresponding text output](./example-0.counts.txt).

Example 0 explanation

The trace of the queries has a formal structure; see Trace of Queries for details.

We style this formal structure as follows:

  • Timed passes: Green boxes, when present (via -Z time-passes), represent timed passes in the compiler. In future versions, these passes may be replaced by queries, explained below.
  • Labels: Some green and red boxes are labeled with text. Where they are present, the labels give the following information:
    • The query's provider, sans its key and its result, which are often too long to include in these labels.
    • The duration of the provider, as a fraction of the total time (for the entire trace). This fraction includes the query's entire extent (that is, the sum total of all of its sub-queries).
  • Query hits: Blue dots represent query hits. They consist of leaves in the trace's tree. (CSS class: hit).
  • Query misses: Red boxes represent query misses. They consist of internal nodes in the trace's tree. (CSS class: miss).
  • Nesting structure: Many red boxes contain nested boxes and dots. This nesting structure reflects that some providers depend on results from other providers, which consist of their nested children.
  • Some red boxes are labeled with text, and have highlighted borders (light red, and bolded). (See heuristics for details).

Heuristics

Heuristics-based CSS Classes:

  • important -- Trace nodes are important if they have an extent of 6 (or more), or they have a duration fraction of one percent (or more). These numbers are simple heuristics (currently hard-coded, but easy to modify). Important nodes are styled with textual labels, and highlighted borders (light red, and bolded).

  • frac-50, -40, ... -- Trace nodes whose total duration (self and children) take a large fraction of the total duration, at or above 50%, 40%, and so on. We style nodes these with larger font and padding.

Interpret the Data Output

The file profile_queries.counts.txt contains a table of information about the queries, organized around their providers.

For each provider (or timed pass, when -Z time-passes is present), we produce:

  • A total count --- the total number of times this provider was queried

  • A total duration --- the total number of seconds spent running this provider, including all providers it may depend on. To get a sense of this dependency structure, and inspect a more fine-grained view of these durations, see this section.

These rows are sorted by total duration, in descending order.

Counts: Example 0

The following example profile_queries.counts.txt file results from running on a hello world program (a single main function that uses println to print `"hellow world").

As explained above, the columns consist of provider/pass, count, duration:

translation,1,0.891
symbol_name,2658,0.733
def_symbol_name,2556,0.268
item_attrs,5566,0.162
type_of,6922,0.117
generics_of,8020,0.084
serialize dep graph,1,0.079
relevant_trait_impls_for,50,0.063
def_span,24875,0.061
expansion,1,0.059
const checking,1,0.055
adt_def,1141,0.048
trait_impls_of,32,0.045
is_copy_raw,47,0.045
is_foreign_item,2638,0.042
fn_sig,2172,0.033
adt_dtorck_constraint,2,0.023
impl_trait_ref,2434,0.023
typeck_tables_of,29,0.022
item-bodies checking,1,0.017
typeck_item_bodies,1,0.017
is_default_impl,2320,0.017
borrow checking,1,0.014
borrowck,4,0.014
mir_validated,4,0.013
adt_destructor,10,0.012
layout_raw,258,0.010
load_dep_graph,1,0.007
item-types checking,1,0.005
mir_const,2,0.005
name resolution,1,0.004
is_object_safe,35,0.003
is_sized_raw,89,0.003
parsing,1,0.003
is_freeze_raw,11,0.001
privacy checking,1,0.001
privacy_access_levels,5,0.001
resolving dependency formats,1,0.001
adt_sized_constraint,9,0.001
wf checking,1,0.001
liveness checking,1,0.001
compute_incremental_hashes_map,1,0.001
match checking,1,0.001
type collecting,1,0.001
param_env,31,0.000
effect checking,1,0.000
trait_def,140,0.000
lowering ast -> hir,1,0.000
predicates_of,70,0.000
extern_crate,319,0.000
lifetime resolution,1,0.000
is_const_fn,6,0.000
intrinsic checking,1,0.000
translation item collection,1,0.000
impl_polarity,15,0.000
creating allocators,1,0.000
language item collection,1,0.000
crate injection,1,0.000
early lint checks,1,0.000
indexing hir,1,0.000
maybe creating a macro crate,1,0.000
coherence checking,1,0.000
optimized_mir,6,0.000
is_panic_runtime,33,0.000
associated_item_def_ids,7,0.000
needs_drop_raw,10,0.000
lint checking,1,0.000
complete gated feature checking,1,0.000
stability index,1,0.000
region_maps,11,0.000
super_predicates_of,8,0.000
coherent_trait,2,0.000
AST validation,1,0.000
loop checking,1,0.000
static item recursion checking,1,0.000
variances_of,11,0.000
associated_item,5,0.000
plugin loading,1,0.000
looking for plugin registrar,1,0.000
stability checking,1,0.000
describe_def,15,0.000
variance testing,1,0.000
codegen unit partitioning,1,0.000
looking for entry point,1,0.000
checking for inline asm in case the target doesn't support it,1,0.000
inherent_impls,1,0.000
crate_inherent_impls,1,0.000
trait_of_item,7,0.000
crate_inherent_impls_overlap_check,1,0.000
attribute checking,1,0.000
internalize symbols,1,0.000
impl wf inference,1,0.000
death checking,1,0.000
reachability checking,1,0.000
reachable_set,1,0.000
is_exported_symbol,3,0.000
is_mir_available,2,0.000
unused lib feature checking,1,0.000
maybe building test harness,1,0.000
recursion limit,1,0.000
write allocator module,1,0.000
assert dep graph,1,0.000
plugin registration,1,0.000
write metadata,1,0.000

Background

We give some background about the query model of the Rust compiler.

Def IDs

In the query model, many queries have a key that consists of a Def ID. The Rust compiler uses Def IDs to distinguish definitions in the input Rust program.

From the compiler source code (src/librustc_middle/hir/def_id.rs):

/// A DefId identifies a particular *definition*, by combining a crate
/// index and a def index.
#[derive(Clone, Eq, Ord, PartialOrd, PartialEq, RustcEncodable, RustcDecodable, Hash, Copy)]
pub struct DefId {
    pub krate: CrateNum,
    pub index: DefIndex,
}

Queries

A query relates a key to a result, either by invoking a provider that computes this result, or by reusing a cached result that was provided earlier. We explain each term in more detail:

  • Query Provider: Each kind of query has a pre-defined provider, which refers to the compiler behavior that provides an answer to the query. These providers may nest; see trace of queries for more information about this nesting structure. Example providers:
    • typeck_tables_of -- Typecheck a Def ID; produce "tables" of type information.
    • borrowck -- Borrow-check a Def ID.
    • optimized_mir -- Generate an optimized MIR for a Def ID; produce MIR.
    • For more examples, see Example 0.
  • Query Key: The input/arguments to the provider. Often, this consists of a particular Def ID.
  • Query Result: The output of the provider.

Trace of Queries

Formally, a trace of the queries consists of a tree, where sub-trees represent sub-traces. In particular, the nesting structure of the trace of queries describes how the queries depend on one another.

Even more precisely, this tree represents a directed acyclic graph (DAG), where shared sub-graphs consist of tree nodes that occur multiple times in the tree, first as "cache misses" and later as "cache hits".

Cache hits and misses. The trace is a tree with the following possible tree nodes:

  • Query, with cache miss: The query's result is unknown, and its provider runs to compute it. In this case, the dynamic extent of the query's trace consists of the traced behavior of its provider.
  • Query, with cache hit: The query's result is known, and is reused; its provider does not rerun. These nodes are leaves in the trace, since they have no dynamic extent. These leaves also represent where the tree, represented as a DAG, would share a sub-graph (namely, the sub-graph of the query that was reused from the cache).

Tree node metrics. To help determine how to style this tree, we define the following tree node metrics:

  • Depth: The number of ancestors of the node in its path from the tree root.
  • Extent: The number of immediate children of the node.

Intuitively, a dependency tree is "good" for incremental caching when the depth and extent of each node is relatively small. It is pathological when either of these metrics grows too large. For instance, a tree node whose extent consists of 1M immediate children means that if and when this node is re-computed, all 1M children must be re-queried, at the very least (some may also require recomputation, too).

External Links

Related design ideas, and tracking issues:

More discussion and issues:

How Salsa works

This chapter is based on the explanation given by Niko Matsakis in this video about Salsa.

Salsa is not used directly in rustc, but it is used extensively for rust-analyzer and may be integrated into the compiler in the future.

What is Salsa?

Salsa is a library for incremental recomputation. This means it allows reusing computations that were already done in the past to increase the efficiency of future computations.

The objectives of Salsa are:

  • Provide that functionality in an automatic way, so reusing old computations is done automatically by the library
  • Doing so in a "sound", or "correct", way, therefore leading to the same results as if it had been done from scratch

Salsa's actual model is much richer, allowing many kinds of inputs and many different outputs. For example, integrating Salsa with an IDE could mean that the inputs could be the manifest (Cargo.toml), entire source files (foo.rs), snippets and so on; the outputs of such an integration could range from a binary executable, to lints, types (for example, if a user selects a certain variable and wishes to see its type), completions, etc.

How does it work?

The first thing that Salsa has to do is identify the "base inputs" 1.

Then Salsa has to also identify intermediate, "derived" values, which are something that the library produces, but, for each derived value there's a "pure" function that computes the derived value.

For example, there might be a function ast(x: Path) -> AST. The produced AST isn't a final value, it's an intermidiate value that the library would use for the computation.

This means that when you try to compute with the library, Salsa is going to compute various derived values, and eventually read the input and produce the result for the asked computation.

In the course of computing, Salsa tracks which inputs were accessed and which values are derived. This information is used to determine what's going to happen when the inputs change: are the derived values still valid?

This doesn't necessarily mean that each computation downstream from the input is going to be checked, which could be costly. Salsa only needs to check each downstream computation until it finds one that isn't changed. At that point, it won't check other derived computations since they wouldn't need to change.

It's is helpful to think about this as a graph with nodes. Each derived value has a dependency on other values, which could themselves be either base or derived. Base values don't have a dependency.

I <- A <- C ...
          |
J <- B <--+

When an input I changes, the derived value A could change. The derived value B , which does not depend on I, A, or any value derived from A or I, is not subject to change. Therefore, Salsa can reuse the computation done for B in the past, without having to compute it again.

The computation could also terminate early. Keeping the same graph as before, say that input I has changed in some way (and input J hasn't) but, when computing A again, it's found that A hasn't changed from the previous computation. This leads to an "early termination", because there's no need to check if C needs to change, since both C direct inputs, A and B, haven't changed.

Key Salsa concepts

Query

A query is some value that Salsa can access in the course of computation. Each query can have a number of keys (from 0 to many), and all queries have a result, akin to functions. 0-key queries are called "input" queries.

Database

The database is basically the context for the entire computation, it's meant to store Salsa's internal state, all intermediate values for each query, and anything else that the computation might need. The database must know all the queries that the library is going to do before it can be built, but they don't need to be specified in the same place.

After the database is formed, it can be accessed with queries that are very similar to functions. Since each query's result is stored in the database, when a query is invoked N times, it will return N cloned results, without having to recompute the query (unless the input has changed in such a way that it warrants recomputation).

For each input query (0-key), a "set" method is generated, allowing the user to change the output of such query, and trigger previous memoized values to be potentially invalidated.

Query Groups

A query group is a set of queries which have been defined together as a unit. The database is formed by combining query groups. Query groups are akin to "Salsa modules" 2.

A set of queries in a query group are just a set of methods in a trait.

To create a query group a trait annotated with a specific attribute (#[salsa::query_group(...)]) has to be created.

An argument must also be provided to said attribute as it will be used by Salsa to create a struct to be used later when the database is created.

Example input query group:

/// This attribute will process this tree, produce this tree as output, and produce
/// a bunch of intermidiate stuff that Salsa also uses.  One of these things is a
/// "StorageStruct", whose name we have specified in the attribute.
///
/// This query group is a bunch of **input** queries, that do not rely on any
/// derived input.
#[salsa::query_group(InputsStorage)]
pub trait Inputs {
    /// This attribute (`#[salsa::input]`) indicates that this query is a base
    /// input, therefore `set_manifest` is going to be auto-generated
    #[salsa::input]
    fn manifest(&self) -> Manifest;

    #[salsa::input]
    fn source_text(&self, name: String) -> String;
}

To create a derived query group, one must specify which other query groups this one depends on by specifying them as supertraits, as seen in the following example:

/// This query group is going to contain queries that depend on derived values a
/// query group can access another query group's queries by specifying the
/// dependency as a super trait query groups can be stacked as much as needed using
/// that pattern.
#[salsa::query_group(ParserStorage)]
pub trait Parser: Inputs {
    /// This query `ast` is not an input query, it's a derived query this means
    /// that a definition is necessary.
    fn ast(&self, name: String) -> String;
}

When creating a derived query the implementation of said query must be defined outside the trait. The definition must take a database parameter as an impl Trait (or dyn Trait), where Trait is the query group that the definition belongs to, in addition to the other keys.

///This is going to be the definition of the `ast` query in the `Parser` trait.
///So, when the query `ast` is invoked, and it needs to be recomputed, Salsa is going to call this function
///and it's is going to give it the database as `impl Parser`.
///The function doesn't need to be aware of all the queries of all the query groups
fn ast(db: &impl Parser, name: String) -> String {
    //! Note, `impl Parser` is used here but `dyn Parser` works just as well
    /* code */
    ///By passing an `impl Parser`, this is allowed
    let source_text = db.input_file(name);
    /* do the actual parsing */
    return ast;
}

Eventually, after all the query groups have been defined, the database can be created by declaring a struct.

To specify which query groups are going to be part of the database an attribute (#[salsa::database(...)]) must be added. The argument of said attribute is a list of identifiers, specifying the query groups storages.

///This attribute specifies which query groups are going to be in the database
#[salsa::database(InputsStorage, ParserStorage)]
#[derive(Default)] //optional!
struct MyDatabase {
    ///You also need this one field
    runtime : salsa::Runtime<MyDatabase>,
}
///And this trait has to be implemented
impl salsa::Databse for MyDatabase {
    fn salsa_runtime(&self) -> &salsa::Runtime<MyDatabase> {
        &self.runtime
    }
}

Example usage:

fn main() {
    let db = MyDatabase::default();
    db.set_manifest(...);
    db.set_source_text(...);
    loop {
        db.ast(...); //will reuse results
        db.set_source_text(...);
    }
}
1

"They are not something that you inaubible but something that you kinda get inaudible from the outside 3:23.

2

What is a Salsa module?

Memory Management in Rustc

Rustc tries to be pretty careful how it manages memory. The compiler allocates a lot of data structures throughout compilation, and if we are not careful, it will take a lot of time and space to do so.

One of the main way the compiler manages this is using arenas and interning.

Arenas and Interning

We create a LOT of data structures during compilation. For performance reasons, we allocate them from a global memory pool; they are each allocated once from a long-lived arena. This is called arena allocation. This system reduces allocations/deallocations of memory. It also allows for easy comparison of types for equality: for each interned type X, we implemented PartialEq for X, so we can just compare pointers. The CtxtInterners type contains a bunch of maps of interned types and the arena itself.

Example: ty::TyS

Taking the example of ty::TyS which represents a type in the compiler (you can read more here). Each time we want to construct a type, the compiler doesn’t naively allocate from the buffer. Instead, we check if that type was already constructed. If it was, we just get the same pointer we had before, otherwise we make a fresh pointer. With this schema if we want to know if two types are the same, all we need to do is compare the pointers which is efficient. TyS is carefully setup so you never construct them on the stack. You always allocate them from this arena and you always intern them so they are unique.

At the beginning of the compilation we make a buffer and each time we need to allocate a type we use some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer is 'tcx. Our types are tied to that lifetime, so when compilation finishes all the memory related to that buffer is freed and our 'tcx references would be invalid.

In addition to types, there are a number of other arena-allocated data structures that you can allocate, and which are found in this module. Here are a few examples:

  • Substs, allocated with mk_substs – this will intern a slice of types, often used to specify the values to be substituted for generics (e.g. HashMap<i32, u32> would be represented as a slice &'tcx [tcx.types.i32, tcx.types.u32]).
  • TraitRef, typically passed by value – a trait reference consists of a reference to a trait along with its various type parameters (including Self), like i32: Display (here, the def-id would reference the Display trait, and the substs would contain i32). Note that def-id is defined and discussed in depth in the AdtDef and DefId section.
  • Predicate defines something the trait system has to prove (see traits module).

The tcx and how it uses lifetimes

The tcx ("typing context") is the central data structure in the compiler. It is the context that you use to perform all manner of queries. The struct TyCtxt defines a reference to this shared context:

tcx: TyCtxt<'tcx>
//          ----
//          |
//          arena lifetime

As you can see, the TyCtxt type takes a lifetime parameter. When you see a reference with a lifetime like 'tcx, you know that it refers to arena-allocated data (or data that lives as long as the arenas, anyhow).

A Note On Lifetimes

The Rust compiler is a fairly large program containing lots of big data structures (e.g. the AST, HIR, and the type system) and as such, arenas and references are heavily relied upon to minimize unnecessary memory use. This manifests itself in the way people can plug into the compiler (i.e. the driver), preferring a "push"-style API (callbacks) instead of the more Rust-ic "pull" style (think the Iterator trait).

Thread-local storage and interning are used a lot through the compiler to reduce duplication while also preventing a lot of the ergonomic issues due to many pervasive lifetimes. The rustc::ty::tls module is used to access these thread-locals, although you should rarely need to touch it.

Part 3: 源代码的不同表示

这部分描述了从用户那里获取原始源代码并将其转换为编译器可以轻松使用的各种形式的过程。 这些称为中间表示。

此过程首先从编译器了解用户的要求开始:分析给定的命令行参数并确定要编译的内容。

The Rustc Driver and Interface

The rustc_driver is essentially rustc's main() function. It acts as the glue for running the various phases of the compiler in the correct order, using the interface defined in the rustc_interface crate.

The rustc_interface crate provides external users with an (unstable) API for running code at particular times during the compilation process, allowing third parties to effectively use rustc's internals as a library for analysing a crate or emulating the compiler in-process (e.g. the RLS or rustdoc).

For those using rustc as a library, the rustc_interface::run_compiler() function is the main entrypoint to the compiler. It takes a configuration for the compiler and a closure that takes a Compiler. run_compiler creates a Compiler from the configuration and passes it to the closure. Inside the closure, you can use the Compiler to drive queries to compile a crate and get the results. This is what the rustc_driver does too. You can see a minimal example of how to use rustc_interface here.

You can see what queries are currently available through the rustdocs for Compiler. You can see an example of how to use them by looking at the rustc_driver implementation, specifically the rustc_driver::run_compiler function (not to be confused with rustc_interface::run_compiler). The rustc_driver::run_compiler function takes a bunch of command-line args and some other configurations and drives the compilation to completion.

rustc_driver::run_compiler also takes a Callbacks, a trait that allows for custom compiler configuration, as well as allowing some custom code run after different phases of the compilation.

Warning: By its very nature, the internal compiler APIs are always going to be unstable. That said, we do try not to break things unnecessarily.

The walking tour of rustdoc

Rustdoc actually uses the rustc internals directly. It lives in-tree with the compiler and standard library. This chapter is about how it works.

Rustdoc is implemented entirely within the crate librustdoc. It runs the compiler up to the point where we have an internal representation of a crate (HIR) and the ability to run some queries about the types of items. HIR and queries are discussed in the linked chapters.

librustdoc performs two major steps after that to render a set of documentation:

  • "Clean" the AST into a form that's more suited to creating documentation (and slightly more resistant to churn in the compiler).
  • Use this cleaned AST to render a crate's documentation, one page at a time.

Naturally, there's more than just this, and those descriptions simplify out lots of details, but that's the high-level overview.

(Side note: librustdoc is a library crate! The rustdoc binary is created using the project in src/tools/rustdoc. Note that literally all that does is call the main() that's in this crate's lib.rs, though.)

Cheat sheet

  • Use ./x.py build --stage 1 src/libstd src/tools/rustdoc to make a usable rustdoc you can run on other projects.
    • Add src/libtest to be able to use rustdoc --test.
    • If you've used rustup toolchain link local /path/to/build/$TARGET/stage1 previously, then after the previous build command, cargo +local doc will Just Work.
  • Use ./x.py doc --stage 1 src/libstd to use this rustdoc to generate the standard library docs.
    • The completed docs will be available in build/$TARGET/doc/std, though the bundle is meant to be used as though you would copy out the doc folder to a web server, since that's where the CSS/JS and landing page are.
  • Most of the HTML printing code is in html/format.rs and html/render.rs. It's in a bunch of fmt::Display implementations and supplementary functions.
  • The types that got Display impls above are defined in clean/mod.rs, right next to the custom Clean trait used to process them out of the rustc HIR.
  • The bits specific to using rustdoc as a test harness are in test.rs.
  • The Markdown renderer is loaded up in html/markdown.rs, including functions for extracting doctests from a given block of Markdown.
  • The tests on rustdoc output are located in src/test/rustdoc, where they're handled by the test runner of rustbuild and the supplementary script src/etc/htmldocck.py.
  • Tests on search index generation are located in src/test/rustdoc-js, as a series of JavaScript files that encode queries on the standard library search index and expected results.

From crate to clean

In core.rs are two central items: the DocContext struct, and the run_core function. The latter is where rustdoc calls out to rustc to compile a crate to the point where rustdoc can take over. The former is a state container used when crawling through a crate to gather its documentation.

The main process of crate crawling is done in clean/mod.rs through several implementations of the Clean trait defined within. This is a conversion trait, which defines one method:

pub trait Clean<T> {
    fn clean(&self, cx: &DocContext) -> T;
}

clean/mod.rs also defines the types for the "cleaned" AST used later on to render documentation pages. Each usually accompanies an implementation of Clean that takes some AST or HIR type from rustc and converts it into the appropriate "cleaned" type. "Big" items like modules or associated items may have some extra processing in its Clean implementation, but for the most part these impls are straightforward conversions. The "entry point" to this module is the impl Clean<Crate> for visit_ast::RustdocVisitor, which is called by run_core above.

You see, I actually lied a little earlier: There's another AST transformation that happens before the events in clean/mod.rs. In visit_ast.rs is the type RustdocVisitor, which actually crawls a rustc_hir::Crate to get the first intermediate representation, defined in doctree.rs. This pass is mainly to get a few intermediate wrappers around the HIR types and to process visibility and inlining. This is where #[doc(inline)], #[doc(no_inline)], and #[doc(hidden)] are processed, as well as the logic for whether a pub use should get the full page or a "Reexport" line in the module page.

The other major thing that happens in clean/mod.rs is the collection of doc comments and #[doc=""] attributes into a separate field of the Attributes struct, present on anything that gets hand-written documentation. This makes it easier to collect this documentation later in the process.

The primary output of this process is a clean::Crate with a tree of Items which describe the publicly-documentable items in the target crate.

Hot potato

Before moving on to the next major step, a few important "passes" occur over the documentation. These do things like combine the separate "attributes" into a single string and strip leading whitespace to make the document easier on the markdown parser, or drop items that are not public or deliberately hidden with #[doc(hidden)]. These are all implemented in the passes/ directory, one file per pass. By default, all of these passes are run on a crate, but the ones regarding dropping private/hidden items can be bypassed by passing --document-private-items to rustdoc. Note that unlike the previous set of AST transformations, the passes happen on the cleaned crate.

(Strictly speaking, you can fine-tune the passes run and even add your own, but we're trying to deprecate that. If you need finer-grain control over these passes, please let us know!)

Here is current (as of this writing) list of passes:

  • propagate-doc-cfg - propagates #[doc(cfg(...))] to child items.
  • collapse-docs concatenates all document attributes into one document attribute. This is necessary because each line of a doc comment is given as a separate doc attribute, and this will combine them into a single string with line breaks between each attribute.
  • unindent-comments removes excess indentation on comments in order for markdown to like it. This is necessary because the convention for writing documentation is to provide a space between the /// or //! marker and the text, and stripping that leading space will make the text easier to parse by the Markdown parser. (In the past, the markdown parser used was not Commonmark- compliant, which caused annoyances with extra whitespace but this seems to be less of an issue today.)
  • strip-priv-imports strips all private import statements (use, extern crate) from a crate. This is necessary because rustdoc will handle public imports by either inlining the item's documentation to the module or creating a "Reexports" section with the import in it. The pass ensures that all of these imports are actually relevant to documentation.
  • strip-hidden and strip-private strip all doc(hidden) and private items from the output. strip-private implies strip-priv-imports. Basically, the goal is to remove items that are not relevant for public documentation.

From clean to crate

This is where the "second phase" in rustdoc begins. This phase primarily lives in the html/ folder, and it all starts with run() in html/render.rs. This code is responsible for setting up the Context, SharedContext, and Cache which are used during rendering, copying out the static files which live in every rendered set of documentation (things like the fonts, CSS, and JavaScript that live in html/static/), creating the search index, and printing out the source code rendering, before beginning the process of rendering all the documentation for the crate.

Several functions implemented directly on Context take the clean::Crate and set up some state between rendering items or recursing on a module's child items. From here the "page rendering" begins, via an enormous write!() call in html/layout.rs. The parts that actually generate HTML from the items and documentation occurs within a series of std::fmt::Display implementations and functions that pass around a &mut std::fmt::Formatter. The top-level implementation that writes out the page body is the impl<'a> fmt::Display for Item<'a> in html/render.rs, which switches out to one of several item_* functions based on the kind of Item being rendered.

Depending on what kind of rendering code you're looking for, you'll probably find it either in html/render.rs for major items like "what sections should I print for a struct page" or html/format.rs for smaller component pieces like "how should I print a where clause as part of some other item".

Whenever rustdoc comes across an item that should print hand-written documentation alongside, it calls out to html/markdown.rs which interfaces with the Markdown parser. This is exposed as a series of types that wrap a string of Markdown, and implement fmt::Display to emit HTML text. It takes special care to enable certain features like footnotes and tables and add syntax highlighting to Rust code blocks (via html/highlight.rs) before running the Markdown parser. There's also a function in here (find_testable_code) that specifically scans for Rust code blocks so the test-runner code can find all the doctests in the crate.

From soup to nuts

(alternate title: "An unbroken thread that stretches from those first Cells to us")

It's important to note that the AST cleaning can ask the compiler for information (crucially, DocContext contains a TyCtxt), but page rendering cannot. The clean::Crate created within run_core is passed outside the compiler context before being handed to html::render::run. This means that a lot of the "supplementary data" that isn't immediately available inside an item's definition, like which trait is the Deref trait used by the language, needs to be collected during cleaning, stored in the DocContext, and passed along to the SharedContext during HTML rendering. This manifests as a bunch of shared state, context variables, and RefCells.

Also of note is that some items that come from "asking the compiler" don't go directly into the DocContext - for example, when loading items from a foreign crate, rustdoc will ask about trait implementations and generate new Items for the impls based on that information. This goes directly into the returned Crate rather than roundabout through the DocContext. This way, these implementations can be collected alongside the others, right before rendering the HTML.

Other tricks up its sleeve

All this describes the process for generating HTML documentation from a Rust crate, but there are couple other major modes that rustdoc runs in. It can also be run on a standalone Markdown file, or it can run doctests on Rust code or standalone Markdown files. For the former, it shortcuts straight to html/markdown.rs, optionally including a mode which inserts a Table of Contents to the output HTML.

For the latter, rustdoc runs a similar partial-compilation to get relevant documentation in test.rs, but instead of going through the full clean and render process, it runs a much simpler crate walk to grab just the hand-written documentation. Combined with the aforementioned "find_testable_code" in html/markdown.rs, it builds up a collection of tests to run before handing them off to the libtest test runner. One notable location in test.rs is the function make_test, which is where hand-written doctests get transformed into something that can be executed.

Some extra reading about make_test can be found here.

Dotting i's and crossing t's

So that's rustdoc's code in a nutshell, but there's more things in the repo that deal with it. Since we have the full compiletest suite at hand, there's a set of tests in src/test/rustdoc that make sure the final HTML is what we expect in various situations. These tests also use a supplementary script, src/etc/htmldocck.py, that allows it to look through the final HTML using XPath notation to get a precise look at the output. The full description of all the commands available to rustdoc tests is in htmldocck.py.

In addition, there are separate tests for the search index and rustdoc's ability to query it. The files in src/test/rustdoc-js each contain a different search query and the expected results, broken out by search tab. These files are processed by a script in src/tools/rustdoc-js and the Node.js runtime. These tests don't have as thorough of a writeup, but a broad example that features results in all tabs can be found in basic.js. The basic idea is that you match a given QUERY with a set of EXPECTED results, complete with the full item path of each item.

Example: Type checking through rustc_interface

rustc_interface allows you to interact with Rust code at various stages of compilation.

Getting the type of an expression

NOTE: For the example to compile, you will need to first run the following:

rustup component add rustc-dev

To get the type of an expression, use the global_ctxt to get a TyCtxt:

// In this example, config specifies the rust program:
//   fn main() { let message = \"Hello, world!\"; println!(\"{}\", message); }
// Our goal is to get the type of the string literal "Hello, world!".
//
// See https://github.com/rust-lang/rustc-dev-guide/blob/master/examples/rustc-driver-example.rs for a complete example of configuring rustc_interface
rustc_interface::run_compiler(config, |compiler| {
    compiler.enter(|queries| {
        // Analyze the crate and inspect the types under the cursor.
        queries.global_ctxt().unwrap().take().enter(|tcx| {
            // Every compilation contains a single crate.
            let krate = tcx.hir().krate();
            // Iterate over the top-level items in the crate, looking for the main function.
            for (_, item) in &krate.items {
                // Use pattern-matching to find a specific node inside the main function.
                if let rustc_hir::ItemKind::Fn(_, _, body_id) = item.kind {
                    let expr = &tcx.hir().body(body_id).value;
                    if let rustc_hir::ExprKind::Block(block, _) = expr.kind {
                        if let rustc_hir::StmtKind::Local(local) = block.stmts[0].kind {
                            if let Some(expr) = local.init {
                                let hir_id = expr.hir_id; // hir_id identifies the string "Hello, world!"
                                let def_id = tcx.hir().local_def_id(item.hir_id); // def_id identifies the main function
                                let ty = tcx.typeck_tables_of(def_id).node_type(hir_id);
                                println!("{:?}: {:?}", expr, ty); // prints expr(HirId { owner: DefIndex(3), local_id: 4 }: "Hello, world!"): &'static str
                            }
                        }
                    }
                }
            }
        })
    });
});

符号与AST

直接使用源代码非常不方便且容易出错。 因此,在执行其他任何操作之前,我们将原始源代码转换为AST。

事实证明,做到这一点还涉及很多工作,包括词法分析,语法分析,宏展开,名称解析,条件编译,特性门控检查和AST验证。

在本章中,我们将介绍所有这些步骤。

词法分析与语法分析

词法分析和语法分析器当前正在进行大量重构,因此本章的某些部分可能已过时。

编译器要做的第一件事就是将程序(一堆Unicode字符)转换为比字符串更方便编译器使用的表示形式。 这发生在两个阶段:词法分析和语法分析。

词法分析接收字符串并将其转换为token流。 例如,a.b + c将被转换为token a.b+c。 该词法分析器位于librustc_lexer中。

然后,文法分析将获取到的token流并将其转换为一种通常称为抽象语法树(AST)的结构化形式,便于编译器使用。 AST使用Span将特定的AST节点链接回其源文本,从而镜像内存中Rust程序的结构。

AST是在librustc_ast中定义的, 其中包括token和token流的一些定义、用于修改AST的数据结构/trait、以及编译器其他与AST相关的部分(如词法分析器和宏展开)。

文法分析器在librustc_parse中定义,这个crate中也包含了词法分析器的高级接口以及一些在宏扩展后运行的验证例程。 特别的,rustc_parse::parser包含文法分析器实现。

文法分析器的主要入口点是通过 parser 中的各种parse_*函数。 它们使您可以执行以下操作,例如将SourceFile(例如,单个文件中的源)转换为token流, 从token流创建文法分析器,然后执行文法分析器以获取Crate( AST根节点)。

为了最大程度地减少复制的数量,StringReaderParser都具有将其绑定到父ParseSess的生命周期。它包含文法分析时所需的所有信息以及SourceMap本身。

更多关于词法分析的信息

词法分析代码分为两个部分:

  • rustc_lexer crate负责将&str分成组成token。 尽管普遍的做法是使用程序生成基于有限状态机的词法分析器,但rustc_lexer中的词法分析器是手写的。
  • 来自librustc_astStringReaderrustc_lexerrustc特定的数据结构集成在一起。 具体来说,它将Span信息添加到rustc_lexer返回的token和内部标识符中。

The #[test] attribute

Today, rust programmers rely on a built in attribute called #[test]. All you have to do is mark a function as a test and include some asserts like so:

#[test]
fn my_test() {
    assert!(2+2 == 4);
}

When this program is compiled using rustc --test or cargo test, it will produce an executable that can run this, and any other test function. This method of testing allows tests to live alongside code in an organic way. You can even put tests inside private modules:

mod my_priv_mod {
    fn my_priv_func() -> bool {}

    #[test]
    fn test_priv_func() {
        assert!(my_priv_func());
    }
}

Private items can thus be easily tested without worrying about how to expose them to any sort of external testing apparatus. This is key to the ergonomics of testing in Rust. Semantically, however, it's rather odd. How does any sort of main function invoke these tests if they're not visible? What exactly is rustc --test doing?

#[test] is implemented as a syntactic transformation inside the compiler's librustc_ast crate. Essentially, it's a fancy macro, that rewrites the crate in 3 steps:

Step 1: Re-Exporting

As mentioned earlier, tests can exist inside private modules, so we need a way of exposing them to the main function, without breaking any existing code. To that end, librustc_ast will create local modules called __test_reexports that recursively reexport tests. This expansion translates the above example into:

mod my_priv_mod {
    fn my_priv_func() -> bool {}

    pub fn test_priv_func() {
        assert!(my_priv_func());
    }

    pub mod __test_reexports {
        pub use super::test_priv_func;
    }
}

Now, our test can be accessed as my_priv_mod::__test_reexports::test_priv_func. For deeper module structures, __test_reexports will reexport modules that contain tests, so a test at a::b::my_test becomes a::__test_reexports::b::__test_reexports::my_test. While this process seems pretty safe, what happens if there is an existing __test_reexports module? The answer: nothing.

To explain, we need to understand how the AST represents identifiers. The name of every function, variable, module, etc. is not stored as a string, but rather as an opaque Symbol which is essentially an ID number for each identifier. The compiler keeps a separate hashtable that allows us to recover the human-readable name of a Symbol when necessary (such as when printing a syntax error). When the compiler generates the __test_reexports module, it generates a new Symbol for the identifier, so while the compiler-generated __test_reexports may share a name with your hand-written one, it will not share a Symbol. This technique prevents name collision during code generation and is the foundation of Rust's macro hygiene.

Step 2: Harness Generation

Now that our tests are accessible from the root of our crate, we need to do something with them. librustc_ast generates a module like so:

#[main]
pub fn main() {
    extern crate test;
    test::test_main_static(&[&path::to::test1, /*...*/]);
}

where path::to::test1 is a constant of type test::TestDescAndFn.

While this transformation is simple, it gives us a lot of insight into how tests are actually run. The tests are aggregated into an array and passed to a test runner called test_main_static. We'll come back to exactly what TestDescAndFn is, but for now, the key takeaway is that there is a crate called test that is part of Rust core, that implements all of the runtime for testing. test's interface is unstable, so the only stable way to interact with it is through the #[test] macro.

Step 3: Test Object Generation

If you've written tests in Rust before, you may be familiar with some of the optional attributes available on test functions. For example, a test can be annotated with #[should_panic] if we expect the test to cause a panic. It looks something like this:

#[test]
#[should_panic]
fn foo() {
    panic!("intentional");
}

This means our tests are more than just simple functions, they have configuration information as well. test encodes this configuration data into a struct called TestDesc. For each test function in a crate, librustc_ast will parse its attributes and generate a TestDesc instance. It then combines the TestDesc and test function into the predictably named TestDescAndFn struct, that test_main_static operates on. For a given test, the generated TestDescAndFn instance looks like so:

self::test::TestDescAndFn{
  desc: self::test::TestDesc{
    name: self::test::StaticTestName("foo"),
    ignore: false,
    should_panic: self::test::ShouldPanic::Yes,
    allow_fail: false,
  },
  testfn: self::test::StaticTestFn(||
    self::test::assert_test_result(::crate::__test_reexports::foo())),
}

Once we've constructed an array of these test objects, they're passed to the test runner via the harness generated in step 2.

Inspecting the generated code

On nightly rust, there's an unstable flag called unpretty that you can use to print out the module source after macro expansion:

$ rustc my_mod.rs -Z unpretty=hir

Panicking in rust

Step 1: Invocation of the panic! macro.

There are actually two panic macros - one defined in libcore, and one defined in libstd. This is due to the fact that code in libcore can panic. libcore is built before libstd, but we want panics to use the same machinery at runtime, whether they originate in libcore or libstd.

libcore definition of panic!

The libcore panic! macro eventually makes the following call (in src/libcore/panicking.rs):


#![allow(unused_variables)]
fn main() {
// NOTE This function never crosses the FFI boundary; it's a Rust-to-Rust call
extern "Rust" {
    #[lang = "panic_impl"]
    fn panic_impl(pi: &PanicInfo<'_>) -> !;
}

let pi = PanicInfo::internal_constructor(Some(&fmt), location);
unsafe { panic_impl(&pi) }
}

Actually resolving this goes through several layers of indirection:

  1. In src/librustc_middle/middle/weak_lang_items.rs, panic_impl is declared as 'weak lang item', with the symbol rust_begin_unwind. This is used in librustc_typeck/collect.rs to set the actual symbol name to rust_begin_unwind.

    Note that panic_impl is declared in an extern "Rust" block, which means that libcore will attempt to call a foreign symbol called rust_begin_unwind (to be resolved at link time)

  2. In src/libstd/panicking.rs, we have this definition:


#![allow(unused_variables)]
fn main() {
/// Entry point of panic from the libcore crate.
#[cfg(not(test))]
#[panic_handler]
#[unwind(allowed)]
pub fn begin_panic_handler(info: &PanicInfo<'_>) -> ! {
    ...
}
}

The special panic_handler attribute is resolved via src/librustc_middle/middle/lang_items. The extract function converts the panic_handler attribute to a panic_impl lang item.

Now, we have a matching panic_handler lang item in the libstd. This function goes through the same process as the extern { fn panic_impl } definition in libcore, ending up with a symbol name of rust_begin_unwind. At link time, the symbol reference in libcore will be resolved to the definition of libstd (the function called begin_panic_handler in the Rust source).

Thus, control flow will pass from libcore to std at runtime. This allows panics from libcore to go through the same infrastructure that other panics use (panic hooks, unwinding, etc)

libstd implementation of panic!

This is where the actual panic-related logic begins. In src/libstd/panicking.rs, control passes to rust_panic_with_hook. This method is responsible for invoking the global panic hook, and checking for double panics. Finally, we call __rust_start_panic, which is provided by the panic runtime.

The call to __rust_start_panic is very weird - it is passed a *mut &mut dyn BoxMeUp, converted to an usize. Let's break this type down:

  1. BoxMeUp is an internal trait. It is implemented for PanicPayload (a wrapper around the user-supplied payload type), and has a method fn box_me_up(&mut self) -> *mut (dyn Any + Send). This method takes the user-provided payload (T: Any + Send), boxes it, and converts the box to a raw pointer.

  2. When we call __rust_start_panic, we have an &mut dyn BoxMeUp. However, this is a fat pointer (twice the size of a usize). To pass this to the panic runtime across an FFI boundary, we take a mutable reference to this mutable reference (&mut &mut dyn BoxMeUp), and convert it to a raw pointer (*mut &mut dyn BoxMeUp). The outer raw pointer is a thin pointer, since it points to a Sized type (a mutable reference). Therefore, we can convert this thin pointer into a usize, which is suitable for passing across an FFI boundary.

Finally, we call __rust_start_panic with this usize. We have now entered the panic runtime.

Step 2: The panic runtime

Rust provides two panic runtimes: libpanic_abort and libpanic_unwind. The user chooses between them at build time via their Cargo.toml

libpanic_abort is extremely simple: its implementation of __rust_start_panic just aborts, as you would expect.

libpanic_unwind is the more interesting case.

In its implementation of __rust_start_panic, we take the usize, convert it back to a *mut &mut dyn BoxMeUp, dereference it, and call box_me_up on the &mut dyn BoxMeUp. At this point, we have a raw pointer to the payload itself (a *mut (dyn Send + Any)): that is, a raw pointer to the actual value provided by the user who called panic!.

At this point, the platform-independent code ends. We now call into platform-specific unwinding logic (e.g libunwind). This code is responsible for unwinding the stack, running any 'landing pads' associated with each frame (currently, running destructors), and transferring control to the catch_unwind frame.

Note that all panics either abort the process or get caught by some call to catch_unwind: in src/libstd/rt.rs, the call to the user-provided main function is wrapped in catch_unwind.

Macro expansion

librustc_ast, librustc_expand, and librustc_builtin_macros are all undergoing refactoring, so some of the links in this chapter may be broken.

Macro expansion happens during parsing. rustc has two parsers, in fact: the normal Rust parser, and the macro parser. During the parsing phase, the normal Rust parser will set aside the contents of macros and their invocations. Later, before name resolution, macros are expanded using these portions of the code. The macro parser, in turn, may call the normal Rust parser when it needs to bind a metavariable (e.g. $my_expr) while parsing the contents of a macro invocation. The code for macro expansion is in src/librustc_expand/mbe/. This chapter aims to explain how macro expansion works.

Example

It's helpful to have an example to refer to. For the remainder of this chapter, whenever we refer to the "example definition", we mean the following:

macro_rules! printer {
    (print $mvar:ident) => {
        println!("{}", $mvar);
    };
    (print twice $mvar:ident) => {
        println!("{}", $mvar);
        println!("{}", $mvar);
    };
}

$mvar is called a metavariable. Unlike normal variables, rather than binding to a value in a computation, a metavariable binds at compile time to a tree of tokens. A token is a single "unit" of the grammar, such as an identifier (e.g. foo) or punctuation (e.g. =>). There are also other special tokens, such as EOF, which indicates that there are no more tokens. Token trees resulting from paired parentheses-like characters ((...), [...], and {...}) – they include the open and close and all the tokens in between (we do require that parentheses-like characters be balanced). Having macro expansion operate on token streams rather than the raw bytes of a source file abstracts away a lot of complexity. The macro expander (and much of the rest of the compiler) doesn't really care that much about the exact line and column of some syntactic construct in the code; it cares about what constructs are used in the code. Using tokens allows us to care about what without worrying about where. For more information about tokens, see the Parsing chapter of this book.

Whenever we refer to the "example invocation", we mean the following snippet:

printer!(print foo); // Assume `foo` is a variable defined somewhere else...

The process of expanding the macro invocation into the syntax tree println!("{}", foo) and then expanding that into a call to Display::fmt is called macro expansion, and it is the topic of this chapter.

The macro parser

There are two parts to macro expansion: parsing the definition and parsing the invocations. Interestingly, both are done by the macro parser.

Basically, the macro parser is like an NFA-based regex parser. It uses an algorithm similar in spirit to the Earley parsing algorithm. The macro parser is defined in src/librustc_expand/mbe/macro_parser.rs.

The interface of the macro parser is as follows (this is slightly simplified):

fn parse_tt(
    parser: &mut Cow<Parser>, 
    ms: &[TokenTree],
) -> NamedParseResult

We use these items in macro parser:

  • sess is a "parsing session", which keeps track of some metadata. Most notably, this is used to keep track of errors that are generated so they can be reported to the user.
  • tts is a stream of tokens. The macro parser's job is to consume the raw stream of tokens and output a binding of metavariables to corresponding token trees.
  • ms a matcher. This is a sequence of token trees that we want to match tts against.

In the analogy of a regex parser, tts is the input and we are matching it against the pattern ms. Using our examples, tts could be the stream of tokens containing the inside of the example invocation print foo, while ms might be the sequence of token (trees) print $mvar:ident.

The output of the parser is a NamedParseResult, which indicates which of three cases has occurred:

  • Success: tts matches the given matcher ms, and we have produced a binding from metavariables to the corresponding token trees.
  • Failure: tts does not match ms. This results in an error message such as "No rule expected token blah".
  • Error: some fatal error has occurred in the parser. For example, this happens if there are more than one pattern match, since that indicates the macro is ambiguous.

The full interface is defined here.

The macro parser does pretty much exactly the same as a normal regex parser with one exception: in order to parse different types of metavariables, such as ident, block, expr, etc., the macro parser must sometimes call back to the normal Rust parser.

As mentioned above, both definitions and invocations of macros are parsed using the macro parser. This is extremely non-intuitive and self-referential. The code to parse macro definitions is in src/librustc_expand/mbe/macro_rules.rs. It defines the pattern for matching for a macro definition as $( $lhs:tt => $rhs:tt );+. In other words, a macro_rules definition should have in its body at least one occurrence of a token tree followed by => followed by another token tree. When the compiler comes to a macro_rules definition, it uses this pattern to match the two token trees per rule in the definition of the macro using the macro parser itself. In our example definition, the metavariable $lhs would match the patterns of both arms: (print $mvar:ident) and (print twice $mvar:ident). And $rhs would match the bodies of both arms: { println!("{}", $mvar); } and { println!("{}", $mvar); println!("{}", $mvar); }. The parser would keep this knowledge around for when it needs to expand a macro invocation.

When the compiler comes to a macro invocation, it parses that invocation using the same NFA-based macro parser that is described above. However, the matcher used is the first token tree ($lhs) extracted from the arms of the macro definition. Using our example, we would try to match the token stream print foo from the invocation against the matchers print $mvar:ident and print twice $mvar:ident that we previously extracted from the definition. The algorithm is exactly the same, but when the macro parser comes to a place in the current matcher where it needs to match a non-terminal (e.g. $mvar:ident), it calls back to the normal Rust parser to get the contents of that non-terminal. In this case, the Rust parser would look for an ident token, which it finds (foo) and returns to the macro parser. Then, the macro parser proceeds in parsing as normal. Also, note that exactly one of the matchers from the various arms should match the invocation; if there is more than one match, the parse is ambiguous, while if there are no matches at all, there is a syntax error.

For more information about the macro parser's implementation, see the comments in src/librustc_expand/mbe/macro_parser.rs.

Hygiene

If you have ever used C/C++ preprocessor macros, you know that there are some annoying and hard-to-debug gotchas! For example, consider the following C code:

#define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;};

// Then, somewhere else
struct Bar {
    ...
};

DEFINE_FOO

Most people avoid writing C like this – and for good reason: it doesn't compile. The struct Bar defined by the macro clashes names with the struct Bar defined in the code. Consider also the following example:

#define DO_FOO(x) {\
    int y = 0;\
    foo(x, y);\
    }

// Then elsewhere
int y = 22;
DO_FOO(y);

Do you see the problem? We wanted to generate a call foo(22, 0), but instead we got foo(0, 0) because the macro defined its own y!

These are both examples of macro hygiene issues. Hygiene relates to how to handle names defined within a macro. In particular, a hygienic macro system prevents errors due to names introduced within a macro. Rust macros are hygienic in that they do not allow one to write the sorts of bugs above.

At a high level, hygiene within the rust compiler is accomplished by keeping track of the context where a name is introduced and used. We can then disambiguate names based on that context. Future iterations of the macro system will allow greater control to the macro author to use that context. For example, a macro author may want to introduce a new name to the context where the macro was called. Alternately, the macro author may be defining a variable for use only within the macro (i.e. it should not be visible outside the macro).

In rustc, this "context" is tracked via Spans.

TODO: what is call-site hygiene? what is def-site hygiene?

TODO

Procedural Macros

TODO

Custom Derive

TODO

TODO: maybe something about macros 2.0?

Discussion about hygiene

The rest of this chapter is a dump of a discussion between mark-i-m and petrochenkov about Macro Expansion and Hygiene. I am pasting it here so that it never gets lost until we can make it into a proper chapter.

mark-i-m: @Vadim Petrochenkov Hi :wave:
I was wondering if you would have a chance sometime in the next month or so to
just have a zulip discussion where you tell us (WG-learning) everything you
know about macros/expansion/hygiene. We were thinking this could be less formal
(and less work for you) than compiler lecture series lecture... thoughts?

mark-i-m: The goal is to fill out that long-standing gap in the rustc-dev-guide

Vadim Petrochenkov: Ok, I'm at UTC+03:00 and generally available in the
evenings (or weekends).

mark-i-m: @Vadim Petrochenkov Either of those works for me (your evenings are
about lunch time for me :) ) Is there a particular date that would work best
for you?

mark-i-m: @WG-learning Does anyone else have a preferred date?

    Vadim Petrochenkov:

    Is there a particular date that would work best for you?

Nah, not much difference.  (If something changes for a specific day, I'll
notify.)

Santiago Pastorino: week days are better, but I'd say let's wait for @Vadim
Petrochenkov to say when they are ready for it and we can set a date

Santiago Pastorino: also, we should record this so ... I guess it doesn't
matter that much when :)

    mark-i-m:

    also, we should record this so ... I guess it doesn't matter that much when
    :)

@Santiago Pastorino My thinking was to just use zulip, so we would have the log

mark-i-m: @Vadim Petrochenkov @WG-learning How about 2 weeks from now: July 24
at 5pm UTC time (if I did the math right, that should be evening for Vadim)

Amanjeev Sethi: i can try and do this but I am starting a new job that week so
cannot promise.

    Santiago Pastorino:

    Vadim Petrochenkov @WG-learning How about 2 weeks from now: July 24 at 5pm
    UTC time (if I did the math right, that should be evening for Vadim)

works perfect for me

Santiago Pastorino: @mark-i-m I have access to the compiler calendar so I can
add something there

Santiago Pastorino: let me know if you want to add an event to the calendar, I
can do that

Santiago Pastorino: how long it would be?

    mark-i-m:

    let me know if you want to add an event to the calendar, I can do that

mark-i-m: That could be good :+1:

    mark-i-m:

    how long it would be?

Let's start with 30 minutes, and if we need to schedule another we cna

    Vadim Petrochenkov:

    5pm UTC

1-2 hours later would be better, 5pm UTC is not evening enough.

Vadim Petrochenkov: How exactly do you plan the meeting to go (aka how much do
I need to prepare)?

    Santiago Pastorino:

        5pm UTC

    1-2 hours later would be better, 5pm UTC is not evening enough.

Scheduled for 7pm UTC then

    Santiago Pastorino:

    How exactly do you plan the meeting to go (aka how much do I need to
    prepare)?

/cc @mark-i-m

mark-i-m: @Vadim Petrochenkov

    How exactly do you plan the meeting to go (aka how much do I need to
    prepare)?

My hope was that this could be less formal than for a compiler lecture series,
but it would be nice if you could have in your mind a tour of the design and
the code

That is, imagine that a new person was joining the compiler team and needed to
get up to speed about macros/expansion/hygiene. What would you tell such a
person?

mark-i-m: @Vadim Petrochenkov Are we still on for tomorrow at 7pm UTC?

Vadim Petrochenkov: Yes.

Santiago Pastorino: @Vadim Petrochenkov @mark-i-m I've added an event on rust
compiler team calendar

mark-i-m: @WG-learning @Vadim Petrochenkov Hello!

mark-i-m: We will be starting in ~7 minutes

mark-i-m: :wave:

Vadim Petrochenkov: I'm here.

mark-i-m: Cool :)

Santiago Pastorino: hello @Vadim Petrochenkov

mark-i-m: Shall we start?

mark-i-m: First off, @Vadim Petrochenkov Thanks for doing this!

Vadim Petrochenkov: Here's some preliminary data I prepared.

Vadim Petrochenkov: Below I'll assume #62771 and #62086 has landed.

Vadim Petrochenkov: Where to find the code: librustc_span/hygiene.rs -
structures related to hygiene and expansion that are kept in global data (can
be accessed from any Ident without any context) librustc_span/lib.rs - some
secondary methods like macro backtrace using primary methods from hygiene.rs
librustc_builtin_macros - implementations of built-in macros (including macro attributes
and derives) and some other early code generation facilities like injection of
standard library imports or generation of test harness.  librustc_ast/config.rs -
implementation of cfg/cfg_attr (they treated specially from other macros),
should probably be moved into librustc_ast/ext.  librustc_ast/tokenstream.rs +
librustc_ast/parse/token.rs - structures for compiler-side tokens, token trees,
and token streams.  librustc_ast/ext - various expansion-related stuff
librustc_ast/ext/base.rs - basic structures used by expansion
librustc_ast/ext/expand.rs - some expansion structures and the bulk of expansion
infrastructure code - collecting macro invocations, calling into resolve for
them, calling their expanding functions, and integrating the results back into
AST librustc_ast/ext/placeholder.rs - the part of expand.rs responsible for
"integrating the results back into AST" basicallly, "placeholder" is a
temporary AST node replaced with macro expansion result nodes
librustc_ast/ext/builer.rs - helper functions for building AST for built-in macros
in librustc_builtin_macros (and user-defined syntactic plugins previously), can probably
be moved into librustc_builtin_macros these days librustc_ast/ext/proc_macro.rs +
librustc_ast/ext/proc_macro_server.rs - interfaces between the compiler and the
stable proc_macro library, converting tokens and token streams between the two
representations and sending them through C ABI librustc_ast/ext/tt -
implementation of macro_rules, turns macro_rules DSL into something with
signature Fn(TokenStream) -> TokenStream that can eat and produce tokens,
@mark-i-m knows more about this librustc_resolve/macros.rs - resolving macro
paths, validating those resolutions, reporting various "not found"/"found, but
it's unstable"/"expected x, found y" errors librustc_middle/hir/map/def_collector.rs +
librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly
expanded from a macro into various parent/child structures like module
hierarchy or "definition paths"

Primary structures: HygieneData - global piece of data containing hygiene and
expansion info that can be accessed from any Ident without any context ExpnId -
ID of a macro call or desugaring (and also expansion of that call/desugaring,
depending on context) ExpnInfo/InternalExpnData - a subset of properties from
both macro definition and macro call available through global data
SyntaxContext - ID of a chain of nested macro definitions (identified by
ExpnIds) SyntaxContextData - data associated with the given SyntaxContext,
mostly a cache for results of filtering that chain in different ways Span - a
code location + SyntaxContext Ident - interned string (Symbol) + Span, i.e. a
string with attached hygiene data TokenStream - a collection of TokenTrees
TokenTree - a token (punctuation, identifier, or literal) or a delimited group
(anything inside ()/[]/{}) SyntaxExtension - a lowered macro representation,
contains its expander function transforming a tokenstream or AST into
tokenstream or AST + some additional data like stability, or a list of unstable
features allowed inside the macro.  SyntaxExtensionKind - expander functions
may have several different signatures (take one token stream, or two, or a
piece of AST, etc), this is an enum that lists them
ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing
the expander signatures (TODO: change and rename the signatures into something
more consistent) trait Resolver - a trait used to break crate dependencies (so
resolver services can be used in librustc_ast, despite librustc_resolve and pretty
much everything else depending on librustc_ast) ExtCtxt/ExpansionData - various
intermediate data kept and used by expansion infra in the process of its work
AstFragment - a piece of AST that can be produced by a macro (may include
multiple homogeneous AST nodes, like e.g. a list of items) Annotatable - a
piece of AST that can be an attribute target, almost same thing as AstFragment
except for types and patterns that can be produced by macros but cannot be
annotated with attributes (TODO: Merge into AstFragment) trait MacResult - a
"polymorphic" AST fragment, something that can turn into a different
AstFragment depending on its context (aka AstFragmentKind - item, or
expression, or pattern etc.) Invocation/InvocationKind - a structure describing
a macro call, these structures are collected by the expansion infra
(InvocationCollector), queued, resolved, expanded when resolved, etc.

Primary algorithms / actions: TODO

mark-i-m: Very useful :+1:

mark-i-m: @Vadim Petrochenkov Zulip doesn't have an indication of typing, so
I'm not sure if you are waiting for me or not

Vadim Petrochenkov: The TODO part should be about how a crate transitions from
the state "macros exist as written in source" to "all macros are expanded", but
I didn't write it yet.

Vadim Petrochenkov: (That should probably better happen off-line.)

Vadim Petrochenkov: Now, if you have any questions?

mark-i-m: Thanks :)

mark-i-m: /me is still reading :P

mark-i-m: Ok

mark-i-m: So I guess my first question is about hygiene, since that remains the
most mysterious to me... My understanding is that the parser outputs AST nodes,
where each node has a Span

mark-i-m: In the absence of macros and desugaring, what does the syntax context
of an AST node look like?

mark-i-m: @Vadim Petrochenkov

Vadim Petrochenkov: Not each node, but many of them.  When a node is not
macro-expanded, its context is 0.

Vadim Petrochenkov: aka SyntaxContext::empty()

Vadim Petrochenkov: it's a chain that consists of one expansion - expansion 0
aka ExpnId::root.

mark-i-m: Do all expansions start at root?

Vadim Petrochenkov: Also, SyntaxContext:empty() is its own father.

mark-i-m: Is this actually stored somewhere or is it a logical value?

Vadim Petrochenkov: All expansion hyerarchies (there are several of them) start
at ExpnId::root.

Vadim Petrochenkov: Vectors in HygieneData has entries for both ctxt == 0 and
expn_id == 0.

Vadim Petrochenkov: I don't think anyone looks into them much though.

mark-i-m: Ok

Vadim Petrochenkov: Speaking of multiple hierarchies...

mark-i-m: Go ahead :)

Vadim Petrochenkov: One is parent (expn_id1) -> parent(expn_id2) -> ...

Vadim Petrochenkov: This is the order in which macros are expanded.

Vadim Petrochenkov: Well.

Vadim Petrochenkov: When we are expanding one macro another macro is revealed
in its output.

Vadim Petrochenkov: That's the parent-child relation in this hierarchy.

Vadim Petrochenkov: InternalExpnData::parent is the child->parent link.

mark-i-m: So in the above chain expn_id1 is the child?

Vadim Petrochenkov: Yes.

Vadim Petrochenkov: The second one is parent (SyntaxContext1) ->
parent(SyntaxContext2) -> ...

Vadim Petrochenkov: This is about nested macro definitions.  When we are
expanding one macro another macro definition is revealed in its output.

Vadim Petrochenkov: SyntaxContextData::parent is the child->parent link here.

Vadim Petrochenkov: So, SyntaxContext is the whole chain in this hierarchy, and
outer_expns are individual elements in the chain.

mark-i-m: So for example, suppose I have the following:

macro_rules! foo { () => { println!(); } }

fn main() { foo!(); }

Then AST nodes that are finally generated would have parent(expn_id_println) ->
parent(expn_id_foo), right?

Vadim Petrochenkov: Pretty common construction (at least it was, before
refactorings) is SyntaxContext::empty().apply_mark(expn_id), which means...

    Vadim Petrochenkov:

    Then AST nodes that are finally generated would have
    parent(expn_id_println) -> parent(expn_id_foo), right?

Yes.

    mark-i-m:

    and outer_expns are individual elements in the chain.

Sorry, what is outer_expns?

Vadim Petrochenkov: SyntaxContextData::outer_expn

mark-i-m: Thanks :) Please continue

Vadim Petrochenkov: ...which means a token produced by a built-in macro (which
is defined in the root effectively).

mark-i-m: Where does the expn_id come from?

Vadim Petrochenkov: Or a stable proc macro, which are always considered to be
defined in the root because they are always cross-crate, and we don't have the
cross-crate hygiene implemented, ha-ha.

    Vadim Petrochenkov:

    Where does the expn_id come from?

Vadim Petrochenkov: ID of the built-in macro call like line!().

Vadim Petrochenkov: Assigned continuously from 0 to N as soon as we discover
new macro calls.

mark-i-m: Sorry, I didn't quite understand. Do you mean that only built-in
macros receive continuous IDs?

Vadim Petrochenkov: So, the second hierarchy has a catch - the context
transplantation hack -
https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732.

    Vadim Petrochenkov:

    Do you mean that only built-in macros receive continuous IDs?

Vadim Petrochenkov: No, all macro calls receive ID.

Vadim Petrochenkov: Built-ins have the typical pattern
SyntaxContext::empty().apply_mark(expn_id) for syntax contexts produced by
them.

mark-i-m: I see, but this pattern is only used for built-ins, right?

Vadim Petrochenkov: And also all stable proc macros, see the comments above.

mark-i-m: Got it

Vadim Petrochenkov: The third hierarchy is call-site hierarchy.

Vadim Petrochenkov: If foo!(bar!(ident)) expands into ident

Vadim Petrochenkov: then hierarchy 1 is root -> foo -> bar -> ident

Vadim Petrochenkov: but hierarchy 3 is root -> ident

Vadim Petrochenkov: ExpnInfo::call_site is the child-parent link in this case.

mark-i-m: When we expand, do we expand foo first or bar? Why is there a
hierarchy 1 here? Is that foo expands first and it expands to something that
contains bar!(ident)?

Vadim Petrochenkov: Ah, yes, let's assume both foo and bar are identity macros.

Vadim Petrochenkov: Then foo!(bar!(ident)) -> expand -> bar!(ident) -> expand
-> ident

Vadim Petrochenkov: If bar were expanded first, that would be eager expansion -
https://github.com/rust-lang/rfcs/pull/2320.

mark-i-m: And after we expand only foo! presumably whatever intermediate state
has heirarchy 1 of root->foo->(bar_ident), right?

Vadim Petrochenkov: (We have it hacked into some built-in macros, but not
generally.)

    Vadim Petrochenkov:

    And after we expand only foo! presumably whatever intermediate state has
    heirarchy 1 of root->foo->(bar_ident), right?

Vadim Petrochenkov: Yes.

mark-i-m: Got it :)

mark-i-m: It looks like we have ~5 minutes left. This has been very helpful
already, but I also have more questions. Shall we try to schedule another
meeting in the future?

Vadim Petrochenkov: Sure, why not.

Vadim Petrochenkov: A thread for offline questions-answers would be good too.

    mark-i-m:

    A thread for offline questions-answers would be good too.

I don't mind using this thread, since it already has a lot of info in it. We
also plan to summarize the info from this thread into the rustc-dev-guide.

    Sure, why not.

Unfortunately, I'm unavailable for a few weeks. Would August 21-ish work for
you (and @WG-learning )?

mark-i-m: @Vadim Petrochenkov Thanks very much for your time and knowledge!

mark-i-m: One last question: are there more hierarchies?

Vadim Petrochenkov: Not that I know of.  Three + the context transplantation
hack is already more complex than I'd like.

mark-i-m: Yes, one wonders what it would be like if one also had to think about
eager expansion...

Santiago Pastorino: sorry but I couldn't follow that much today, will read it
when I have some time later

Santiago Pastorino: btw https://github.com/rust-lang/rustc-dev-guide/issues/398

mark-i-m: @Vadim Petrochenkov Would 7pm UTC on August 21 work for a followup?

Vadim Petrochenkov: Tentatively yes.

mark-i-m: @Vadim Petrochenkov @WG-learning Does this still work for everyone?

Vadim Petrochenkov: August 21 is still ok.

mark-i-m: @WG-learning @Vadim Petrochenkov We will start in ~30min

Vadim Petrochenkov: Oh.  Thanks for the reminder, I forgot about this entirely.

mark-i-m: Hello!

Vadim Petrochenkov: (I'll be here in a couple of minutes.)

Vadim Petrochenkov: Ok, I'm here.

mark-i-m: Hi :)

Vadim Petrochenkov: Hi.

mark-i-m: so last time, we talked about the 3 context heirarchies

Vadim Petrochenkov: Right.

mark-i-m: Was there anything you wanted to add to that? If not, I think it
would be good to get a big-picture... Given some piece of rust code, how do we
get to the point where things are expanded and hygiene context is computed?

mark-i-m: (I'm assuming that hygiene info is computed as we expand stuff, since
I don't think you can discover it beforehand)

Vadim Petrochenkov: Ok, let's move from hygiene to expansion.

Vadim Petrochenkov: Especially given that I don't remember the specific hygiene
algorithms like adjust in detail.

    Vadim Petrochenkov:

    Given some piece of rust code, how do we get to the point where things are
    expanded

So, first of all, the "some piece of rust code" is the whole crate.

mark-i-m: Just to confirm, the algorithms are well-encapsulated, right? Like a
function or a struct as opposed to a bunch of conventions distributed across
the codebase?

Vadim Petrochenkov: We run fully_expand_fragment in it.

    Vadim Petrochenkov:

    Just to confirm, the algorithms are well-encapsulated, right?

Yes, the algorithmic parts are entirely inside hygiene.rs.

Vadim Petrochenkov: Ok, some are in fn resolve_crate_root, but those are hacks.

Vadim Petrochenkov: (Continuing about expansion.) If fully_expand_fragment is
run not on a whole crate, it means that we are performing eager expansion.

Vadim Petrochenkov: Eager expansion is done for arguments of some built-in
macros that expect literals.

Vadim Petrochenkov: It generally performs a subset of actions performed by the
non-eager expansion.

Vadim Petrochenkov: So, I'll talk about non-eager expansion for now.

mark-i-m: Eager expansion is not exposed as a language feature, right? i.e. it
is not possible for me to write an eager macro?

Vadim Petrochenkov:
https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 (vvv The
link is explained below vvv )

    Vadim Petrochenkov:

    Eager expansion is not exposed as a language feature, right? i.e. it is not
    possible for me to write an eager macro?

Yes, it's entirely an ability of some built-in macros.

Vadim Petrochenkov: Not exposed for general use.

Vadim Petrochenkov: fully_expand_fragment works in iterations.

Vadim Petrochenkov: Iterations looks roughly like this:
- Resolve imports in our partially built crate as much as possible.
- Collect as many macro invocations as possible from our partially built crate
  (fn-like, attributes, derives) from the crate and add them to the queue.

    Vadim Petrochenkov: Take a macro from the queue, and attempt to resolve it.

    Vadim Petrochenkov: If it's resolved - run its expander function that
    consumes tokens or AST and produces tokens or AST (depending on the macro
    kind).

    Vadim Petrochenkov: (If it's not resolved, then put it back into the
    queue.)

Vadim Petrochenkov: ^^^ That's where we fill in the hygiene data associated
with ExpnIds.

mark-i-m: When we put it back in the queue?

mark-i-m: or do you mean the collect step in general?

Vadim Petrochenkov: Once we resolved the macro call to the macro definition we
know everything about the macro and can call set_expn_data to fill in its
properties in the global data.

Vadim Petrochenkov: I mean, immediately after successful resolution.

Vadim Petrochenkov: That's the first part of hygiene data, the second one is
associated with SyntaxContext rather than with ExpnId, it's filled in later
during expansion.

Vadim Petrochenkov: So, after we run the macro's expander function and got a
piece of AST (or got tokens and parsed them into a piece of AST) we need to
integrate that piece of AST into the big existing partially built AST.

Vadim Petrochenkov: This integration is a really important step where the next
things happen:
- NodeIds are assigned.

    Vadim Petrochenkov: "def paths"s and their IDs (DefIds) are created

    Vadim Petrochenkov: Names are put into modules from the resolver point of
    view.

Vadim Petrochenkov: So, we are basically turning some vague token-like mass
into proper set in stone hierarhical AST and side tables.

Vadim Petrochenkov: Where exactly this happens - NodeIds are assigned by
InvocationCollector (which also collects new macro calls from this new AST
piece and adds them to the queue), DefIds are created by DefCollector, and
modules are filled by BuildReducedGraphVisitor.

Vadim Petrochenkov: These three passes run one after another on every AST
fragment freshly expanded from a macro.

Vadim Petrochenkov: After expanding a single macro and integrating its output
we again try to resolve all imports in the crate, and then return to the big
queue processing loop and pick up the next macro.

Vadim Petrochenkov: Repeat until there's no more macros.  Vadim Petrochenkov:

mark-i-m: The integration step is where we would get parser errors too right?

mark-i-m: Also, when do we know definitively that resolution has failed for
particular ident?

    Vadim Petrochenkov:

    The integration step is where we would get parser errors too right?

Yes, if the macro produced tokens (rather than AST directly) and we had to
parse them.

    Vadim Petrochenkov:

    when do we know definitively that resolution has failed for particular
    ident?

So, ident is looked up in a number of scopes during resolution.  From closest
like the current block or module, to far away like preludes or built-in types.

Vadim Petrochenkov: If lookup is certainly failed in all of the scopes, then
it's certainly failed.

mark-i-m: This is after all expansions and integrations are done, right?

Vadim Petrochenkov: "Certainly" is determined differently for different scopes,
e.g. for a module scope it means no unexpanded macros and no unresolved glob
imports in that module.

    Vadim Petrochenkov:

    This is after all expansions and integrations are done, right?

For macro and import names this happens during expansions and integrations.

mark-i-m: Makes sense

Vadim Petrochenkov: For all other names we certainly know whether a name is
resolved successfully or not on the first attempt, because no new names can
appear.

Vadim Petrochenkov: (They are resolved in a later pass, see
librustc_resolve/late.rs.)

mark-i-m: And if at the end of the iteration, there are still things in the
queue that can't be resolve, this represents an error, right?

mark-i-m: i.e. an undefined macro?

Vadim Petrochenkov: Yes, if we make no progress during an iteration, then we
are stuck and that state represent an error.

Vadim Petrochenkov: We attempt to recover though, using dummies expanding into
nothing or ExprKind::Err or something like that for unresolved macros.

mark-i-m: This is for the purposes of diagnostics, though, right?

Vadim Petrochenkov: But if we are going through recovery, then compilation must
result in an error anyway.

Vadim Petrochenkov: Yes, that's for diagnostics, without recovery we would
stuck at the first unresolved macro or import.  Vadim Petrochenkov:

So, about the SyntaxContext hygiene...

Vadim Petrochenkov: New syntax contexts are created during macro expansion.

Vadim Petrochenkov: If the token had context X before being produced by a
macro, e.g. here ident has context SyntaxContext::root(): Vadim Petrochenkov:

macro m() { ident }

Vadim Petrochenkov: , then after being produced by the macro it has context X
-> macro_id.

Vadim Petrochenkov: I.e. our ident has context ROOT -> id(m) after it's
produced by m.

Vadim Petrochenkov: The "chaining operator" -> is apply_mark in compiler code.
Vadim Petrochenkov:

macro m() { macro n() { ident } }

Vadim Petrochenkov: In this example the ident has context ROOT originally, then
ROOT -> id(m), then ROOT -> id(m) -> id(n).

Vadim Petrochenkov: Note that these chains are not entirely determined by their
last element, in other words ExpnId is not isomorphic to SyntaxCtxt.

Vadim Petrochenkov: Couterexample: Vadim Petrochenkov:

macro m($i: ident) { macro n() { ($i, bar) } }

m!(foo);

Vadim Petrochenkov: foo has context ROOT -> id(n) and bar has context ROOT ->
id(m) -> id(n) after all the expansions.

mark-i-m: Cool :)

mark-i-m: It looks like we are out of time

mark-i-m: Is there anything you wanted to add?

mark-i-m: We can schedule another meeting if you would like

Vadim Petrochenkov: Yep, 23.06 already.  No, I think this is an ok point to
stop.

mark-i-m: :+1:

mark-i-m: Thanks @Vadim Petrochenkov ! This was very helpful

Vadim Petrochenkov: Yeah, we can schedule another one.  So far it's been like 1
hour of meetings per month? Certainly not a big burden.

Name resolution

Basics

In our programs we can refer to variables, types, functions, etc, by giving them a name. These names are not always unique. For example, take this valid Rust program:


#![allow(unused_variables)]
fn main() {
type x = u32;
let x: x = 1;
let y: x = 2;
}

How do we know on line 3 whether x is a type (u32) or a value (1)? These conflicts are resolved during name resolution. In this specific case, name resolution defines that type names and variable names live in separate namespaces and therefore can co-exist.

The name resolution in Rust is a two-phase process. In the first phase, which runs during macro expansion, we build a tree of modules and resolve imports. Macro expansion and name resolution communicate with each other via the Resolver trait.

The input to the second phase is the syntax tree, produced by parsing input files and expanding macros. This phase produces links from all the names in the source to relevant places where the name was introduced. It also generates helpful error messages, like typo suggestions, traits to import or lints about unused items.

A successful run of the second phase (Resolver::resolve_crate) creates kind of an index the rest of the compilation may use to ask about the present names (through the hir::lowering::Resolver interface).

The name resolution lives in the librustc_resolve crate, with the meat in lib.rs and some helpers or symbol-type specific logic in the other modules.

Namespaces

Different kind of symbols live in different namespaces ‒ e.g. types don't clash with variables. This usually doesn't happen, because variables start with lower-case letter while types with upper case one, but this is only a convention. This is legal Rust code that'll compile (with warnings):


#![allow(unused_variables)]
fn main() {
type x = u32;
let x: x = 1;
let y: x = 2; // See? x is still a type here.
}

To cope with this, and with slightly different scoping rules for these namespaces, the resolver keeps them separated and builds separate structures for them.

In other words, when the code talks about namespaces, it doesn't mean the module hierarchy, it's types vs. values vs. macros.

Scopes and ribs

A name is visible only in certain area in the source code. This forms a hierarchical structure, but not necessarily a simple one ‒ if one scope is part of another, it doesn't mean the name visible in the outer one is also visible in the inner one, or that it refers to the same thing.

To cope with that, the compiler introduces the concept of Ribs. This is abstraction of a scope. Every time the set of visible names potentially changes, a new rib is pushed onto a stack. The places where this can happen includes for example:

  • The obvious places ‒ curly braces enclosing a block, function boundaries, modules.
  • Introducing a let binding ‒ this can shadow another binding with the same name.
  • Macro expansion border ‒ to cope with macro hygiene.

When searching for a name, the stack of ribs is traversed from the innermost outwards. This helps to find the closest meaning of the name (the one not shadowed by anything else). The transition to outer rib may also change the rules what names are usable ‒ if there are nested functions (not closures), the inner one can't access parameters and local bindings of the outer one, even though they should be visible by ordinary scoping rules. An example:


#![allow(unused_variables)]
fn main() {
fn do_something<T: Default>(val: T) { // <- New rib in both types and values (1)
    // `val` is accessible, as is the helper function
    // `T` is accessible
    let helper = || { // New rib on `helper` (2) and another on the block (3)
        // `val` is accessible here
    }; // End of (3)
    // `val` is accessible, `helper` variable shadows `helper` function
    fn helper() { // <- New rib in both types and values (4)
        // `val` is not accessible here, (4) is not transparent for locals)
        // `T` is not accessible here
    } // End of (4)
    let val = T::default(); // New rib (5)
    // `val` is the variable, not the parameter here
} // End of (5), (2) and (1)
}

Because the rules for different namespaces are a bit different, each namespace has its own independent rib stack that is constructed in parallel to the others. In addition, there's also a rib stack for local labels (e.g. names of loops or blocks), which isn't a full namespace in its own right.

Overall strategy

To perform the name resolution of the whole crate, the syntax tree is traversed top-down and every encountered name is resolved. This works for most kinds of names, because at the point of use of a name it is already introduced in the Rib hierarchy.

There are some exceptions to this. Items are bit tricky, because they can be used even before encountered ‒ therefore every block needs to be first scanned for items to fill in its Rib.

Other, even more problematic ones, are imports which need recursive fixed-point resolution and macros, that need to be resolved and expanded before the rest of the code can be processed.

Therefore, the resolution is performed in multiple stages.

TODO:

This is a result of the first pass of learning the code. It is definitely incomplete and not detailed enough. It also might be inaccurate in places. Still, it probably provides useful first guidepost to what happens in there.

  • What exactly does it link to and how is that published and consumed by following stages of compilation?
  • Who calls it and how it is actually used.
  • Is it a pass and then the result is only used, or can it be computed incrementally (e.g. for RLS)?
  • The overall strategy description is a bit vague.
  • Where does the name Rib come from?
  • Does this thing have its own tests, or is it tested only as part of some e2e testing?

AST Validation

AST validation is the process of checking various correctness properties about the AST after macro expansion.

TODO: write this chapter.

Feature Gate Checking

TODO: this chapter

HIR

HIR ——“高级中间表示” ——是大多数rustc组件中使用的主要IR。 它是抽象语法树(AST)的编译器友好表示形式,该结构在语法分析,宏扩展和名称解析之后生成(有关如何创建HIR,请参见Lowering)。 HIR的许多部分都非常类似于Rust表面语法,但是Rust的某些表达形式已被“去糖”。 例如,for循环将转换为了loop,因此不会出现在HIR中不会出现for。 这使HIR比普通AST更易于分析。

本章介绍了HIR的主要概念。

您可以通过将-Zunpretty=hir-tree标志传递给rustc来查看代码的HIR表示形式:

cargo rustc -- -Zunpretty=hir-tree

带外存储和Crate类型

HIR中的顶层数据结构是Crate,它存储当前正在编译的crate的内容(我们只为当前crate构造HIR)。 在AST中,crate数据结构基本上只包含根模块,而HIRCrate结构则包含许多map和其他用于组织crate内容以便于访问的数据。

例如,HIR中单个项目的内容(例如模块,功能,特征,隐含等)不能在父级中立即访问。 因此,例如,如果有一个包含函数bar()的模块项目foo


#![allow(unused_variables)]
fn main() {
mod foo {
    fn bar() { }
}
}

那么在模块foo的HIR中表示(Mod结构)中将只有bar()的**ItemId** I。 要获取函数bar()的详细信息,我们将在items映射中查找I

这种表示的一个很好的结果是,可以通过遍历这些映射中的键值对来遍历crate中的所有项目(而无需遍历整个HIR)。 对于trait项和impl项以及“实体”(如下所述)也有类似的map。

使用这种表示形式的另一个原因是为了更好地与增量编译集成。 这样,如果您想要访问&rustc_hir::Item(例如modfoo),则不能立即访问函数bar()的内容。 相反,您只能访问bar()id,并且必须调用要求id作为参数的某些函数来查找bar的内容。 这使编译器有机会观察到您访问了bar()的数据,然后记录依赖。

HIR中的标识符

大多数必须处理HIR中事物的代码都倾向于不携带对HIR的引用,而是携带标识符号(或“ids”)。现在,您会发现四种正在使用的标识符:

  • DefId,主要标识“定义”或顶层项目。
    • 您可以认为DefId是非常明确和完整路径的简写,例如std::collections::HashMap。 但是,这些路径能够命名在普通Rust中无法命名的东西(例如impls),并且它们还包含有关crate的其他信息(例如其版本号,因为同一crate的两个版本可以共存)。
    • 一个DefId实际上由两部分组成,一个CrateNum(标识一个crate)和一个DefIndex(索引到每个箱子中维护的item列表)。
  • HirId,它将特定item的索引与该item内的偏移量结合在一起。
    • HirId的关键点是它是相对于某些其他项目(通过DefId标识)的偏移量。
  • BodyId引用了板条箱中的特定项目的实际内容(函数或常量的定义)。 当前,它实际上是“ newtype'd”的 HirId
  • NodeId,它是一个绝对ID,用于标识HIR树中的单个节点。
    • 尽管这种标识符仍很常用,但它们正在被逐步淘汰
    • 由于它们在crate中是绝对的,因此在树中的任何位置添加新节点都会导致包装箱中所有后续代码的NodeId发生更改。 你可能已经看出来了,这对增量编译极为不利。

我们还有一个内部map,是从DefId到所谓的 “Def path”的映射。 “ Def path”就像一个模块路径,但是内容更为丰富。 例如,可能是crate::foo::MyStruct唯一标识此定义。 它与模块路径有些不同,因为它可能包含类型参数T,例如crate::foo::MyStruct::T,在普通的Rust中,就不能这么写。 这些用于增量编译。

HIR Map

在大多数情况下,当您使用HIR时,您将通过 HIR Map进行操作,该map可通过tcx.hir_map在tcx中访问(并在hir::map模块中定义)。 HIR map包含多种方法,用于在各种ID之间进行转换并查找与HIR节点关联的数据。

例如,如果您有DefId,并且想将其转换为NodeId,则可以使用tcx.hir.as_local_node_id(def_id)。 这将返回一个Option<NodeId> —— 如果def-id引用了当前crate之外的内容(因为这种内容没有HIR节点),则将为None; 否则返回Some(n),其中n是定义的节点ID。

同样,您可以使用tcx.hir.find(n)在节点上查找NodeId。 这将返回一个Option<Node<'tcx>>,其中Node是在map中定义的枚举。

通过对此进行匹配,您可以找出node-id所指的节点类型,并获得指向数据本身的指针。 通常,您知道节点n是哪种类型——例如 如果您知道n必须是某些HIR表达式, 则可以执行tcx.hir.expect_expr(n),它将提取并返回&hir::Expr,此时如果n实际上不是一个表达式,那么会panic。

最后,您可以通过tcx.hir.get_parent_node(n)之类的调用,使用HIR map来查找节点的父节点。

HIR Bodies

rustc_hir::Body代表某种可执行代码,例如函数/闭包的主体或常量的定义。 body与一个所有者相关联,“所有者”通常是某种Item(例如,fn()const),但也可以是闭包表达式(例如, |x, y| x + y)。 您可以使用HIR映射来查找与给定def-id(maybe_body_owned_by)关联的body,或找到body的所有者(body_owner_def_id)。

Lowering

Lowering步骤将AST转换为HIR。 这意味着许多与类型分析或类似的语法无关分析无关的代码在这一阶段被删除了。 这种结构的例子包括但不限于

  • 括号
    • 无需替换,直接删除,树结构本身就能明确运算顺序
  • for 循环和 while (let) 循环
    • 转换为 loop + match 和一些 let binding
  • if let
    • 转换为 match
  • Universal impl Trait
    • 转换成范型参数(会添加flag来标志这些参数不是用户写的)
  • Existential impl Trait
    • 转换为虚拟的 existential type 声明

Lowering需要遵守几点,否则就会触发src/librustc_middle/hir/map/hir_id_validator.rs中的检查:

  1. 如果创建了一个HirId,那就必须使用它。 因此,如果您使用lower_node_id,则必须使用生成的NodeIdHirId(两个都可以,因为检查HIR中的NodeId时也会检查是否存在现有的HirIds)
  2. Lowering HirId必须在对item有所有权的作用域内完成。 这意味着如果要创建除当前正在Lower的item之外的其他item,则需要使用with_hir_id_owner。 例如,在lower existential的impl Trait 时会发生这种情况.
  3. 即使其HirId未使用,要放入HIR结构中的NodeId也必须被lower。 此时一个合理的方案是调用let _ = self.lower_node_id(node_id);
  4. 如果要创建在AST中不存在的新节点,则必须为它们创建新的ID。 这是通过调用next_id方法来完成的,该方法会生成一个新的NodeId并自动为您lowering它,以便您也可以获得HirId

如果您要创建新的DefId,由于每个DefId需要具有一个对应的NodeId,建议将这些NodeId添加到AST中,这样您就不必在lowering时生成新的DefId。 这样做的好处是创建了一种通过NodeId查找某物的DefID的方法。 如果lower操作需要在多个位置使用该DefId,则不能在所有这些位置生成一个新的NodeId,因为那样的话,您将获得多余的的DefId。 对于来自AST的NodeId来说,这不是问题。

有一个NodeId也允许了DefCollector生成DefId,而不需要立即进行操作。将DefId生成集中在一个地方可以使重构和推理变得更加容易。

HIR Debugging

The -Zunpretty=hir-tree flag will dump out the HIR.

If you are trying to correlate NodeIds or DefIds with source code, the --pretty expanded,identified flag may be useful.

TODO: anything else?

MIR (中层IR)

MIR 是 Rust's 中层中间表示. MIR是在RFC 1211中引入的。 它是Rust的一种非常简化的形式,用于某些对控制流敏感的安全检查——尤其是是借用检查器! ——以及优化和代码生成。 如果您想阅读对MIR非常层次的介绍,以及它所依赖的一些编译器概念(例如控制流图和简化),则可以欣赏介绍MIR的rust-lang博客文章

介绍 MIR

MIR 在 src/librustc_middle/mir/ 模块中定义,但许多操纵它的代码都在 src/librustc_mir.

MIR的一些核心特征有:

  • 它基于 控制流图
  • 他没有嵌套的表达式。
  • MIR中的所有类型都是完全显式的。

MIR核心词汇

本节介绍了MIR的关键概念,总结如下:

  • 基本块: 控制流图的单元,包含了:
    • 语句: 有一个后继的动作
    • 终结句: 可能有多个后继的动作,永远在块的末尾
    • (如果你对术语基本块不熟悉,见 背景知识)
  • 本地变量: 在堆栈上分配的内存位置(至少在概念上是这样),例如函数参数,局部变量和临时变量。 这些由索引标识,并带有前导下划线,例如_1。 还有一个特殊的“本地变量”(_0)分配来存储返回值。
  • 位置: 用来表达内存中一个位置的表达式,像_1 或者_1.f.
  • 右值: 生成一个值的表达式,“右”意味着这些表达式一般只会出现在赋值语句的右侧。
    • 操作数: 右值表达式的参数,可以是一个常数(如22)或者一个位置(如_1)。

通过将简单的程序转换为MIR并读取pretty print的输出,您可以了解MIR的结构。 实际上,playgroud使得此操作变得容易,因为它提供了一个MIR按钮,该按钮将向您显示程序的MIR。 尝试运行此程序(或单击此链接),然后单击顶部的“ MIR”按钮:

fn main() {
    let mut vec = Vec::new();
    vec.push(1);
    vec.push(2);
}

你会看见:

// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn main() -> () {
    ...
}

这是 main 函数的MIR格式。

变量定义 如果我们深入一些,我们可以看到函数以一些变量定义开始,他们看起来像这样:

let mut _0: ();                      // return place
let mut _1: std::vec::Vec<i32>;      // in scope 0 at src/main.rs:2:9: 2:16
let mut _2: ();
let mut _3: &mut std::vec::Vec<i32>;
let mut _4: ();
let mut _5: &mut std::vec::Vec<i32>;

您会看到MIR中的变量没有名称,而是具有索引,例如_0_1。 我们还将用户变量(例如_1)与临时值(例如_2_3)混为一谈。 但您还是可以区分出哪些是用户定义的变量,因为它们具有与之相关联的调试信息(请参见下文)。

用户变量的调试信息 在变量定义下面,我们能发现唯一能提醒我们 _1 代表的是一个用户变量的提示:

scope 1 {
    debug vec => _1;                 // in scope 1 at src/main.rs:2:9: 2:16
}

每个 debug <Name> => <Place>; 注解都描述了一个用户定义变量与调试器在哪里(即位置)能找到这个变量对应的数据。 这里这个映射非常简单,但优化可能会使得这个位置的使用情况复杂化,也可能会让多个用户变量共享同一个位置。 另外,闭包的捕获也是用同一套系统描述的,这种情况下,即使不进行优化,也已经很复杂了。如:debug x => (*((*_1).0: &T));

“scope”块(例如,scope 1 {..})描述了源程序的词法结构(某个名称在哪个作用域中), 因此,用// in scope 0中注释的程序的任何部分都看不到vec,在调试器中单步执行代码时就能发现这一点。

基本块:进一步阅读代码,我们能看到我们的第一个“基本块”(自然,当您查看它时,它看起来可能略有不同,我也省略了一些注释):

bb0: {
    StorageLive(_1);
    _1 = const <std::vec::Vec<T>>::new() -> bb2;
}

基本块由一系列语句和最终终结句定义。 在这个例子,有一个语句:

StorageLive(_1);

该语句表明变量 _1是“活动的”,这意味着它可以在以后使用 —— 它将持续存在,直到遇到 StorageDead(_1)语句为止,该语句表明变量_1已完成使用。 LLVM使用这些“存储语句”来分配栈空间。

bb0块的 终结句 是对 Vec::new的调用:

_1 = const <std::vec::Vec<T>>::new() -> bb2;

终结句和一般语句不同,它们能有多个后继 —— 控制流可能会流向不同的地方。 像 Vec::new 这样的函数调用永远是终结句,因为这可能可以导致堆栈解退,尽管在Vec::new的情况下显然堆栈解退是不可能的,因此我们只列出了唯一的后继块bb2

如果我们继续向前看到 bb2,我们可以看见像这样的代码:

bb2: {
    StorageLive(_3);
    _3 = &mut _1;
    _2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4];
}

这里有两个语句:另一个 StorageLive,引入了 _3临时变量,然后是一个赋值:

_3 = &mut _1;

赋值一般有形式:

<Place> = <Rvalue>

位置是类似于_3_ 3.f* _3的表达式——它表示内存中的位置。 右值是一个创建值的表达式:在这种情况下,rvalue是一个可变借用表达式,看起来像&mut <Place>。 因此,我们可以为右值定义语法,如下所示:

<Rvalue>  = & (mut)? <Place>
          | <Operand> + <Operand>
          | <Operand> - <Operand>
          | ...

<Operand> = Constant
          | copy Place
          | move Place

从该语法可以看出,右值不能嵌套——它们只能引用位置和常量。 此外,当您使用某个位置时,我们会指明是要复制该位置(要求该位置的类型为 T: Copy)还是移动它(适用于 任何类型的位置)。 因此,例如,如果我们在Rust中写了表达式x = a + b + c,它将被编译为两个语句和一个临时变量:

TMP1 = a + b
x = TMP1 + c

试试看,你可能想要使用release模式来编译来跳过overflow检查)

MIR 中的数据类型

MIR中的数据类型的定义在 src/librustc_middle/mir/模块中。 前面章节提到的关键概念都有一个直接对应的Rust类型。

MIR的主要数据类型为Mir。 它包含单个函数的数据(以及Mir的“提升过的常量”的子实例,您可以在下面阅读其中的内容)。

  • 基本块: 基本块被保存在 basic_blocks成员中;这是一个BasicBlockData向量。 我们不会直接引用一个基本块,代替地,我们会传递BasicBlock值,其实际上是newtype过的这个向量中的索引。
  • 语句Statement类型表示。
  • 终结句Terminator类型表示。
  • 本地变量 由类型 Localnewtype过的索引)表示。 本地变量的实际数据保存在Mir中的local_decls。 也有一个特殊的常量RETURN_PLACE来标记一个特殊的表示返回值的本地变量。
  • 位置 由枚举 Place表示。有如下变种:
    • 本地变量如 _1
    • 静态变量如 FOO
    • 投影,这一般是结构的成员或者从某个基位置“投影”出来的位置。 例如_1.f就是从)1上投影出来的。 *_1也是一个投影,这类投影由 ProjectionElem::Deref 代表。
  • RvaluesRvalue枚举表示。
  • OperandsOperand 枚举表示。

表示常量

to be written

提升过的常量

to be written

HAIR 和 MIR 的构建

HIR lower到 MIR 的过程会在下面(可能不完整)这些item上进行:

  • 函数体和闭包体
  • staticconst的初始化
  • 枚举判别的初始化
  • 任何类型的胶水和补充代码
    • Tuple结构体的初始化函数
    • Drop 代码 ( Drop::drop 函数不会直接被调用)
    • 没有显式 Drop 实现的对象的drop

Lowering是通过调用mir_built查询触发的。 HIRMIR之间有一个中间表示,称为HAIR,这仅在lowering过程中使用。 HAIR的最重要特征是各种调整(在没有显式语法的情况下发生),例如隐式类型转换,自动解引用,自动引用和重载方法调用,已成为显式强制转换,解引用操作,引用操作和具体的函数调用。

HAIR的数据类型与HIR数据类型类似, 但是例如-xhair::ExprKind::Neg(hir::Expr) , 而不是hair::ExprKind::Neg(hair::Expr)。 这种shallow的特性使HAIR能够表示HIR具有的所有数据类型,而不必创建整个HIR的副本。 MIR lowering 将首先将最上面的表达式从HIR转换为HAIR(在rustc_mir_build::hair::cx::expr中),然后递归处理HAIR表达式。

Lowering会为函数签名中指定的每个参数创建局部变量。 接下来,它为指定的每个绑定创建局部变量(例如, (a, b): (i32, String))产生3个绑定,一个用于参数,两个用于绑定。 接下来,它生成字段访问,该访问从参数读取字段并将其值写入绑定变量。

在解决了初始化的情况下,lowering为函数体递归生成MIR( Block 表达式)并将结果写入RETURN_PLACE

unpack! 所有东西

生成MIR的函数有两种模式。 第一种情况,如果该函数仅生成语句,则它将以基本块作为参数,这些语句应放入该基本块。 然后可以正常返回结果:

fn generate_some_mir(&mut self, block: BasicBlock) -> ResultType {
   ...
}

但是还有其他一些函数会生成新的基本块。 例如,lowering像if foo { 22 } else { 44 }这样的表达式需要生成一个小的“菱形图”。 在这种情况下,函数将在其代码开始处使用一个基本块,并在代码生成结束时返回一个(可能)新的基本块。 BlockAnd类型用于表示此类情况:

fn generate_more_mir(&mut self, block: BasicBlock) -> BlockAnd<ResultType> {
    ...
}

当您调用这些函数时,通常有一个局部变量block,它实际上是一个“光标”。 它代表了我们要添加新的MIR的位置。 当调用generate_more_mir时,您会想更新该光标。 您可以手动执行此操作,但这很繁琐:

let mut block;
let v = match self.generate_more_mir(..) {
    BlockAnd { block: new_block, value: v } => {
        block = new_block;
        v
    }
};

For this reason, we offer a macro that lets you write let v = unpack!(block = self.generate_more_mir(...)). It simply extracts the new block and overwrites the variable block that you named in the unpack!.

因此,我们提供了一个宏,可让您编写 let v = unpack!(block = self.generate_more_mir(...))。 它简单地提取新的块并覆盖在unpack!中指明的变量block

将表达式 Lowering 到 MIR

本质上一个表达式可以有四种表示形式:

  • Place 指一个(或一部分)已经存在的内存地址(本地,静态,或者提升过的)
  • Rvalue 是可以给一个Place赋值的东西
  • Operand 是一个给像 + 这样的运算符或者一个函数调用的参数
  • 一个存放了一个值的拷贝的临时变量

下图描绘了表示之间的交互的一般概述:

点此看大图

我们首先将函数体lowering到一个 Rvalue,这样我们就可以为 RETURN_PLACE 创建一个赋值, 这个Rvalue的lowering反过来会触发其参数的Operand lowering(如果有的话) lowering Operaper会产生一个const操作数,或者移动/复制出Place,从而触发Place lowering。 如果降低的表达式包含操作,则lowering到Place的表达式可以触发创建一个临时变量。 这是蛇咬自己的尾巴的地方,我们需要触发Rvalue lowering,以将表达式的值写入本地变量。

Operator lowering

内置类型的运算符不会lower为函数调用(这将导致无限递归调用,因为trait包含了操作本身)。相反,存在用于二元和一元运算符和索引运算的Rvalue。 这些Rvalue稍后将生成为llvm基本操作或llvm内部函数。

所有其他类型的运算符都被lower为对运算符对应特征的impl的函数调用。

无论采用哪种lower方式,运算符的参数都会lower为Operand。 这意味着所有参数都是常量,或者引用局部或静态位置中已经存在的值。

方法调用的 lowering

方法调用被降低到与一般函数调用相同的TerminatorKind。 在MIR中,方法调用和一般函数调用之间不再存在差异。

条件

不带字段变量的enumif条件判断和match语句都会被lower为TerminatorKind::SwitchInt。 每个可能的值(如果为if条件判断,则对应的值为01)都有一个对应的BasicBlock。 分支的参数是表示if条件值的Operand

模式匹配

具有字段enummatch语句也被lower为TerminatorKind::SwitchInt,但是操作数是一个Place,可以在其中找到该值的判别式。 这通常涉及将判别式读取为新的临时变量。

聚合构造

任何类型的聚合值(例如结构或元组)都是通过Rvalue::Aggregate建立的。 所有字段都lower为Operator。 从本质上讲,这等效于每个聚合字段都会有一个赋值语句,再加上一个对enum的判别式的赋值。

MIR 访问者

MIR访问者是遍历MIR并查找事物或对其进行更改的便捷工具。 访问者trait特征是在rustc ::mir::visit模块中定义的 ——其中有两个是通过宏生成的:Visitor(在&Mir上运行并返回共享引用)和MutVisitor(在&mut Mir上运行并返回可变引用)。

要实现访问者,您必须创建一个代表这个访问者的类型。 通常,此类型希望在处理MIR时“挂起”到您需要的任何状态:

struct MyVisitor<...> {
    tcx: TyCtxt<'tcx>,
    ...
}

然后您可以为那个类型实现 VisitorMutVisitor trait:

impl<'tcx> MutVisitor<'tcx> for NoLandingPads {
    fn visit_foo(&mut self, ...) {
        ...
        self.super_foo(...);
    }
}

如上所示,在impl中,您可以覆盖任何visit_foo方法(例如,visit_terminator),以便编写一些在看到foo时执行的代码。 如果您想递归遍历foo的内容,则可以调用super_foo方法。 (注意。您永远都不应该覆盖super_foo

可以在NoLandingPads中找到一个非常简单的访问者示例。该访问者甚至不需要任何状态:它仅访问所有终止符并删除其unwind后继。

遍历

In addition the visitor, the rustc::mir::traversal module contains useful functions for walking the MIR CFG in different standard orders (e.g. pre-order, reverse post-order, and so forth).

除了访问者之外, rustc::mir::traversal模块也包含了用于按不同的标准顺序(例如,先根序、后根序等等)便利MIR CFG的实用函数。

MIR passes

如果您想获取某个函数(或常量,等等)的MIR,则可以使用optimized_mir(def_id)查询。 这将返回给您最终的,优化的MIR。 对于外部def-id,我们只需从其他crate的元数据中读取MIR。 但是对于本地def-id,查询将构造MIR,然后通过应用一系列pass来迭代优化它。 本节描述了这些pass的工作方式以及如何扩展它们。

为了为给定的def-idD生成optimized_mir(D),MIR会通过几组优化,每组均由一个查询表示。 每个套件都包含多个优化和转换。 这些套件代表有用的中间点,我们可以在这些中间点访问MIR以进行类型检查或其他目的:

  • mir_build(D) —— 这不是一个查询,而是构建初始MIR
  • mir_const(D) —— 应用一些简单的转换以使MIR准备进行常量求值;
  • mir_validated(D) —— 应用更多的转换,使MIR准备好进行借用检查;
  • optimized_mir(D) —— 完成所有优化后的最终状态。

实现并注册一个pass

MirPass是处理MIR的一些代码,通常(但不总是)以某种方式对其进行转换。 例如,它可能会执行优化。 MirPass trait本身可在rustc_mir::transform模块中找到,它基本上由一个run_pass方法组成, 该方法仅要求一个&mut Mir(以及tcx及有关其来源的一些信息)。 因此,其是对MIR进行原地修改而非重新构造(这有助于保持效率)。

基本的MIR pass的一个很好的例子是NoLandingPads,它遍历MIR并删除由于unwinding而产生的所有边——当配置为panic=abort时,它unwind永远不会发生。 从源代码可以看到,MIR pass是通过首先定义虚拟类型(无字段的结构)来定义的,例如:


#![allow(unused_variables)]
fn main() {
struct MyPass;
}

接下来您可以为它实现MirPass特征。 然后,您可以将此pass插入到在诸如optimized_mirmir_validated等查询的pass的适当列表中。 (如果这是一种优化,则应进入optimized_mir列表中。)

如果您要写pass,则很有可能要使用MIR访问者。 MIR访问者是一种方便的方法,可以遍历MIR的所有部分,以进行搜索或进行少量编辑。

窃取

中间查询mir_const()mir_validated()产生一个使用tcx.alloc_steal_mir()分配的&tcx Steal<Mir<'tcx >>。 这表明结果可能会被下一组优化窃取 —— 这是一种避免克隆MIR的优化。 尝试使用被窃取的结果会导致编译器panic。 因此,重要的是,除了作为MIR处理管道的一部分之外,不要直接从这些中间查询中读取信息。

由于存在这种窃取机制,因此还必须注意确保在处理管道中特定阶段的MIR被窃取之前,任何想要读取的信息都已经被读取完了。 具体来说,这意味着如果您有一些查询foo(D)要访问mir_const(D)mir_validated(D)的结果, 则需要让后继使用ty::queries::foo::force(...)强制传递foo(D)。 即使您不直接要求查询结果,这也会强制执行查询。

例如,考虑MIR const限定。 它想读取由mir_const()套件产生的结果。 但是,该结果将被mir_validated()套件窃取。 如果什么都不做,那么如果mir_const_qualif(D)mir_validated(D)之前执行,它将成功执行,否则就会失败。 因此,mir_validated(D)会在实际窃取之前对mir_const_qualif进行强制执行, 从而确保读取已发生(请记住查询已被记忆,因此执行第二次查询时只是从缓存加载):

mir_const(D) --read-by--> mir_const_qualif(D)
     |                       ^
  stolen-by                  |
     |                    (forces)
     v                       |
mir_validated(D) ------------+

这种机制有点偷奸耍滑的感觉。 rust-lang/rust#41710 中讨论了更优雅的替代方法。

Closure Expansion in rustc

This section describes how rustc handles closures. Closures in Rust are effectively "desugared" into structs that contain the values they use (or references to the values they use) from their creator's stack frame. rustc has the job of figuring out which values a closure uses and how, so it can decide whether to capture a given variable by shared reference, mutable reference, or by move. rustc also has to figure out which the closure traits (Fn, FnMut, or FnOnce) a closure is capable of implementing.

Let's start with a few examples:

Example 1

To start, let's take a look at how the closure in the following example is desugared:

fn closure(f: impl Fn()) {
    f();
}

fn main() {
    let x: i32 = 10;
    closure(|| println!("Hi {}", x));  // The closure just reads x.
    println!("Value of x after return {}", x);
}

Let's say the above is the content of a file called immut.rs. If we compile immut.rs using the following command. The -Zdump-mir=all flag will cause rustc to generate and dump the MIR to a directory called mir_dump.

> rustc +stage1 immut.rs -Zdump-mir=all

After we run this command, we will see a newly generated directory in our current working directory called mir_dump, which will contain several files. If we look at file rustc.main.-------.mir_map.0.mir, we will find, among other things, it also contains this line:

_4 = &_1;
_3 = [closure@immut.rs:7:13: 7:36] { x: move _4 };

Note that in the MIR examples in this chapter, _1 is x.

Here in first line _4 = &_1;, the mir_dump tells us that x was borrowed as an immutable reference. This is what we would hope as our closure just reads x.

Example 2

Here is another example:

fn closure(mut f: impl FnMut()) {
    f();
}

fn main() {
    let mut x: i32 = 10;
    closure(|| {
        x += 10;  // The closure mutates the value of x
        println!("Hi {}", x)
    });
    println!("Value of x after return {}", x);
}
_4 = &mut _1;
_3 = [closure@mut.rs:7:13: 10:6] { x: move _4 };

This time along, in the line _4 = &mut _1;, we see that the borrow is changed to mutable borrow. Fair enough! The closure increments x by 10.

Example 3

One more example:

fn closure(f: impl FnOnce()) {
    f();
}

fn main() {
    let x = vec![21];
    closure(|| {
        drop(x);  // Makes x unusable after the fact.
    });
    // println!("Value of x after return {:?}", x);
}
_6 = [closure@move.rs:7:13: 9:6] { x: move _1 }; // bb16[3]: scope 1 at move.rs:7:13: 9:6

Here, x is directly moved into the closure and the access to it will not be permitted after the closure.

Inferences in the compiler

Now let's dive into rustc code and see how all these inferences are done by the compiler.

Let's start with defining a term that we will be using quite a bit in the rest of the discussion - upvar. An upvar is a variable that is local to the function where the closure is defined. So, in the above examples, x will be an upvar to the closure. They are also sometimes referred to as the free variables meaning they are not bound to the context of the closure. src/librustc_middle/ty/query/mod.rs defines a query called upvars for this purpose.

Other than lazy invocation, one other thing that the distinguishes a closure from a normal function is that it can use the upvars. It borrows these upvars from its surrounding context; therefore the compiler has to determine the upvar's borrow type. The compiler starts with assigning an immutable borrow type and lowers the restriction (that is, changes it from immutable to mutable to move) as needed, based on the usage. In the Example 1 above, the closure only uses the variable for printing but does not modify it in any way and therefore, in the mir_dump, we find the borrow type for the upvar x to be immutable. In example 2, however, the closure modifies x and increments it by some value. Because of this mutation, the compiler, which started off assigning x as an immutable reference type, has to adjust it as a mutable reference. Likewise in the third example, the closure drops the vector and therefore this requires the variable x to be moved into the closure. Depending on the borrow kind, the closure has to implement the appropriate trait: Fn trait for immutable borrow, FnMut for mutable borrow, and FnOnce for move semantics.

Most of the code related to the closure is in the src/librustc_typeck/check/upvar.rs file and the data structures are declared in the file src/librustc_middle/ty/mod.rs.

Before we go any further, let's discuss how we can examine the flow of control through the rustc codebase. For closures specifically, set the RUST_LOG env variable as below and collect the output in a file:

> RUST_LOG=rustc_typeck::check::upvar rustc +stage1 -Zdump-mir=all \
    <.rs file to compile> 2> <file where the output will be dumped>

This uses the stage1 compiler and enables debug! logging for the rustc_typeck::check::upvar module.

The other option is to step through the code using lldb or gdb.

  1. rust-lldb build/x86_64-apple-darwin/stage1/bin/rustc test.rs
  2. In lldb:
    1. b upvar.rs:134 // Setting the breakpoint on a certain line in the upvar.rs file`
    2. r // Run the program until it hits the breakpoint

Let's start with upvar.rs. This file has something called the euv::ExprUseVisitor which walks the source of the closure and invokes a callbackfor each upvar that is borrowed, mutated, or moved.

fn main() {
    let mut x = vec![21];
    let _cl = || {
        let y = x[0];  // 1.
        x[0] += 1;  // 2.
    };
}

In the above example, our visitor will be called twice, for the lines marked 1 and 2, once for a shared borrow and another one for a mutable borrow. It will also tell us what was borrowed.

The callbacks are defined by implementing the Delegate trait. The InferBorrowKind type implements Delegate and keeps a map that records for each upvar which mode of borrow was required. The modes of borrow can be ByValue (moved) or ByRef (borrowed). For ByRef borrows, it can be shared, shallow, unique or mut as defined in the src/librustc_middle/mir/mod.rs.

Delegate defines a few different methods (the different callbacks): consume: for move of a variable, borrow for a borrow of some kind (shared or mutable), and mutate when we see an assignment of something.

All of these callbacks have a common argument cmt which stands for Category, Mutability and Type and is defined in src/librustc_middle/middle/mem_categorization.rs. Borrowing from the code comments, "cmt is a complete categorization of a value indicating where it originated and how it is located, as well as the mutability of the memory in which the value is stored". Based on the callback (consume, borrow etc.), we will call the relevant adjust_upvar_borrow_kind_for_ and pass the cmt along. Once the borrow type is adjusted, we store it in the table, which basically says what borrows were made for each closure.

self.tables
    .borrow_mut()
    .upvar_capture_map
    .extend(delegate.adjust_upvar_captures);

Part 4: 分析

本部分讨论了编译器用来检查代码的各种属性并通知以后阶段的许多分析。 通常,这就是人们谈论“Rust的类型系统”时的意思。 这包括类型,特征系统和借用检查器的表示,推断和检查。 这些分析不会作为一个大pass或一组连续pass进行。 相反,它们分布在整个编译过程的各个部分,并使用不同的中间表示形式。 例如,类型检查在HIR上进行,而借用检查在MIR上进行。 尽管如此,出于介绍的目的,我们将在本指南的这一部分中讨论所有这些分析。

ty 模块:类型的表示

ty模块定义了Rust编译器如何在内部表示类型。 它还定义了类型上下文tcxTyCtxt),这是编译器中的中央数据结构。

ty::Ty

当我们谈论rustc如何表示类型时,我们通常指的是称为Ty的类型。 编译器中有很多Ty的模块和类型(Ty 文档)。

我们指的Tyrustc::ty:: Ty(而不是rustc_hir::Ty)。它们之间的区别很重要,因此我们将在讨论ty::Ty之前先进行讨论。

rustc_hir::Ty vs ty::Ty

rustc中的HIR可以看作是高级中间表示。 它或多或少是一种AST(请参阅本章),因为它代表用户编写的语法,并且是在语法分析和一些desugaring之后获得的。 它具有类型的表示形式,但实际上它反映了用户编写的内容,即他们为表示该类型而编写的内容。

相反,ty::Ty表示类型的语义,即用户编写内容的含义。 例如,rustc_hir::Ty会记录用户在程序中使用了两次u32这个名字,但是ty::Ty会记录两种用法都指向同一类型。

例如: fn foo(x: u32) → u32 { } 在这个函数中,我们看到u32出现了两次。 我们知道这是同一类型,即该函数接受一个参数并返回相同类型的参数,但是从HIR的角度来看,将存在两个不同的类型实例,因为它们分别在程序中的两个不同位置出现。 也就是说,它们有两个不同的Span(位置)。

例如: fn foo(x: &u32) -> &u32) 另外,HIR可能遗漏了信息。 &u32类型是不完整的,因为在完整的Rust类型中实际上这里应该存在一个生命周期,但是我们不需要编写这些生命周期。 还有一些省略规则可以插入信息。 结果可能看起来像fn foo<'a>(x: &'a u32) -> &'a u32).

在HIR级别上,这些内容并未阐明。 但是,在ty::Ty级别,添加了这些详细信息。 此外,对于给定类型,我们将只有一个ty::Ty,例如u32,并且该ty::Ty用于整个程序中的所有u32,而不是只在特定场景中使用,这与 rustc_hir::Ty不同。

这里有一个总结:

rustc_hir::Tyty::Ty
描述类型的语法:用户写的内容(去除了一些语法糖)。描述一种类型的“语义”:用户写的内容的含义。
每个rustc_hir::Ty都有自己的span,对应于程序中的适当位置。与用户程序中的单个位置不对应。
rustc_hir::Ty具有泛型和生命周期; 但是,其中一些生命周期是特殊标记,例如LifetimeName::Implicitty::Ty具有完整的类型,包括泛型和生命周期,即使用户忽略了它们
fn foo(x: u32) → u32 { } —— 两个rustc_hir::Ty代表了u32的两次不同的使用。 每个都有自己的Span等。——rustc_hir::Ty不能告诉我们两者是同一类型整个程序中所有u32是同一个ty::Ty。——ty::Ty告诉我们,u32的两次使用表示相同的类型。
fn foo(x: &u32) -> &u32) —— 仍然有两个rustc_hir::Ty。 —— 在rustc_hir::Ty中这两个引用的生命期使用特殊标记`LifetimeName::Implicit表示。fn foo(x: &u32) -> &u32) —— 单个ty::Ty。—— ty::Ty具有隐藏的生命周期参数

次序 HIR是直接从AST构建的,因此会在生成任何ty::Ty之前发生。 构建HIR之后,将完成一些基本的类型推断和类型检查。 在类型推断过程中,我们找出所有事物的ty::Ty是什么,并且还要检查某事物的类型是否不明确。 然后,ty::Ty将用于类型检查,来确保所有内容都具有预期的类型。 astconv模块是负责将rustc_hir::Ty转换为ty::Ty的代码所在的位置。 这发生在类型检查阶段,但也发生在编译器的其他部分,例如“该函数需要什么样的参数类型”之类的问题。

语义如何驱动两个Ty实例 您可以将HIR视为对类型信息假设最少的视角。 我们假设两件事是截然不同的,直到证明它们是同一件事为止。 换句话说,我们对它们的了解较少,因此我们应该对它们的假设较少。

从文法上讲,第N行第20列的"u32"和第N行第35列的"u32"是两个字符串。我们尚不知道它们是否相同。 因此,在HIR中,我们将它们视为不同的。 后来,我们确定它们在语义上是相同的类型,这就是我们使用ty::Ty的地方。

考虑另一个例子: fn foo<T>(x: T) -> u32 假设有人调用了 foo::<u32>(0)

这意味着Tu32(在本次调用中)实际上是相同的类型,因此最终我们最终将得到相同的ty::Ty,但是我们有截然不同的rustc_hir::Ty

(不过,这有点过于简化了,因为在类型检查过程中,我们将对函数范型检查,并且仍然具有不同于u32T。 之后,在进行代码生成时,我们将始终处理每个函数的“单态化"(完全替换的)版本,因此我们将知道T代表什么(特别是它是u32)。

这里还有一个例子:


#![allow(unused_variables)]
fn main() {
mod a {
    type X = u32;
    pub fn foo(x: X) -> i32 { 22 }
}
mod b {
    type X = i32;
    pub fn foo(x: X) -> i32 { x }
}
}

显然,这里的X类型将根据上下文而变化。 如果查看rustc_hir::Ty,您会发现X在两种情况下都是别名(尽管它将通过名称解析映射到不同的别名)。 但是,如果您查看ty::Ty中的函数签名,它将是 fn(u32) -> u32fn(i32) -> i32(类型别名已完全展开)。

ty::Ty 的实现

rustc::ty::Ty实际上是&TyS的类型别名(稍后会详细介绍)。 TyS(Type Structure)是主要功能所在的位置。 您通常可以忽略TyS结构;您基本上永远不会显式访问它。我们总是使用Ty别名通过引用传递它。 唯一的例外是在类型上定义固有方法。 特别地,TyS具有类型为TyKindkind字段,其表示关键类型信息。 TyKind是一个很大的枚举,代表了不同类型的类型(例如原生类型,引用,抽象数据类型,泛型,生命周期等)。 TyS还有另外2个字段:flagsouter_exclusive_binder。 它们是提高效率的便捷工具,可以汇总有关我们可能想知道的类型的信息,但本文并不多涉及这部分内容。 最后,ty::TySinterned的,以便使ty::TyS可以是类似于指针的瘦类型。这使我们能够进行低成本的相等比较,以及其他的interning的好处。

分配和使用类型

要分配新类型,可以使用在tcx上定义的各种mk_方法。 它们的名称主要对应于各种类型。 例如:

let array_ty = tcx.mk_array(elem_ty, len * 2);

这些方法都返回Ty<'tcx> —— 注意,返回的生命周期是该tcx可以访问的生命周期。 类型总是被规范化和interned(因此我们永远不会两次分配完全相同的类型)。

注意 由于类型是interned的,因此可以使用==高效地比较它们是否相等 —— 但是,除非您碰巧正在散列并寻找重复项,否则您应该不会希望这么做。 这是因为在Rust中通常有多种方法来表示同一类型,特别是一旦涉及到类型推断。 如果要测试类型相等性,则可能需要开始研究类型推倒的代码才能正确完成。

您还可以通过访问tcx.types.booltcx.types.char等来在tcx中找到各种常见类型(有关更多信息,请参见 CommonTypes。)。

ty::TyKind 的变体

注意:TyKind 并非 Kind的函数式编程概念。

每当在编译器中使用Ty时,通常会在类型上进行匹配:

fn foo(x: Ty<'tcx>) {
  match x.kind {
    ...
  }
}

kind字段的类型为TyKind<'tcx>,它是一个枚举,用于定义编译器中所有不同种类的类型。

N.B. 在类型推断过程中检查类型的kind字段可能会很冒险,因为可能会有推断变量和其他要考虑的因素,或者有时类型未知,并且稍后将变得已知。

相关类型的很多,我们会及时介绍(例如,区域/生命周期,“替代”等)。

TyKind枚举上有很多变体,您可以通过查看rustdocs来看到。 这是一个样本:

代数数据类型(ADT) 代数数据类型structenumunion。 实际上,structenumunion是用相同的方式实现的:它们都是ty::TyKind::Adt类型。 这基本上是用户定义的类型。稍后我们将详细讨论。

Foreign 对应 extern type T.

Str 是str类型。当用户编写&str时,Str是我们表示该类型的str部分的方式。

Slice 对应 [T].

Array 对应 [T; n].

RawPtr 对应 *mut T 或者 *const T

Ref Ref代表安全的引用,&'a mut T&'a TRef具有一些相关类型,例如,Ty<tcx>是引用所引用的类型,Region<tcx>是引用的生命周期或区域,Mutability则是引用的可变性。

Param 代表类型参数,如Vec<T>中的T

Error 在某处表示类型错误,以便我们可以打印出更好的诊断信息。 我们将在后面讨论它。

以及更多...

Import 约定

尽管没有硬性规定,但是ty模块的用法通常如下:

use ty::{self, Ty, TyCtxt};

由于TyTyCtxt类型使用得非常普遍,因此可以直接导入。 其他类型通常使用显式的ty::前缀来引用(例如ty::TraitRef<'tcx>)。 但是某些模块选择显式导入更大或更小的名称集。

ADT的表示

让我们考虑像MyStruct<u32>这样的类型的例子,其中MyStruct的定义如下:

struct MyStruct<T> { x: u32, y: T }

类型MyStruct<u32>将是TyKind::Adt的实例:

Adt(&'tcx AdtDef, SubstsRef<'tcx>)
//  ------------  ---------------
//  (1)            (2)
//
// (1) 表示 `MyStruct` 部分
// (2) 表示 `<u32>`, 或者 "substitutions" / 范型参数

有两个部分:

  • AdtDef引用struct/enum/union,但没有类型参数的值。 在我们的示例中,这是MyStruct部分,没有参数u32。
    • 请注意,在HIR中,结构体,枚举和union的表示方式是不同的,但是在ty::Ty中,它们均使用TyKind::Adt表示。
  • SubstsRef是要替换的范型参数值的内部列表。 在我们的MyStruct<u32>的示例中,我们会得到一个类似[u32]的列表。 稍后,我们将进一步探讨泛型和替换。

AdtDefDefId

对于源代码中定义的每种类型,都有一个唯一的DefId(请参阅本章)。 这包括ADT和泛型。 在上面给出的MyStruct<T>定义中,有两个DefId:一个用于MyStruct,一个用于T。 注意,上面的代码不会为u32生成新的DefId,因为该代码并不定义u32(而仅是引用它)。

AdtDef或多或少是DefId的包装,其中包含许多有用的辅助方法。 AdtDefDefId之间本质上是一对一的关系。 您可以通过tcx.adt_def(def_id)查询DefId对应的AdtDef。 所有AdtDef都被缓存了(您可以看到其上的'tcx生命周期)。

类型错误

用户制造了类型错误时会生成TyKind::Error。 我们的想法是,我们将传播这种类型并抑制由于该类型而引起的其他错误,以免级联的编译器错误消息使用户不知所措。

TyKind::Error的使用有一个重要的原则。 除非您知道已经向用户报告了错误,否则您绝不要返回“错误类型”。 通常是因为(a)您刚刚在此报告了该错误,或者(b)您正在传播现有的Error类型(在这种情况下,应该在生成该错误类型时报告该错误)。

此原则非常重要,因为Error类型的全部目的就是抑制其他错误——即,我们不报告它们。 如果我们在不向用户实际制造了错误的情况下生成Error类型,则这可能导致以后的错误被抑制,并且编译可能会无意中成功!

有时还有第三种情况。 您认为已报告了一个错误,但是您认为该错误将在编译的更早阶段而不是现在得到报告。 在这种情况下,您可以调用delay_span_bug,这表示编译应该会产生错误——如果编译意外地成功了,则将触发编译器错误报告。

问题:为什么在AdtDef“内部”做替换?

回想一下,我们用(AdtDef,substs)表示一个范型结构体。 那么,为什么要使用这种麻烦的模式?

我们可以选择表示这种类型的另一种方法是始终创建一个新的,完全不同的AdtDef形式,其中所有类型都已被替换。 这样做好像比较方便。 但是,(AdtDef,substs)方案对此有一些优势。

首先,(AdtDef,substs)方案可以提高效率:

struct MyStruct<T> {
  ... 100s of fields ...
}

// Want to do: MyStruct<A> ==> MyStruct<B>

在像这样的示例中,只需将对A的一个引用替换为B,就可以低成本地地将MyStruct<A>替换为MyStruct<B>(依此类推)。 但是,如果我们替换所有字段,则可能需要多做很多工作,我们可能必须遍历AdtDef中的所有字段并更新所有类型。

更深入一点来说,Rust中的结构体是nominal 类型——这意味着它们是由其名称定义的(然后它们的内容将从该名称的定义中进行索引,而不是携带在类型本身“内”)。

范型和替换

给定泛型类型MyType<A, B, ...>,我们可能希望将泛型参数A, B, ...替换为其他一些类型(可能是其他泛型参数或具体类型)。 我们在进行类型推断,类型检查和Trait求解时会做很多这类事情。 从概念上讲,在这些过程中,我们可能会发现一种类型等于另一种类型,并希望将一种类型换成另一种类型,依此类推,直到最终得到一些具体的类型(或错误) 。

在rustc中,这是使用我们上面提到的SubstsRef完成的(“substs” = “substitutions”)。 从概念上讲,您可以认为SubstsRef 是一个替换ADT泛型类型参数的类型列表。

SubstsRefList<GenericArg<'tcx>>的类型别名(请参阅rust文档中的List)。 GenericArg本质上是GenericArgKind周围的节省空间的包装器,这是一个枚举,指示类型参数是哪种泛型(类型,生存期或const)。 因此,SubstsRef在概念上类似于&tcx [GenericArgKind <'tcx>]切片(但它实际上是一个List)。

那么为什么我们使用这种List类型而不是真正的slice呢? 它的长度是“内连”的,因此&List仅为32位。 结果,它不能被切片(仅在长度超出范围时才起作用)。

这也意味着您可以通过==来检查两个List的相等性(对于普通切片是不可能的)。 正是因为它们从不代表“子列表”,而仅代表完整的“列表”,该列表已被散列和interned。

综上所述,让我们回到上面的示例:

struct MyStruct<T>
  • MyStruct会有一个AdtDef(和相应的DefId)。
  • T会有一个TyKind::Param(以及相应的DefId)(稍后再介绍)。
  • 将有一个包含列表[GenericArgKind::Type(Ty(T))]SubstsRef
    • 这里的Ty(T)是对ty::Ty的简写,其中有TyKind::Param,我们在之前提到过这一点。
  • 这是一个TyKind::Adt,其中包含MyStructAdtDef和上面的SubstsRef

最后,我们将快速提到Generics类型。 它用于提供某个类型的类型参数的信息。

替换前的范型

因此,回想一下,在我们的示例中,MyStruct结构具有范型T。 例如,当我们对使用MyStruct的函数进行类型检查时,我们将需要能够在不真正知道T是什么的情况下引用该类型T。 总的来说,在所有泛型定义中都是如此:我们需要能够处理未知类型。 这是通过TyKind::Param(我们在上面的示例中提到的)完成的。

每个TyKind::Param都包含两个字段:名称和索引。 通常,索引完全定义了参数,并且大多数代码都使用该索引。 名称则包含在调试打印输出中。

这么做有两个原因。 首先,索引很方便,它使您可以在替换时将其包含在通用参数列表中。 其次,索引鲁棒性更强。 例如,原则上可以有两个使用相同名称的不同类型参数,例如 impl<A> Foo<A> { fn bar<A>() { .. } },尽管禁止阴影的规则使此操作变得困难(但是将来这些语言规则可能会更改)。

类型参数的索引是一个整数,指示其在类型参数列表中的顺序。 此外,我们认为该列表包括来自外部作用域的所有类型参数。 考虑以下示例:

struct Foo<A, B> {
  // A would have index 0
  // B would have index 1

  .. // some fields
}
impl<X, Y> Foo<X, Y> {
  fn method<Z>() {
    // inside here, X, Y and Z are all in scope
    // X has index 0
    // Y has index 1
    // Z has index 2
  }
}

当我们在泛型定义中工作时,我们将像其他TyKind一样使用TyKind::Param。 毕竟这只是一种类型。 但是,如果我们想在某个地方使用范型,那么我们将需要进行替换。

例如,假设前面示例中的Foo <A, B>类型的字段为Vec<A>。 请注意,Vec也是通用类型。 我们要告诉编译器,应将Vec的类型参数替换为Foo<A,B>A类型参数。我们通过替换来做到这一点:

struct Foo<A, B> { // Adt(Foo, &[Param(0), Param(1)])
  x: Vec<A>, // Adt(Vec, &[Param(0)])
  ..
}

fn bar(foo: Foo<u32, f32>) { // Adt(Foo, &[u32, f32])
  let y = foo.x; // Vec<Param(0)> => Vec<u32>
}

这个例子有一些不同的替代:

  • Foo的定义中,在字段x的类型中,将Vec的类型参数替换为Param(0),即Foo<A, B>的第一个参数,因此x的类型是Vec <A>
  • 在函数bar上,我们指定要使用Foo<u32, f32>。这意味着我们将用u32f32替换Param(0)Param(1)
  • bar的函数体中,我们访问foo.x,其类型为Vec<Param(0)>,但Param(0)已经被替换为u32,因此,foo.x 的类型为Vec<u32>

让我们更仔细地看看最后的替换方法,以了解为什么使用索引。如果要查找foo.x的类型,则可以获取x的范型,即Vec<Param(0)>。 现在我们可以使用索引0,并使用它来查找正确的类型替换:查看FooSubstsRef,我们有列表[u32, f32], 因为我们要替换索引0 ,我们采用此列表的第0个索引,即u32。然后就好了!

您可能有几个后续问题……

type_of 我们如何获得x的范型?您可以通过 tcx.type_of(def_id) 查询获得几乎所有类型的东西,在这种情况下,我们将传递字段xDefIdtype_of查询总是返回带有定义范围内的泛型的定义。 例如,tcx.type_of(def_id_of_my_struct)将返回MyStruct的“自视图”:Adt(Foo, &[Param(0), Param(1)])

subst 我们如何实际地进行替换?也有一个用来这么做的函数!您可以使用substSubstRef替换为其他类型的列表。

这里是在编译器中实际使用subst的示例。 确切的细节并不是太重要,但是在这段代码中,我们碰巧将其从rustc_hir::Ty转换为真实的ty::Ty。 您可以看到我们首先得到了一些替换(substs)。然后我们调用type_of来获取类型,并调用ty.subst(substs)来获得新的ty类型,并进行替换。

关于索引的注释:Param中的索引可能与我们期望的不匹配。 例如,索引可能超出范围,或者可能是我们期望类型时却得到了一个生命周期的索引。 从rustc_hir::Ty转换为ty::Ty时或者更早,编译器会捕获这些错误。 如果它们在那以后发生,那就是编译器错误。

TypeFoldable and TypeFolder

How is this subst query actually implemented? As you can imagine, we might want to do substitutions on a lot of different things. For example, we might want to do a substitution directly on a type like we did with Vec above. But we might also have a more complex type with other types nested inside that also need substitutions.

The answer is a couple of traits: TypeFoldable and TypeFolder.

  • TypeFoldable is implemented by types that embed type information. It allows you to recursively process the contents of the TypeFoldable and do stuff to them.
  • TypeFolder defines what you want to do with the types you encounter while processing the TypeFoldable.

For example, the TypeFolder trait has a method fold_ty that takes a type as input a type and returns a new type as a result. TypeFoldable invokes the TypeFolder fold_foo methods on itself, giving the TypeFolder access to its contents (the types, regions, etc that are contained within).

You can think of it with this analogy to the iterator combinators we have come to love in rust:

vec.iter().map(|e1| foo(e2)).collect()
//             ^^^^^^^^^^^^ analogous to `TypeFolder`
//         ^^^ analogous to `TypeFoldable`

So to reiterate:

  • TypeFolder is a trait that defines a “map” operation.
  • TypeFoldable is a trait that is implemented by things that embed types.

In the case of subst, we can see that it is implemented as a TypeFolder: SubstFolder. Looking at its implementation, we see where the actual substitutions are happening.

However, you might also notice that the implementation calls this super_fold_with method. What is that? It is a method of TypeFoldable. Consider the following TypeFoldable type MyFoldable:

struct MyFoldable<'tcx> {
  def_id: DefId,
  ty: Ty<'tcx>,
}

The TypeFolder can call super_fold_with on MyFoldable if it just wants to replace some of the fields of MyFoldable with new values. If it instead wants to replace the whole MyFoldable with a different one, it would call fold_with instead (a different method on TypeFoldable).

In almost all cases, we don’t want to replace the whole struct; we only want to replace ty::Tys in the struct, so usually we call super_fold_with. A typical implementation that MyFoldable could have might do something like this:

my_foldable: MyFoldable<'tcx>
my_foldable.subst(..., subst)

impl TypeFoldable for MyFoldable {
  fn super_fold_with(&self, folder: &mut impl TypeFolder<'tcx>) -> MyFoldable {
    MyFoldable {
      def_id: self.def_id.fold_with(folder),
      ty: self.ty.fold_with(folder),
    }
  }

  fn super_visit_with(..) { }
}

Notice that here, we implement super_fold_with to go over the fields of MyFoldable and call fold_with on them. That is, a folder may replace def_id and ty, but not the whole MyFoldable struct.

Here is another example to put things together: suppose we have a type like Vec<Vec<X>>. The ty::Ty would look like: Adt(Vec, &[Adt(Vec, &[Param(X)])]). If we want to do subst(X => u32), then we would first look at the overall type. We would see that there are no substitutions to be made at the outer level, so we would descend one level and look at Adt(Vec, &[Param(X)]). There are still no substitutions to be made here, so we would descend again. Now we are looking at Param(X), which can be substituted, so we replace it with u32. We can’t descend any more, so we are done, and the overall result is Adt(Vec, &[Adt(Vec, &[u32])]).

One last thing to mention: often when folding over a TypeFoldable, we don’t want to change most things. We only want to do something when we reach a type. That means there may be a lot of TypeFoldable types whose implementations basically just forward to their fields’ TypeFoldable implementations. Such implementations of TypeFoldable tend to be pretty tedious to write by hand. For this reason, there is a derive macro that allows you to #![derive(TypeFoldable)]. It is defined here.

subst In the case of substitutions the actual folder is going to be doing the indexing we’ve already mentioned. There we define a Folder and call fold_with on the TypeFoldable to process yourself. Then fold_ty the method that process each type it looks for a ty::Param and for those it replaces it for something from the list of substitutions, otherwise recursively process the type. To replace it, calls ty_for_param and all that does is index into the list of substitutions with the index of the Param.

Generic arguments

A ty::subst::GenericArg<'tcx> represents some entity in the type system: a type (Ty<'tcx>), lifetime (ty::Region<'tcx>) or constant (ty::Const<'tcx>). GenericArg is used to perform substitutions of generic parameters for concrete arguments, such as when calling a function with generic parameters explicitly with type arguments. Substitutions are represented using the Subst type as described below.

Subst

ty::subst::Subst<'tcx> is intuitively simply a slice of GenericArg<'tcx>s, acting as an ordered list of substitutions from generic parameters to concrete arguments (such as types, lifetimes and consts).

For example, given a HashMap<K, V> with two type parameters, K and V, an instantiation of the parameters, for example HashMap<i32, u32>, would be represented by the substitution &'tcx [tcx.types.i32, tcx.types.u32].

Subst provides various convenience methods to instantiate substitutions given item definitions, which should generally be used rather than explicitly constructing such substitution slices.

GenericArg

The actual GenericArg struct is optimised for space, storing the type, lifetime or const as an interned pointer containing a tag identifying its kind (in the lowest 2 bits). Unless you are working with the Subst implementation specifically, you should generally not have to deal with GenericArg and instead make use of the safe GenericArgKind abstraction.

GenericArgKind

As GenericArg itself is not type-safe, the GenericArgKind enum provides a more convenient and safe interface for dealing with generic arguments. An GenericArgKind can be converted to a raw GenericArg using GenericArg::from() (or simply .into() when the context is clear). As mentioned earlier, substitution lists store raw GenericArgs, so before dealing with them, it is preferable to convert them to GenericArgKinds first. This is done by calling the .unpack() method.

// An example of unpacking and packing a generic argument.
fn deal_with_generic_arg<'tcx>(generic_arg: GenericArg<'tcx>) -> GenericArg<'tcx> {
    // Unpack a raw `GenericArg` to deal with it safely.
    let new_generic_arg: GenericArgKind<'tcx> = match generic_arg.unpack() {
        GenericArgKind::Type(ty) => { /* ... */ }
        GenericArgKind::Lifetime(lt) => { /* ... */ }
        GenericArgKind::Const(ct) => { /* ... */ }
    };
    // Pack the `GenericArgKind` to store it in a substitution list.
    new_generic_arg.into()
}

Type inference

Type inference is the process of automatic detection of the type of an expression.

It is what allows Rust to work with fewer or no type annotations, making things easier for users:

fn main() {
    let mut things = vec![];
    things.push("thing");
}

Here, the type of things is inferred to be Vec<&str> because of the value we push into things.

The type inference is based on the standard Hindley-Milner (HM) type inference algorithm, but extended in various way to accommodate subtyping, region inference, and higher-ranked types.

A note on terminology

We use the notation ?T to refer to inference variables, also called existential variables.

We use the terms "region" and "lifetime" interchangeably. Both refer to the 'a in &'a T.

The term "bound region" refers to a region that is bound in a function signature, such as the 'a in for<'a> fn(&'a u32). A region is "free" if it is not bound.

Creating an inference context

You create and "enter" an inference context by doing something like the following:

tcx.infer_ctxt().enter(|infcx| {
    // Use the inference context `infcx` here.
})

Within the closure, infcx has the type InferCtxt<'cx, 'tcx> for some fresh 'cx, while 'tcx is the same as outside the inference context. (Again, see the ty chapter for more details on this setup.)

The tcx.infer_ctxt method actually returns a builder, which means there are some kinds of configuration you can do before the infcx is created. See InferCtxtBuilder for more information.

Inference variables

The main purpose of the inference context is to house a bunch of inference variables – these represent types or regions whose precise value is not yet known, but will be uncovered as we perform type-checking.

If you're familiar with the basic ideas of unification from H-M type systems, or logic languages like Prolog, this is the same concept. If you're not, you might want to read a tutorial on how H-M type inference works, or perhaps this blog post on unification in the Chalk project.

All told, the inference context stores four kinds of inference variables as of this writing:

  • Type variables, which come in three varieties:
    • General type variables (the most common). These can be unified with any type.
    • Integral type variables, which can only be unified with an integral type, and arise from an integer literal expression like 22.
    • Float type variables, which can only be unified with a float type, and arise from a float literal expression like 22.0.
  • Region variables, which represent lifetimes, and arise all over the place.

All the type variables work in much the same way: you can create a new type variable, and what you get is Ty<'tcx> representing an unresolved type ?T. Then later you can apply the various operations that the inferencer supports, such as equality or subtyping, and it will possibly instantiate (or bind) that ?T to a specific value as a result.

The region variables work somewhat differently, and are described below in a separate section.

Enforcing equality / subtyping

The most basic operations you can perform in the type inferencer is equality, which forces two types T and U to be the same. The recommended way to add an equality constraint is to use the at method, roughly like so:

infcx.at(...).eq(t, u);

The first at() call provides a bit of context, i.e. why you are doing this unification, and in what environment, and the eq method performs the actual equality constraint.

When you equate things, you force them to be precisely equal. Equating returns an InferResult – if it returns Err(err), then equating failed, and the enclosing TypeError will tell you what went wrong.

The success case is perhaps more interesting. The "primary" return type of eq is () – that is, when it succeeds, it doesn't return a value of any particular interest. Rather, it is executed for its side-effects of constraining type variables and so forth. However, the actual return type is not (), but rather InferOk<()>. The InferOk type is used to carry extra trait obligations – your job is to ensure that these are fulfilled (typically by enrolling them in a fulfillment context). See the trait chapter for more background on that.

You can similarly enforce subtyping through infcx.at(..).sub(..). The same basic concepts as above apply.

"Trying" equality

Sometimes you would like to know if it is possible to equate two types without error. You can test that with infcx.can_eq (or infcx.can_sub for subtyping). If this returns Ok, then equality is possible – but in all cases, any side-effects are reversed.

Be aware, though, that the success or failure of these methods is always modulo regions. That is, two types &'a u32 and &'b u32 will return Ok for can_eq, even if 'a != 'b. This falls out from the "two-phase" nature of how we solve region constraints.

Snapshots

As described in the previous section on can_eq, often it is useful to be able to do a series of operations and then roll back their side-effects. This is done for various reasons: one of them is to be able to backtrack, trying out multiple possibilities before settling on which path to take. Another is in order to ensure that a series of smaller changes take place atomically or not at all.

To allow for this, the inference context supports a snapshot method. When you call it, it will start recording changes that occur from the operations you perform. When you are done, you can either invoke rollback_to, which will undo those changes, or else confirm, which will make them permanent. Snapshots can be nested as long as you follow a stack-like discipline.

Rather than use snapshots directly, it is often helpful to use the methods like commit_if_ok or probe that encapsulate higher-level patterns.

Subtyping obligations

One thing worth discussing is subtyping obligations. When you force two types to be a subtype, like ?T <: i32, we can often convert those into equality constraints. This follows from Rust's rather limited notion of subtyping: so, in the above case, ?T <: i32 is equivalent to ?T = i32.

However, in some cases we have to be more careful. For example, when regions are involved. So if you have ?T <: &'a i32, what we would do is to first "generalize" &'a i32 into a type with a region variable: &'?b i32, and then unify ?T with that (?T = &'?b i32). We then relate this new variable with the original bound:

&'?b i32 <: &'a i32

This will result in a region constraint (see below) of '?b: 'a.

One final interesting case is relating two unbound type variables, like ?T <: ?U. In that case, we can't make progress, so we enqueue an obligation Subtype(?T, ?U) and return it via the InferOk mechanism. You'll have to try again when more details about ?T or ?U are known.

Region constraints

Regions are inferenced somewhat differently from types. Rather than eagerly unifying things, we simply collect constraints as we go, but make (almost) no attempt to solve regions. These constraints have the form of an "outlives" constraint:

'a: 'b

Actually the code tends to view them as a subregion relation, but it's the same idea:

'b <= 'a

(There are various other kinds of constraints, such as "verifys"; see the region_constraints module for details.)

There is one case where we do some amount of eager unification. If you have an equality constraint between two regions

'a = 'b

we will record that fact in a unification table. You can then use opportunistic_resolve_var to convert 'b to 'a (or vice versa). This is sometimes needed to ensure termination of fixed-point algorithms.

Extracting region constraints

Ultimately, region constraints are only solved at the very end of type-checking, once all other constraints are known. There are two ways to solve region constraints right now: lexical and non-lexical. Eventually there will only be one.

To solve lexical region constraints, you invoke resolve_regions_and_report_errors. This "closes" the region constraint process and invokes the lexical_region_resolve code. Once this is done, any further attempt to equate or create a subtyping relationship will yield an ICE.

Non-lexical region constraints are not handled within the inference context. Instead, the NLL solver (actually, the MIR type-checker) invokes take_and_reset_region_constraints periodically. This extracts all of the outlives constraints from the region solver, but leaves the set of variables intact. This is used to get just the region constraints that resulted from some particular point in the program, since the NLL solver needs to know not just what regions were subregions, but also where. Finally, the NLL solver invokes take_region_var_origins, which "closes" the region constraint process in the same way as normal solving.

Lexical region resolution

Lexical region resolution is done by initially assigning each region variable to an empty value. We then process each outlives constraint repeatedly, growing region variables until a fixed-point is reached. Region variables can be grown using a least-upper-bound relation on the region lattice in a fairly straightforward fashion.

Trait resolution (old-style)

This chapter describes the general process of trait resolution and points out some non-obvious things.

Note: This chapter (and its subchapters) describe how the trait solver currently works. However, we are in the process of designing a new trait solver. If you'd prefer to read about that, see this traits chapter.

Major concepts

Trait resolution is the process of pairing up an impl with each reference to a trait. So, for example, if there is a generic function like:

fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> { ... }

and then a call to that function:

let v: Vec<isize> = clone_slice(&[1, 2, 3])

it is the job of trait resolution to figure out whether there exists an impl of (in this case) isize : Clone.

Note that in some cases, like generic functions, we may not be able to find a specific impl, but we can figure out that the caller must provide an impl. For example, consider the body of clone_slice:

fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> {
    let mut v = Vec::new();
    for e in &x {
        v.push((*e).clone()); // (*)
    }
}

The line marked (*) is only legal if T (the type of *e) implements the Clone trait. Naturally, since we don't know what T is, we can't find the specific impl; but based on the bound T:Clone, we can say that there exists an impl which the caller must provide.

We use the term obligation to refer to a trait reference in need of an impl. Basically, the trait resolution system resolves an obligation by proving that an appropriate impl does exist.

During type checking, we do not store the results of trait selection. We simply wish to verify that trait selection will succeed. Then later, at trans time, when we have all concrete types available, we can repeat the trait selection to choose an actual implementation, which will then be generated in the output binary.

Overview

Trait resolution consists of three major parts:

  • Selection: Deciding how to resolve a specific obligation. For example, selection might decide that a specific obligation can be resolved by employing an impl which matches the Self type, or by using a parameter bound (e.g. T: Trait). In the case of an impl, selecting one obligation can create nested obligations because of where clauses on the impl itself. It may also require evaluating those nested obligations to resolve ambiguities.

  • Fulfillment: The fulfillment code is what tracks that obligations are completely fulfilled. Basically it is a worklist of obligations to be selected: once selection is successful, the obligation is removed from the worklist and any nested obligations are enqueued.

  • Coherence: The coherence checks are intended to ensure that there are never overlapping impls, where two impls could be used with equal precedence.

Selection

Selection is the process of deciding whether an obligation can be resolved and, if so, how it is to be resolved (via impl, where clause, etc). The main interface is the select() function, which takes an obligation and returns a SelectionResult. There are three possible outcomes:

  • Ok(Some(selection)) – yes, the obligation can be resolved, and selection indicates how. If the impl was resolved via an impl, then selection may also indicate nested obligations that are required by the impl.

  • Ok(None) – we are not yet sure whether the obligation can be resolved or not. This happens most commonly when the obligation contains unbound type variables.

  • Err(err) – the obligation definitely cannot be resolved due to a type error or because there are no impls that could possibly apply.

The basic algorithm for selection is broken into two big phases: candidate assembly and confirmation.

Note that because of how lifetime inference works, it is not possible to give back immediate feedback as to whether a unification or subtype relationship between lifetimes holds or not. Therefore, lifetime matching is not considered during selection. This is reflected in the fact that subregion assignment is infallible. This may yield lifetime constraints that will later be found to be in error (in contrast, the non-lifetime-constraints have already been checked during selection and can never cause an error, though naturally they may lead to other errors downstream).

Candidate assembly

Searches for impls/where-clauses/etc that might possibly be used to satisfy the obligation. Each of those is called a candidate. To avoid ambiguity, we want to find exactly one candidate that is definitively applicable. In some cases, we may not know whether an impl/where-clause applies or not – this occurs when the obligation contains unbound inference variables.

The subroutines that decide whether a particular impl/where-clause/etc applies to a particular obligation are collectively referred to as the process of matching. At the moment, this amounts to unifying the Self types, but in the future we may also recursively consider some of the nested obligations, in the case of an impl.

TODO: what does "unifying the Self types" mean? The Self of the obligation with that of an impl?

The basic idea for candidate assembly is to do a first pass in which we identify all possible candidates. During this pass, all that we do is try and unify the type parameters. (In particular, we ignore any nested where clauses.) Presuming that this unification succeeds, the impl is added as a candidate.

Once this first pass is done, we can examine the set of candidates. If it is a singleton set, then we are done: this is the only impl in scope that could possibly apply. Otherwise, we can winnow down the set of candidates by using where clauses and other conditions. If this reduced set yields a single, unambiguous entry, we're good to go, otherwise the result is considered ambiguous.

The basic process: Inferring based on the impls we see

This process is easier if we work through some examples. Consider the following trait:

trait Convert<Target> {
    fn convert(&self) -> Target;
}

This trait just has one method. It's about as simple as it gets. It converts from the (implicit) Self type to the Target type. If we wanted to permit conversion between isize and usize, we might implement Convert like so:

impl Convert<usize> for isize { ... } // isize -> usize
impl Convert<isize> for usize { ... } // usize -> isize

Now imagine there is some code like the following:

let x: isize = ...;
let y = x.convert();

The call to convert will generate a trait reference Convert<$Y> for isize, where $Y is the type variable representing the type of y. Of the two impls we can see, the only one that matches is Convert<usize> for isize. Therefore, we can select this impl, which will cause the type of $Y to be unified to usize. (Note that while assembling candidates, we do the initial unifications in a transaction, so that they don't affect one another.)

TODO: The example says we can "select" the impl, but this section is talking specifically about candidate assembly. Does this mean we can sometimes skip confirmation? Or is this poor wording? TODO: Is the unification of $Y part of trait resolution or type inference? Or is this not the same type of "inference variable" as in type inference?

Winnowing: Resolving ambiguities

But what happens if there are multiple impls where all the types unify? Consider this example:

trait Get {
    fn get(&self) -> Self;
}

impl<T:Copy> Get for T {
    fn get(&self) -> T { *self }
}

impl<T:Get> Get for Box<T> {
    fn get(&self) -> Box<T> { Box::new(get_it(&**self)) }
}

What happens when we invoke get_it(&Box::new(1_u16)), for example? In this case, the Self type is Box<u16> – that unifies with both impls, because the first applies to all types T, and the second to all Box<T>. In order for this to be unambiguous, the compiler does a winnowing pass that considers where clauses and attempts to remove candidates. In this case, the first impl only applies if Box<u16> : Copy, which doesn't hold. After winnowing, then, we are left with just one candidate, so we can proceed.

where clauses

Besides an impl, the other major way to resolve an obligation is via a where clause. The selection process is always given a parameter environment which contains a list of where clauses, which are basically obligations that we can assume are satisfiable. We will iterate over that list and check whether our current obligation can be found in that list. If so, it is considered satisfied. More precisely, we want to check whether there is a where-clause obligation that is for the same trait (or some subtrait) and which can match against the obligation.

Consider this simple example:

trait A1 {
    fn do_a1(&self);
}
trait A2 : A1 { ... }

trait B {
    fn do_b(&self);
}

fn foo<X:A2+B>(x: X) {
    x.do_a1(); // (*)
    x.do_b();  // (#)
}

In the body of foo, clearly we can use methods of A1, A2, or B on variable x. The line marked (*) will incur an obligation X: A1, while the line marked (#) will incur an obligation X: B. Meanwhile, the parameter environment will contain two where-clauses: X : A2 and X : B. For each obligation, then, we search this list of where-clauses. The obligation X: B trivially matches against the where-clause X: B. To resolve an obligation X:A1, we would note that X:A2 implies that X:A1.

Confirmation

Confirmation unifies the output type parameters of the trait with the values found in the obligation, possibly yielding a type error.

Suppose we have the following variation of the Convert example in the previous section:

trait Convert<Target> {
    fn convert(&self) -> Target;
}

impl Convert<usize> for isize { ... } // isize -> usize
impl Convert<isize> for usize { ... } // usize -> isize

let x: isize = ...;
let y: char = x.convert(); // NOTE: `y: char` now!

Confirmation is where an error would be reported because the impl specified that Target would be usize, but the obligation reported char. Hence the result of selection would be an error.

Note that the candidate impl is chosen based on the Self type, but confirmation is done based on (in this case) the Target type parameter.

Selection during translation

As mentioned above, during type checking, we do not store the results of trait selection. At trans time, we repeat the trait selection to choose a particular impl for each method call. In this second selection, we do not consider any where-clauses to be in scope because we know that each resolution will resolve to a particular impl.

One interesting twist has to do with nested obligations. In general, in trans, we only need to do a "shallow" selection for an obligation. That is, we wish to identify which impl applies, but we do not (yet) need to decide how to select any nested obligations. Nonetheless, we do currently do a complete resolution, and that is because it can sometimes inform the results of type inference. That is, we do not have the full substitutions in terms of the type variables of the impl available to us, so we must run trait selection to figure everything out.

TODO: is this still talking about trans?

Here is an example:

trait Foo { ... }
impl<U, T:Bar<U>> Foo for Vec<T> { ... }

impl Bar<usize> for isize { ... }

After one shallow round of selection for an obligation like Vec<isize> : Foo, we would know which impl we want, and we would know that T=isize, but we do not know the type of U. We must select the nested obligation isize : Bar<U> to find out that U=usize.

It would be good to only do just as much nested resolution as necessary. Currently, though, we just do a full resolution.

Higher-ranked trait bounds

One of the more subtle concepts in trait resolution is higher-ranked trait bounds. An example of such a bound is for<'a> MyTrait<&'a isize>. Let's walk through how selection on higher-ranked trait references works.

Basic matching and placeholder leaks

Suppose we have a trait Foo:


#![allow(unused_variables)]
fn main() {
trait Foo<X> {
    fn foo(&self, x: X) { }
}
}

Let's say we have a function want_hrtb that wants a type which implements Foo<&'a isize> for any 'a:

fn want_hrtb<T>() where T : for<'a> Foo<&'a isize> { ... }

Now we have a struct AnyInt that implements Foo<&'a isize> for any 'a:

struct AnyInt;
impl<'a> Foo<&'a isize> for AnyInt { }

And the question is, does AnyInt : for<'a> Foo<&'a isize>? We want the answer to be yes. The algorithm for figuring it out is closely related to the subtyping for higher-ranked types (which is described here and also in a paper by SPJ. If you wish to understand higher-ranked subtyping, we recommend you read the paper). There are a few parts:

  1. Replace bound regions in the obligation with placeholders.
  2. Match the impl against the placeholder obligation.
  3. Check for placeholder leaks.

So let's work through our example.

  1. The first thing we would do is to replace the bound region in the obligation with a placeholder, yielding AnyInt : Foo<&'0 isize> (here '0 represents placeholder region #0). Note that we now have no quantifiers; in terms of the compiler type, this changes from a ty::PolyTraitRef to a TraitRef. We would then create the TraitRef from the impl, using fresh variables for it's bound regions (and thus getting Foo<&'$a isize>, where '$a is the inference variable for 'a).

  2. Next we relate the two trait refs, yielding a graph with the constraint that '0 == '$a.

  3. Finally, we check for placeholder "leaks" – a leak is basically any attempt to relate a placeholder region to another placeholder region, or to any region that pre-existed the impl match. The leak check is done by searching from the placeholder region to find the set of regions that it is related to in any way. This is called the "taint" set. To pass the check, that set must consist solely of itself and region variables from the impl. If the taint set includes any other region, then the match is a failure. In this case, the taint set for '0 is {'0, '$a}, and hence the check will succeed.

Let's consider a failure case. Imagine we also have a struct

struct StaticInt;
impl Foo<&'static isize> for StaticInt;

We want the obligation StaticInt : for<'a> Foo<&'a isize> to be considered unsatisfied. The check begins just as before. 'a is replaced with a placeholder '0 and the impl trait reference is instantiated to Foo<&'static isize>. When we relate those two, we get a constraint like 'static == '0. This means that the taint set for '0 is {'0, 'static}, which fails the leak check.

TODO: This is because 'static is not a region variable but is in the taint set, right?

Higher-ranked trait obligations

Once the basic matching is done, we get to another interesting topic: how to deal with impl obligations. I'll work through a simple example here. Imagine we have the traits Foo and Bar and an associated impl:


#![allow(unused_variables)]
fn main() {
trait Foo<X> {
    fn foo(&self, x: X) { }
}

trait Bar<X> {
    fn bar(&self, x: X) { }
}

impl<X,F> Foo<X> for F
    where F : Bar<X>
{
}
}

Now let's say we have a obligation Baz: for<'a> Foo<&'a isize> and we match this impl. What obligation is generated as a result? We want to get Baz: for<'a> Bar<&'a isize>, but how does that happen?

After the matching, we are in a position where we have a placeholder substitution like X => &'0 isize. If we apply this substitution to the impl obligations, we get F : Bar<&'0 isize>. Obviously this is not directly usable because the placeholder region '0 cannot leak out of our computation.

What we do is to create an inverse mapping from the taint set of '0 back to the original bound region ('a, here) that '0 resulted from. (This is done in higher_ranked::plug_leaks). We know that the leak check passed, so this taint set consists solely of the placeholder region itself plus various intermediate region variables. We then walk the trait-reference and convert every region in that taint set back to a late-bound region, so in this case we'd wind up with Baz: for<'a> Bar<&'a isize>.

Caching and subtle considerations therewith

In general, we attempt to cache the results of trait selection. This is a somewhat complex process. Part of the reason for this is that we want to be able to cache results even when all the types in the trait reference are not fully known. In that case, it may happen that the trait selection process is also influencing type variables, so we have to be able to not only cache the result of the selection process, but replay its effects on the type variables.

An example

The high-level idea of how the cache works is that we first replace all unbound inference variables with placeholder versions. Therefore, if we had a trait reference usize : Foo<$t>, where $t is an unbound inference variable, we might replace it with usize : Foo<$0>, where $0 is a placeholder type. We would then look this up in the cache.

If we found a hit, the hit would tell us the immediate next step to take in the selection process (e.g. apply impl #22, or apply where clause X : Foo<Y>).

On the other hand, if there is no hit, we need to go through the selection process from scratch. Suppose, we come to the conclusion that the only possible impl is this one, with def-id 22:

impl Foo<isize> for usize { ... } // Impl #22

We would then record in the cache usize : Foo<$0> => ImplCandidate(22). Next we would confirm ImplCandidate(22), which would (as a side-effect) unify $t with isize.

Now, at some later time, we might come along and see a usize : Foo<$u>. When replaced with a placeholder, this would yield usize : Foo<$0>, just as before, and hence the cache lookup would succeed, yielding ImplCandidate(22). We would confirm ImplCandidate(22) which would (as a side-effect) unify $u with isize.

Where clauses and the local vs global cache

One subtle interaction is that the results of trait lookup will vary depending on what where clauses are in scope. Therefore, we actually have two caches, a local and a global cache. The local cache is attached to the ParamEnv, and the global cache attached to the tcx. We use the local cache whenever the result might depend on the where clauses that are in scope. The determination of which cache to use is done by the method pick_candidate_cache in select.rs. At the moment, we use a very simple, conservative rule: if there are any where-clauses in scope, then we use the local cache. We used to try and draw finer-grained distinctions, but that led to a serious of annoying and weird bugs like #22019 and #18290. This simple rule seems to be pretty clearly safe and also still retains a very high hit rate (~95% when compiling rustc).

TODO: it looks like pick_candidate_cache no longer exists. In general, is this section still accurate at all?

Specialization

TODO: where does Chalk fit in? Should we mention/discuss it here?

Defined in the specialize module.

The basic strategy is to build up a specialization graph during coherence checking (recall that coherence checking looks for overlapping impls). Insertion into the graph locates the right place to put an impl in the specialization hierarchy; if there is no right place (due to partial overlap but no containment), you get an overlap error. Specialization is consulted when selecting an impl (of course), and the graph is consulted when propagating defaults down the specialization hierarchy.

You might expect that the specialization graph would be used during selection – i.e. when actually performing specialization. This is not done for two reasons:

  • It's merely an optimization: given a set of candidates that apply, we can determine the most specialized one by comparing them directly for specialization, rather than consulting the graph. Given that we also cache the results of selection, the benefit of this optimization is questionable.

  • To build the specialization graph in the first place, we need to use selection (because we need to determine whether one impl specializes another). Dealing with this reentrancy would require some additional mode switch for selection. Given that there seems to be no strong reason to use the graph anyway, we stick with a simpler approach in selection, and use the graph only for propagating default implementations.

Trait impl selection can succeed even when multiple impls can apply, as long as they are part of the same specialization family. In that case, it returns a single impl on success – this is the most specialized impl known to apply. However, if there are any inference variables in play, the returned impl may not be the actual impl we will use at trans time. Thus, we take special care to avoid projecting associated types unless either (1) the associated type does not use default and thus cannot be overridden or (2) all input types are known concretely.

Additional Resources

This talk by @sunjay may be useful. Keep in mind that the talk only gives a broad overview of the problem and the solution (it was presented about halfway through @sunjay's work). Also, it was given in June 2018, and some things may have changed by the time you watch it.

Trait solving (new-style)

🚧 This chapter describes "new-style" trait solving. This is still in the process of being implemented; this chapter serves as a kind of in-progress design document. If you would prefer to read about how the current trait solver works, check out this other chapter. 🚧

By the way, if you would like to help in hacking on the new solver, you will find instructions for getting involved in the Traits Working Group tracking issue!

The new-style trait solver is based on the work done in chalk. Chalk recasts Rust's trait system explicitly in terms of logic programming. It does this by "lowering" Rust code into a kind of logic program we can then execute queries against.

You can read more about chalk itself in the Overview of Chalk section.

Trait solving in rustc is based around a few key ideas:

  • Lowering to logic, which expresses Rust traits in terms of standard logical terms.
    • The goals and clauses chapter describes the precise form of rules we use, and lowering rules gives the complete set of lowering rules in a more reference-like form.
    • Lazy normalization, which is the technique we use to accommodate associated types when figuring out whether types are equal.
    • Region constraints, which are accumulated during trait solving but mostly ignored. This means that trait solving effectively ignores the precise regions involved, always – but we still remember the constraints on them so that those constraints can be checked by the type checker.
  • Canonical queries, which allow us to solve trait problems (like "is Foo implemented for the type Bar?") once, and then apply that same result independently in many different inference contexts.

This is not a complete list of topics. See the sidebar for more.

Ongoing work

The design of the new-style trait solving currently happens in two places:

chalk. The chalk repository is where we experiment with new ideas and designs for the trait system. It primarily consists of two parts:

  • a unit testing framework for the correctness and feasibility of the logical rules defining the new-style trait system.
  • the chalk_engine crate, which defines the new-style trait solver used both in the unit testing framework and in rustc.

rustc. Once we are happy with the logical rules, we proceed to implementing them in rustc. This mainly happens in librustc_traits.

Lowering to logic

The key observation here is that the Rust trait system is basically a kind of logic, and it can be mapped onto standard logical inference rules. We can then look for solutions to those inference rules in a very similar fashion to how e.g. a Prolog solver works. It turns out that we can't quite use Prolog rules (also called Horn clauses) but rather need a somewhat more expressive variant.

Rust traits and logic

One of the first observations is that the Rust trait system is basically a kind of logic. As such, we can map our struct, trait, and impl declarations into logical inference rules. For the most part, these are basically Horn clauses, though we'll see that to capture the full richness of Rust – and in particular to support generic programming – we have to go a bit further than standard Horn clauses.

To see how this mapping works, let's start with an example. Imagine we declare a trait and a few impls, like so:


#![allow(unused_variables)]
fn main() {
trait Clone { }
impl Clone for usize { }
impl<T> Clone for Vec<T> where T: Clone { }
}

We could map these declarations to some Horn clauses, written in a Prolog-like notation, as follows:

Clone(usize).
Clone(Vec<?T>) :- Clone(?T).

// The notation `A :- B` means "A is true if B is true".
// Or, put another way, B implies A.

In Prolog terms, we might say that Clone(Foo) – where Foo is some Rust type – is a predicate that represents the idea that the type Foo implements Clone. These rules are program clauses; they state the conditions under which that predicate can be proven (i.e., considered true). So the first rule just says "Clone is implemented for usize". The next rule says "for any type ?T, Clone is implemented for Vec<?T> if clone is implemented for ?T". So e.g. if we wanted to prove that Clone(Vec<Vec<usize>>), we would do so by applying the rules recursively:

  • Clone(Vec<Vec<usize>>) is provable if:
    • Clone(Vec<usize>) is provable if:
      • Clone(usize) is provable. (Which it is, so we're all good.)

But now suppose we tried to prove that Clone(Vec<Bar>). This would fail (after all, I didn't give an impl of Clone for Bar):

  • Clone(Vec<Bar>) is provable if:
    • Clone(Bar) is provable. (But it is not, as there are no applicable rules.)

We can easily extend the example above to cover generic traits with more than one input type. So imagine the Eq<T> trait, which declares that Self is equatable with a value of type T:

trait Eq<T> { ... }
impl Eq<usize> for usize { }
impl<T: Eq<U>> Eq<Vec<U>> for Vec<T> { }

That could be mapped as follows:

Eq(usize, usize).
Eq(Vec<?T>, Vec<?U>) :- Eq(?T, ?U).

So far so good.

Type-checking normal functions

OK, now that we have defined some logical rules that are able to express when traits are implemented and to handle associated types, let's turn our focus a bit towards type-checking. Type-checking is interesting because it is what gives us the goals that we need to prove. That is, everything we've seen so far has been about how we derive the rules by which we can prove goals from the traits and impls in the program; but we are also interested in how to derive the goals that we need to prove, and those come from type-checking.

Consider type-checking the function foo() here:

fn foo() { bar::<usize>() }
fn bar<U: Eq<U>>() { }

This function is very simple, of course: all it does is to call bar::<usize>(). Now, looking at the definition of bar(), we can see that it has one where-clause U: Eq<U>. So, that means that foo() will have to prove that usize: Eq<usize> in order to show that it can call bar() with usize as the type argument.

If we wanted, we could write a Prolog predicate that defines the conditions under which bar() can be called. We'll say that those conditions are called being "well-formed":

barWellFormed(?U) :- Eq(?U, ?U).

Then we can say that foo() type-checks if the reference to bar::<usize> (that is, bar() applied to the type usize) is well-formed:

fooTypeChecks :- barWellFormed(usize).

If we try to prove the goal fooTypeChecks, it will succeed:

  • fooTypeChecks is provable if:
    • barWellFormed(usize), which is provable if:
      • Eq(usize, usize), which is provable because of an impl.

Ok, so far so good. Let's move on to type-checking a more complex function.

Type-checking generic functions: beyond Horn clauses

In the last section, we used standard Prolog horn-clauses (augmented with Rust's notion of type equality) to type-check some simple Rust functions. But that only works when we are type-checking non-generic functions. If we want to type-check a generic function, it turns out we need a stronger notion of goal than what Prolog can provide. To see what I'm talking about, let's revamp our previous example to make foo generic:

fn foo<T: Eq<T>>() { bar::<T>() }
fn bar<U: Eq<U>>() { }

To type-check the body of foo, we need to be able to hold the type T "abstract". That is, we need to check that the body of foo is type-safe for all types T, not just for some specific type. We might express this like so:

fooTypeChecks :-
  // for all types T...
  forall<T> {
    // ...if we assume that Eq(T, T) is provable...
    if (Eq(T, T)) {
      // ...then we can prove that `barWellFormed(T)` holds.
      barWellFormed(T)
    }
  }.

This notation I'm using here is the notation I've been using in my prototype implementation; it's similar to standard mathematical notation but a bit Rustified. Anyway, the problem is that standard Horn clauses don't allow universal quantification (forall) or implication (if) in goals (though many Prolog engines do support them, as an extension). For this reason, we need to accept something called "first-order hereditary harrop" (FOHH) clauses – this long name basically means "standard Horn clauses with forall and if in the body". But it's nice to know the proper name, because there is a lot of work describing how to efficiently handle FOHH clauses; see for example Gopalan Nadathur's excellent "A Proof Procedure for the Logic of Hereditary Harrop Formulas" in the bibliography.

It turns out that supporting FOHH is not really all that hard. And once we are able to do that, we can easily describe the type-checking rule for generic functions like foo in our logic.

Source

This page is a lightly adapted version of a blog post by Nicholas Matsakis.

Goals and clauses

In logic programming terms, a goal is something that you must prove and a clause is something that you know is true. As described in the lowering to logic chapter, Rust's trait solver is based on an extension of hereditary harrop (HH) clauses, which extend traditional Prolog Horn clauses with a few new superpowers.

Goals and clauses meta structure

In Rust's solver, goals and clauses have the following forms (note that the two definitions reference one another):

Goal = DomainGoal           // defined in the section below
        | Goal && Goal
        | Goal || Goal
        | exists<K> { Goal }   // existential quantification
        | forall<K> { Goal }   // universal quantification
        | if (Clause) { Goal } // implication
        | true                 // something that's trivially true
        | ambiguous            // something that's never provable

Clause = DomainGoal
        | Clause :- Goal     // if can prove Goal, then Clause is true
        | Clause && Clause
        | forall<K> { Clause }

K = <type>     // a "kind"
    | <lifetime>

The proof procedure for these sorts of goals is actually quite straightforward. Essentially, it's a form of depth-first search. The paper "A Proof Procedure for the Logic of Hereditary Harrop Formulas" gives the details.

In terms of code, these types are defined in librustc_middle/traits/mod.rs in rustc, and in chalk-ir/src/lib.rs in chalk.

Domain goals

Domain goals are the atoms of the trait logic. As can be seen in the definitions given above, general goals basically consist in a combination of domain goals.

Moreover, flattening a bit the definition of clauses given previously, one can see that clauses are always of the form:

forall<K1, ..., Kn> { DomainGoal :- Goal }

hence domain goals are in fact clauses' LHS. That is, at the most granular level, domain goals are what the trait solver will end up trying to prove.

To define the set of domain goals in our system, we need to first introduce a few simple formulations. A trait reference consists of the name of a trait along with a suitable set of inputs P0..Pn:

TraitRef = P0: TraitName<P1..Pn>

So, for example, u32: Display is a trait reference, as is Vec<T>: IntoIterator. Note that Rust surface syntax also permits some extra things, like associated type bindings (Vec<T>: IntoIterator<Item = T>), that are not part of a trait reference.

A projection consists of an associated item reference along with its inputs P0..Pm:

Projection = <P0 as TraitName<P1..Pn>>::AssocItem<Pn+1..Pm>

Given these, we can define a DomainGoal as follows:

DomainGoal = Holds(WhereClause)
            | FromEnv(TraitRef)
            | FromEnv(Type)
            | WellFormed(TraitRef)
            | WellFormed(Type)
            | Normalize(Projection -> Type)

WhereClause = Implemented(TraitRef)
            | ProjectionEq(Projection = Type)
            | Outlives(Type: Region)
            | Outlives(Region: Region)

WhereClause refers to a where clause that a Rust user would actually be able to write in a Rust program. This abstraction exists only as a convenience as we sometimes want to only deal with domain goals that are effectively writable in Rust.

Let's break down each one of these, one-by-one.

Implemented(TraitRef)

e.g. Implemented(i32: Copy)

True if the given trait is implemented for the given input types and lifetimes.

ProjectionEq(Projection = Type)

e.g. ProjectionEq<T as Iterator>::Item = u8

The given associated type Projection is equal to Type; this can be proved with either normalization or using placeholder associated types. See the section on associated types.

Normalize(Projection -> Type)

e.g. ProjectionEq<T as Iterator>::Item -> u8

The given associated type Projection can be normalized to Type.

As discussed in the section on associated types, Normalize implies ProjectionEq, but not vice versa. In general, proving Normalize(<T as Trait>::Item -> U) also requires proving Implemented(T: Trait).

FromEnv(TraitRef)

e.g. FromEnv(Self: Add<i32>)

True if the inner TraitRef is assumed to be true, that is, if it can be derived from the in-scope where clauses.

For example, given the following function:


#![allow(unused_variables)]
fn main() {
fn loud_clone<T: Clone>(stuff: &T) -> T {
    println!("cloning!");
    stuff.clone()
}
}

Inside the body of our function, we would have FromEnv(T: Clone). In-scope where clauses nest, so a function body inside an impl body inherits the impl body's where clauses, too.

This and the next rule are used to implement implied bounds. As we'll see in the section on lowering, FromEnv(TraitRef) implies Implemented(TraitRef), but not vice versa. This distinction is crucial to implied bounds.

FromEnv(Type)

e.g. FromEnv(HashSet<K>)

True if the inner Type is assumed to be well-formed, that is, if it is an input type of a function or an impl.

For example, given the following code:

struct HashSet<K> where K: Hash { ... }

fn loud_insert<K>(set: &mut HashSet<K>, item: K) {
    println!("inserting!");
    set.insert(item);
}

HashSet<K> is an input type of the loud_insert function. Hence, we assume it to be well-formed, so we would have FromEnv(HashSet<K>) inside the body of our function. As we'll see in the section on lowering, FromEnv(HashSet<K>) implies Implemented(K: Hash) because the HashSet declaration was written with a K: Hash where clause. Hence, we don't need to repeat that bound on the loud_insert function: we rather automatically assume that it is true.

WellFormed(Item)

These goals imply that the given item is well-formed.

We can talk about different types of items being well-formed:

  • Types, like WellFormed(Vec<i32>), which is true in Rust, or WellFormed(Vec<str>), which is not (because str is not Sized.)

  • TraitRefs, like WellFormed(Vec<i32>: Clone).

Well-formedness is important to implied bounds. In particular, the reason it is okay to assume FromEnv(T: Clone) in the loud_clone example is that we also verify WellFormed(T: Clone) for each call site of loud_clone. Similarly, it is okay to assume FromEnv(HashSet<K>) in the loud_insert example because we will verify WellFormed(HashSet<K>) for each call site of loud_insert.

Outlives(Type: Region), Outlives(Region: Region)

e.g. Outlives(&'a str: 'b), Outlives('a: 'static)

True if the given type or region on the left outlives the right-hand region.

Coinductive goals

Most goals in our system are "inductive". In an inductive goal, circular reasoning is disallowed. Consider this example clause:

    Implemented(Foo: Bar) :-
        Implemented(Foo: Bar).

Considered inductively, this clause is useless: if we are trying to prove Implemented(Foo: Bar), we would then recursively have to prove Implemented(Foo: Bar), and that cycle would continue ad infinitum (the trait solver will terminate here, it would just consider that Implemented(Foo: Bar) is not known to be true).

However, some goals are co-inductive. Simply put, this means that cycles are OK. So, if Bar were a co-inductive trait, then the rule above would be perfectly valid, and it would indicate that Implemented(Foo: Bar) is true.

Auto traits are one example in Rust where co-inductive goals are used. Consider the Send trait, and imagine that we have this struct:


#![allow(unused_variables)]
fn main() {
struct Foo {
    next: Option<Box<Foo>>
}
}

The default rules for auto traits say that Foo is Send if the types of its fields are Send. Therefore, we would have a rule like

Implemented(Foo: Send) :-
    Implemented(Option<Box<Foo>>: Send).

As you can probably imagine, proving that Option<Box<Foo>>: Send is going to wind up circularly requiring us to prove that Foo: Send again. So this would be an example where we wind up in a cycle – but that's ok, we do consider Foo: Send to hold, even though it references itself.

In general, co-inductive traits are used in Rust trait solving when we want to enumerate a fixed set of possibilities. In the case of auto traits, we are enumerating the set of reachable types from a given starting point (i.e., Foo can reach values of type Option<Box<Foo>>, which implies it can reach values of type Box<Foo>, and then of type Foo, and then the cycle is complete).

In addition to auto traits, WellFormed predicates are co-inductive. These are used to achieve a similar "enumerate all the cases" pattern, as described in the section on implied bounds.

Incomplete chapter

Some topics yet to be written:

  • Elaborate on the proof procedure
  • SLG solving – introduce negative reasoning

Equality and associated types

This section covers how the trait system handles equality between associated types. The full system consists of several moving parts, which we will introduce one by one:

  • Projection and the Normalize predicate
  • Placeholder associated type projections
  • The ProjectionEq predicate
  • Integration with unification

Associated type projection and normalization

When a trait defines an associated type (e.g., the Item type in the IntoIterator trait), that type can be referenced by the user using an associated type projection like <Option<u32> as IntoIterator>::Item.

Often, people will use the shorthand syntax T::Item. Presently, that syntax is expanded during "type collection" into the explicit form, though that is something we may want to change in the future.

In some cases, associated type projections can be normalized – that is, simplified – based on the types given in an impl. So, to continue with our example, the impl of IntoIterator for Option<T> declares (among other things) that Item = T:

impl<T> IntoIterator for Option<T> {
  type Item = T;
  ...
}

This means we can normalize the projection <Option<u32> as IntoIterator>::Item to just u32.

In this case, the projection was a "monomorphic" one – that is, it did not have any type parameters. Monomorphic projections are special because they can always be fully normalized.

Often, we can normalize other associated type projections as well. For example, <Option<?T> as IntoIterator>::Item, where ?T is an inference variable, can be normalized to just ?T.

In our logic, normalization is defined by a predicate Normalize. The Normalize clauses arise only from impls. For example, the impl of IntoIterator for Option<T> that we saw above would be lowered to a program clause like so:

forall<T> {
    Normalize(<Option<T> as IntoIterator>::Item -> T) :-
        Implemented(Option<T>: IntoIterator)
}

where in this case, the one Implemented condition is always true.

Since we do not permit quantification over traits, this is really more like a family of program clauses, one for each associated type.

We could apply that rule to normalize either of the examples that we've seen so far.

Placeholder associated types

Sometimes however we want to work with associated types that cannot be normalized. For example, consider this function:

fn foo<T: IntoIterator>(...) { ... }

In this context, how would we normalize the type T::Item?

Without knowing what T is, we can't really do so. To represent this case, we introduce a type called a placeholder associated type projection. This is written like so: (IntoIterator::Item)<T>.

You may note that it looks a lot like a regular type (e.g., Option<T>), except that the "name" of the type is (IntoIterator::Item). This is not an accident: placeholder associated type projections work just like ordinary types like Vec<T> when it comes to unification. That is, they are only considered equal if (a) they are both references to the same associated type, like IntoIterator::Item and (b) their type arguments are equal.

Placeholder associated types are never written directly by the user. They are used internally by the trait system only, as we will see shortly.

In rustc, they correspond to the TyKind::UnnormalizedProjectionTy enum variant, declared in librustc_middle/ty/sty.rs. In chalk, we use an ApplicationTy with a name living in a special namespace dedicated to placeholder associated types (see the TypeName enum declared in chalk-ir/src/lib.rs).

Projection equality

So far we have seen two ways to answer the question of "When can we consider an associated type projection equal to another type?":

  • the Normalize predicate could be used to transform projections when we knew which impl applied;
  • placeholder associated types can be used when we don't. This is also known as lazy normalization.

We now introduce the ProjectionEq predicate to bring those two cases together. The ProjectionEq predicate looks like so:

ProjectionEq(<T as IntoIterator>::Item = U)

and we will see that it can be proven either via normalization or via the placeholder type. As part of lowering an associated type declaration from some trait, we create two program clauses for ProjectionEq:

forall<T, U> {
    ProjectionEq(<T as IntoIterator>::Item = U) :-
        Normalize(<T as IntoIterator>::Item -> U)
}

forall<T> {
    ProjectionEq(<T as IntoIterator>::Item = (IntoIterator::Item)<T>)
}

These are the only two ProjectionEq program clauses we ever make for any given associated item.

Integration with unification

Now we are ready to discuss how associated type equality integrates with unification. As described in the type inference section, unification is basically a procedure with a signature like this:

Unify(A, B) = Result<(Subgoals, RegionConstraints), NoSolution>

In other words, we try to unify two things A and B. That procedure might just fail, in which case we get back Err(NoSolution). This would happen, for example, if we tried to unify u32 and i32.

The key point is that, on success, unification can also give back to us a set of subgoals that still remain to be proven. (It can also give back region constraints, but those are not relevant here).

Whenever unification encounters a non-placeholder associated type projection P being equated with some other type T, it always succeeds, but it produces a subgoal ProjectionEq(P = T) that is propagated back up. Thus it falls to the ordinary workings of the trait system to process that constraint.

If we unify two projections P1 and P2, then unification produces a variable X and asks us to prove that ProjectionEq(P1 = X) and ProjectionEq(P2 = X). (That used to be needed in an older system to prevent cycles; I rather doubt it still is. -nmatsakis)

Implied Bounds

Implied bounds remove the need to repeat where clauses written on a type declaration or a trait declaration. For example, say we have the following type declaration:

struct HashSet<K: Hash> {
    ...
}

then everywhere we use HashSet<K> as an "input" type, that is appearing in the receiver type of an impl or in the arguments of a function, we don't want to have to repeat the where K: Hash bound, as in:

// I don't want to have to repeat `where K: Hash` here.
impl<K> HashSet<K> {
    ...
}

// Same here.
fn loud_insert<K>(set: &mut HashSet<K>, item: K) {
    println!("inserting!");
    set.insert(item);
}

Note that in the loud_insert example, HashSet<K> is not the type of the set argument of loud_insert, it only appears in the argument type &mut HashSet<K>: we care about every type appearing in the function's header (the header is the signature without the return type), not only types of the function's arguments.

The rationale for applying implied bounds to input types is that, for example, in order to call the loud_insert function above, the programmer must have produced the type HashSet<K> already, hence the compiler already verified that HashSet<K> was well-formed, i.e. that K effectively implemented Hash, as in the following example:

fn main() {
    // I am producing a value of type `HashSet<i32>`.
    // If `i32` was not `Hash`, the compiler would report an error here.
    let set: HashSet<i32> = HashSet::new();
    loud_insert(&mut set, 5);
}

Hence, we don't want to repeat where clauses for input types because that would sort of duplicate the work of the programmer, having to verify that their types are well-formed both when calling the function and when using them in the arguments of their function. The same reasoning applies when using an impl.

Similarly, given the following trait declaration:

trait Copy where Self: Clone { // desugared version of `Copy: Clone`
    ...
}

then everywhere we bound over SomeType: Copy, we would like to be able to use the fact that SomeType: Clone without having to write it explicitly, as in:

fn loud_clone<T: Clone>(x: T) {
    println!("cloning!");
    x.clone();
}

fn fun_with_copy<T: Copy>(x: T) {
    println!("will clone a `Copy` type soon...");

    // I'm using `loud_clone<T: Clone>` with `T: Copy`, I know this
    // implies `T: Clone` so I don't want to have to write it explicitly.
    loud_clone(x);
}

The rationale for implied bounds for traits is that if a type implements Copy, that is, if there exists an impl Copy for that type, there ought to exist an impl Clone for that type, otherwise the compiler would have reported an error in the first place. So again, if we were forced to repeat the additional where SomeType: Clone everywhere whereas we already know that SomeType: Copy hold, we would kind of duplicate the verification work.

Implied bounds are not yet completely enforced in rustc, at the moment it only works for outlive requirements, super trait bounds, and bounds on associated types. The full RFC can be found here. We'll give here a brief view of how implied bounds work and why we chose to implement it that way. The complete set of lowering rules can be found in the corresponding chapter.

Implied bounds and lowering rules

Now we need to express implied bounds in terms of logical rules. We will start with exposing a naive way to do it. Suppose that we have the following traits:

trait Foo {
    ...
}

trait Bar where Self: Foo { } {
    ...
}

So we would like to say that if a type implements Bar, then necessarily it must also implement Foo. We might think that a clause like this would work:

forall<Type> {
    Implemented(Type: Foo) :- Implemented(Type: Bar).
}

Now suppose that we just write this impl:

struct X;

impl Bar for X { }

Clearly this should not be allowed: indeed, we wrote a Bar impl for X, but the Bar trait requires that we also implement Foo for X, which we never did. In terms of what the compiler does, this would look like this:

struct X;

impl Bar for X {
    // We are in a `Bar` impl for the type `X`.
    // There is a `where Self: Foo` bound on the `Bar` trait declaration.
    // Hence I need to prove that `X` also implements `Foo` for that impl
    // to be legal.
}

So the compiler would try to prove Implemented(X: Foo). Of course it will not find any impl Foo for X since we did not write any. However, it will see our implied bound clause:

forall<Type> {
    Implemented(Type: Foo) :- Implemented(Type: Bar).
}

so that it may be able to prove Implemented(X: Foo) if Implemented(X: Bar) holds. And it turns out that Implemented(X: Bar) does hold since we wrote a Bar impl for X! Hence the compiler will accept the Bar impl while it should not.

Implied bounds coming from the environment

So the naive approach does not work. What we need to do is to somehow decouple implied bounds from impls. Suppose we know that a type SomeType<...> implements Bar and we want to deduce that SomeType<...> must also implement Foo.

There are two possibilities: first, we have enough information about SomeType<...> to see that there exists a Bar impl in the program which covers SomeType<...>, for example a plain impl<...> Bar for SomeType<...>. Then if the compiler has done its job correctly, there must exist a Foo impl which covers SomeType<...>, e.g. another plain impl<...> Foo for SomeType<...>. In that case then, we can just use this impl and we do not need implied bounds at all.

Second possibility: we do not know enough about SomeType<...> in order to find a Bar impl which covers it, for example if SomeType<...> is just a type parameter in a function:

fn foo<T: Bar>() {
    // We'd like to deduce `Implemented(T: Foo)`.
}

That is, the information that T implements Bar here comes from the environment. The environment is the set of things that we assume to be true when we type check some Rust declaration. In that case, what we assume is that T: Bar. Then at that point, we might authorize ourselves to have some kind of "local" implied bound reasoning which would say Implemented(T: Foo) :- Implemented(T: Bar). This reasoning would only be done within our foo function in order to avoid the earlier problem where we had a global clause.

We can apply these local reasonings everywhere we can have an environment -- i.e. when we can write where clauses -- that is, inside impls, trait declarations, and type declarations.

Computing implied bounds with FromEnv

The previous subsection showed that it was only useful to compute implied bounds for facts coming from the environment. We talked about "local" rules, but there are multiple possible strategies to indeed implement the locality of implied bounds.

In rustc, the current strategy is to elaborate bounds: that is, each time we have a fact in the environment, we recursively derive all the other things that are implied by this fact until we reach a fixed point. For example, if we have the following declarations:

trait A { }
trait B where Self: A { }
trait C where Self: B { }

fn foo<T: C>() {
    ...
}

then inside the foo function, we start with an environment containing only Implemented(T: C). Then because of implied bounds for the C trait, we elaborate Implemented(T: B) and add it to our environment. Because of implied bounds for the B trait, we elaborate Implemented(T: A)and add it to our environment as well. We cannot elaborate anything else, so we conclude that our final environment consists of Implemented(T: A + B + C).

In the new-style trait system, we like to encode as many things as possible with logical rules. So rather than "elaborating", we have a set of global program clauses defined like so:

forall<T> { Implemented(T: A) :- FromEnv(T: A). }

forall<T> { Implemented(T: B) :- FromEnv(T: B). }
forall<T> { FromEnv(T: A) :- FromEnv(T: B). }

forall<T> { Implemented(T: C) :- FromEnv(T: C). }
forall<T> { FromEnv(T: C) :- FromEnv(T: C). }

So these clauses are defined globally (that is, they are available from everywhere in the program) but they cannot be used because the hypothesis is always of the form FromEnv(...) which is a bit special. Indeed, as indicated by the name, FromEnv(...) facts can only come from the environment. How it works is that in the foo function, instead of having an environment containing Implemented(T: C), we replace this environment with FromEnv(T: C). From here and thanks to the above clauses, we see that we are able to reach any of Implemented(T: A), Implemented(T: B) or Implemented(T: C), which is what we wanted.

Implied bounds and well-formedness checking

Implied bounds are tightly related with well-formedness checking. Well-formedness checking is the process of checking that the impls the programmer wrote are legal, what we referred to earlier as "the compiler doing its job correctly".

We already saw examples of illegal and legal impls:

trait Foo { }
trait Bar where Self: Foo { }

struct X;
struct Y;

impl Bar for X {
    // This impl is not legal: the `Bar` trait requires that we also
    // implement `Foo`, and we didn't.
}

impl Foo for Y {
    // This impl is legal: there is nothing to check as there are no where
    // clauses on the `Foo` trait.
}

impl Bar for Y {
    // This impl is legal: we have a `Foo` impl for `Y`.
}

We must define what "legal" and "illegal" mean. For this, we introduce another predicate: WellFormed(Type: Trait). We say that the trait reference Type: Trait is well-formed if Type meets the bounds written on the Trait declaration. For each impl we write, assuming that the where clauses declared on the impl hold, the compiler tries to prove that the corresponding trait reference is well-formed. The impl is legal if the compiler manages to do so.

Coming to the definition of WellFormed(Type: Trait), it would be tempting to define it as:

trait Trait where WC1, WC2, ..., WCn {
    ...
}
forall<Type> {
    WellFormed(Type: Trait) :- WC1 && WC2 && .. && WCn.
}

and indeed this was basically what was done in rustc until it was noticed that this mixed badly with implied bounds. The key thing is that implied bounds allows someone to derive all bounds implied by a fact in the environment, and this transitively as we've seen with the A + B + C traits example. However, the WellFormed predicate as defined above only checks that the direct superbounds hold. That is, if we come back to our A + B + C example:

trait A { }
// No where clauses, always well-formed.
// forall<Type> { WellFormed(Type: A). }

trait B where Self: A { }
// We only check the direct superbound `Self: A`.
// forall<Type> { WellFormed(Type: B) :- Implemented(Type: A). }

trait C where Self: B { }
// We only check the direct superbound `Self: B`. We do not check
// the `Self: A` implied bound  coming from the `Self: B` superbound.
// forall<Type> { WellFormed(Type: C) :- Implemented(Type: B). }

There is an asymmetry between the recursive power of implied bounds and the shallow checking of WellFormed. It turns out that this asymmetry can be exploited. Indeed, suppose that we define the following traits:

trait Partial where Self: Copy { }
// WellFormed(Self: Partial) :- Implemented(Self: Copy).

trait Complete where Self: Partial { }
// WellFormed(Self: Complete) :- Implemented(Self: Partial).

impl<T> Partial for T where T: Complete { }

impl<T> Complete for T { }

For the Partial impl, what the compiler must prove is:

forall<T> {
    if (T: Complete) { // assume that the where clauses hold
        WellFormed(T: Partial) // show that the trait reference is well-formed
    }
}

Proving WellFormed(T: Partial) amounts to proving Implemented(T: Copy). However, we have Implemented(T: Complete) in our environment: thanks to implied bounds, we can deduce Implemented(T: Partial). Using implied bounds one level deeper, we can deduce Implemented(T: Copy). Finally, the Partial impl is legal.

For the Complete impl, what the compiler must prove is:

forall<T> {
    WellFormed(T: Complete) // show that the trait reference is well-formed
}

Proving WellFormed(T: Complete) amounts to proving Implemented(T: Partial). We see that the impl Partial for T applies if we can prove Implemented(T: Complete), and it turns out we can prove this fact since our impl<T> Complete for T is a blanket impl without any where clauses.

So both impls are legal and the compiler accepts the program. Moreover, thanks to the Complete blanket impl, all types implement Complete. So we could now use this impl like so:

fn eat<T>(x: T) { }

fn copy_everything<T: Complete>(x: T) {
    eat(x);
    eat(x);
}

fn main() {
    let not_copiable = vec![1, 2, 3, 4];
    copy_everything(not_copiable);
}

In this program, we use the fact that Vec<i32> implements Complete, as any other type. Hence we can call copy_everything with an argument of type Vec<i32>. Inside the copy_everything function, we have the Implemented(T: Complete) bound in our environment. Thanks to implied bounds, we can deduce Implemented(T: Partial). Using implied bounds again, we deduce Implemented(T: Copy) and we can indeed call the eat function which moves the argument twice since its argument is Copy. Problem: the T type was in fact Vec<i32> which is not copy at all, hence we will double-free the underlying vec storage so we have a memory unsoundness in safe Rust.

Of course, disregarding the asymmetry between WellFormed and implied bounds, this bug was possible only because we had some kind of self-referencing impls. But self-referencing impls are very useful in practice and are not the real culprits in this affair.

Co-inductiveness of WellFormed

So the solution is to fix this asymmetry between WellFormed and implied bounds. For that, we need for the WellFormed predicate to not only require that the direct superbounds hold, but also all the bounds transitively implied by the superbounds. What we can do is to have the following rules for the WellFormed predicate:

trait A { }
// WellFormed(Self: A) :- Implemented(Self: A).

trait B where Self: A { }
// WellFormed(Self: B) :- Implemented(Self: B) && WellFormed(Self: A).

trait C where Self: B { }
// WellFormed(Self: C) :- Implemented(Self: C) && WellFormed(Self: B).

Notice that we are now also requiring Implemented(Self: Trait) for WellFormed(Self: Trait) to be true: this is to simplify the process of traversing all the implied bounds transitively. This does not change anything when checking whether impls are legal, because since we assume that the where clauses hold inside the impl, we know that the corresponding trait reference do hold. Thanks to this setup, you can see that we indeed require to prove the set of all bounds transitively implied by the where clauses.

However there is still a catch. Suppose that we have the following trait definition:

trait Foo where <Self as Foo>::Item: Foo {
    type Item;
}

so this definition is a bit more involved than the ones we've seen already because it defines an associated item. However, the well-formedness rule would not be more complicated:

WellFormed(Self: Foo) :-
    Implemented(Self: Foo) &&
    WellFormed(<Self as Foo>::Item: Foo).

Now we would like to write the following impl:

impl Foo for i32 {
    type Item = i32;
}

The Foo trait definition and the impl Foo for i32 are perfectly valid Rust: we're kind of recursively using our Foo impl in order to show that the associated value indeed implements Foo, but that's ok. But if we translate this to our well-formedness setting, the compiler proof process inside the Foo impl is the following: it starts with proving that the well-formedness goal WellFormed(i32: Foo) is true. In order to do that, it must prove the following goals: Implemented(i32: Foo) and WellFormed(<i32 as Foo>::Item: Foo). Implemented(i32: Foo) holds because there is our impl and there are no where clauses on it so it's always true. However, because of the associated type value we used, WellFormed(<i32 as Foo>::Item: Foo) simplifies to just WellFormed(i32: Foo). So in order to prove its original goal WellFormed(i32: Foo), the compiler needs to prove WellFormed(i32: Foo): this clearly is a cycle and cycles are usually rejected by the trait solver, unless... if the WellFormed predicate was made to be co-inductive.

A co-inductive predicate, as discussed in the chapter on goals and clauses, are predicates for which the trait solver accepts cycles. In our setting, this would be a valid thing to do: indeed, the WellFormed predicate just serves as a way of enumerating all the implied bounds. Hence, it's like a fixed point algorithm: it tries to grow the set of implied bounds until there is nothing more to add. Here, a cycle in the chain of WellFormed predicates just means that there is no more bounds to add in that direction, so we can just accept this cycle and focus on other directions. It's easy to prove that under these co-inductive semantics, we are effectively visiting all the transitive implied bounds, and only these.

Implied bounds on types

We mainly talked about implied bounds for traits because this was the most subtle regarding implementation. Implied bounds on types are simpler, especially because if we assume that a type is well-formed, we don't use that fact to deduce that other types are well-formed, we only use it to deduce that e.g. some trait bounds hold.

For types, we just use rules like these ones:

struct Type<...> where WC1, ..., WCn {
    ...
}
forall<...> {
    WellFormed(Type<...>) :- WC1, ..., WCn.
}

forall<...> {
    FromEnv(WC1) :- FromEnv(Type<...>).
    ...
    FromEnv(WCn) :- FromEnv(Type<...>).
}

We can see that we have this asymmetry between well-formedness check, which only verifies that the direct superbounds hold, and implied bounds which gives access to all bounds transitively implied by the where clauses. In that case this is ok because as we said, we don't use FromEnv(Type<...>) to deduce other FromEnv(OtherType<...>) things, nor do we use FromEnv(Type: Trait) to deduce FromEnv(OtherType<...>) things. So in that sense type definitions are "less recursive" than traits, and we saw in a previous subsection that it was the combination of asymmetry and recursive trait / impls that led to unsoundness. As such, the WellFormed(Type<...>) predicate does not need to be co-inductive.

This asymmetry optimization is useful because in a real Rust program, we have to check the well-formedness of types very often (e.g. for each type which appears in the body of a function).

Region constraints

To be written.

Chalk does not have the concept of region constraints, and as of this writing, work on rustc was not far enough to worry about them.

In the meantime, you can read about region constraints in the type inference section.

The lowering module in rustc

The program clauses described in the lowering rules section are actually created in the rustc_traits::lowering module.

The program_clauses_for query

The main entry point is the program_clauses_for query, which – given a DefId – produces a set of Chalk program clauses. The query is invoked on a DefId that identifies something like a trait, an impl, or an associated item definition. It then produces and returns a vector of program clauses.

Unit tests

Note: We've removed the Chalk unit tests in rust-lang/rust#69247. They will come back once we're ready to integrate next Chalk into rustc.

Here's a good example test. At the time of this writing, it looked like this:

#![feature(rustc_attrs)]

trait Foo { }

#[rustc_dump_program_clauses] //~ ERROR program clause dump
impl<T: 'static> Foo for T where T: Iterator<Item = i32> { }

fn main() {
    println!("hello");
}

The #[rustc_dump_program_clauses] annotation can be attached to anything with a DefId (It requires the rustc_attrs feature). The compiler will then invoke the program_clauses_for query on that item, and emit compiler errors that dump the clauses produced. These errors just exist for unit-testing. The stderr will be:

error: program clause dump
  --> $DIR/lower_impl.rs:5:1
   |
LL | #[rustc_dump_program_clauses]
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: forall<T> { Implemented(T: Foo) :- ProjectionEq(<T as std::iter::Iterator>::Item == i32), TypeOutlives(T: 'static), Implemented(T: std::iter::Iterator), Implemented(T: std::marker::Sized). }

Lowering rules

This section gives the complete lowering rules for Rust traits into program clauses. It is a kind of reference. These rules reference the domain goals defined in an earlier section.

Notation

The nonterminal Pi is used to mean some generic parameter, either a named lifetime like 'a or a type parameter like A.

The nonterminal Ai is used to mean some generic argument, which might be a lifetime like 'a or a type like Vec<A>.

When defining the lowering rules, we will give goals and clauses in the notation given in this section. We sometimes insert "macros" like LowerWhereClause! into these definitions; these macros reference other sections within this chapter.

Rule names and cross-references

Each of these lowering rules is given a name, documented with a comment like so:

// Rule Foo-Bar-Baz

The reference implementation of these rules is to be found in chalk/chalk-solve/src/clauses.rs. They are also ported in rustc in the librustc_traits crate.

Lowering where clauses

When used in a goal position, where clauses can be mapped directly to the Holds variant of domain goals, as follows:

  • A0: Foo<A1..An> maps to Implemented(A0: Foo<A1..An>)
  • T: 'r maps to Outlives(T, 'r)
  • 'a: 'b maps to Outlives('a, 'b)
  • A0: Foo<A1..An, Item = T> is a bit special and expands to two distinct goals, namely Implemented(A0: Foo<A1..An>) and ProjectionEq(<A0 as Foo<A1..An>>::Item = T)

In the rules below, we will use WC to indicate where clauses that appear in Rust syntax; we will then use the same WC to indicate where those where clauses appear as goals in the program clauses that we are producing. In that case, the mapping above is used to convert from the Rust syntax into goals.

Transforming the lowered where clauses

In addition, in the rules below, we sometimes do some transformations on the lowered where clauses, as defined here:

  • FromEnv(WC) – this indicates that:
    • Implemented(TraitRef) becomes FromEnv(TraitRef)
    • other where-clauses are left intact
  • WellFormed(WC) – this indicates that:
    • Implemented(TraitRef) becomes WellFormed(TraitRef)
    • other where-clauses are left intact

TODO: I suspect that we want to alter the outlives relations too, but Chalk isn't modeling those right now.

Lowering traits

Given a trait definition

trait Trait<P1..Pn> // P0 == Self
where WC
{
    // trait items
}

we will produce a number of declarations. This section is focused on the program clauses for the trait header (i.e., the stuff outside the {}); the section on trait items covers the stuff inside the {}.

Trait header

From the trait itself we mostly make "meta" rules that setup the relationships between different kinds of domain goals. The first such rule from the trait header creates the mapping between the FromEnv and Implemented predicates:

// Rule Implemented-From-Env
forall<Self, P1..Pn> {
  Implemented(Self: Trait<P1..Pn>) :- FromEnv(Self: Trait<P1..Pn>)
}

Implied bounds

The next few clauses have to do with implied bounds (see also RFC 2089 and the implied bounds chapter for a more in depth cover). For each trait, we produce two clauses:

// Rule Implied-Bound-From-Trait
//
// For each where clause WC:
forall<Self, P1..Pn> {
  FromEnv(WC) :- FromEnv(Self: Trait<P1..Pn>)
}

This clause says that if we are assuming that the trait holds, then we can also assume that its where-clauses hold. It's perhaps useful to see an example:

trait Eq: PartialEq { ... }

In this case, the PartialEq supertrait is equivalent to a where Self: PartialEq where clause, in our simplified model. The program clause above therefore states that if we can prove FromEnv(T: Eq) – e.g., if we are in some function with T: Eq in its where clauses – then we also know that FromEnv(T: PartialEq). Thus the set of things that follow from the environment are not only the direct where clauses but also things that follow from them.

The next rule is related; it defines what it means for a trait reference to be well-formed:

// Rule WellFormed-TraitRef
forall<Self, P1..Pn> {
  WellFormed(Self: Trait<P1..Pn>) :- Implemented(Self: Trait<P1..Pn>) && WellFormed(WC)
}

This WellFormed rule states that T: Trait is well-formed if (a) T: Trait is implemented and (b) all the where-clauses declared on Trait are well-formed (and hence they are implemented). Remember that the WellFormed predicate is coinductive; in this case, it is serving as a kind of "carrier" that allows us to enumerate all the where clauses that are transitively implied by T: Trait.

An example:

trait Foo: A + Bar { }
trait Bar: B + Foo { }
trait A { }
trait B { }

Here, the transitive set of implications for T: Foo are T: A, T: Bar, and T: B. And indeed if we were to try to prove WellFormed(T: Foo), we would have to prove each one of those:

  • WellFormed(T: Foo)
    • Implemented(T: Foo)
    • WellFormed(T: A)
      • Implemented(T: A)
    • WellFormed(T: Bar)
      • Implemented(T: Bar)
      • WellFormed(T: B)
        • Implemented(T: Bar)
      • WellFormed(T: Foo) -- cycle, true coinductively

This WellFormed predicate is only used when proving that impls are well-formed – basically, for each impl of some trait ref TraitRef, we must show that WellFormed(TraitRef). This in turn justifies the implied bounds rules that allow us to extend the set of FromEnv items.

Lowering type definitions

We also want to have some rules which define when a type is well-formed. For example, given this type:

struct Set<K> where K: Hash { ... }

then Set<i32> is well-formed because i32 implements Hash, but Set<NotHash> would not be well-formed. Basically, a type is well-formed if its parameters verify the where clauses written on the type definition.

Hence, for every type definition:

struct Type<P1..Pn> where WC { ... }

we produce the following rule:

// Rule WellFormed-Type
forall<P1..Pn> {
  WellFormed(Type<P1..Pn>) :- WC
}

Note that we use struct for defining a type, but this should be understood as a general type definition (it could be e.g. a generic enum).

Conversely, we define rules which say that if we assume that a type is well-formed, we can also assume that its where clauses hold. That is, we produce the following family of rules:

// Rule Implied-Bound-From-Type
//
// For each where clause `WC`
forall<P1..Pn> {
  FromEnv(WC) :- FromEnv(Type<P1..Pn>)
}

As for the implied bounds RFC, functions will assume that their arguments are well-formed. For example, suppose we have the following bit of code:

trait Hash: Eq { }
struct Set<K: Hash> { ... }

fn foo<K>(collection: Set<K>, x: K, y: K) {
    // `x` and `y` can be equalized even if we did not explicitly write
    // `where K: Eq`
    if x == y {
        ...
    }
}

In the foo function, we assume that Set<K> is well-formed, i.e. we have FromEnv(Set<K>) in our environment. Because of the previous rule, we get FromEnv(K: Hash) without needing an explicit where clause. And because of the Hash trait definition, there also exists a rule which says:

forall<K> {
  FromEnv(K: Eq) :- FromEnv(K: Hash)
}

which means that we finally get FromEnv(K: Eq) and then can compare x and y without needing an explicit where clause.

Lowering trait items

Associated type declarations

Given a trait that declares a (possibly generic) associated type:

trait Trait<P1..Pn> // P0 == Self
where WC
{
    type AssocType<Pn+1..Pm>: Bounds where WC1;
}

We will produce a number of program clauses. The first two define the rules by which ProjectionEq can succeed; these two clauses are discussed in detail in the section on associated types, but reproduced here for reference:

// Rule ProjectionEq-Normalize
//
// ProjectionEq can succeed by normalizing:
forall<Self, P1..Pn, Pn+1..Pm, U> {
  ProjectionEq(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> = U) :-
      Normalize(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> -> U)
}
// Rule ProjectionEq-Placeholder
//
// ProjectionEq can succeed through the placeholder associated type,
// see "associated type" chapter for more:
forall<Self, P1..Pn, Pn+1..Pm> {
  ProjectionEq(
    <Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> =
    (Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>
  )
}

The next rule covers implied bounds for the projection. In particular, the Bounds declared on the associated type must have been proven to hold to show that the impl is well-formed, and hence we can rely on them elsewhere.

// Rule Implied-Bound-From-AssocTy
//
// For each `Bound` in `Bounds`:
forall<Self, P1..Pn, Pn+1..Pm> {
    FromEnv(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm>>: Bound) :-
      FromEnv(Self: Trait<P1..Pn>) && WC1
}

Next, we define the requirements for an instantiation of our associated type to be well-formed...

// Rule WellFormed-AssocTy
forall<Self, P1..Pn, Pn+1..Pm> {
    WellFormed((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>) :-
      Implemented(Self: Trait<P1..Pn>) && WC1
}

...along with the reverse implications, when we can assume that it is well-formed.

// Rule Implied-WC-From-AssocTy
//
// For each where clause WC1:
forall<Self, P1..Pn, Pn+1..Pm> {
    FromEnv(WC1) :- FromEnv((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>)
}
// Rule Implied-Trait-From-AssocTy
forall<Self, P1..Pn, Pn+1..Pm> {
    FromEnv(Self: Trait<P1..Pn>) :-
      FromEnv((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>)
}

Lowering function and constant declarations

Chalk didn't model functions and constants, but I would eventually like to treat them exactly like normalization. See the section on function/constant values below for more details.

Lowering impls

Given an impl of a trait:

impl<P0..Pn> Trait<A1..An> for A0
where WC
{
    // zero or more impl items
}

Let TraitRef be the trait reference A0: Trait<A1..An>. Then we will create the following rules:

// Rule Implemented-From-Impl
forall<P0..Pn> {
  Implemented(TraitRef) :- WC
}

In addition, we will lower all of the impl items.

Lowering impl items

Associated type values

Given an impl that contains:

impl<P0..Pn> Trait<P1..Pn> for P0
where WC_impl
{
    type AssocType<Pn+1..Pm> = T;
}

and our where clause WC1 on the trait associated type from above, we produce the following rule:

// Rule Normalize-From-Impl
forall<P0..Pm> {
  forall<Pn+1..Pm> {
    Normalize(<P0 as Trait<P1..Pn>>::AssocType<Pn+1..Pm> -> T) :-
      Implemented(P0 as Trait) && WC1
  }
}

Note that WC_impl and WC1 both encode where-clauses that the impl can rely on. (WC_impl is not used here, because it is implied by Implemented(P0 as Trait).)

Function and constant values

Chalk didn't model functions and constants, but I would eventually like to treat them exactly like normalization. This presumably involves adding a new kind of parameter (constant), and then having a NormalizeValue domain goal. This is to be written because the details are a bit up in the air.

Well-formedness checking

WF checking has the job of checking that the various declarations in a Rust program are well-formed. This is the basis for implied bounds, and partly for that reason, this checking can be surprisingly subtle! For example, we have to be sure that each impl proves the WF conditions declared on the trait.

For each declaration in a Rust program, we will generate a logical goal and try to prove it using the lowered rules we described in the lowering rules chapter. If we are able to prove it, we say that the construct is well-formed. If not, we report an error to the user.

Well-formedness checking happens in the chalk/chalk-solve/src/wf.rs module in chalk. After you have read this chapter, you may find useful to see an extended set of examples in the chalk/tests/test/wf_lowering.rs submodule.

The new-style WF checking has not been implemented in rustc yet.

We give here a complete reference of the generated goals for each Rust declaration.

In addition to the notations introduced in the chapter about lowering rules, we'll introduce another notation: when checking WF of a declaration, we'll often have to prove that all types that appear are well-formed, except type parameters that we always assume to be WF. Hence, we'll use the following notation: for a type SomeType<...>, we define InputTypes(SomeType<...>) to be the set of all non-parameter types appearing in SomeType<...>, including SomeType<...> itself.

Examples:

  • InputTypes((u32, f32)) = [u32, f32, (u32, f32)]
  • InputTypes(Box<T>) = [Box<T>] (assuming that T is a type parameter)
  • InputTypes(Box<Box<T>>) = [Box<T>, Box<Box<T>>]

We also extend the InputTypes notation to where clauses in the natural way. So, for example InputTypes(A0: Trait<A1,...,An>) is the union of InputTypes(A0), InputTypes(A1), ..., InputTypes(An).

Type definitions

Given a general type definition:

struct Type<P...> where WC_type {
    field1: A1,
    ...
    fieldn: An,
}

we generate the following goal, which represents its well-formedness condition:

forall<P...> {
    if (FromEnv(WC_type)) {
        WellFormed(InputTypes(WC_type)) &&
            WellFormed(InputTypes(A1)) &&
            ...
            WellFormed(InputTypes(An))
    }
}

which in English states: assuming that the where clauses defined on the type hold, prove that every type appearing in the type definition is well-formed.

Some examples:

struct OnlyClone<T> where T: Clone {
    clonable: T,
}
// The only types appearing are type parameters: we have nothing to check,
// the type definition is well-formed.
struct Foo<T> where T: Clone {
    foo: OnlyClone<T>,
}
// The only non-parameter type which appears in this definition is
// `OnlyClone<T>`. The generated goal is the following:
// ```
// forall<T> {
//     if (FromEnv(T: Clone)) {
//          WellFormed(OnlyClone<T>)
//     }
// }
// ```
// which is provable.
struct Bar<T> where <T as Iterator>::Item: Debug {
    bar: i32,
}
// The only non-parameter types which appear in this definition are
// `<T as Iterator>::Item` and `i32`. The generated goal is the following:
// ```
// forall<T> {
//     if (FromEnv(<T as Iterator>::Item: Debug)) {
//          WellFormed(<T as Iterator>::Item) &&
//               WellFormed(i32)
//     }
// }
// ```
// which is not provable since `WellFormed(<T as Iterator>::Item)` requires
// proving `Implemented(T: Iterator)`, and we are unable to prove that for an
// unknown `T`.
//
// Hence, this type definition is considered illegal. An additional
// `where T: Iterator` would make it legal.

Trait definitions

Given a general trait definition:

trait Trait<P1...> where WC_trait {
    type Assoc<P2...>: Bounds_assoc where WC_assoc;
}

we generate the following goal:

forall<P1...> {
    if (FromEnv(WC_trait)) {
        WellFormed(InputTypes(WC_trait)) &&

            forall<P2...> {
                if (FromEnv(WC_assoc)) {
                    WellFormed(InputTypes(Bounds_assoc)) &&
                        WellFormed(InputTypes(WC_assoc))
                }
            }
    }
}

There is not much to verify in a trait definition. We just want to prove that the types appearing in the trait definition are well-formed, under the assumption that the different where clauses hold.

Some examples:

trait Foo<T> where T: Iterator, <T as Iterator>::Item: Debug {
    ...
}
// The only non-parameter type which appears in this definition is
// `<T as Iterator>::Item`. The generated goal is the following:
// ```
// forall<T> {
//     if (FromEnv(T: Iterator), FromEnv(<T as Iterator>::Item: Debug)) {
//         WellFormed(<T as Iterator>::Item)
//     }
// }
// ```
// which is provable thanks to the `FromEnv(T: Iterator)` assumption.
trait Bar {
    type Assoc<T>: From<<T as Iterator>::Item>;
}
// The only non-parameter type which appears in this definition is
// `<T as Iterator>::Item`. The generated goal is the following:
// ```
// forall<T> {
//     WellFormed(<T as Iterator>::Item)
// }
// ```
// which is not provable, hence the trait definition is considered illegal.
trait Baz {
    type Assoc<T>: From<<T as Iterator>::Item> where T: Iterator;
}
// The generated goal is now:
// ```
// forall<T> {
//     if (FromEnv(T: Iterator)) {
//         WellFormed(<T as Iterator>::Item)
//     }
// }
// ```
// which is now provable.

Impls

Now we give ourselves a general impl for the trait defined above:

impl<P1...> Trait<A1...> for SomeType<A2...> where WC_impl {
    type Assoc<P2...> = SomeValue<A3...> where WC_assoc;
}

Note that here, WC_assoc are the same where clauses as those defined on the associated type definition in the trait declaration, except that type parameters from the trait are substituted with values provided by the impl (see example below). You cannot add new where clauses. You may omit to write the where clauses if you want to emphasize the fact that you are actually not relying on them.

Some examples to illustrate that:

trait Foo<T> {
    type Assoc where T: Clone;
}

struct OnlyClone<T: Clone> { ... }

impl<U> Foo<Option<U>> for () {
    // We substitute type parameters from the trait by the ones provided
    // by the impl, that is instead of having a `T: Clone` where clause,
    // we have an `Option<U>: Clone` one.
    type Assoc = OnlyClone<Option<U>> where Option<U>: Clone;
}

impl<T> Foo<T> for i32 {
    // I'm not using the `T: Clone` where clause from the trait, so I can
    // omit it.
    type Assoc = u32;
}

impl<T> Foo<T> for f32 {
    type Assoc = OnlyClone<Option<T>> where Option<T>: Clone;
    //                                ^^^^^^^^^^^^^^^^^^^^^^
    //                                this where clause does not exist
    //                                on the original trait decl: illegal
}

So in Rust, where clauses on associated types work exactly like where clauses on trait methods: in an impl, we must substitute the parameters from the traits with values provided by the impl, we may omit them if we don't need them, but we cannot add new where clauses.

Now let's see the generated goal for this general impl:

forall<P1...> {
    // Well-formedness of types appearing in the impl
    if (FromEnv(WC_impl), FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>))) {
        WellFormed(InputTypes(WC_impl)) &&

            forall<P2...> {
                if (FromEnv(WC_assoc)) {
                        WellFormed(InputTypes(SomeValue<A3...>))
                }
            }
    }

    // Implied bounds checking
    if (FromEnv(WC_impl), FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>))) {
        WellFormed(SomeType<A2...>: Trait<A1...>) &&

            forall<P2...> {
                if (FromEnv(WC_assoc)) {
                    WellFormed(SomeValue<A3...>: Bounds_assoc)
                }
            }
    }
}

Here is the most complex goal. As always, first, assuming that the various where clauses hold, we prove that every type appearing in the impl is well-formed, except types appearing in the impl header SomeType<A2...>: Trait<A1...>. Instead, we assume that those types are well-formed (hence the if (FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>))) conditions). This is part of the implied bounds proposal, so that we can rely on the bounds written on the definition of e.g. the SomeType<A2...> type (and that we don't need to repeat those bounds).

Note that we don't need to check well-formedness of types appearing in WC_assoc because we already did that in the trait decl (they are just repeated with some substitutions of values which we already assume to be well-formed)

Next, still assuming that the where clauses on the impl WC_impl hold and that the input types of SomeType<A2...> are well-formed, we prove that WellFormed(SomeType<A2...>: Trait<A1...>) hold. That is, we want to prove that SomeType<A2...> verify all the where clauses that might transitively be required by the Trait definition (see this subsection).

Lastly, assuming in addition that the where clauses on the associated type WC_assoc hold, we prove that WellFormed(SomeValue<A3...>: Bounds_assoc) hold. Again, we are not only proving Implemented(SomeValue<A3...>: Bounds_assoc), but also all the facts that might transitively come from Bounds_assoc. We must do this because we allow the use of implied bounds on associated types: if we have FromEnv(SomeType: Trait) in our environment, the lowering rules chapter indicates that we are able to deduce FromEnv(<SomeType as Trait>::Assoc: Bounds_assoc) without knowing what the precise value of <SomeType as Trait>::Assoc is.

Some examples for the generated goal:

// Trait Program Clauses

// These are program clauses that come from the trait definitions below
// and that the trait solver can use for its reasonings. I'm just restating
// them here so that we have them in mind.

trait Copy { }
// This is a program clause that comes from the trait definition above
// and that the trait solver can use for its reasonings. I'm just restating
// it here (and also the few other ones coming just after) so that we have
// them in mind.
// `WellFormed(Self: Copy) :- Implemented(Self: Copy).`

trait Partial where Self: Copy { }
// ```
// WellFormed(Self: Partial) :-
//     Implemented(Self: Partial) &&
//     WellFormed(Self: Copy).
// ```

trait Complete where Self: Partial { }
// ```
// WellFormed(Self: Complete) :-
//     Implemented(Self: Complete) &&
//     WellFormed(Self: Partial).
// ```

// Impl WF Goals

impl<T> Partial for T where T: Complete { }
// The generated goal is:
// ```
// forall<T> {
//     if (FromEnv(T: Complete)) {
//         WellFormed(T: Partial)
//     }
// }
// ```
// Then proving `WellFormed(T: Partial)` amounts to proving
// `Implemented(T: Partial)` and `Implemented(T: Copy)`.
// Both those facts can be deduced from the `FromEnv(T: Complete)` in our
// environment: this impl is legal.

impl<T> Complete for T { }
// The generated goal is:
// ```
// forall<T> {
//     WellFormed(T: Complete)
// }
// ```
// Then proving `WellFormed(T: Complete)` amounts to proving
// `Implemented(T: Complete)`, `Implemented(T: Partial)` and
// `Implemented(T: Copy)`.
//
// `Implemented(T: Complete)` can be proved thanks to the
// `impl<T> Complete for T` blanket impl.
//
// `Implemented(T: Partial)` can be proved thanks to the
// `impl<T> Partial for T where T: Complete` impl and because we know
// `T: Complete` holds.

// However, `Implemented(T: Copy)` cannot be proved: the impl is illegal.
// An additional `where T: Copy` bound would be sufficient to make that impl
// legal.
trait Bar { }

impl<T> Bar for T where <T as Iterator>::Item: Bar { }
// We have a non-parameter type appearing in the where clauses:
// `<T as Iterator>::Item`. The generated goal is:
// ```
// forall<T> {
//     if (FromEnv(<T as Iterator>::Item: Bar)) {
//         WellFormed(T: Bar) &&
//             WellFormed(<T as Iterator>::Item: Bar)
//     }
// }
// ```
// And `WellFormed(<T as Iterator>::Item: Bar)` is not provable: we'd need
// an additional `where T: Iterator` for example.
trait Foo { }

trait Bar {
    type Item: Foo;
}

struct Stuff<T> { }

impl<T> Bar for Stuff<T> where T: Foo {
    type Item = T;
}
// The generated goal is:
// ```
// forall<T> {
//     if (FromEnv(T: Foo)) {
//         WellFormed(T: Foo).
//     }
// }
// ```
// which is provable.
trait Debug { ... }
// `WellFormed(Self: Debug) :- Implemented(Self: Debug).`

struct Box<T> { ... }
impl<T> Debug for Box<T> where T: Debug { ... }

trait PointerFamily {
    type Pointer<T>: Debug where T: Debug;
}
// `WellFormed(Self: PointerFamily) :- Implemented(Self: PointerFamily).`

struct BoxFamily;

impl PointerFamily for BoxFamily {
    type Pointer<T> = Box<T> where T: Debug;
}
// The generated goal is:
// ```
// forall<T> {
//     WellFormed(BoxFamily: PointerFamily) &&
//
//     if (FromEnv(T: Debug)) {
//         WellFormed(Box<T>: Debug) &&
//             WellFormed(Box<T>)
//     }
// }
// ```
// `WellFormed(BoxFamily: PointerFamily)` amounts to proving
// `Implemented(BoxFamily: PointerFamily)`, which is ok thanks to our impl.
//
// `WellFormed(Box<T>)` is always true (there are no where clauses on the
// `Box` type definition).
//
// Moreover, we have an `impl<T: Debug> Debug for Box<T>`, hence
// we can prove `WellFormed(Box<T>: Debug)` and the impl is indeed legal.
trait Foo {
    type Assoc<T>;
}

struct OnlyClone<T: Clone> { ... }

impl Foo for i32 {
    type Assoc<T> = OnlyClone<T>;
}
// The generated goal is:
// ```
// forall<T> {
//     WellFormed(i32: Foo) &&
//        WellFormed(OnlyClone<T>)
// }
// ```
// however `WellFormed(OnlyClone<T>)` is not provable because it requires
// `Implemented(T: Clone)`. It would be tempting to just add a `where T: Clone`
// bound inside the `impl Foo for i32` block, however we saw that it was
// illegal to add where clauses that didn't come from the trait definition.

Canonical queries

The "start" of the trait system is the canonical query (these are both queries in the more general sense of the word – something you would like to know the answer to – and in the rustc-specific sense). The idea is that the type checker or other parts of the system, may in the course of doing their thing want to know whether some trait is implemented for some type (e.g., is u32: Debug true?). Or they may want to normalize some associated type.

This section covers queries at a fairly high level of abstraction. The subsections look a bit more closely at how these ideas are implemented in rustc.

The traditional, interactive Prolog query

In a traditional Prolog system, when you start a query, the solver will run off and start supplying you with every possible answer it can find. So given something like this:

?- Vec<i32>: AsRef<?U>

The solver might answer:

Vec<i32>: AsRef<[i32]>
    continue? (y/n)

This continue bit is interesting. The idea in Prolog is that the solver is finding all possible instantiations of your query that are true. In this case, if we instantiate ?U = [i32], then the query is true (note that a traditional Prolog interface does not, directly, tell us a value for ?U, but we can infer one by unifying the response with our original query – Rust's solver gives back a substitution instead). If we were to hit y, the solver might then give us another possible answer:

Vec<i32>: AsRef<Vec<i32>>
    continue? (y/n)

This answer derives from the fact that there is a reflexive impl (impl<T> AsRef<T> for T) for AsRef. If were to hit y again, then we might get back a negative response:

no

Naturally, in some cases, there may be no possible answers, and hence the solver will just give me back no right away:

?- Box<i32>: Copy
    no

In some cases, there might be an infinite number of responses. So for example if I gave this query, and I kept hitting y, then the solver would never stop giving me back answers:

?- Vec<?U>: Clone
    Vec<i32>: Clone
        continue? (y/n)
    Vec<Box<i32>>: Clone
        continue? (y/n)
    Vec<Box<Box<i32>>>: Clone
        continue? (y/n)
    Vec<Box<Box<Box<i32>>>>: Clone
        continue? (y/n)

As you can imagine, the solver will gleefully keep adding another layer of Box until we ask it to stop, or it runs out of memory.

Another interesting thing is that queries might still have variables in them. For example:

?- Rc<?T>: Clone

might produce the answer:

Rc<?T>: Clone
    continue? (y/n)

After all, Rc<?T> is true no matter what type ?T is.

A trait query in rustc

The trait queries in rustc work somewhat differently. Instead of trying to enumerate all possible answers for you, they are looking for an unambiguous answer. In particular, when they tell you the value for a type variable, that means that this is the only possible instantiation that you could use, given the current set of impls and where-clauses, that would be provable. (Internally within the solver, though, they can potentially enumerate all possible answers. See the description of the SLG solver for details.)

The response to a trait query in rustc is typically a Result<QueryResult<T>, NoSolution> (where the T will vary a bit depending on the query itself). The Err(NoSolution) case indicates that the query was false and had no answers (e.g., Box<i32>: Copy). Otherwise, the QueryResult gives back information about the possible answer(s) we did find. It consists of four parts:

  • Certainty: tells you how sure we are of this answer. It can have two values:
    • Proven means that the result is known to be true.
      • This might be the result for trying to prove Vec<i32>: Clone, say, or Rc<?T>: Clone.
    • Ambiguous means that there were things we could not yet prove to be either true or false, typically because more type information was needed. (We'll see an example shortly.)
      • This might be the result for trying to prove Vec<?T>: Clone.
  • Var values: Values for each of the unbound inference variables (like ?T) that appeared in your original query. (Remember that in Prolog, we had to infer these.)
    • As we'll see in the example below, we can get back var values even for Ambiguous cases.
  • Region constraints: these are relations that must hold between the lifetimes that you supplied as inputs. We'll ignore these here, but see the section on handling regions in traits for more details.
  • Value: The query result also comes with a value of type T. For some specialized queries – like normalizing associated types – this is used to carry back an extra result, but it's often just ().

Examples

Let's work through an example query to see what all the parts mean. Consider the Borrow trait. This trait has a number of impls; among them, there are these two (for clarity, I've written the Sized bounds explicitly):

impl<T> Borrow<T> for T where T: ?Sized
impl<T> Borrow<[T]> for Vec<T> where T: Sized

Example 1. Imagine we are type-checking this (rather artificial) bit of code:

fn foo<A, B>(a: A, vec_b: Option<B>) where A: Borrow<B> { }

fn main() {
    let mut t: Vec<_> = vec![]; // Type: Vec<?T>
    let mut u: Option<_> = None; // Type: Option<?U>
    foo(t, u); // Example 1: requires `Vec<?T>: Borrow<?U>`
    ...
}

As the comments indicate, we first create two variables t and u; t is an empty vector and u is a None option. Both of these variables have unbound inference variables in their type: ?T represents the elements in the vector t and ?U represents the value stored in the option u. Next, we invoke foo; comparing the signature of foo to its arguments, we wind up with A = Vec<?T> and B = ?U. Therefore, the where clause on foo requires that Vec<?T>: Borrow<?U>. This is thus our first example trait query.

There are many possible solutions to the query Vec<?T>: Borrow<?U>; for example:

  • ?U = Vec<?T>,
  • ?U = [?T],
  • ?T = u32, ?U = [u32]
  • and so forth.

Therefore, the result we get back would be as follows (I'm going to ignore region constraints and the "value"):

  • Certainty: Ambiguous – we're not sure yet if this holds
  • Var values: [?T = ?T, ?U = ?U] – we learned nothing about the values of the variables

In short, the query result says that it is too soon to say much about whether this trait is proven. During type-checking, this is not an immediate error: instead, the type checker would hold on to this requirement (Vec<?T>: Borrow<?U>) and wait. As we'll see in the next example, it may happen that ?T and ?U wind up constrained from other sources, in which case we can try the trait query again.

Example 2. We can now extend our previous example a bit, and assign a value to u:

fn foo<A, B>(a: A, vec_b: Option<B>) where A: Borrow<B> { }

fn main() {
    // What we saw before:
    let mut t: Vec<_> = vec![]; // Type: Vec<?T>
    let mut u: Option<_> = None; // Type: Option<?U>
    foo(t, u); // `Vec<?T>: Borrow<?U>` => ambiguous

    // New stuff:
    u = Some(vec![]); // ?U = Vec<?V>
}

As a result of this assignment, the type of u is forced to be Option<Vec<?V>>, where ?V represents the element type of the vector. This in turn implies that ?U is unified to Vec<?V>.

Let's suppose that the type checker decides to revisit the "as-yet-unproven" trait obligation we saw before, Vec<?T>: Borrow<?U>. ?U is no longer an unbound inference variable; it now has a value, Vec<?V>. So, if we "refresh" the query with that value, we get:

Vec<?T>: Borrow<Vec<?V>>

This time, there is only one impl that applies, the reflexive impl:

impl<T> Borrow<T> for T where T: ?Sized

Therefore, the trait checker will answer:

  • Certainty: Proven
  • Var values: [?T = ?T, ?V = ?T]

Here, it is saying that we have indeed proven that the obligation holds, and we also know that ?T and ?V are the same type (but we don't know what that type is yet!).

(In fact, as the function ends here, the type checker would give an error at this point, since the element types of t and u are still not yet known, even though they are known to be the same.)

Canonicalization

Canonicalization is the process of isolating an inference value from its context. It is a key part of implementing canonical queries, and you may wish to read the parent chapter to get more context.

Canonicalization is really based on a very simple concept: every inference variable is always in one of two states: either it is unbound, in which case we don't know yet what type it is, or it is bound, in which case we do. So to isolate some data-structure T that contains types/regions from its environment, we just walk down and find the unbound variables that appear in T; those variables get replaced with "canonical variables", starting from zero and numbered in a fixed order (left to right, for the most part, but really it doesn't matter as long as it is consistent).

So, for example, if we have the type X = (?T, ?U), where ?T and ?U are distinct, unbound inference variables, then the canonical form of X would be (?0, ?1), where ?0 and ?1 represent these canonical placeholders. Note that the type Y = (?U, ?T) also canonicalizes to (?0, ?1). But the type Z = (?T, ?T) would canonicalize to (?0, ?0) (as would (?U, ?U)). In other words, the exact identity of the inference variables is not important – unless they are repeated.

We use this to improve caching as well as to detect cycles and other things during trait resolution. Roughly speaking, the idea is that if two trait queries have the same canonical form, then they will get the same answer. That answer will be expressed in terms of the canonical variables (?0, ?1), which we can then map back to the original variables (?T, ?U).

Canonicalizing the query

To see how it works, imagine that we are asking to solve the following trait query: ?A: Foo<'static, ?B>, where ?A and ?B are unbound. This query contains two unbound variables, but it also contains the lifetime 'static. The trait system generally ignores all lifetimes and treats them equally, so when canonicalizing, we will also replace any free lifetime with a canonical variable (Note that 'static is actually a free lifetime variable here. We are not considering it in the typing context of the whole program but only in the context of this trait reference. Mathematically, we are not quantifying over the whole program, but only this obligation). Therefore, we get the following result:

?0: Foo<'?1, ?2>

Sometimes we write this differently, like so:

for<T,L,T> { ?0: Foo<'?1, ?2> }

This for<> gives some information about each of the canonical variables within. In this case, each T indicates a type variable, so ?0 and ?2 are types; the L indicates a lifetime variable, so ?1 is a lifetime. The canonicalize method also gives back a CanonicalVarValues array OV with the "original values" for each canonicalized variable:

[?A, 'static, ?B]

We'll need this vector OV later, when we process the query response.

Executing the query

Once we've constructed the canonical query, we can try to solve it. To do so, we will wind up creating a fresh inference context and instantiating the canonical query in that context. The idea is that we create a substitution S from the canonical form containing a fresh inference variable (of suitable kind) for each canonical variable. So, for our example query:

for<T,L,T> { ?0: Foo<'?1, ?2> }

the substitution S might be:

S = [?A, '?B, ?C]

We can then replace the bound canonical variables (?0, etc) with these inference variables, yielding the following fully instantiated query:

?A: Foo<'?B, ?C>

Remember that substitution S though! We're going to need it later.

OK, now that we have a fresh inference context and an instantiated query, we can go ahead and try to solve it. The trait solver itself is explained in more detail in another section, but suffice to say that it will compute a certainty value (Proven or Ambiguous) and have side-effects on the inference variables we've created. For example, if there were only one impl of Foo, like so:

impl<'a, X> Foo<'a, X> for Vec<X>
where X: 'a
{ ... }

then we might wind up with a certainty value of Proven, as well as creating fresh inference variables '?D and ?E (to represent the parameters on the impl) and unifying as follows:

  • '?B = '?D
  • ?A = Vec<?E>
  • ?C = ?E

We would also accumulate the region constraint ?E: '?D, due to the where clause.

In order to create our final query result, we have to "lift" these values out of the query's inference context and into something that can be reapplied in our original inference context. We do that by re-applying canonicalization, but to the query result.

Canonicalizing the query result

As discussed in the parent section, most trait queries wind up with a result that brings together a "certainty value" certainty, a result substitution var_values, and some region constraints. To create this, we wind up re-using the substitution S that we created when first instantiating our query. To refresh your memory, we had a query

for<T,L,T> { ?0: Foo<'?1, ?2> }

for which we made a substutition S:

S = [?A, '?B, ?C]

We then did some work which unified some of those variables with other things. If we "refresh" S with the latest results, we get:

S = [Vec<?E>, '?D, ?E]

These are precisely the new values for the three input variables from our original query. Note though that they include some new variables (like ?E). We can make those go away by canonicalizing again! We don't just canonicalize S, though, we canonicalize the whole query response QR:

QR = {
    certainty: Proven,             // or whatever
    var_values: [Vec<?E>, '?D, ?E] // this is S
    region_constraints: [?E: '?D], // from the impl
    value: (),                     // for our purposes, just (), but
                                   // in some cases this might have
                                   // a type or other info
}

The result would be as follows:

Canonical(QR) = for<T, L> {
    certainty: Proven,
    var_values: [Vec<?0>, '?1, ?0]
    region_constraints: [?0: '?1],
    value: (),
}

(One subtle point: when we canonicalize the query result, we do not use any special treatment for free lifetimes. Note that both references to '?D, for example, were converted into the same canonical variable (?1). This is in contrast to the original query, where we canonicalized every free lifetime into a fresh canonical variable.)

Now, this result must be reapplied in each context where needed.

Processing the canonicalized query result

In the previous section we produced a canonical query result. We now have to apply that result in our original context. If you recall, way back in the beginning, we were trying to prove this query:

?A: Foo<'static, ?B>

We canonicalized that into this:

for<T,L,T> { ?0: Foo<'?1, ?2> }

and now we got back a canonical response:

for<T, L> {
    certainty: Proven,
    var_values: [Vec<?0>, '?1, ?0]
    region_constraints: [?0: '?1],
    value: (),
}

We now want to apply that response to our context. Conceptually, how we do that is to (a) instantiate each of the canonical variables in the result with a fresh inference variable, (b) unify the values in the result with the original values, and then (c) record the region constraints for later. Doing step (a) would yield a result of

{
      certainty: Proven,
      var_values: [Vec<?C>, '?D, ?C]
                       ^^   ^^^ fresh inference variables
      region_constraints: [?C: '?D],
      value: (),
}

Step (b) would then unify:

?A with Vec<?C>
'static with '?D
?B with ?C

And finally the region constraint of ?C: 'static would be recorded for later verification.

(What we actually do is a mildly optimized variant of that: Rather than eagerly instantiating all of the canonical values in the result with variables, we instead walk the vector of values, looking for cases where the value is just a canonical variable. In our example, values[2] is ?C, so that means we can deduce that ?C := ?B and '?D := 'static. This gives us a partial set of values. Anything for which we do not find a value, we create an inference variable.)

The On-Demand SLG solver

Given a set of program clauses (provided by our lowering rules) and a query, we need to return the result of the query and the value of any type variables we can determine. This is the job of the solver.

For example, exists<T> { Vec<T>: FromIterator<u32> } has one solution, so its result is Unique; substitution [?T := u32]. A solution also comes with a set of region constraints, which we'll ignore in this introduction.

Goals of the Solver

On demand

There are often many, or even infinitely many, solutions to a query. For example, say we want to prove that exists<T> { Vec<T>: Debug } for some type ?T. Our solver should be capable of yielding one answer at a time, say ?T = u32, then ?T = i32, and so on, rather than iterating over every type in the type system. If we need more answers, we can request more until we are done. This is similar to how Prolog works.

See also: The traditional, interactive Prolog query

Breadth-first

Vec<?T>: Debug is true if ?T: Debug. This leads to a cycle: [Vec<u32>, Vec<Vec<u32>>, Vec<Vec<Vec<u32>>>], and so on all implement Debug. Our solver ought to be breadth first and consider answers like [Vec<u32>: Debug, Vec<i32>: Debug, ...] before it recurses, or we may never find the answer we're looking for.

Cachable

To speed up compilation, we need to cache results, including partial results left over from past solver queries.

Description of how it works

The basis of the solver is the Forest type. A forest stores a collection of tables as well as a stack. Each table represents the stored results of a particular query that is being performed, as well as the various strands, which are basically suspended computations that may be used to find more answers. Tables are interdependent: solving one query may require solving others.

Walkthrough

Perhaps the easiest way to explain how the solver works is to walk through an example. Let's imagine that we have the following program:

trait Debug { }

struct u32 { }
impl Debug for u32 { }

struct Rc<T> { }
impl<T: Debug> Debug for Rc<T> { }

struct Vec<T> { }
impl<T: Debug> Debug for Vec<T> { }

Now imagine that we want to find answers for the query exists<T> { Rc<T>: Debug }. The first step would be to u-canonicalize this query; this is the act of giving canonical names to all the unbound inference variables based on the order of their left-most appearance, as well as canonicalizing the universes of any universally bound names (e.g., the T in forall<T> { ... }). In this case, there are no universally bound names, but the canonical form Q of the query might look something like:

Rc<?0>: Debug

where ?0 is a variable in the root universe U0. We would then go and look for a table with this canonical query as the key: since the forest is empty, this lookup will fail, and we will create a new table T0, corresponding to the u-canonical goal Q.

Ignoring negative reasoning and regions. To start, we'll ignore the possibility of negative goals like not { Foo }. We'll phase them in later, as they bring several complications.

Creating a table. When we first create a table, we also initialize it with a set of initial strands. A "strand" is kind of like a "thread" for the solver: it contains a particular way to produce an answer. The initial set of strands for a goal like Rc<?0>: Debug (i.e., a "domain goal") is determined by looking for clauses in the environment. In Rust, these clauses derive from impls, but also from where-clauses that are in scope. In the case of our example, there would be three clauses, each coming from the program. Using a Prolog-like notation, these look like:

(u32: Debug).
(Rc<T>: Debug) :- (T: Debug).
(Vec<T>: Debug) :- (T: Debug).

To create our initial strands, then, we will try to apply each of these clauses to our goal of Rc<?0>: Debug. The first and third clauses are inapplicable because u32 and Vec<?0> cannot be unified with Rc<?0>. The second clause, however, will work.

What is a strand? Let's talk a bit more about what a strand is. In the code, a strand is the combination of an inference table, an X-clause, and (possibly) a selected subgoal from that X-clause. But what is an X-clause (ExClause, in the code)? An X-clause pulls together a few things:

  • The current state of the goal we are trying to prove;
  • A set of subgoals that have yet to be proven;
  • There are also a few things we're ignoring for now:
    • delayed literals, region constraints

The general form of an X-clause is written much like a Prolog clause, but with somewhat different semantics. Since we're ignoring delayed literals and region constraints, an X-clause just looks like this:

G :- L

where G is a goal and L is a set of subgoals that must be proven. (The L stands for literal -- when we address negative reasoning, a literal will be either a positive or negative subgoal.) The idea is that if we are able to prove L then the goal G can be considered true.

In the case of our example, we would wind up creating one strand, with an X-clause like so:

(Rc<?T>: Debug) :- (?T: Debug)

Here, the ?T refers to one of the inference variables created in the inference table that accompanies the strand. (I'll use named variables to refer to inference variables, and numbered variables like ?0 to refer to variables in a canonicalized goal; in the code, however, they are both represented with an index.)

For each strand, we also optionally store a selected subgoal. This is the subgoal after the turnstile (:-) that we are currently trying to prove in this strand. Initially, when a strand is first created, there is no selected subgoal.

Activating a strand. Now that we have created the table T0 and initialized it with strands, we have to actually try and produce an answer. We do this by invoking the ensure_root_answer operation on the table: specifically, we say ensure_root_answer(T0, A0), meaning "ensure that there is a 0th answer A0 to query T0".

Remember that tables store not only strands, but also a vector of cached answers. The first thing that ensure_root_answer does is to check whether answer A0 is in this vector. If so, we can just return immediately. In this case, the vector will be empty, and hence that does not apply (this becomes important for cyclic checks later on).

When there is no cached answer, ensure_root_answer will try to produce one. It does this by selecting a strand from the set of active strands -- the strands are stored in a VecDeque and hence processed in a round-robin fashion. Right now, we have only one strand, storing the following X-clause with no selected subgoal:

(Rc<?T>: Debug) :- (?T: Debug)

When we activate the strand, we see that we have no selected subgoal, and so we first pick one of the subgoals to process. Here, there is only one (?T: Debug), so that becomes the selected subgoal, changing the state of the strand to:

(Rc<?T>: Debug) :- selected(?T: Debug, A0)

Here, we write selected(L, An) to indicate that (a) the literal L is the selected subgoal and (b) which answer An we are looking for. We start out looking for A0.

Processing the selected subgoal. Next, we have to try and find an answer to this selected goal. To do that, we will u-canonicalize it and try to find an associated table. In this case, the u-canonical form of the subgoal is ?0: Debug: we don't have a table yet for that, so we can create a new one, T1. As before, we'll initialize T1 with strands. In this case, there will be three strands, because all the program clauses are potentially applicable. Those three strands will be:

  • (u32: Debug) :-, derived from the program clause (u32: Debug)..
    • Note: This strand has no subgoals.
  • (Vec<?U>: Debug) :- (?U: Debug), derived from the Vec impl.
  • (Rc<?U>: Debug) :- (?U: Debug), derived from the Rc impl.

We can thus summarize the state of the whole forest at this point as follows:

Table T0 [Rc<?0>: Debug]
  Strands:
    (Rc<?T>: Debug) :- selected(?T: Debug, A0)

Table T1 [?0: Debug]
  Strands:
    (u32: Debug) :-
    (Vec<?U>: Debug) :- (?U: Debug)
    (Rc<?V>: Debug) :- (?V: Debug)

Delegation between tables. Now that the active strand from T0 has created the table T1, it can try to extract an answer. It does this via that same ensure_answer operation we saw before. In this case, the strand would invoke ensure_answer(T1, A0), since we will start with the first answer. This will cause T1 to activate its first strand, u32: Debug :-.

This strand is somewhat special: it has no subgoals at all. This means that the goal is proven. We can therefore add u32: Debug to the set of answers for our table, calling it answer A0 (it is the first answer). The strand is then removed from the list of strands.

The state of table T1 is therefore:

Table T1 [?0: Debug]
  Answers:
    A0 = [?0 = u32]
  Strand:
    (Vec<?U>: Debug) :- (?U: Debug)
    (Rc<?V>: Debug) :- (?V: Debug)

Note that I am writing out the answer A0 as a substitution that can be applied to the table goal; actually, in the code, the goals for each X-clause are also represented as substitutions, but in this exposition I've chosen to write them as full goals, following NFTD.

Since we now have an answer, ensure_answer(T1, A0) will return Ok to the table T0, indicating that answer A0 is available. T0 now has the job of incorporating that result into its active strand. It does this in two ways. First, it creates a new strand that is looking for the next possible answer of T1. Next, it incorpoates the answer from A0 and removes the subgoal. The resulting state of table T0 is:

Table T0 [Rc<?0>: Debug]
  Strands:
    (Rc<?T>: Debug) :- selected(?T: Debug, A1)
    (Rc<u32>: Debug) :-

We then immediately activate the strand that incorporated the answer (the Rc<u32>: Debug one). In this case, that strand has no further subgoals, so it becomes an answer to the table T0. This answer can then be returned up to our caller, and the whole forest goes quiescent at this point (remember, we only do enough work to generate one answer). The ending state of the forest at this point will be:

Table T0 [Rc<?0>: Debug]
  Answer:
    A0 = [?0 = u32]
  Strands:
    (Rc<?T>: Debug) :- selected(?T: Debug, A1)

Table T1 [?0: Debug]
  Answers:
    A0 = [?0 = u32]
  Strand:
    (Vec<?U>: Debug) :- (?U: Debug)
    (Rc<?V>: Debug) :- (?V: Debug)

Here you can see how the forest captures both the answers we have created thus far and the strands that will let us try to produce more answers later on.

See also

An Overview of Chalk

Chalk is under heavy development, so if any of these links are broken or if any of the information is inconsistent with the code or outdated, please open an issue so we can fix it. If you are able to fix the issue yourself, we would love your contribution!

Chalk recasts Rust's trait system explicitly in terms of logic programming by "lowering" Rust code into a kind of logic program we can then execute queries against (see Lowering to Logic and Lowering Rules). Its goal is to be an executable, highly readable specification of the Rust trait system.

There are many expected benefits from this work. It will consolidate our existing, somewhat ad-hoc implementation into something far more principled and expressive, which should behave better in corner cases, and be much easier to extend.

Chalk Structure

Chalk has two main "products". The first of these is the chalk_engine crate, which defines the core SLG solver. This is the part rustc uses.

The rest of chalk can be considered an elaborate testing harness. Chalk is capable of parsing Rust-like "programs", lowering them to logic, and performing queries on them.

Here's a sample session in the chalk repl, chalki. After feeding it our program, we perform some queries on it.

?- program
Enter a program; press Ctrl-D when finished
| struct Foo { }
| struct Bar { }
| struct Vec<T> { }
| trait Clone { }
| impl<T> Clone for Vec<T> where T: Clone { }
| impl Clone for Foo { }

?- Vec<Foo>: Clone
Unique; substitution [], lifetime constraints []

?- Vec<Bar>: Clone
No possible solution.

?- exists<T> { Vec<T>: Clone }
Ambiguous; no inference guidance

You can see more examples of programs and queries in the unit tests.

Next we'll go through each stage required to produce the output above.

Parsing (chalk_parse)

Chalk is designed to be incorporated with the Rust compiler, so the syntax and concepts it deals with heavily borrow from Rust. It is convenient for the sake of testing to be able to run chalk on its own, so chalk includes a parser for a Rust-like syntax. This syntax is orthogonal to the Rust AST and grammar. It is not intended to look exactly like it or support the exact same syntax.

The parser takes that syntax and produces an Abstract Syntax Tree (AST). You can find the complete definition of the AST in the source code.

The syntax contains things from Rust that we know and love, for example: traits, impls, and struct definitions. Parsing is often the first "phase" of transformation that a program goes through in order to become a format that chalk can understand.

Rust Intermediate Representation (chalk_rust_ir)

After getting the AST we convert it to a more convenient intermediate representation called chalk_rust_ir. This is sort of analogous to the HIR in Rust. The process of converting to IR is called lowering.

The chalk::program::Program struct contains some "rust things" but indexed and accessible in a different way. For example, if you have a type like Foo<Bar>, we would represent Foo as a string in the AST but in chalk::program::Program, we use numeric indices (ItemId).

The IR source code contains the complete definition.

Chalk Intermediate Representation (chalk_ir)

Once we have Rust IR it is time to convert it to "program clauses". A ProgramClause is essentially one of the following:

  • A clause of the form consequence :- conditions where :- is read as "if" and conditions = cond1 && cond2 && ...
  • A universally quantified clause of the form forall<T> { consequence :- conditions }
    • forall<T> { ... } is used to represent universal quantification. See the section on Lowering to logic for more information.
    • A key thing to note about forall is that we don't allow you to "quantify" over traits, only types and regions (lifetimes). That is, you can't make a rule like forall<Trait> { u32: Trait } which would say "u32 implements all traits". You can however say forall<T> { T: Trait } meaning "Trait is implemented by all types".
    • forall<T> { ... } is represented in the code using the Binders<T> struct.

See also: Goals and Clauses

This is where we encode the rules of the trait system into logic. For example, if we have the following Rust:

impl<T: Clone> Clone for Vec<T> {}

We generate the following program clause:

forall<T> { (Vec<T>: Clone) :- (T: Clone) }

This rule dictates that Vec<T>: Clone is only satisfied if T: Clone is also satisfied (i.e. "provable").

Similar to chalk::program::Program which has "rust-like things", chalk_ir defines ProgramEnvironment which is "pure logic". The main field in that struct is program_clauses, which contains the ProgramClauses generated by the rules module.

Rules (chalk_solve)

The chalk_solve crate (source code) defines the logic rules we use for each item in the Rust IR. It works by iterating over every trait, impl, etc. and emitting the rules that come from each one.

See also: Lowering Rules

Well-formedness checks

As part of lowering to logic, we also do some "well formedness" checks. See the chalk_solve::wf source code for where those are done.

See also: Well-formedness checking

Coherence

The method CoherenceSolver::specialization_priorities in the coherence module (source code) checks "coherence", which means that it ensures that two impls of the same trait for the same type cannot exist.

Solver (chalk_solve)

Finally, when we've collected all the program clauses we care about, we want to perform queries on it. The component that finds the answer to these queries is called the solver.

See also: The SLG Solver

Crates

Chalk's functionality is broken up into the following crates:

  • chalk_engine: Defines the core SLG solver.
  • chalk_rust_ir, containing the "HIR-like" form of the AST
  • chalk_ir: Defines chalk's internal representation of types, lifetimes, and goals.
  • chalk_solve: Combines chalk_ir and chalk_engine, effectively, which implements logic rules converting chalk_rust_ir to chalk_ir
    • Defines the coherence module, which implements coherence rules
    • chalk_engine::context provides the necessary hooks.
  • chalk_parse: Defines the raw AST and a parser.
  • chalk: Brings everything together. Defines the following modules:
    • chalk::lowering, which converts AST to chalk_rust_ir
    • Also includes chalki, chalk's REPL.

Browse source code on GitHub

Testing

chalk has a test framework for lowering programs to logic, checking the lowered logic, and performing queries on it. This is how we test the implementation of chalk itself, and the viability of the lowering rules.

The main kind of tests in chalk are goal tests. They contain a program, which is expected to lower to logic successfully, and a set of queries (goals) along with the expected output. Here's an example. Since chalk's output can be quite long, goal tests support specifying only a prefix of the output.

Lowering tests check the stages that occur before we can issue queries to the solver: the lowering to chalk_rust_ir, and the well-formedness checks that occur after that.

Testing internals

Goal tests use a test! macro that takes chalk's Rust-like syntax and runs it through the full pipeline described above. The macro ultimately calls the solve_goal function.

Likewise, lowering tests use the lowering_success! and lowering_error! macros.

More Resources

Blog Posts

Bibliography

If you'd like to read more background material, here are some recommended texts and papers:

Programming with Higher-order Logic, by Dale Miller and Gopalan Nadathur, covers the key concepts of Lambda prolog. Although it's a slim little volume, it's the kind of book where you learn something new every time you open it.

"A proof procedure for the logic of Hereditary Harrop formulas", by Gopalan Nadathur. This paper covers the basics of universes, environments, and Lambda Prolog-style proof search. Quite readable.

"A new formulation of tabled resolution with delay", by Theresa Swift. This paper gives a kind of abstract treatment of the SLG formulation that is the basis for our on-demand solver.

Type checking

The rustc_typeck crate contains the source for "type collection" and "type checking", as well as a few other bits of related functionality. (It draws heavily on the type inference and trait solving.)

Type collection

Type "collection" is the process of converting the types found in the HIR (hir::Ty), which represent the syntactic things that the user wrote, into the internal representation used by the compiler (Ty<'tcx>) – we also do similar conversions for where-clauses and other bits of the function signature.

To try and get a sense for the difference, consider this function:

struct Foo { }
fn foo(x: Foo, y: self::Foo) { ... }
//        ^^^     ^^^^^^^^^

Those two parameters x and y each have the same type: but they will have distinct hir::Ty nodes. Those nodes will have different spans, and of course they encode the path somewhat differently. But once they are "collected" into Ty<'tcx> nodes, they will be represented by the exact same internal type.

Collection is defined as a bundle of queries for computing information about the various functions, traits, and other items in the crate being compiled. Note that each of these queries is concerned with interprocedural things – for example, for a function definition, collection will figure out the type and signature of the function, but it will not visit the body of the function in any way, nor examine type annotations on local variables (that's the job of type checking).

For more details, see the collect module.

TODO: actually talk about type checking...

Method lookup

Method lookup can be rather complex due to the interaction of a number of factors, such as self types, autoderef, trait lookup, etc. This file provides an overview of the process. More detailed notes are in the code itself, naturally.

One way to think of method lookup is that we convert an expression of the form:

receiver.method(...)

into a more explicit UFCS form:

Trait::method(ADJ(receiver), ...) // for a trait call
ReceiverType::method(ADJ(receiver), ...) // for an inherent method call

Here ADJ is some kind of adjustment, which is typically a series of autoderefs and then possibly an autoref (e.g., &**receiver). However we sometimes do other adjustments and coercions along the way, in particular unsizing (e.g., converting from [T; n] to [T]).

Method lookup is divided into two major phases:

  1. Probing (probe.rs). The probe phase is when we decide what method to call and how to adjust the receiver.
  2. Confirmation (confirm.rs). The confirmation phase "applies" this selection, updating the side-tables, unifying type variables, and otherwise doing side-effectful things.

One reason for this division is to be more amenable to caching. The probe phase produces a "pick" (probe::Pick), which is designed to be cacheable across method-call sites. Therefore, it does not include inference variables or other information.

The Probe phase

Steps

The first thing that the probe phase does is to create a series of steps. This is done by progressively dereferencing the receiver type until it cannot be deref'd anymore, as well as applying an optional "unsize" step. So if the receiver has type Rc<Box<[T; 3]>>, this might yield:

Rc<Box<[T; 3]>>
Box<[T; 3]>
[T; 3]
[T]

Candidate assembly

We then search along those steps to create a list of candidates. A Candidate is a method item that might plausibly be the method being invoked. For each candidate, we'll derive a "transformed self type" that takes into account explicit self.

Candidates are grouped into two kinds, inherent and extension.

Inherent candidates are those that are derived from the type of the receiver itself. So, if you have a receiver of some nominal type Foo (e.g., a struct), any methods defined within an impl like impl Foo are inherent methods. Nothing needs to be imported to use an inherent method, they are associated with the type itself (note that inherent impls can only be defined in the same module as the type itself).

FIXME: Inherent candidates are not always derived from impls. If you have a trait object, such as a value of type Box<ToString>, then the trait methods (to_string(), in this case) are inherently associated with it. Another case is type parameters, in which case the methods of their bounds are inherent. However, this part of the rules is subject to change: when DST's "impl Trait for Trait" is complete, trait object dispatch could be subsumed into trait matching, and the type parameter behavior should be reconsidered in light of where clauses.

TODO: Is this FIXME still accurate?

Extension candidates are derived from imported traits. If I have the trait ToString imported, and I call to_string() on a value of type T, then we will go off to find out whether there is an impl of ToString for T. These kinds of method calls are called "extension methods". They can be defined in any module, not only the one that defined T. Furthermore, you must import the trait to call such a method.

So, let's continue our example. Imagine that we were calling a method foo with the receiver Rc<Box<[T; 3]>> and there is a trait Foo that defines it with &self for the type Rc<U> as well as a method on the type Box that defines Foo but with &mut self. Then we might have two candidates:

&Rc<Box<[T; 3]>> from the impl of `Foo` for `Rc<U>` where `U=Box<T; 3]>
&mut Box<[T; 3]>> from the inherent impl on `Box<U>` where `U=[T; 3]`

Candidate search

Finally, to actually pick the method, we will search down the steps, trying to match the receiver type against the candidate types. At each step, we also consider an auto-ref and auto-mut-ref to see whether that makes any of the candidates match. We pick the first step where we find a match.

In the case of our example, the first step is Rc<Box<[T; 3]>>, which does not itself match any candidate. But when we autoref it, we get the type &Rc<Box<[T; 3]>> which does match. We would then recursively consider all where-clauses that appear on the impl: if those match (or we cannot rule out that they do), then this is the method we would pick. Otherwise, we would continue down the series of steps.

Variance of type and lifetime parameters

For a more general background on variance, see the background appendix.

During type checking we must infer the variance of type and lifetime parameters. The algorithm is taken from Section 4 of the paper "Taming the Wildcards: Combining Definition- and Use-Site Variance" published in PLDI'11 and written by Altidor et al., and hereafter referred to as The Paper.

This inference is explicitly designed not to consider the uses of types within code. To determine the variance of type parameters defined on type X, we only consider the definition of the type X and the definitions of any types it references.

We only infer variance for type parameters found on data types like structs and enums. In these cases, there is a fairly straightforward explanation for what variance means. The variance of the type or lifetime parameters defines whether T<A> is a subtype of T<B> (resp. T<'a> and T<'b>) based on the relationship of A and B (resp. 'a and 'b).

We do not infer variance for type parameters found on traits, functions, or impls. Variance on trait parameters can indeed make sense (and we used to compute it) but it is actually rather subtle in meaning and not that useful in practice, so we removed it. See the addendum for some details. Variances on function/impl parameters, on the other hand, doesn't make sense because these parameters are instantiated and then forgotten, they don't persist in types or compiled byproducts.

Notation

We use the notation of The Paper throughout this chapter:

  • + is covariance.
  • - is contravariance.
  • * is bivariance.
  • o is invariance.

The algorithm

The basic idea is quite straightforward. We iterate over the types defined and, for each use of a type parameter X, accumulate a constraint indicating that the variance of X must be valid for the variance of that use site. We then iteratively refine the variance of X until all constraints are met. There is always a solution, because at the limit we can declare all type parameters to be invariant and all constraints will be satisfied.

As a simple example, consider:

enum Option<A> { Some(A), None }
enum OptionalFn<B> { Some(|B|), None }
enum OptionalMap<C> { Some(|C| -> C), None }

Here, we will generate the constraints:

1. V(A) <= +
2. V(B) <= -
3. V(C) <= +
4. V(C) <= -

These indicate that (1) the variance of A must be at most covariant; (2) the variance of B must be at most contravariant; and (3, 4) the variance of C must be at most covariant and contravariant. All of these results are based on a variance lattice defined as follows:

   *      Top (bivariant)
-     +
   o      Bottom (invariant)

Based on this lattice, the solution V(A)=+, V(B)=-, V(C)=o is the optimal solution. Note that there is always a naive solution which just declares all variables to be invariant.

You may be wondering why fixed-point iteration is required. The reason is that the variance of a use site may itself be a function of the variance of other type parameters. In full generality, our constraints take the form:

V(X) <= Term
Term := + | - | * | o | V(X) | Term x Term

Here the notation V(X) indicates the variance of a type/region parameter X with respect to its defining class. Term x Term represents the "variance transform" as defined in the paper:

If the variance of a type variable X in type expression E is V2 and the definition-site variance of the corresponding type parameter of a class C is V1, then the variance of X in the type expression C<E> is V3 = V1.xform(V2).

Constraints

If I have a struct or enum with where clauses:

struct Foo<T: Bar> { ... }

you might wonder whether the variance of T with respect to Bar affects the variance T with respect to Foo. I claim no. The reason: assume that T is invariant with respect to Bar but covariant with respect to Foo. And then we have a Foo<X> that is upcast to Foo<Y>, where X <: Y. However, while X : Bar, Y : Bar does not hold. In that case, the upcast will be illegal, but not because of a variance failure, but rather because the target type Foo<Y> is itself just not well-formed. Basically we get to assume well-formedness of all types involved before considering variance.

Dependency graph management

Because variance is a whole-crate inference, its dependency graph can become quite muddled if we are not careful. To resolve this, we refactor into two queries:

  • crate_variances computes the variance for all items in the current crate.
  • variances_of accesses the variance for an individual reading; it works by requesting crate_variances and extracting the relevant data.

If you limit yourself to reading variances_of, your code will only depend then on the inference of that particular item.

Ultimately, this setup relies on the red-green algorithm. In particular, every variance query effectively depends on all type definitions in the entire crate (through crate_variances), but since most changes will not result in a change to the actual results from variance inference, the variances_of query will wind up being considered green after it is re-evaluated.

Addendum: Variance on traits

As mentioned above, we used to permit variance on traits. This was computed based on the appearance of trait type parameters in method signatures and was used to represent the compatibility of vtables in trait objects (and also "virtual" vtables or dictionary in trait bounds). One complication was that variance for associated types is less obvious, since they can be projected out and put to myriad uses, so it's not clear when it is safe to allow X<A>::Bar to vary (or indeed just what that means). Moreover (as covered below) all inputs on any trait with an associated type had to be invariant, limiting the applicability. Finally, the annotations (MarkerTrait, PhantomFn) needed to ensure that all trait type parameters had a variance were confusing and annoying for little benefit.

Just for historical reference, I am going to preserve some text indicating how one could interpret variance and trait matching.

Variance and object types

Just as with structs and enums, we can decide the subtyping relationship between two object types &Trait<A> and &Trait<B> based on the relationship of A and B. Note that for object types we ignore the Self type parameter – it is unknown, and the nature of dynamic dispatch ensures that we will always call a function that is expected the appropriate Self type. However, we must be careful with the other type parameters, or else we could end up calling a function that is expecting one type but provided another.

To see what I mean, consider a trait like so:


#![allow(unused_variables)]
fn main() {
trait ConvertTo<A> {
    fn convertTo(&self) -> A;
}
}

Intuitively, If we had one object O=&ConvertTo<Object> and another S=&ConvertTo<String>, then S <: O because String <: Object (presuming Java-like "string" and "object" types, my go to examples for subtyping). The actual algorithm would be to compare the (explicit) type parameters pairwise respecting their variance: here, the type parameter A is covariant (it appears only in a return position), and hence we require that String <: Object.

You'll note though that we did not consider the binding for the (implicit) Self type parameter: in fact, it is unknown, so that's good. The reason we can ignore that parameter is precisely because we don't need to know its value until a call occurs, and at that time (as you said) the dynamic nature of virtual dispatch means the code we run will be correct for whatever value Self happens to be bound to for the particular object whose method we called. Self is thus different from A, because the caller requires that A be known in order to know the return type of the method convertTo(). (As an aside, we have rules preventing methods where Self appears outside of the receiver position from being called via an object.)

Trait variance and vtable resolution

But traits aren't only used with objects. They're also used when deciding whether a given impl satisfies a given trait bound. To set the scene here, imagine I had a function:

fn convertAll<A,T:ConvertTo<A>>(v: &[T]) { ... }

Now imagine that I have an implementation of ConvertTo for Object:

impl ConvertTo<i32> for Object { ... }

And I want to call convertAll on an array of strings. Suppose further that for whatever reason I specifically supply the value of String for the type parameter T:

let mut vector = vec!["string", ...];
convertAll::<i32, String>(vector);

Is this legal? To put another way, can we apply the impl for Object to the type String? The answer is yes, but to see why we have to expand out what will happen:

  • convertAll will create a pointer to one of the entries in the vector, which will have type &String

  • It will then call the impl of convertTo() that is intended for use with objects. This has the type fn(self: &Object) -> i32.

    It is OK to provide a value for self of type &String because &String <: &Object.

OK, so intuitively we want this to be legal, so let's bring this back to variance and see whether we are computing the correct result. We must first figure out how to phrase the question "is an impl for Object,i32 usable where an impl for String,i32 is expected?"

Maybe it's helpful to think of a dictionary-passing implementation of type classes. In that case, convertAll() takes an implicit parameter representing the impl. In short, we have an impl of type:

V_O = ConvertTo<i32> for Object

and the function prototype expects an impl of type:

V_S = ConvertTo<i32> for String

As with any argument, this is legal if the type of the value given (V_O) is a subtype of the type expected (V_S). So is V_O <: V_S? The answer will depend on the variance of the various parameters. In this case, because the Self parameter is contravariant and A is covariant, it means that:

V_O <: V_S iff
    i32 <: i32
    String <: Object

These conditions are satisfied and so we are happy.

Variance and associated types

Traits with associated types – or at minimum projection expressions – must be invariant with respect to all of their inputs. To see why this makes sense, consider what subtyping for a trait reference means:

<T as Trait> <: <U as Trait>

means that if I know that T as Trait, I also know that U as Trait. Moreover, if you think of it as dictionary passing style, it means that a dictionary for <T as Trait> is safe to use where a dictionary for <U as Trait> is expected.

The problem is that when you can project types out from <T as Trait>, the relationship to types projected out of <U as Trait> is completely unknown unless T==U (see #21726 for more details). Making Trait invariant ensures that this is true.

Another related reason is that if we didn't make traits with associated types invariant, then projection is no longer a function with a single result. Consider:

trait Identity { type Out; fn foo(&self); }
impl<T> Identity for T { type Out = T; ... }

Now if I have <&'static () as Identity>::Out, this can be validly derived as &'a () for any 'a:

<&'a () as Identity> <: <&'static () as Identity>
if &'static () < : &'a ()   -- Identity is contravariant in Self
if 'static : 'a             -- Subtyping rules for relations

This change otoh means that <'static () as Identity>::Out is always &'static () (which might then be upcast to 'a (), separately). This was helpful in solving #21750.

Opaque types (type alias impl Trait)

Opaque types are syntax to declare an opaque type alias that only exposes a specific set of traits as their interface; the concrete type in the background is inferred from a certain set of use sites of the opaque type.

This is expressed by using impl Trait within type aliases, for example:

type Foo = impl Bar;

This declares an opaque type named Foo, of which the only information is that it implements Bar. Therefore, any of Bar's interface can be used on a Foo, but nothing else (regardless of whether it implements any other traits).

Since there needs to be a concrete background type, you can currently express that type by using the opaque type in a "defining use site".

struct Struct;
impl Bar for Struct { /* stuff */ }
fn foo() -> Foo {
    Struct
}

Any other "defining use site" needs to produce the exact same type.

Defining use site(s)

Currently only the return value of a function can be a defining use site of an opaque type (and only if the return type of that function contains the opaque type).

The defining use of an opaque type can be any code within the parent of the opaque type definition. This includes any siblings of the opaque type and all children of the siblings.

The initiative for "not causing fatal brain damage to developers due to accidentally running infinite loops in their brain while trying to comprehend what the type system is doing" has decided to disallow children of opaque types to be defining use sites.

Associated opaque types

Associated opaque types can be defined by any other associated item on the same trait impl or a child of these associated items. For instance:

trait Baz {
    type Foo;
    fn foo() -> Self::Foo;
}

struct Quux;

impl Baz for Quux {
    type Foo = impl Bar;
    fn foo() -> Self::Foo { ... }
}

Pattern and Exhaustiveness Checking

In Rust, pattern matching and bindings have a few very helpful properties. The compiler will check that bindings are irrefutable when made and that match arms are exhaustive.

TODO: write this chapter.

MIR borrow check

The borrow check is Rust's "secret sauce" – it is tasked with enforcing a number of properties:

  • That all variables are initialized before they are used.
  • That you can't move the same value twice.
  • That you can't move a value while it is borrowed.
  • That you can't access a place while it is mutably borrowed (except through the reference).
  • That you can't mutate a place while it is immutably borrowed.
  • etc

The borrow checker operates on the MIR. An older implementation operated on the HIR. Doing borrow checking on MIR has several advantages:

Major phases of the borrow checker

The borrow checker source is found in the rustc_mir::borrow_check module. The main entry point is the mir_borrowck query.

  • We first create a local copy of the MIR. In the coming steps, we will modify this copy in place to modify the types and things to include references to the new regions that we are computing.
  • We then invoke replace_regions_in_mir to modify our local MIR. Among other things, this function will replace all of the regions in the MIR with fresh inference variables.
  • Next, we perform a number of dataflow analyses that compute what data is moved and when.
  • We then do a second type check across the MIR: the purpose of this type check is to determine all of the constraints between different regions.
  • Next, we do region inference, which computes the values of each region — basically, the points in the control-flow graph where each lifetime must be valid according to the constraints we collected.
  • At this point, we can compute the "borrows in scope" at each point.
  • Finally, we do a second walk over the MIR, looking at the actions it does and reporting errors. For example, if we see a statement like *a + 1, then we would check that the variable a is initialized and that it is not mutably borrowed, as either of those would require an error to be reported. Doing this check requires the results of all the previous analyses.

Tracking moves and initialization

Part of the borrow checker's job is to track which variables are "initialized" at any given point in time -- this also requires figuring out where moves occur and tracking those.

Initialization and moves

From a user's perspective, initialization -- giving a variable some value -- and moves -- transferring ownership to another place -- might seem like distinct topics. Indeed, our borrow checker error messages often talk about them differently. But within the borrow checker, they are not nearly as separate. Roughly speaking, the borrow checker tracks the set of "initialized places" at any point in the source code. Assigning to a previously uninitialized local variable adds it to that set; moving from a local variable removes it from that set.

Consider this example:

fn foo() {
    let a: Vec<u32>;
    
    // a is not initialized yet
    
    a = vec![22];
    
    // a is initialized here
    
    std::mem::drop(a); // a is moved here
    
    // a is no longer initialized here

    let l = a.len(); //~ ERROR
}

Here you can see that a starts off as uninitialized; once it is assigned, it becomes initialized. But when drop(a) is called, that moves a into the call, and hence it becomes uninitialized again.

Subsections

To make it easier to peruse, this section is broken into a number of subsections:

  • Move paths the move path concept that we use to track which local variables (or parts of local variables, in some cases) are initialized.
  • TODO Rest not yet written =)

Move paths

In reality, it's not enough to track initialization at the granularity of local variables. Rust also allows us to do moves and initialization at the field granularity:

fn foo() {
    let a: (Vec<u32>, Vec<u32>) = (vec![22], vec![44]);
    
    // a.0 and a.1 are both initialized
    
    let b = a.0; // moves a.0
    
    // a.0 is not initialized, but a.1 still is

    let c = a.0; // ERROR
    let d = a.1; // OK
}

To handle this, we track initialization at the granularity of a move path. A MovePath represents some location that the user can initialize, move, etc. So e.g. there is a move-path representing the local variable a, and there is a move-path representing a.0. Move paths roughly correspond to the concept of a Place from MIR, but they are indexed in ways that enable us to do move analysis more efficiently.

Move path indices

Although there is a MovePath data structure, they are never referenced directly. Instead, all the code passes around indices of type MovePathIndex. If you need to get information about a move path, you use this index with the move_paths field of the MoveData. For example, to convert a MovePathIndex mpi into a MIR Place, you might access the MovePath::place field like so:

move_data.move_paths[mpi].place

Building move paths

One of the first things we do in the MIR borrow check is to construct the set of move paths. This is done as part of the MoveData::gather_moves function. This function uses a MIR visitor called Gatherer to walk the MIR and look at how each Place within is accessed. For each such Place, it constructs a corresponding MovePathIndex. It also records when/where that particular move path is moved/initialized, but we'll get to that in a later section.

Illegal move paths

We don't actually create a move-path for every Place that gets used. In particular, if it is illegal to move from a Place, then there is no need for a MovePathIndex. Some examples:

  • You cannot move from a static variable, so we do not create a MovePathIndex for static variables.
  • You cannot move an individual element of an array, so if we have e.g. foo: [String; 3], there would be no move-path for foo[1].
  • You cannot move from inside of a borrowed reference, so if we have e.g. foo: &String, there would be no move-path for *foo.

These rules are enforced by the move_path_for function, which converts a Place into a MovePathIndex -- in error cases like those just discussed, the function returns an Err. This in turn means we don't have to bother tracking whether those places are initialized (which lowers overhead).

Looking up a move-path

If you have a Place and you would like to convert it to a MovePathIndex, you can do that using the MovePathLookup structure found in the rev_lookup field of MoveData. There are two different methods:

  • find_local, which takes a mir::Local representing a local variable. This is the easier method, because we always create a MovePathIndex for every local variable.
  • find, which takes an arbitrary Place. This method is a bit more annoying to use, precisely because we don't have a MovePathIndex for every Place (as we just discussed in the "illegal move paths" section). Therefore, find returns a LookupResult indicating the closest path it was able to find that exists (e.g., for foo[1], it might return just the path for foo).

Cross-references

As we noted above, move-paths are stored in a big vector and referenced via their MovePathIndex. However, within this vector, they are also structured into a tree. So for example if you have the MovePathIndex for a.b.c, you can go to its parent move-path a.b. You can also iterate over all children paths: so, from a.b, you might iterate to find the path a.b.c (here you are iterating just over the paths that are actually referenced in the source, not all possible paths that could have been referenced). These references are used for example in the find_in_move_path_or_its_descendants function, which determines whether a move-path (e.g., a.b) or any child of that move-path (e.g.,a.b.c) matches a given predicate.

The MIR type-check

A key component of the borrow check is the MIR type-check. This check walks the MIR and does a complete "type check" -- the same kind you might find in any other language. In the process of doing this type-check, we also uncover the region constraints that apply to the program.

TODO -- elaborate further? Maybe? :)

Region inference (NLL)

The MIR-based region checking code is located in the rustc_mir::borrow_check module.

The MIR-based region analysis consists of two major functions:

  • replace_regions_in_mir, invoked first, has two jobs:
    • First, it finds the set of regions that appear within the signature of the function (e.g., 'a in fn foo<'a>(&'a u32) { ... }). These are called the "universal" or "free" regions – in particular, they are the regions that appear free in the function body.
    • Second, it replaces all the regions from the function body with fresh inference variables. This is because (presently) those regions are the results of lexical region inference and hence are not of much interest. The intention is that – eventually – they will be "erased regions" (i.e., no information at all), since we won't be doing lexical region inference at all.
  • compute_regions, invoked second: this is given as argument the results of move analysis. It has the job of computing values for all the inference variables that replace_regions_in_mir introduced.
    • To do that, it first runs the MIR type checker. This is basically a normal type-checker but specialized to MIR, which is much simpler than full Rust, of course. Running the MIR type checker will however create various constraints between region variables, indicating their potential values and relationships to one another.
    • After this, we perform constraint propagation by creating a RegionInferenceContext and invoking its solve method.
    • The NLL RFC also includes fairly thorough (and hopefully readable) coverage.

Universal regions

The UniversalRegions type represents a collection of universal regions corresponding to some MIR DefId. It is constructed in replace_regions_in_mir when we replace all regions with fresh inference variables. UniversalRegions contains indices for all the free regions in the given MIR along with any relationships that are known to hold between them (e.g. implied bounds, where clauses, etc.).

For example, given the MIR for the following function:


#![allow(unused_variables)]
fn main() {
fn foo<'a>(x: &'a u32) {
    // ...
}
}

we would create a universal region for 'a and one for 'static. There may also be some complications for handling closures, but we will ignore those for the moment.

TODO: write about how these regions are computed.

Region variables

The value of a region can be thought of as a set. This set contains all points in the MIR where the region is valid along with any regions that are outlived by this region (e.g. if 'a: 'b, then end('b) is in the set for 'a); we call the domain of this set a RegionElement. In the code, the value for all regions is maintained in the rustc_mir::borrow_check::nll::region_infer module. For each region we maintain a set storing what elements are present in its value (to make this efficient, we give each kind of element an index, the RegionElementIndex, and use sparse bitsets).

The kinds of region elements are as follows:

  • Each location in the MIR control-flow graph: a location is just the pair of a basic block and an index. This identifies the point on entry to the statement with that index (or the terminator, if the index is equal to statements.len()).
  • There is an element end('a) for each universal region 'a, corresponding to some portion of the caller's (or caller's caller, etc) control-flow graph.
  • Similarly, there is an element denoted end('static) corresponding to the remainder of program execution after this function returns.
  • There is an element !1 for each placeholder region !1. This corresponds (intuitively) to some unknown set of other elements – for details on placeholders, see the section placeholders and universes.

Constraints

Before we can infer the value of regions, we need to collect constraints on the regions. The full set of constraints is described in the section on constraint propagation, but the two most common sorts of constraints are:

  1. Outlives constraints. These are constraints that one region outlives another (e.g. 'a: 'b). Outlives constraints are generated by the MIR type checker.
  2. Liveness constraints. Each region needs to be live at points where it can be used. These constraints are collected by generate_constraints.

Inference Overview

So how do we compute the contents of a region? This process is called region inference. The high-level idea is pretty simple, but there are some details we need to take care of.

Here is the high-level idea: we start off each region with the MIR locations we know must be in it from the liveness constraints. From there, we use all of the outlives constraints computed from the type checker to propagate the constraints: for each region 'a, if 'a: 'b, then we add all elements of 'b to 'a, including end('b). This all happens in propagate_constraints.

Then, we will check for errors. We first check that type tests are satisfied by calling check_type_tests. This checks constraints like T: 'a. Second, we check that universal regions are not "too big". This is done by calling check_universal_regions. This checks that for each region 'a if 'a contains the element end('b), then we must already know that 'a: 'b holds (e.g. from a where clause). If we don't already know this, that is an error... well, almost. There is some special handling for closures that we will discuss later.

Example

Consider the following example:

fn foo<'a, 'b>(x: &'a usize) -> &'b usize {
    x
}

Clearly, this should not compile because we don't know if 'a outlives 'b (if it doesn't then the return value could be a dangling reference).

Let's back up a bit. We need to introduce some free inference variables (as is done in replace_regions_in_mir). This example doesn't use the exact regions produced, but it (hopefully) is enough to get the idea across.

fn foo<'a, 'b>(x: &'a /* '#1 */ usize) -> &'b /* '#3 */ usize {
    x // '#2, location L1
}

Some notation: '#1, '#3, and '#2 represent the universal regions for the argument, return value, and the expression x, respectively. Additionally, I will call the location of the expression x L1.

So now we can use the liveness constraints to get the following starting points:

RegionContents
'#1
'#2L1
'#3L1

Now we use the outlives constraints to expand each region. Specifically, we know that '#2: '#3 ...

RegionContents
'#1L1
'#2L1, end('#3) // add contents of '#3 and end('#3)
'#3L1

... and '#1: '#2, so ...

RegionContents
'#1L1, end('#2), end('#3) // add contents of '#2 and end('#2)
'#2L1, end('#3)
'#3L1

Now, we need to check that no regions were too big (we don't have any type tests to check in this case). Notice that '#1 now contains end('#3), but we have no where clause or implied bound to say that 'a: 'b... that's an error!

Some details

The RegionInferenceContext type contains all of the information needed to do inference, including the universal regions from replace_regions_in_mir and the constraints computed for each region. It is constructed just after we compute the liveness constraints.

Here are some of the fields of the struct:

  • constraints: contains all the outlives constraints.
  • liveness_constraints: contains all the liveness constraints.
  • universal_regions: contains the UniversalRegions returned by replace_regions_in_mir.
  • universal_region_relations: contains relations known to be true about universal regions. For example, if we have a where clause that 'a: 'b, that relation is assumed to be true while borrow checking the implementation (it is checked at the caller), so universal_region_relations would contain 'a: 'b.
  • type_tests: contains some constraints on types that we must check after inference (e.g. T: 'a).
  • closure_bounds_mapping: used for propagating region constraints from closures back out to the creator of the closure.

TODO: should we discuss any of the others fields? What about the SCCs?

Ok, now that we have constructed a RegionInferenceContext, we can do inference. This is done by calling the solve method on the context. This is where we call propagate_constraints and then check the resulting type tests and universal regions, as discussed above.

Constraint propagation

The main work of the region inference is constraint propagation, which is done in the propagate_constraints function. There are three sorts of constraints that are used in NLL, and we'll explain how propagate_constraints works by "layering" those sorts of constraints on one at a time (each of them is fairly independent from the others):

  • liveness constraints (R live at E), which arise from liveness;
  • outlives constraints (R1: R2), which arise from subtyping;
  • member constraints (member R_m of [R_c...]), which arise from impl Trait.

In this chapter, we'll explain the "heart" of constraint propagation, covering both liveness and outlives constraints.

Notation and high-level concepts

Conceptually, region inference is a "fixed-point" computation. It is given some set of constraints {C} and it computes a set of values Values: R -> {E} that maps each region R to a set of elements {E} (see here for more notes on region elements):

  • Initially, each region is mapped to an empty set, so Values(R) = {} for all regions R.
  • Next, we process the constraints repeatedly until a fixed-point is reached:
    • For each constraint C:
      • Update Values as needed to satisfy the constraint

As a simple example, if we have a liveness constraint R live at E, then we can apply Values(R) = Values(R) union {E} to make the constraint be satisfied. Similarly, if we have an outlives constraints R1: R2, we can apply Values(R1) = Values(R1) union Values(R2). (Member constraints are more complex and we discuss them in this section.)

In practice, however, we are a bit more clever. Instead of applying the constraints in a loop, we can analyze the constraints and figure out the correct order to apply them, so that we only have to apply each constraint once in order to find the final result.

Similarly, in the implementation, the Values set is stored in the scc_values field, but they are indexed not by a region but by a strongly connected component (SCC). SCCs are an optimization that avoids a lot of redundant storage and computation. They are explained in the section on outlives constraints.

Liveness constraints

A liveness constraint arises when some variable whose type includes a region R is live at some point P. This simply means that the value of R must include the point P. Liveness constraints are computed by the MIR type checker.

A liveness constraint R live at E is satisfied if E is a member of Values(R). So to "apply" such a constraint to Values, we just have to compute Values(R) = Values(R) union {E}.

The liveness values are computed in the type-check and passed to the region inference upon creation in the liveness_constraints argument. These are not represented as individual constraints like R live at E though; instead, we store a (sparse) bitset per region variable (of type LivenessValues). This way we only need a single bit for each liveness constraint.

One thing that is worth mentioning: All lifetime parameters are always considered to be live over the entire function body. This is because they correspond to some portion of the caller's execution, and that execution clearly includes the time spent in this function, since the caller is waiting for us to return.

Outlives constraints

An outlives constraint 'a: 'b indicates that the value of 'a must be a superset of the value of 'b. That is, an outlives constraint R1: R2 is satisfied if Values(R1) is a superset of Values(R2). So to "apply" such a constraint to Values, we just have to compute Values(R1) = Values(R1) union Values(R2).

One observation that follows from this is that if you have R1: R2 and R2: R1, then R1 = R2 must be true. Similarly, if you have:

R1: R2
R2: R3
R3: R4
R4: R1

then R1 = R2 = R3 = R4 follows. We take advantage of this to make things much faster, as described shortly.

In the code, the set of outlives constraints is given to the region inference context on creation in a parameter of type OutlivesConstraintSet. The constraint set is basically just a list of 'a: 'b constraints.

The outlives constraint graph and SCCs

In order to work more efficiently with outlives constraints, they are converted into the form of a graph, where the nodes of the graph are region variables ('a, 'b) and each constraint 'a: 'b induces an edge 'a -> 'b. This conversion happens in the RegionInferenceContext::new function that creates the inference context.

When using a graph representation, we can detect regions that must be equal by looking for cycles. That is, if you have a constraint like

'a: 'b
'b: 'c
'c: 'd
'd: 'a

then this will correspond to a cycle in the graph containing the elements 'a...'d.

Therefore, one of the first things that we do in propagating region values is to compute the strongly connected components (SCCs) in the constraint graph. The result is stored in the constraint_sccs field. You can then easily find the SCC that a region r is a part of by invoking constraint_sccs.scc(r).

Working in terms of SCCs allows us to be more efficient: if we have a set of regions 'a...'d that are part of a single SCC, we don't have to compute/store their values separately. We can just store one value for the SCC, since they must all be equal.

If you look over the region inference code, you will see that a number of fields are defined in terms of SCCs. For example, the scc_values field stores the values of each SCC. To get the value of a specific region 'a then, we first figure out the SCC that the region is a part of, and then find the value of that SCC.

When we compute SCCs, we not only figure out which regions are a member of each SCC, we also figure out the edges between them. So for example consider this set of outlives constraints:

'a: 'b
'b: 'a

'a: 'c

'c: 'd
'd: 'c

Here we have two SCCs: S0 contains 'a and 'b, and S1 contains 'c and 'd. But these SCCs are not independent: because 'a: 'c, that means that S0: S1 as well. That is -- the value of S0 must be a superset of the value of S1. One crucial thing is that this graph of SCCs is always a DAG -- that is, it never has cycles. This is because all the cycles have been removed to form the SCCs themselves.

Applying liveness constraints to SCCs

The liveness constraints that come in from the type-checker are expressed in terms of regions -- that is, we have a map like Liveness: R -> {E}. But we want our final result to be expressed in terms of SCCs -- we can integrate these liveness constraints very easily just by taking the union:

for each region R:
  let S be the SCC that contains R
  Values(S) = Values(S) union Liveness(R)

In the region inferencer, this step is done in RegionInferenceContext::new.

Applying outlives constraints

Once we have computed the DAG of SCCs, we use that to structure out entire computation. If we have an edge S1 -> S2 between two SCCs, that means that Values(S1) >= Values(S2) must hold. So, to compute the value of S1, we first compute the values of each successor S2. Then we simply union all of those values together. To use a quasi-iterator-like notation:

Values(S1) =
  s1.successors()
    .map(|s2| Values(s2))
    .union()

In the code, this work starts in the propagate_constraints function, which iterates over all the SCCs. For each SCC S1, we compute its value by first computing the value of its successors. Since SCCs form a DAG, we don't have to be concerned about cycles, though we do need to keep a set around to track whether we have already processed a given SCC or not. For each successor S2, once we have computed S2's value, we can union those elements into the value for S1. (Although we have to be careful in this process to properly handle higher-ranked placeholders. Note that the value for S1 already contains the liveness constraints, since they were added in RegionInferenceContext::new.

Once that process is done, we now have the "minimal value" for S1, taking into account all of the liveness and outlives constraints. However, in order to complete the process, we must also consider member constraints, which are described in a later section.

Universal regions

"Universal regions" is the name that the code uses to refer to "named lifetimes" -- e.g., lifetime parameters and 'static. The name derives from the fact that such lifetimes are "universally quantified" (i.e., we must make sure the code is true for all values of those lifetimes). It is worth spending a bit of discussing how lifetime parameters are handled during region inference. Consider this example:

fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 {
  x
}

This example is intended not to compile, because we are returning x, which has type &'a u32, but our signature promises that we will return a &'b u32 value. But how are lifetimes like 'a and 'b integrated into region inference, and how this error wind up being detected?

Universal regions and their relationships to one another

Early on in region inference, one of the first things we do is to construct a UniversalRegions struct. This struct tracks the various universal regions in scope on a particular function. We also create a UniversalRegionRelations struct, which tracks their relationships to one another. So if you have e.g. where 'a: 'b, then the UniversalRegionRelations struct would track that 'a: 'b is known to hold (which could be tested with the outlives function.

Everything is a region variable

One important aspect of how NLL region inference works is that all lifetimes are represented as numbered variables. This means that the only variant of ty::RegionKind that we use is the ReVar variant. These region variables are broken into two major categories, based on their index:

  • 0..N: universal regions -- the ones we are discussing here. In this case, the code must be correct with respect to any value of those variables that meets the declared relationships.
  • N..M: existential regions -- inference variables where the region inferencer is tasked with finding some suitable value.

In fact, the universal regions can be further subdivided based on where they were brought into scope (see the RegionClassification type). These subdivisions are not important for the topics discussed here, but become important when we consider closure constraint propagation, so we discuss them there.

Universal lifetimes as the elements of a region's value

As noted previously, the value that we infer for each region is a set {E}. The elements of this set can be points in the control-flow graph, but they can also be an element end('a) corresponding to each universal lifetime 'a. If the value for some region R0 includes end('a), then this implies that R0 must extend until the end of 'a in the caller.

The "value" of a universal region

During region inference, we compute a value for each universal region in the same way as we compute values for other regions. This value represents, effectively, the lower bound on that universal region -- the things that it must outlive. We now describe how we use this value to check for errors.

Liveness and universal regions

All universal regions have an initial liveness constraint that includes the entire function body. This is because lifetime parameters are defined in the caller and must include the entirety of the function call that invokes this particular function. In addition, each universal region 'a includes itself (that is, end('a)) in its liveness constraint (i.e., 'a must extend until the end of itself). In the code, these liveness constraints are setup in init_free_and_bound_regions.

Propagating outlives constraints for universal regions

So, consider the first example of this section:

fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 {
  x
}

Here, returning x requires that &'a u32 <: &'b u32, which gives rise to an outlives constraint 'a: 'b. Combined with our default liveness constraints we get:

'a live at {B, end('a)} // B represents the "function body"
'b live at {B, end('b)}
'a: 'b

When we process the 'a: 'b constraint, therefore, we will add end('b) into the value for 'a, resulting in a final value of {B, end('a), end('b)}.

Detecting errors

Once we have finished constraint propagation, we then enforce a constraint that if some universal region 'a includes an element end('b), then 'a: 'b must be declared in the function's bounds. If not, as in our example, that is an error. This check is done in the check_universal_regions function, which simply iterates over all universal regions, inspects their final value, and tests against the declared UniversalRegionRelations.

Member constraints

A member constraint 'm member of ['c_1..'c_N] expresses that the region 'm must be equal to some choice regions 'c_i (for some i). These constraints cannot be expressed by users, but they arise from impl Trait due to its lifetime capture rules. Consider a function such as the following:

fn make(a: &'a u32, b: &'b u32) -> impl Trait<'a, 'b> { .. }

Here, the true return type (often called the "hidden type") is only permitted to capture the lifetimes 'a or 'b. You can kind of see this more clearly by desugaring that impl Trait return type into its more explicit form:

type MakeReturn<'x, 'y> = impl Trait<'x, 'y>;
fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> { .. }

Here, the idea is that the hidden type must be some type that could have been written in place of the impl Trait<'x, 'y> -- but clearly such a type can only reference the regions 'x or 'y (or 'static!), as those are the only names in scope. This limitation is then translated into a restriction to only access 'a or 'b because we are returning MakeReturn<'a, 'b>, where 'x and 'y have been replaced with 'a and 'b respectively.

Detailed example

To help us explain member constraints in more detail, let's spell out the make example in a bit more detail. First off, let's assume that you have some dummy trait:

trait Trait<'a, 'b> { }
impl<T> Trait<'_, '_> for T { }

and this is the make function (in desugared form):

type MakeReturn<'x, 'y> = impl Trait<'x, 'y>;
fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> {
  (a, b)
}

What happens in this case is that the return type will be (&'0 u32, &'1 u32), where '0 and '1 are fresh region variables. We will have the following region constraints:

'0 live at {L}
'1 live at {L}
'a: '0
'b: '1
'0 member of ['a, 'b, 'static]
'1 member of ['a, 'b, 'static]

Here the "liveness set" {L} corresponds to that subset of the body where '0 and '1 are live -- basically the point from where the return tuple is constructed to where it is returned (in fact, '0 and '1 might have slightly different liveness sets, but that's not very interesting to the point we are illustrating here).

The 'a: '0 and 'b: '1 constraints arise from subtyping. When we construct the (a, b) value, it will be assigned type (&'0 u32, &'1 u32) -- the region variables reflect that the lifetimes of these references could be made smaller. For this value to be created from a and b, however, we do require that:

(&'a u32, &'b u32) <: (&'0 u32, &'1 u32)

which means in turn that &'a u32 <: &'0 u32 and hence that 'a: '0 (and similarly that &'b u32 <: &'1 u32, 'b: '1).

Note that if we ignore member constraints, the value of '0 would be inferred to some subset of the function body (from the liveness constraints, which we did not write explicitly). It would never become 'a, because there is no need for it too -- we have a constraint that 'a: '0, but that just puts a "cap" on how large '0 can grow to become. Since we compute the minimal value that we can, we are happy to leave '0 as being just equal to the liveness set. This is where member constraints come in.

Choices are always lifetime parameters

At present, the "choice" regions from a member constraint are always lifetime parameters from the current function. This falls out from the placement of impl Trait, though in the future it may not be the case. We take some advantage of this fact, as it simplifies the current code. In particular, we don't have to consider a case like '0 member of ['1, 'static], in which the value of both '0 and '1 are being inferred and hence changing. See rust-lang/rust#61773 for more information.

Applying member constraints

Member constraints are a bit more complex than other forms of constraints. This is because they have a "or" quality to them -- that is, they describe multiple choices that we must select from. E.g., in our example constraint '0 member of ['a, 'b, 'static], it might be that '0 is equal to 'a, 'b, or 'static. How can we pick the correct one? What we currently do is to look for a minimal choice -- if we find one, then we will grow '0 to be equal to that minimal choice. To find that minimal choice, we take two factors into consideration: lower and upper bounds.

Lower bounds

The lower bounds are those lifetimes that '0 must outlive -- i.e., that '0 must be larger than. In fact, when it comes time to apply member constraints, we've already computed the lower bounds of '0 because we computed its minimal value (or at least, the lower bounds considering everything but member constraints).

Let LB be the current value of '0. We know then that '0: LB must hold, whatever the final value of '0 is. Therefore, we can rule out any choice 'choice where 'choice: LB does not hold.

Unfortunately, in our example, this is not very helpful. The lower bound for '0 will just be the liveness set {L}, and we know that all the lifetime parameters outlive that set. So we are left with the same set of choices here. (But in other examples, particularly those with different variance, lower bound constraints may be relevant.)

Upper bounds

The upper bounds are those lifetimes that must outlive '0 -- i.e., that '0 must be smaller than. In our example, this would be 'a, because we have the constraint that 'a: '0. In more complex examples, the chain may be more indirect.

We can use upper bounds to rule out members in a very similar way to lower lower bounds. If UB is some upper bound, then we know that UB: '0 must hold, so we can rule out any choice 'choice where UB: 'choice does not hold.

In our example, we would be able to reduce our choice set from ['a, 'b, 'static] to just ['a]. This is because '0 has an upper bound of 'a, and neither 'a: 'b nor 'a: 'static is known to hold.

(For notes on how we collect upper bounds in the implementation, see the section below.)

Minimal choice

After applying lower and upper bounds, we can still sometimes have multiple possibilities. For example, imagine a variant of our example using types with the opposite variance. In that case, we would have the constraint '0: 'a instead of 'a: '0. Hence the current value of '0 would be {L, 'a}. Using this as a lower bound, we would be able to narrow down the member choices to ['a, 'static] because 'b: 'a is not known to hold (but 'a: 'a and 'static: 'a do hold). We would not have any upper bounds, so that would be our final set of choices.

In that case, we apply the minimal choice rule -- basically, if one of our choices if smaller than the others, we can use that. In this case, we would opt for 'a (and not 'static).

This choice is consistent with the general 'flow' of region propagation, which always aims to compute a minimal value for the region being inferred. However, it is somewhat arbitrary.

Collecting upper bounds in the implementation

In practice, computing upper bounds is a bit inconvenient, because our data structures are setup for the opposite. What we do is to compute the reverse SCC graph (we do this lazily and cache the result) -- that is, a graph where 'a: 'b induces an edge SCC('b) -> SCC('a). Like the normal SCC graph, this is a DAG. We can then do a depth-first search starting from SCC('0) in this graph. This will take us to all the SCCs that must outlive '0.

One wrinkle is that, as we walk the "upper bound" SCCs, their values will not yet have been fully computed. However, we have already applied their liveness constraints, so we have some information about their value. In particular, for any regions representing lifetime parameters, their value will contain themselves (i.e., the initial value for 'a includes 'a and the value for 'b contains 'b). So we can collect all of the lifetime parameters that are reachable, which is precisely what we are interested in.

Placeholders and universes

From time to time we have to reason about regions that we can't concretely know. For example, consider this program:

// A function that needs a static reference
fn foo(x: &'static u32) { }

fn bar(f: for<'a> fn(&'a u32)) {
       // ^^^^^^^^^^^^^^^^^^^ a function that can accept **any** reference
    let x = 22;
    f(&x);
}

fn main() {
    bar(foo);
}

This program ought not to type-check: foo needs a static reference for its argument, and bar wants to be given a function that that accepts any reference (so it can call it with something on its stack, for example). But how do we reject it and why?

Subtyping and Placeholders

When we type-check main, and in particular the call bar(foo), we are going to wind up with a subtyping relationship like this one:

fn(&'static u32) <: for<'a> fn(&'a u32)
----------------    -------------------
the type of `foo`   the type `bar` expects

We handle this sort of subtyping by taking the variables that are bound in the supertype and replacing them with universally quantified representatives, denoted like !1 here. We call these regions "placeholder regions" – they represent, basically, "some unknown region".

Once we've done that replacement, we have the following relation:

fn(&'static u32) <: fn(&'!1 u32)

The key idea here is that this unknown region '!1 is not related to any other regions. So if we can prove that the subtyping relationship is true for '!1, then it ought to be true for any region, which is what we wanted.

So let's work through what happens next. To check if two functions are subtypes, we check if their arguments have the desired relationship (fn arguments are contravariant, so we swap the left and right here):

&'!1 u32 <: &'static u32

According to the basic subtyping rules for a reference, this will be true if '!1: 'static. That is – if "some unknown region !1" lives outlives 'static. Now, this might be true – after all, '!1 could be 'static – but we don't know that it's true. So this should yield up an error (eventually).

What is a universe?

In the previous section, we introduced the idea of a placeholder region, and we denoted it !1. We call this number 1 the universe index. The idea of a "universe" is that it is a set of names that are in scope within some type or at some point. Universes are formed into a tree, where each child extends its parents with some new names. So the root universe conceptually contains global names, such as the the lifetime 'static or the type i32. In the compiler, we also put generic type parameters into this root universe (in this sense, there is not just one root universe, but one per item). So consider this function bar:

struct Foo { }

fn bar<'a, T>(t: &'a T) {
    ...
}

Here, the root universe would consist of the lifetimes 'static and 'a. In fact, although we're focused on lifetimes here, we can apply the same concept to types, in which case the types Foo and T would be in the root universe (along with other global types, like i32). Basically, the root universe contains all the names that appear free in the body of bar.

Now let's extend bar a bit by adding a variable x:

fn bar<'a, T>(t: &'a T) {
    let x: for<'b> fn(&'b u32) = ...;
}

Here, the name 'b is not part of the root universe. Instead, when we "enter" into this for<'b> (e.g., by replacing it with a placeholder), we will create a child universe of the root, let's call it U1:

U0 (root universe)
│
└─ U1 (child universe)

The idea is that this child universe U1 extends the root universe U0 with a new name, which we are identifying by its universe number: !1.

Now let's extend bar a bit by adding one more variable, y:

fn bar<'a, T>(t: &'a T) {
    let x: for<'b> fn(&'b u32) = ...;
    let y: for<'c> fn(&'b u32) = ...;
}

When we enter this type, we will again create a new universe, which we'll call U2. Its parent will be the root universe, and U1 will be its sibling:

U0 (root universe)
│
├─ U1 (child universe)
│
└─ U2 (child universe)

This implies that, while in U2, we can name things from U0 or U2, but not U1.

Giving existential variables a universe. Now that we have this notion of universes, we can use it to extend our type-checker and things to prevent illegal names from leaking out. The idea is that we give each inference (existential) variable – whether it be a type or a lifetime – a universe. That variable's value can then only reference names visible from that universe. So for example if a lifetime variable is created in U0, then it cannot be assigned a value of !1 or !2, because those names are not visible from the universe U0.

Representing universes with just a counter. You might be surprised to see that the compiler doesn't keep track of a full tree of universes. Instead, it just keeps a counter – and, to determine if one universe can see another one, it just checks if the index is greater. For example, U2 can see U0 because 2 >= 0. But U0 cannot see U2, because 0 >= 2 is false.

How can we get away with this? Doesn't this mean that we would allow U2 to also see U1? The answer is that, yes, we would, if that question ever arose. But because of the structure of our type checker etc, there is no way for that to happen. In order for something happening in the universe U1 to "communicate" with something happening in U2, they would have to have a shared inference variable X in common. And because everything in U1 is scoped to just U1 and its children, that inference variable X would have to be in U0. And since X is in U0, it cannot name anything from U1 (or U2). This is perhaps easiest to see by using a kind of generic "logic" example:

exists<X> {
   forall<Y> { ... /* Y is in U1 ... */ }
   forall<Z> { ... /* Z is in U2 ... */ }
}

Here, the only way for the two foralls to interact would be through X, but neither Y nor Z are in scope when X is declared, so its value cannot reference either of them.

Universes and placeholder region elements

But where does that error come from? The way it happens is like this. When we are constructing the region inference context, we can tell from the type inference context how many placeholder variables exist (the InferCtxt has an internal counter). For each of those, we create a corresponding universal region variable !n and a "region element" placeholder(n). This corresponds to "some unknown set of other elements". The value of !n is {placeholder(n)}.

At the same time, we also give each existential variable a universe (also taken from the InferCtxt). This universe determines which placeholder elements may appear in its value: For example, a variable in universe U3 may name placeholder(1), placeholder(2), and placeholder(3), but not placeholder(4). Note that the universe of an inference variable controls what region elements can appear in its value; it does not say region elements will appear.

Placeholders and outlives constraints

In the region inference engine, outlives constraints have the form:

V1: V2 @ P

where V1 and V2 are region indices, and hence map to some region variable (which may be universally or existentially quantified). The P here is a "point" in the control-flow graph; it's not important for this section. This variable will have a universe, so let's call those universes U(V1) and U(V2) respectively. (Actually, the only one we are going to care about is U(V1).)

When we encounter this constraint, the ordinary procedure is to start a DFS from P. We keep walking so long as the nodes we are walking are present in value(V2) and we add those nodes to value(V1). If we reach a return point, we add in any end(X) elements. That part remains unchanged.

But then after that we want to iterate over the placeholder placeholder(x) elements in V2 (each of those must be visible to U(V2), but we should be able to just assume that is true, we don't have to check it). We have to ensure that value(V1) outlives each of those placeholder elements.

Now there are two ways that could happen. First, if U(V1) can see the universe x (i.e., x <= U(V1)), then we can just add placeholder(x) to value(V1) and be done. But if not, then we have to approximate: we may not know what set of elements placeholder(x) represents, but we should be able to compute some sort of upper bound B for it – some region B that outlives placeholder(x). For now, we'll just use 'static for that (since it outlives everything) – in the future, we can sometimes be smarter here (and in fact we have code for doing this already in other contexts). Moreover, since 'static is in the root universe U0, we know that all variables can see it – so basically if we find that value(V2) contains placeholder(x) for some universe x that V1 can't see, then we force V1 to 'static.

Extending the "universal regions" check

After all constraints have been propagated, the NLL region inference has one final check, where it goes over the values that wound up being computed for each universal region and checks that they did not get 'too large'. In our case, we will go through each placeholder region and check that it contains only the placeholder(u) element it is known to outlive. (Later, we might be able to know that there are relationships between two placeholder regions and take those into account, as we do for universal regions from the fn signature.)

Put another way, the "universal regions" check can be considered to be checking constraints like:

{placeholder(1)}: V1

where {placeholder(1)} is like a constant set, and V1 is the variable we made to represent the !1 region.

Back to our example

OK, so far so good. Now let's walk through what would happen with our first example:

fn(&'static u32) <: fn(&'!1 u32) @ P  // this point P is not imp't here

The region inference engine will create a region element domain like this:

{ CFG; end('static); placeholder(1) }
  ---  ------------  ------- from the universe `!1`
  |    'static is always in scope
  all points in the CFG; not especially relevant here

It will always create two universal variables, one representing 'static and one representing '!1. Let's call them Vs and V1. They will have initial values like so:

Vs = { CFG; end('static) } // it is in U0, so can't name anything else
V1 = { placeholder(1) }

From the subtyping constraint above, we would have an outlives constraint like

'!1: 'static @ P

To process this, we would grow the value of V1 to include all of Vs:

Vs = { CFG; end('static) }
V1 = { CFG; end('static), placeholder(1) }

At that point, constraint propagation is complete, because all the outlives relationships are satisfied. Then we would go to the "check universal regions" portion of the code, which would test that no universal region grew too large.

In this case, V1 did grow too large – it is not known to outlive end('static), nor any of the CFG – so we would report an error.

Another example

What about this subtyping relationship?

for<'a> fn(&'a u32, &'a u32)
    <:
for<'b, 'c> fn(&'b u32, &'c u32)

Here we would replace the bound region in the supertype with a placeholder, as before, yielding:

for<'a> fn(&'a u32, &'a u32)
    <:
fn(&'!1 u32, &'!2 u32)

then we instantiate the variable on the left-hand side with an existential in universe U2, yielding the following (?n is a notation for an existential variable):

fn(&'?3 u32, &'?3 u32)
    <:
fn(&'!1 u32, &'!2 u32)

Then we break this down further:

&'!1 u32 <: &'?3 u32
&'!2 u32 <: &'?3 u32

and even further, yield up our region constraints:

'!1: '?3
'!2: '?3

Note that, in this case, both '!1 and '!2 have to outlive the variable '?3, but the variable '?3 is not forced to outlive anything else. Therefore, it simply starts and ends as the empty set of elements, and hence the type-check succeeds here.

(This should surprise you a little. It surprised me when I first realized it. We are saying that if we are a fn that needs both of its arguments to have the same region, we can accept being called with arguments with two distinct regions. That seems intuitively unsound. But in fact, it's fine, as I tried to explain in this issue on the Rust issue tracker long ago. The reason is that even if we get called with arguments of two distinct lifetimes, those two lifetimes have some intersection (the call itself), and that intersection can be our value of 'a that we use as the common lifetime of our arguments. -nmatsakis)

Final example

Let's look at one last example. We'll extend the previous one to have a return type:

for<'a> fn(&'a u32, &'a u32) -> &'a u32
    <:
for<'b, 'c> fn(&'b u32, &'c u32) -> &'b u32

Despite seeming very similar to the previous example, this case is going to get an error. That's good: the problem is that we've gone from a fn that promises to return one of its two arguments, to a fn that is promising to return the first one. That is unsound. Let's see how it plays out.

First, we replace the bound region in the supertype with a placeholder:

for<'a> fn(&'a u32, &'a u32) -> &'a u32
    <:
fn(&'!1 u32, &'!2 u32) -> &'!1 u32

Then we instantiate the subtype with existentials (in U2):

fn(&'?3 u32, &'?3 u32) -> &'?3 u32
    <:
fn(&'!1 u32, &'!2 u32) -> &'!1 u32

And now we create the subtyping relationships:

&'!1 u32 <: &'?3 u32 // arg 1
&'!2 u32 <: &'?3 u32 // arg 2
&'?3 u32 <: &'!1 u32 // return type

And finally the outlives relationships. Here, let V1, V2, and V3 be the variables we assign to !1, !2, and ?3 respectively:

V1: V3
V2: V3
V3: V1

Those variables will have these initial values:

V1 in U1 = {placeholder(1)}
V2 in U2 = {placeholder(2)}
V3 in U2 = {}

Now because of the V3: V1 constraint, we have to add placeholder(1) into V3 (and indeed it is visible from V3), so we get:

V3 in U2 = {placeholder(1)}

then we have this constraint V2: V3, so we wind up having to enlarge V2 to include placeholder(1) (which it can also see):

V2 in U2 = {placeholder(1), placeholder(2)}

Now constraint propagation is done, but when we check the outlives relationships, we find that V2 includes this new element placeholder(1), so we report an error.

Propagating closure constraints

When we are checking the type tests and universal regions, we may come across a constraint that we can't prove yet if we are in a closure body! However, the necessary constraints may actually hold (we just don't know it yet). Thus, if we are inside a closure, we just collect all the constraints we can't prove yet and return them. Later, when we are borrow check the MIR node that created the closure, we can also check that these constraints hold. At that time, if we can't prove they hold, we report an error.

Reporting region errors

TODO: we should discuss how to generate errors from the results of these analyses.

Two-phase borrows

Two-phase borrows are a more permissive version of mutable borrows that allow nested method calls such as vec.push(vec.len()). Such borrows first act as shared borrows in a "reservation" phase and can later be "activated" into a full mutable borrow.

Only certain implicit mutable borrows can be two-phase, any &mut or ref mut in the source code is never a two-phase borrow. The cases where we generate a two-phase borrow are:

  1. The autoref borrow when calling a method with a mutable reference receiver.
  2. A mutable reborrow in function arguments.
  3. The implicit mutable borrow in an overloaded compound assignment operator.

To give some examples:


#![allow(unused_variables)]
fn main() {
// In the source code

// Case 1:
let mut v = Vec::new();
v.push(v.len());
let r = &mut Vec::new();
r.push(r.len());

// Case 2:
std::mem::replace(r, vec![1, r.len()]);

// Case 3:
let mut x = std::num::Wrapping(2);
x += x;
}

Expanding these enough to show the two-phase borrows:

// Case 1:
let mut v = Vec::new();
let temp1 = &two_phase v;
let temp2 = v.len();
Vec::push(temp1, temp2);
let r = &mut Vec::new();
let temp3 = &two_phase *r;
let temp4 = r.len();
Vec::push(temp3, temp4);

// Case 2:
let temp5 = &two_phase *r;
let temp6 = vec![1, r.len()];
std::mem::replace(temp5, temp6);

// Case 3:
let mut x = std::num::Wrapping(2);
let temp7 = &two_phase x;
let temp8 = x;
std::ops::AddAssign::add_assign(temp7, temp8);

Whether a borrow can be two-phase is tracked by a flag on the AutoBorrow after type checking, which is then converted to a BorrowKind during MIR construction.

Each two-phase borrow is assigned to a temporary that is only used once. As such we can define:

  • The point where the temporary is assigned to is called the reservation point of the two-phase borrow.
  • The point where the temporary is used, which is effectively always a function call, is called the activation point.

The activation points are found using the GatherBorrows visitor. The BorrowData then holds both the reservation and activation points for the borrow.

Checking two-phase borrows

Two-phase borrows are treated as if they were mutable borrows with the following exceptions:

  1. At every location in the MIR we check if any two-phase borrows are activated at this location. If a live two phase borrow is activated at a location, then we check that there are no borrows that conflict with the two-phase borrow.
  2. At the reservation point we error if there are conflicting live mutable borrows. And lint if there are any conflicting shared borrows.
  3. Between the reservation and the activation point, the two-phase borrow acts as a shared borrow. We determine (in is_active) if we're at such a point by using the Dominators for the MIR graph.
  4. After the activation point, the two-phase borrow acts as a mutable borrow.

Parameter Environment

When working with associated and/or or generic items (types, constants, functions/methods) it is often relevant to have more information about the Self or generic parameters. Trait bounds and similar information is encoded in the ParamEnv. Often this is not enough information to obtain things like the type's Layout, but you can do all kinds of other checks on it (e.g. whether a type implements Copy) or you can evaluate an associated constant whose value does not depend on anything from the parameter environment.

For example if you have a function


#![allow(unused_variables)]
fn main() {
fn foo<T: Copy>(t: T) { ... }
}

the parameter environment for that function is [T: Copy]. This means any evaluation within this function will, when accessing the type T, know about its Copy bound via the parameter environment.

You can get the parameter environment for a def_id using the param_env query. However, this ParamEnv can be too generic for your use case. Using the ParamEnv from the surrounding context can allow you to evaluate more things. For example, suppose we had something the following:


#![allow(unused_variables)]
fn main() {
trait Foo {
    type Assoc;
}

trait Bar { }

trait Baz {
    fn stuff() -> bool;
}

fn foo<T>(t: T)
where
    T: Foo,
    <T as Foo>::Assoc: Bar
{
   bar::<T::Assoc>()
}

fn bar<T: Baz>() {
    if T::stuff() { mep() } else { mop() }
}
}

We may know some things inside bar that we wouldn't know if we just fetched bar's param env because of the <T as Foo>::Assoc: Bar bound in foo. This is a contrived example that makes no sense in our existing analyses, but we may run into similar cases when doing analyses with associated constants on generic traits or traits with assoc types.

Bundling

Another great thing about ParamEnv is that you can use it to bundle the thing depending on generic parameters (e.g. a Ty) by calling the and method. This will produce a ParamEnvAnd<Ty>, making clear that you should probably not be using the inner value without taking care to also use the ParamEnv.

From MIR to Binaries

All of the preceding chapters of this guide have one thing in common: we never generated any executable machine code at all! With this chapter, all of that changes.

So far, we've shown how the compiler can take raw source code in text format and transform it into MIR. We have also shown how the compiler does various analyses on the code to detect things like type or lifetime errors. Now, we will finally take the MIR and produce some executable machine code.

NOTE: This part of a compiler is often called the backend the term is a bit overloaded because in the compiler source, it usually refers to the "codegen backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend" in this part, we are refering to the "codegen backend".

So what do we need to do?

  1. First, we need to collect the set of things to generate code for. In particular, we need to find out which concrete types to substitute for generic ones, since we need to generate code for the concrete types. Generating code for the concrete types (i.e. emitting a copy of the code for each concrete type) is called monomorphization, so the process of collecting all the concrete types is called monomorphization collection.
  2. Next, we need to actually lower the MIR to a codegen IR (usually LLVM IR) for each concrete type we collected.
  3. Finally, we need to invoke LLVM or Cranelift, which runs a bunch of optimization passes, generates executable code, and links together an executable binary.

The code for codegen is actually a bit complex due to a few factors:

  • Support for multiple codegen backends (LLVM and Cranelift). We try to share as much backend code between them as possible, so a lot of it is generic over the codegen implementation. This means that there are often a lot of layers of abstraction.
  • Codegen happens asynchronously in another thread for performance.
  • The actual codegen is done by a third-party library (either LLVM or Cranelift).

Generally, the rustc_codegen_ssa crate contains backend-agnostic code (i.e. independent of LLVM or Cranelift), while the rustc_codegen_llvm crate contains code specific to LLVM codegen.

At a very high level, the entry point is rustc_codegen_ssa::base::codegen_crate. This function starts the process discussed in the rest of this chapter.

MIR (中层IR)

MIR 是 Rust's 中层中间表示. MIR是在RFC 1211中引入的。 它是Rust的一种非常简化的形式,用于某些对控制流敏感的安全检查——尤其是是借用检查器! ——以及优化和代码生成。 如果您想阅读对MIR非常层次的介绍,以及它所依赖的一些编译器概念(例如控制流图和简化),则可以欣赏介绍MIR的rust-lang博客文章

介绍 MIR

MIR 在 src/librustc_middle/mir/ 模块中定义,但许多操纵它的代码都在 src/librustc_mir.

MIR的一些核心特征有:

  • 它基于 控制流图
  • 他没有嵌套的表达式。
  • MIR中的所有类型都是完全显式的。

MIR核心词汇

本节介绍了MIR的关键概念,总结如下:

  • 基本块: 控制流图的单元,包含了:
    • 语句: 有一个后继的动作
    • 终结句: 可能有多个后继的动作,永远在块的末尾
    • (如果你对术语基本块不熟悉,见 背景知识)
  • 本地变量: 在堆栈上分配的内存位置(至少在概念上是这样),例如函数参数,局部变量和临时变量。 这些由索引标识,并带有前导下划线,例如_1。 还有一个特殊的“本地变量”(_0)分配来存储返回值。
  • 位置: 用来表达内存中一个位置的表达式,像_1 或者_1.f.
  • 右值: 生成一个值的表达式,“右”意味着这些表达式一般只会出现在赋值语句的右侧。
    • 操作数: 右值表达式的参数,可以是一个常数(如22)或者一个位置(如_1)。

通过将简单的程序转换为MIR并读取pretty print的输出,您可以了解MIR的结构。 实际上,playgroud使得此操作变得容易,因为它提供了一个MIR按钮,该按钮将向您显示程序的MIR。 尝试运行此程序(或单击此链接),然后单击顶部的“ MIR”按钮:

fn main() {
    let mut vec = Vec::new();
    vec.push(1);
    vec.push(2);
}

你会看见:

// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn main() -> () {
    ...
}

这是 main 函数的MIR格式。

变量定义 如果我们深入一些,我们可以看到函数以一些变量定义开始,他们看起来像这样:

let mut _0: ();                      // return place
let mut _1: std::vec::Vec<i32>;      // in scope 0 at src/main.rs:2:9: 2:16
let mut _2: ();
let mut _3: &mut std::vec::Vec<i32>;
let mut _4: ();
let mut _5: &mut std::vec::Vec<i32>;

您会看到MIR中的变量没有名称,而是具有索引,例如_0_1。 我们还将用户变量(例如_1)与临时值(例如_2_3)混为一谈。 但您还是可以区分出哪些是用户定义的变量,因为它们具有与之相关联的调试信息(请参见下文)。

用户变量的调试信息 在变量定义下面,我们能发现唯一能提醒我们 _1 代表的是一个用户变量的提示:

scope 1 {
    debug vec => _1;                 // in scope 1 at src/main.rs:2:9: 2:16
}

每个 debug <Name> => <Place>; 注解都描述了一个用户定义变量与调试器在哪里(即位置)能找到这个变量对应的数据。 这里这个映射非常简单,但优化可能会使得这个位置的使用情况复杂化,也可能会让多个用户变量共享同一个位置。 另外,闭包的捕获也是用同一套系统描述的,这种情况下,即使不进行优化,也已经很复杂了。如:debug x => (*((*_1).0: &T));

“scope”块(例如,scope 1 {..})描述了源程序的词法结构(某个名称在哪个作用域中), 因此,用// in scope 0中注释的程序的任何部分都看不到vec,在调试器中单步执行代码时就能发现这一点。

基本块:进一步阅读代码,我们能看到我们的第一个“基本块”(自然,当您查看它时,它看起来可能略有不同,我也省略了一些注释):

bb0: {
    StorageLive(_1);
    _1 = const <std::vec::Vec<T>>::new() -> bb2;
}

基本块由一系列语句和最终终结句定义。 在这个例子,有一个语句:

StorageLive(_1);

该语句表明变量 _1是“活动的”,这意味着它可以在以后使用 —— 它将持续存在,直到遇到 StorageDead(_1)语句为止,该语句表明变量_1已完成使用。 LLVM使用这些“存储语句”来分配栈空间。

bb0块的 终结句 是对 Vec::new的调用:

_1 = const <std::vec::Vec<T>>::new() -> bb2;

终结句和一般语句不同,它们能有多个后继 —— 控制流可能会流向不同的地方。 像 Vec::new 这样的函数调用永远是终结句,因为这可能可以导致堆栈解退,尽管在Vec::new的情况下显然堆栈解退是不可能的,因此我们只列出了唯一的后继块bb2

如果我们继续向前看到 bb2,我们可以看见像这样的代码:

bb2: {
    StorageLive(_3);
    _3 = &mut _1;
    _2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4];
}

这里有两个语句:另一个 StorageLive,引入了 _3临时变量,然后是一个赋值:

_3 = &mut _1;

赋值一般有形式:

<Place> = <Rvalue>

位置是类似于_3_ 3.f* _3的表达式——它表示内存中的位置。 右值是一个创建值的表达式:在这种情况下,rvalue是一个可变借用表达式,看起来像&mut <Place>。 因此,我们可以为右值定义语法,如下所示:

<Rvalue>  = & (mut)? <Place>
          | <Operand> + <Operand>
          | <Operand> - <Operand>
          | ...

<Operand> = Constant
          | copy Place
          | move Place

从该语法可以看出,右值不能嵌套——它们只能引用位置和常量。 此外,当您使用某个位置时,我们会指明是要复制该位置(要求该位置的类型为 T: Copy)还是移动它(适用于 任何类型的位置)。 因此,例如,如果我们在Rust中写了表达式x = a + b + c,它将被编译为两个语句和一个临时变量:

TMP1 = a + b
x = TMP1 + c

试试看,你可能想要使用release模式来编译来跳过overflow检查)

MIR 中的数据类型

MIR中的数据类型的定义在 src/librustc_middle/mir/模块中。 前面章节提到的关键概念都有一个直接对应的Rust类型。

MIR的主要数据类型为Mir。 它包含单个函数的数据(以及Mir的“提升过的常量”的子实例,您可以在下面阅读其中的内容)。

  • 基本块: 基本块被保存在 basic_blocks成员中;这是一个BasicBlockData向量。 我们不会直接引用一个基本块,代替地,我们会传递BasicBlock值,其实际上是newtype过的这个向量中的索引。
  • 语句Statement类型表示。
  • 终结句Terminator类型表示。
  • 本地变量 由类型 Localnewtype过的索引)表示。 本地变量的实际数据保存在Mir中的local_decls。 也有一个特殊的常量RETURN_PLACE来标记一个特殊的表示返回值的本地变量。
  • 位置 由枚举 Place表示。有如下变种:
    • 本地变量如 _1
    • 静态变量如 FOO
    • 投影,这一般是结构的成员或者从某个基位置“投影”出来的位置。 例如_1.f就是从)1上投影出来的。 *_1也是一个投影,这类投影由 ProjectionElem::Deref 代表。
  • RvaluesRvalue枚举表示。
  • OperandsOperand 枚举表示。

表示常量

to be written

提升过的常量

to be written

MIR optimizations

MIR optimizations are optimizations run on the MIR to produce better MIR before codegen. This is important for two reasons: first, it makes the final generated executable code better, and second, it means that LLVM has less work to do, so compilation is faster. Note that since MIR is generic (not monomorphized yet), these optimizations are particularly effective; we can optimize the generic version, so all of the monomorphizations are cheaper!

MIR optimizations run after borrow checking. We run a series of optimization passes over the MIR to improve it. Some passes are required to run on all code, some passes don't actually do optimizations but only check stuff, and some passes are only turned on in release mode.

The optimized_mir query is called to produce the optimized MIR for a given DefId. This query makes sure that the borrow checker has run and that some validation has occurred. Then, it steals the MIR, optimizes it, and returns the improved MIR.

Defining optimization passes

The list of passes run and the order in which they are run is defined by the run_optimization_passes function. It contains an array of passes to run. Each pass in the array is a struct that implements the MirPass trait. The array is an array of &dyn MirPass trait objects. Typically, a pass is implemented in its own submodule of the rustc_mir::transform module.

Some examples of passes are:

  • CleanupNonCodegenStatements: remove some of the info that is only needed for analyses, rather than codegen.
  • ConstProp: Does constant propagation

You can see the "Implementors" section of the MirPass rustdocs for more examples.

MIR Debugging

The -Zdump-mir flag can be used to dump a text representation of the MIR. The -Zdump-mir-graphviz flag can be used to dump a .dot file that represents MIR as a control-flow graph.

-Zdump-mir=F is a handy compiler options that will let you view the MIR for each function at each stage of compilation. -Zdump-mir takes a filter F which allows you to control which functions and which passes you are interesting in. For example:

> rustc -Zdump-mir=foo ...

This will dump the MIR for any function whose name contains foo; it will dump the MIR both before and after every pass. Those files will be created in the mir_dump directory. There will likely be quite a lot of them!

> cat > foo.rs
fn main() {
    println!("Hello, world!");
}
^D
> rustc -Zdump-mir=main foo.rs
> ls mir_dump/* | wc -l
     161

The files have names like rustc.main.000-000.CleanEndRegions.after.mir. These names have a number of parts:

rustc.main.000-000.CleanEndRegions.after.mir
      ---- --- --- --------------- ----- either before or after
      |    |   |   name of the pass
      |    |   index of dump within the pass (usually 0, but some passes dump intermediate states)
      |    index of the pass
      def-path to the function etc being dumped

You can also make more selective filters. For example, main & CleanEndRegions will select for things that reference both main and the pass CleanEndRegions:

> rustc -Zdump-mir='main & CleanEndRegions' foo.rs
> ls mir_dump
rustc.main.000-000.CleanEndRegions.after.mir	rustc.main.000-000.CleanEndRegions.before.mir

Filters can also have | parts to combine multiple sets of &-filters. For example main & CleanEndRegions | main & NoLandingPads will select either main and CleanEndRegions or main and NoLandingPads:

> rustc -Zdump-mir='main & CleanEndRegions | main & NoLandingPads' foo.rs
> ls mir_dump
rustc.main-promoted[0].002-000.NoLandingPads.after.mir
rustc.main-promoted[0].002-000.NoLandingPads.before.mir
rustc.main-promoted[0].002-006.NoLandingPads.after.mir
rustc.main-promoted[0].002-006.NoLandingPads.before.mir
rustc.main-promoted[1].002-000.NoLandingPads.after.mir
rustc.main-promoted[1].002-000.NoLandingPads.before.mir
rustc.main-promoted[1].002-006.NoLandingPads.after.mir
rustc.main-promoted[1].002-006.NoLandingPads.before.mir
rustc.main.000-000.CleanEndRegions.after.mir
rustc.main.000-000.CleanEndRegions.before.mir
rustc.main.002-000.NoLandingPads.after.mir
rustc.main.002-000.NoLandingPads.before.mir
rustc.main.002-006.NoLandingPads.after.mir
rustc.main.002-006.NoLandingPads.before.mir

(Here, the main-promoted[0] files refer to the MIR for "promoted constants" that appeared within the main function.)

TODO: anything else?

Constant Evaluation

Constant evaluation is the process of computing values at compile time. For a specific item (constant/static/array length) this happens after the MIR for the item is borrow-checked and optimized. In many cases trying to const evaluate an item will trigger the computation of its MIR for the first time.

Prominent examples are

  • The initializer of a static
  • Array length
    • needs to be known to reserve stack or heap space
  • Enum variant discriminants
    • needs to be known to prevent two variants from having the same discriminant
  • Patterns
    • need to be known to check for overlapping patterns

Additionally constant evaluation can be used to reduce the workload or binary size at runtime by precomputing complex operations at compiletime and only storing the result.

Constant evaluation can be done by calling the const_eval query of TyCtxt.

The const_eval query takes a ParamEnv of environment in which the constant is evaluated (e.g. the function within which the constant is used) and a GlobalId. The GlobalId is made up of an Instance referring to a constant or static or of an Instance of a function and an index into the function's Promoted table.

Constant evaluation returns a Result with either the error, or the simplest representation of the constant. "simplest" meaning if it is representable as an integer or fat pointer, it will directly yield the value (via ConstValue::Scalar or ConstValue::ScalarPair), instead of referring to the miri virtual memory allocation (via ConstValue::ByRef). This means that the const_eval function cannot be used to create miri-pointers to the evaluated constant or static. If you need that, you need to directly work with the functions in src/librustc_mir/const_eval.rs.

Miri

Miri (MIR Interpreter) is a virtual machine for executing MIR without compiling to machine code. It is usually invoked via tcx.const_eval.

If you start out with a constant


#![allow(unused_variables)]
fn main() {
const FOO: usize = 1 << 12;
}

rustc doesn't actually invoke anything until the constant is either used or placed into metadata.

Once you have a use-site like

type Foo = [u8; FOO - 42];

The compiler needs to figure out the length of the array before being able to create items that use the type (locals, constants, function arguments, ...).

To obtain the (in this case empty) parameter environment, one can call let param_env = tcx.param_env(length_def_id);. The GlobalId needed is

let gid = GlobalId {
    promoted: None,
    instance: Instance::mono(length_def_id),
};

Invoking tcx.const_eval(param_env.and(gid)) will now trigger the creation of the MIR of the array length expression. The MIR will look something like this:

const Foo::{{initializer}}: usize = {
    let mut _0: usize;                   // return pointer
    let mut _1: (usize, bool);

    bb0: {
        _1 = CheckedSub(const Unevaluated(FOO, Slice([])), const 42usize);
        assert(!(_1.1: bool), "attempt to subtract with overflow") -> bb1;
    }

    bb1: {
        _0 = (_1.0: usize);
        return;
    }
}

Before the evaluation, a virtual memory location (in this case essentially a vec![u8; 4] or vec![u8; 8]) is created for storing the evaluation result.

At the start of the evaluation, _0 and _1 are Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef)). This is quite a mouthful: Operand can represent either data stored somewhere in the interpreter memory (Operand::Indirect), or (as an optimization) immediate data stored in-line. And Immediate can either be a single (potentially uninitialized) scalar value (integer or thin pointer), or a pair of two of them. In our case, the single scalar value is not (yet) initialized.

When the initialization of _1 is invoked, the value of the FOO constant is required, and triggers another call to tcx.const_eval, which will not be shown here. If the evaluation of FOO is successful, 42 will be subtracted from its value 4096 and the result stored in _1 as Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. }, Scalar::Raw { data: 0, .. }). The first part of the pair is the computed value, the second part is a bool that's true if an overflow happened. A Scalar::Raw also stores the size (in bytes) of this scalar value; we are eliding that here.

The next statement asserts that said boolean is 0. In case the assertion fails, its error message is used for reporting a compile-time error.

Since it does not fail, Operand::Immediate(Immediate::Scalar(Scalar::Raw { data: 4054, .. })) is stored in the virtual memory was allocated before the evaluation. _0 always refers to that location directly.

After the evaluation is done, the return value is converted from Operand to ConstValue by op_to_const: the former representation is geared towards what is needed during cost evaluation, while ConstValue is shaped by the needs of the remaining parts of the compiler that consume the results of const evaluation. As part of this conversion, for types with scalar values, even if the resulting Operand is Indirect, it will return an immediate ConstValue::Scalar(computed_value) (instead of the usual ConstValue::ByRef). This makes using the result much more efficient and also more convenient, as no further queries need to be executed in order to get at something as simple as a usize.

Future evaluations of the same constants will not actually invoke Miri, but just use the cached result.

Datastructures

Miri's outside-facing datastructures can be found in librustc_middle/mir/interpret. This is mainly the error enum and the ConstValue and Scalar types. A ConstValue can be either Scalar (a single Scalar, i.e., integer or thin pointer), Slice (to represent byte slices and strings, as needed for pattern matching) or ByRef, which is used for anything else and refers to a virtual allocation. These allocations can be accessed via the methods on tcx.interpret_interner. A Scalar is either some Raw integer or a pointer; see the next section for more on that.

If you are expecting a numeric result, you can use eval_usize (panics on anything that can't be representad as a u64) or try_eval_usize which results in an Option<u64> yielding the Scalar if possible.

Memory

To support any kind of pointers, Miri needs to have a "virtual memory" that the pointers can point to. This is implemented in the Memory type. In the simplest model, every global variable, stack variable and every dynamic allocation corresponds to an Allocation in that memory. (Actually using an allocation for every MIR stack variable would be very inefficient; that's why we have Operand::Immediate for stack variables that are both small and never have their address taken. But that is purely an optimization.)

Such an Allocation is basically just a sequence of u8 storing the value of each byte in this allocation. (Plus some extra data, see below.) Every Allocation has a globally unique AllocId assigned in Memory. With that, a Pointer consists of a pair of an AllocId (indicating the allocation) and an offset into the allocation (indicating which byte of the allocation the pointer points to). It may seem odd that a Pointer is not just an integer address, but remember that during const evaluation, we cannot know at which actual integer address the allocation will end up -- so we use AllocId as symbolic base addresses, which means we need a separate offset. (As an aside, it turns out that pointers at run-time are more than just integers, too.)

These allocations exist so that references and raw pointers have something to point to. There is no global linear heap in which things are allocated, but each allocation (be it for a local variable, a static or a (future) heap allocation) gets its own little memory with exactly the required size. So if you have a pointer to an allocation for a local variable a, there is no possible (no matter how unsafe) operation that you can do that would ever change said pointer to a pointer to a different local variable b. Pointer arithmetic on a will only ever change its offset; the AllocId stays the same.

This, however, causes a problem when we want to store a Pointer into an Allocation: we cannot turn it into a sequence of u8 of the right length! AllocId and offset together are twice as big as a pointer "seems" to be. This is what the relocation field of Allocation is for: the byte offset of the Pointer gets stored as a bunch of u8, while its AllocId gets stored out-of-band. The two are reassembled when the Pointer is read from memory. The other bit of extra data an Allocation needs is undef_mask for keeping track of which of its bytes are initialized.

Global memory and exotic allocations

Memory exists only during the Miri evaluation; it gets destroyed when the final value of the constant is computed. In case that constant contains any pointers, those get "interned" and moved to a global "const eval memory" that is part of TyCtxt. These allocations stay around for the remaining computation and get serialized into the final output (so that dependent crates can use them).

Moreover, to also support function pointers, the global memory in TyCtxt can also contain "virtual allocations": instead of an Allocation, these contain an Instance. That allows a Pointer to point to either normal data or a function, which is needed to be able to evaluate casts from function pointers to raw pointers.

Finally, the GlobalAlloc type used in the global memory also contains a variant Static that points to a particular const or static item. This is needed to support circular statics, where we need to have a Pointer to a static for which we cannot yet have an Allocation as we do not know the bytes of its value.

Pointer values vs Pointer types

One common cause of confusion in Miri is that being a pointer value and having a pointer type are entirely independent properties. By "pointer value", we refer to a Scalar::Ptr containing a Pointer and thus pointing somewhere into Miri's virtual memory. This is in contrast to Scalar::Raw, which is just some concrete integer.

However, a variable of pointer or reference type, such as *const T or &T, does not have to have a pointer value: it could be obtaining by casting or transmuting an integer to a pointer (currently that is hard to do in const eval, but eventually transmute will be stable as a const fn). And similarly, when casting or transmuting a reference to some actual allocation to an integer, we end up with a pointer value (Scalar::Ptr) at integer type (usize). This is a problem because we cannot meaningfully perform integer operations such as division on pointer values.

Interpretation

Although the main entry point to constant evaluation is the tcx.const_eval query, there are additional functions in librustc_mir/const_eval.rs that allow accessing the fields of a ConstValue (ByRef or otherwise). You should never have to access an Allocation directly except for translating it to the compilation target (at the moment just LLVM).

Miri starts by creating a virtual stack frame for the current constant that is being evaluated. There's essentially no difference between a constant and a function with no arguments, except that constants do not allow local (named) variables at the time of writing this guide.

A stack frame is defined by the Frame type in librustc_mir/interpret/eval_context.rs and contains all the local variables memory (None at the start of evaluation). Each frame refers to the evaluation of either the root constant or subsequent calls to const fn. The evaluation of another constant simply calls tcx.const_eval, which produces an entirely new and independent stack frame.

The frames are just a Vec<Frame>, there's no way to actually refer to a Frame's memory even if horrible shenanigans are done via unsafe code. The only memory that can be referred to are Allocations.

Miri now calls the step method (in librustc_mir/interpret/step.rs ) until it either returns an error or has no further statements to execute. Each statement will now initialize or modify the locals or the virtual memory referred to by a local. This might require evaluating other constants or statics, which just recursively invokes tcx.const_eval.

Monomorphization

As you probably know, rust has a very expressive type system that has extensive support for generic types. But of course, assembly is not generic, so we need to figure out the concrete types of all the generics before the code can execute.

Different languages handle this problem differently. For example, in some languages, such as Java, we may not know the most precise type of value until runtime. In the case of Java, this is ok because (almost) all variables are reference values anyway (i.e. pointers to a stack allocated object). This flexibility comes at the cost of performance, since all accesses to an object must dereference a pointer.

Rust takes a different approach: it monomorphizes all generic types. This means that compiler stamps out a different copy of the code of a generic function for each concrete type needed. For example, if I use a Vec<u64> and a Vec<String> in my code, then the generated binary will have two copies of the generated code for Vec: one for Vec<u64> and another for Vec<String>. The result is fast programs, but it comes at the cost of compile time (creating all those copies can take a while) and binary size (all those copies might take a lot of space).

Monomorphization is the first step in the backend of the rust compiler.

Collection

First, we need to figure out what concrete types we need for all the generic things in our program. This is called collection, and the code that does this is called the monomorphization collector.

Take this example:

fn banana() {
   peach::<u64>();
}

fn main() {
    banana();
}

The monomorphization collector will give you a list of [main, banana, peach::<u64>]. These are the functions that will have machine code generated for them. Collector will also add things like statics to that list.

See the collector rustdocs for more info.

The monomorphization collector is run just before MIR lowering and codegen. rustc_codegen_ssa::base::codegen_crate calls the collect_and_partition_mono_items query, which does monomorphization collection and then partitions them into codegen units.

Polymorphization

As mentioned above, monomorphization produces fast code, but it comes at the cost of compile time and binary size. MIR optimizations can help a bit with this. Another optimization currently under development is called polymorphization.

The general idea is that often we can share some code between monomorphized copies of code. More precisely, if a MIR block is not dependent on a type parameter, it may not need to be monomorphized into many copies. Consider the following example:


#![allow(unused_variables)]
fn main() {
pub fn f() {
    g::<bool>();
    g::<usize>();
}

fn g<T>() -> usize {
    let n = 1;
    let closure = || n;
    closure()
}
}

In this case, we would currently collect [f, g::<bool>, g::<usize>, g::<bool>::{{closure}}, g::<usize>::{{closure}}], but notice that the two closures would be identical -- they don't depend on the type parameter T of function g. So we only need to emit one copy of the closure.

For more information, see this thread on github.

Lowering MIR to a Codegen IR

Now that we have a list of symbols to generate from the collector, we need to generate some sort of codegen IR. In this chapter, we will assume LLVM IR, since that's what rustc usually uses. The actual monomorphization is performed as we go, while we do the translation.

Recall that the backend is started by rustc_codegen_ssa::base::codegen_crate. Eventually, this reaches rustc_codegen_ssa::mir::codegen_mir, which does the lowering from MIR to LLVM IR.

The code is split into modules which handle particular MIR primitives:

Before a function is translated a number of simple and primitive analysis passes will run to help us generate simpler and more efficient LLVM IR. An example of such an analysis pass would be figuring out which variables are SSA-like, so that we can translate them to SSA directly rather than relying on LLVM's mem2reg for those variables. The analysis can be found in rustc_codegen_ssa::mir::analyze.

Usually a single MIR basic block will map to a LLVM basic block, with very few exceptions: intrinsic or function calls and less basic MIR statements like assert can result in multiple basic blocks. This is a perfect lede into the non-portable LLVM-specific part of the code generation. Intrinsic generation is fairly easy to understand as it involves very few abstraction levels in between and can be found in rustc_codegen_llvm::intrinsic.

Everything else will use the builder interface. This is the code that gets called in the librustc_codegen_ssa::mir::* modules discussed above.

TODO: discuss how constants are generated

Code generation

Code generation or "codegen" is the part of the compiler that actually generates an executable binary. Usually, rustc uses LLVM for code generation; there is also support for Cranelift. The key is that rustc doesn't implement codegen itself. It's worth noting, though, that in the rust source code, many parts of the backend have codegen in their names (there are no hard boundaries).

NOTE: If you are looking for hints on how to debug code generation bugs, please see this section of the debugging chapter.

What is LLVM?

LLVM is "a collection of modular and reusable compiler and toolchain technologies". In particular, the LLVM project contains a pluggable compiler backend (also called "LLVM"), which is used by many compiler projects, including the clang C compiler and our beloved rustc.

LLVM takes input in the form of LLVM IR. It is basically assembly code with additional low-level types and annotations added. These annotations are helpful for doing optimizations on the LLVM IR and outputted machine code. The end result of all this is (at long last) something executable (e.g. an ELF object, an EXE, or wasm).

There are a few benefits to using LLVM:

  • We don't have to write a whole compiler backend. This reduces implementation and maintenance burden.
  • We benefit from the large suite of advanced optimizations that the LLVM project has been collecting.
  • We can automatically compile Rust to any of the platforms for which LLVM has support. For example, as soon as LLVM added support for wasm, voila! rustc, clang, and a bunch of other languages were able to compile to wasm! (Well, there was some extra stuff to be done, but we were 90% there anyway).
  • We and other compiler projects benefit from each other. For example, when the Spectre and Meltdown security vulnerabilities were discovered, only LLVM needed to be patched.

Running LLVM, linking, and metadata generation

Once LLVM IR for all of the functions and statics, etc is built, it is time to start running LLVM and its optimization passes. LLVM IR is grouped into "modules". Multiple "modules" can be codegened at the same time to aid in multi-core utilization. These "modules" are what we refer to as codegen units. These units were established way back during monomorphization collection phase.

Once LLVM produces objects from these modules, these objects are passed to the linker along with, optionally, the metadata object and an archive or an executable is produced.

It is not necessarily the codegen phase described above that runs the optimizations. With certain kinds of LTO, the optimization might happen at the linking time instead. It is also possible for some optimizations to happen before objects are passed on to the linker and some to happen during the linking.

This all happens towards the very end of compilation. The code for this can be found in librustc_codegen_ssa::back and librustc_codegen_llvm::back. Sadly, this piece of code is not really well-separated into LLVM-dependent code; the rustc_codegen_ssa contains a fair amount of code specific to the LLVM backend.

Once these components are done with their work you end up with a number of files in your filesystem corresponding to the outputs you have requested.

Updating LLVM

The Rust compiler uses LLVM as its primary codegen backend today, and naturally we want to at least occasionally update this dependency! Currently we do not have a strict policy about when to update LLVM or what it can be updated to, but a few guidelines are applied:

  • We try to always support the latest released version of LLVM
  • We try to support the "last few" versions of LLVM (how many is changing over time)
  • We allow moving to arbitrary commits during development.
  • Strongly prefer to upstream all patches to LLVM before including them in rustc.

This policy may change over time (or may actually start to exist as a formal policy!), but for now these are rough guidelines!

Why update LLVM?

There are a few reasons nowadays that we want to update LLVM in one way or another:

  • A bug could have been fixed! Often we find bugs in the compiler and fix them upstream in LLVM. We'll want to pull fixes back to the compiler itself as they're merged upstream.

  • A new feature may be available in LLVM that we want to use in rustc, but we don't want to wait for a full LLVM release to test it out.

  • LLVM itself may have a new release and we'd like to update to this LLVM release.

Each of these reasons has a different strategy for updating LLVM, and we'll go over them in detail here.

Bugfix Updates

For updates of LLVM that are to fix a small bug, we cherry-pick the bugfix to the branch we're already using. The steps for this are:

  1. Make sure the bugfix is in upstream LLVM.
  2. Identify the branch that rustc is currently using. The src/llvm-project submodule is always pinned to a branch of the rust-lang/llvm-project repository.
  3. Fork the rust-lang/llvm-project repository
  4. Check out the appropriate branch (typically named rustc/a.b-yyyy-mm-dd)
  5. Cherry-pick the upstream commit onto the branch
  6. Push this branch to your fork
  7. Send a Pull Request to rust-lang/llvm-project to the same branch as before. Be sure to reference the Rust and/or LLVM issue that you're fixing in the PR description.
  8. Wait for the PR to be merged
  9. Send a PR to rust-lang/rust updating the src/llvm-project submodule with your bugfix. This can be done locally with git submodule update --remote src/llvm-project typically.
  10. Wait for PR to be merged

The tl;dr; is that we can cherry-pick bugfixes at any time and pull them back into the rust-lang/llvm-project branch that we're using, and getting it into the compiler is just updating the submodule via a PR!

Example PRs look like: #59089

Feature updates

Note that this is all information as applies to the current day in age. This process for updating LLVM changes with practically all LLVM updates, so this may be out of date!

Unlike bugfixes, updating to pick up a new feature of LLVM typically requires a lot more work. This is where we can't reasonably cherry-pick commits backwards so we need to do a full update. There's a lot of stuff to do here, so let's go through each in detail.

  1. Create a new branch in the rust-lang/llvm-project repository. This branch should be named rustc/a.b-yyyy-mm-dd where a.b is the current version number of LLVM in-tree at the time of the branch and the remaining part is today's date. Move this branch to the commit in LLVM that you'd like, which for this is probably the current LLVM HEAD.

  2. Apply Rust-specific patches to the llvm-project repository. All features and bugfixes are upstream, but there's often some weird build-related patches that don't make sense to upstream which we have on our repositories. These patches are around the latest patches in the rust-lang/llvm-project branch that rustc is currently using.

  3. Build the new LLVM in the rust repository. To do this you'll want to update the src/llvm-project repository to your branch and the revision you've created. It's also typically a good idea to update .gitmodules with the new branch name of the LLVM submodule. Make sure you've committed changes to src/llvm-project to ensure submodule updates aren't reverted. Some commands you should execute are:

    • ./x.py build src/llvm - test that LLVM still builds
    • ./x.py build src/tools/lld - same for LLD
    • ./x.py build - build the rest of rustc

    You'll likely need to update src/rustllvm/*.cpp to compile with updated LLVM bindings. Note that you should use #ifdef and such to ensure that the bindings still compile on older LLVM versions.

  4. Test for regressions across other platforms. LLVM often has at least one bug for non-tier-1 architectures, so it's good to do some more testing before sending this to bors! If you're low on resources you can send the PR as-is now to bors, though, and it'll get tested anyway.

    Ideally, build LLVM and test it on a few platforms:

    • Linux
    • OSX
    • Windows

    and afterwards run some docker containers that CI also does:

    • ./src/ci/docker/run.sh wasm32-unknown
    • ./src/ci/docker/run.sh arm-android
    • ./src/ci/docker/run.sh dist-various-1
    • ./src/ci/docker/run.sh dist-various-2
    • ./src/ci/docker/run.sh armhf-gnu
  5. Prepare a PR to rust-lang/rust. Work with maintainers of rust-lang/llvm-project to get your commit in a branch of that repository, and then you can send a PR to rust-lang/rust. You'll change at least src/llvm-project and will likely also change src/rustllvm/* as well.

For prior art, previous LLVM updates look like #55835 #47828 #62474 #62592. Note that sometimes it's easiest to land src/rustllvm/* compatibility as a PR before actually updating src/llvm-project. This way while you're working through LLVM issues others interested in trying out the new LLVM can benefit from work you've done to update the C++ bindings.

Caveats and gotchas

Ideally the above instructions are pretty smooth, but here's some caveats to keep in mind while going through them:

  • LLVM bugs are hard to find, don't hesitate to ask for help! Bisection is definitely your friend here (yes LLVM takes forever to build, yet bisection is still your friend)
  • If you've got general questions, @alexcrichton can help you out.
  • Creating branches is a privileged operation on GitHub, so you'll need someone with write access to create the branches for you most likely.

New LLVM Release Updates

Updating to a new release of LLVM is very similar to the "feature updates" section above. The release process for LLVM is often months-long though and we like to ensure compatibility ASAP. The main tweaks to the "feature updates" section above is generally around branch naming. The sequence of events typically looks like:

  1. LLVM announces that its latest release version has branched. This will show up as a branch in https://github.com/llvm/llvm-project typically named release/$N.x where $N is the version of LLVM that's being released.

  2. We then follow the "feature updates" section above to create a new branch of LLVM in our rust-lang/llvm-project repository. This follows the same naming convention of branches as usual, except that a.b is the new version. This update is eventually landed in the rust-lang/rust repository.

  3. Over the next few months, LLVM will continually push commits to its release/a.b branch. Often those are bug fixes we'd like to have as well. The merge process for that is to use git merge itself to merge LLVM's release/a.b branch with the branch created in step 2. This is typically done multiple times when necessary while LLVM's release branch is baking.

  4. LLVM then announces the release of version a.b.

  5. After LLVM's official release, we follow the "feature update" section again to create a new branch in the rust-lang/llvm-project repository, this time with a new date. The commit history should look much cleaner as just a few Rust-specific commits stacked on top of stock LLVM's release branch.

Debugging LLVM

NOTE: If you are looking for info about code generation, please see this chapter instead.

This section is about debugging compiler bugs in code generation (e.g. why the compiler generated some piece of code or crashed in LLVM). LLVM is a big project on its own that probably needs to have its own debugging document (not that I could find one). But here are some tips that are important in a rustc context:

As a general rule, compilers generate lots of information from analyzing code. Thus, a useful first step is usually to find a minimal example. One way to do this is to

  1. create a new crate that reproduces the issue (e.g. adding whatever crate is at fault as a dependency, and using it from there)

  2. minimize the crate by removing external dependencies; that is, moving everything relevant to the new crate

  3. further minimize the issue by making the code shorter (there are tools that help with this like creduce)

The official compilers (including nightlies) have LLVM assertions disabled, which means that LLVM assertion failures can show up as compiler crashes (not ICEs but "real" crashes) and other sorts of weird behavior. If you are encountering these, it is a good idea to try using a compiler with LLVM assertions enabled - either an "alt" nightly or a compiler you build yourself by setting [llvm] assertions=true in your config.toml - and see whether anything turns up.

The rustc build process builds the LLVM tools into ./build/<host-triple>/llvm/bin. They can be called directly.

The default rustc compilation pipeline has multiple codegen units, which is hard to replicate manually and means that LLVM is called multiple times in parallel. If you can get away with it (i.e. if it doesn't make your bug disappear), passing -C codegen-units=1 to rustc will make debugging easier.

For rustc to generate LLVM IR, you need to pass the --emit=llvm-ir flag. If you are building via cargo, use the RUSTFLAGS environment variable (e.g. RUSTFLAGS='--emit=llvm-ir'). This causes rustc to spit out LLVM IR into the target directory.

cargo llvm-ir [options] path spits out the LLVM IR for a particular function at path. (cargo install cargo-asm installs cargo asm and cargo llvm-ir). --build-type=debug emits code for debug builds. There are also other useful options. Also, debug info in LLVM IR can clutter the output a lot: RUSTFLAGS="-C debuginfo=0" is really useful.

RUSTFLAGS="-C save-temps" outputs LLVM bitcode (not the same as IR) at different stages during compilation, which is sometimes useful. One just needs to convert the bitcode files to .ll files using llvm-dis which should be in the target local compilation of rustc.

If you want to play with the optimization pipeline, you can use the opt tool from ./build/<host-triple>/llvm/bin/ with the LLVM IR emitted by rustc. Note that rustc emits different IR depending on whether -O is enabled, even without LLVM's optimizations, so if you want to play with the IR rustc emits, you should:

$ rustc +local my-file.rs --emit=llvm-ir -O -C no-prepopulate-passes \
    -C codegen-units=1
$ OPT=./build/$TRIPLE/llvm/bin/opt
$ $OPT -S -O2 < my-file.ll > my

If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which IR causes an optimization-time assertion to fail, or to see when LLVM performs a particular optimization, you can pass the rustc flag -C llvm-args=-print-after-all, and possibly add -C llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME (e.g. -C llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\ 7replace17hbe10ea2e7c809b0bE').

That produces a lot of output into standard error, so you'll want to pipe that to some file. Also, if you are using neither -filter-print-funcs nor -C codegen-units=1, then, because the multiple codegen units run in parallel, the printouts will mix together and you won't be able to read anything.

If you want just the IR for a specific function (say, you want to see why it causes an assertion or doesn't optimize correctly), you can use llvm-extract, e.g.

$ ./build/$TRIPLE/llvm/bin/llvm-extract \
    -func='_ZN11collections3str21_$LT$impl$u20$str$GT$7replace17hbe10ea2e7c809b0bE' \
    -S \
    < unextracted.ll \
    > extracted.ll

Getting help and asking questions

If you have some questions, head over to the rust-lang Zulip and specifically the #t-compiler/wg-llvm stream.

Compiler options to know and love

The -Chelp and -Zhelp compiler switches will list out a variety of interesting options you may find useful. Here are a few of the most common that pertain to LLVM development (some of them are employed in the tutorial above):

  • The --emit llvm-ir option emits a <filename>.ll file with LLVM IR in textual format
    • The --emit llvm-bc option emits in bytecode format (<filename>.bc)
  • Passing -Cllvm-args=<foo> allows passing pretty much all the options that tools like llc and opt would accept; e.g. -Cllvm-args=-print-before-all to print IR before every LLVM pass.
  • The -Cno-prepopulate-passes will avoid pre-populate the LLVM pass manager with a list of passes. This will allow you to view the LLVM IR that rustc generates, not the LLVM IR after optimizations.
  • The -Cpasses=val option allows you to supply a (space seprated) list of extra LLVM passes to run
  • The -Csave-temps option saves all temporary output files during compilation
  • The -Zprint-llvm-passes option will print out LLVM optimization passes being run
  • The -Ztime-llvm-passes option measures the time of each LLVM pass
  • The -Zverify-llvm-ir option will verify the LLVM IR for correctness
  • The -Zno-parallel-llvm will disable parallel compilation of distinct compilation units
  • The -Zllvm-time-trace option will output a Chrome profiler compatible JSON file which contains details and timings for LLVM passes.

Filing LLVM bug reports

When filing an LLVM bug report, you will probably want some sort of minimal working example that demonstrates the problem. The Godbolt compiler explorer is really helpful for this.

  1. Once you have some LLVM IR for the problematic code (see above), you can create a minimal working example with Godbolt. Go to gcc.godbolt.org.

  2. Choose LLVM-IR as programming language.

  3. Use llc to compile the IR to a particular target as is:

    • There are some useful flags: -mattr enables target features, -march= selects the target, -mcpu= selects the CPU, etc.
    • Commands like llc -march=help output all architectures available, which is useful because sometimes the Rust arch names and the LLVM names do not match.
    • If you have compiled rustc yourself somewhere, in the target directory you have binaries for llc, opt, etc.
  4. If you want to optimize the LLVM-IR, you can use opt to see how the LLVM optimizations transform it.

  5. Once you have a godbolt link demonstrating the issue, it is pretty easy to fill in an LLVM bug. Just visit bugs.llvm.org.

Porting bug fixes from LLVM

Once you've identified the bug as an LLVM bug, you will sometimes find that it has already been reported and fixed in LLVM, but we haven't gotten the fix yet (or perhaps you are familiar enough with LLVM to fix it yourself).

In that case, we can sometimes opt to port the fix for the bug directly to our own LLVM fork, so that rustc can use it more easily. Our fork of LLVM is maintained in rust-lang/llvm-project. Once you've landed the fix there, you'll also need to land a PR modifying our submodule commits -- ask around on Zulip for help.

Backend Agnostic Codegen

In the future, it would be nice to allow other codegen backends (e.g. Cranelift). To this end, librustc_codegen_ssa provides an abstract interface for all backends to implenent.

The following is a copy/paste of a README from the rust-lang/rust repo. Please submit a PR if it needs updating.

Refactoring of rustc_codegen_llvm

by Denis Merigoux, October 23rd 2018

State of the code before the refactoring

All the code related to the compilation of MIR into LLVM IR was contained inside the rustc_codegen_llvm crate. Here is the breakdown of the most important elements:

  • the back folder (7,800 LOC) implements the mechanisms for creating the different object files and archive through LLVM, but also the communication mechanisms for parallel code generation;
  • the debuginfo (3,200 LOC) folder contains all code that passes debug information down to LLVM;
  • the llvm (2,200 LOC) folder defines the FFI necessary to communicate with LLVM using the C++ API;
  • the mir (4,300 LOC) folder implements the actual lowering from MIR to LLVM IR;
  • the base.rs (1,300 LOC) file contains some helper functions but also the high-level code that launches the code generation and distributes the work.
  • the builder.rs (1,200 LOC) file contains all the functions generating individual LLVM IR instructions inside a basic block;
  • the common.rs (450 LOC) contains various helper functions and all the functions generating LLVM static values;
  • the type_.rs (300 LOC) defines most of the type translations to LLVM IR.

The goal of this refactoring is to separate inside this crate code that is specific to the LLVM from code that can be reused for other rustc backends. For instance, the mir folder is almost entirely backend-specific but it relies heavily on other parts of the crate. The separation of the code must not affect the logic of the code nor its performance.

For these reasons, the separation process involves two transformations that have to be done at the same time for the resulting code to compile :

  1. replace all the LLVM-specific types by generics inside function signatures and structure definitions;
  2. encapsulate all functions calling the LLVM FFI inside a set of traits that will define the interface between backend-agnostic code and the backend.

While the LLVM-specific code will be left in rustc_codegen_llvm, all the new traits and backend-agnostic code will be moved in rustc_codegen_ssa (name suggestion by @eddyb).

Generic types and structures

@irinagpopa started to parametrize the types of rustc_codegen_llvm by a generic Value type, implemented in LLVM by a reference &'ll Value. This work has been extended to all structures inside the mir folder and elsewhere, as well as for LLVM's BasicBlock and Type types.

The two most important structures for the LLVM codegen are CodegenCx and Builder. They are parametrized by multiple lifetime parameters and the type for Value.

struct CodegenCx<'ll, 'tcx> {
  /* ... */
}

struct Builder<'a, 'll, 'tcx> {
  cx: &'a CodegenCx<'ll, 'tcx>,
  /* ... */
}

CodegenCx is used to compile one codegen-unit that can contain multiple functions, whereas Builder is created to compile one basic block.

The code in rustc_codegen_llvm has to deal with multiple explicit lifetime parameters, that correspond to the following:

  • 'tcx is the longest lifetime, that corresponds to the original TyCtxt containing the program's information;
  • 'a is a short-lived reference of a CodegenCx or another object inside a struct;
  • 'll is the lifetime of references to LLVM objects such as Value or Type.

Although there are already many lifetime parameters in the code, making it generic uncovered situations where the borrow-checker was passing only due to the special nature of the LLVM objects manipulated (they are extern pointers). For instance, an additional lifetime parameter had to be added to LocalAnalyser in analyse.rs, leading to the definition:

struct LocalAnalyzer<'mir, 'a, 'tcx> {
  /* ... */
}

However, the two most important structures CodegenCx and Builder are not defined in the backend-agnostic code. Indeed, their content is highly specific of the backend and it makes more sense to leave their definition to the backend implementor than to allow just a narrow spot via a generic field for the backend's context.

Traits and interface

Because they have to be defined by the backend, CodegenCx and Builder will be the structures implementing all the traits defining the backend's interface. These traits are defined in the folder rustc_codegen_ssa/traits and all the backend-agnostic code is parametrized by them. For instance, let us explain how a function in base.rs is parametrized:

pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
    cx: &'a Bx::CodegenCx,
    instance: Instance<'tcx>
) {
    /* ... */
}

In this signature, we have the two lifetime parameters explained earlier and the master type Bx which satisfies the trait BuilderMethods corresponding to the interface satisfied by the Builder struct. The BuilderMethods defines an associated type Bx::CodegenCx that itself satisfies the CodegenMethods traits implemented by the struct CodegenCx.

On the trait side, here is an example with part of the definition of BuilderMethods in traits/builder.rs:

pub trait BuilderMethods<'a, 'tcx>:
    HasCodegen<'tcx>
    + DebugInfoBuilderMethods<'tcx>
    + ArgTypeMethods<'tcx>
    + AbiBuilderMethods<'tcx>
    + IntrinsicCallMethods<'tcx>
    + AsmBuilderMethods<'tcx>
{
    fn new_block<'b>(
        cx: &'a Self::CodegenCx,
        llfn: Self::Function,
        name: &'b str
    ) -> Self;
    /* ... */
    fn cond_br(
        &mut self,
        cond: Self::Value,
        then_llbb: Self::BasicBlock,
        else_llbb: Self::BasicBlock,
    );
    /* ... */
}

Finally, a master structure implementing the ExtraBackendMethods trait is used for high-level codegen-driving functions like codegen_crate in base.rs. For LLVM, it is the empty LlvmCodegenBackend. ExtraBackendMethods should be implemented by the same structure that implements the CodegenBackend defined in rustc_codegen_utils/codegen_backend.rs.

During the traitification process, certain functions have been converted from methods of a local structure to methods of CodegenCx or Builder and a corresponding self parameter has been added. Indeed, LLVM stores information internally that it can access when called through its API. This information does not show up in a Rust data structure carried around when these methods are called. However, when implementing a Rust backend for rustc, these methods will need information from CodegenCx, hence the additional parameter (unused in the LLVM implementation of the trait).

State of the code after the refactoring

The traits offer an API which is very similar to the API of LLVM. This is not the best solution since LLVM has a very special way of doing things: when addding another backend, the traits definition might be changed in order to offer more flexibility.

However, the current separation between backend-agnostic and LLVM-specific code has allowed the reuse of a significant part of the old rustc_codegen_llvm. Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the most important elements:

  • back folder: 3,800 (BA) vs 4,100 (LLVM);
  • mir folder: 4,400 (BA) vs 0 (LLVM);
  • base.rs: 1,100 (BA) vs 250 (LLVM);
  • builder.rs: 1,400 (BA) vs 0 (LLVM);
  • common.rs: 350 (BA) vs 350 (LLVM);

The debuginfo folder has been left almost untouched by the splitting and is specific to LLVM. Only its high-level features have been traitified.

The new traits folder has 1500 LOC only for trait definitions. Overall, the 27,000 LOC-sized old rustc_codegen_llvm code has been split into the new 18,500 LOC-sized new rustc_codegen_llvm and the 12,000 LOC-sized rustc_codegen_ssa. We can say that this refactoring allowed the reuse of approximately 10,000 LOC that would otherwise have had to be duplicated between the multiple backends of rustc.

The refactored version of rustc's backend introduced no regression over the test suite nor in performance benchmark, which is in coherence with the nature of the refactoring that used only compile-time parametricity (no trait objects).

Implicit Caller Location

Approved in RFC 2091, this feature enables the accurate reporting of caller location during panics initiated from functions like Option::unwrap, Result::expect, and Index::index. This feature adds the #[track_caller] attribute for functions, the caller_location intrinsic, and the stabilization-friendly core::panic::Location::caller wrapper.

Motivating Example

Take this example program:

fn main() {
    let foo: Option<()> = None;
    foo.unwrap(); // this should produce a useful panic message!
}

Prior to Rust 1.42, panics like this unwrap() printed a location in libcore:

$ rustc +1.41.0 example.rs; example.exe
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value',...core\macros\mod.rs:15:40
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

As of 1.42, we get a much more helpful message:

$ rustc +1.42.0 example.rs; example.exe 
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', example.rs:3:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

These error messages are achieved through a combination of changes to panic! internals to make use of core::panic::Location::caller and a number of #[track_caller] annotations in the standard library which propagate caller information.

Reading Caller Location

Previously, panic! made use of the file!(), line!(), and column!() macros to construct a Location pointing to where the panic occurred. These macros couldn't be given an overridden location, so functions which intentionally invoked panic! couldn't provide their own location, hiding the actual source of error.

Internally, panic!() now calls core::panic::Location::caller() to find out where it was expanded. This function is itself annotated with #[track_caller] and wraps the caller_location compiler intrinsic implemented by rustc. This intrinsic is easiest explained in terms of how it works in a const context.

Caller Location in const

There are two main phases to returning the caller location in a const context: walking up the stack to find the right location and allocating a const value to return.

Finding the right Location

In a const context we "walk up the stack" from where the intrinsic is invoked, stopping when we reach the first function call in the stack which does not have the attribute. This walk is in InterpCx::find_closest_untracked_caller_location().

Starting at the bottom, we iterate up over stack Frames in the InterpCx::stack, calling InstanceDef::requires_caller_location on the Instances from each Frame. We stop once we find one that returns false and return the span of the previous frame which was the "topmost" tracked function.

Allocating a static Location

Once we have a Span, we need to allocate static memory for the Location, which is performed by the TyCtxt::const_caller_location() query. Internally this calls InterpCx::alloc_caller_location() and results in a unique memory kind (MemoryKind::CallerLocation). The SSA codegen backend is able to emit code for these same values, and we use this code there as well.

Once our Location has been allocated in static memory, our intrinsic returns a reference to it.

Generating code for #[track_caller] callees

To generate efficient code for a tracked function and its callers, we need to provide the same behavior from the intrinsic's point of view without having a stack to walk up at runtime. We invert the approach: as we grow the stack down we pass an additional argument to calls of tracked functions rather than walking up the stack when the intrinsic is called. That additional argument can be returned wherever the caller location is queried.

The argument we append is of type &'static core::panic::Location<'staic>. A reference was chosen to avoid unnecessary copying because a pointer is a third the size of std::mem::size_of::<core::panic::Location>() == 24 at time of writing.

When generating a call to a function which is tracked, we pass the location argument the value of FunctionCx::get_caller_location.

If the calling function is tracked, get_caller_location returns the local in FunctionCx::caller_location which was populated by the current caller's caller. In these cases the intrinsic "returns" a reference which was actually provided in an argument to its caller.

If the calling function is not tracked, get_caller_location allocates a Location static from the current Span and returns a reference to that.

We more efficiently achieve the same behavior as a loop starting from the bottom by passing a single &Location value through the caller_location fields of multiple FunctionCxs as we grow the stack downward.

Codegen examples

What does this transformation look like in practice? Take this example which uses the new feature:

#![feature(track_caller)]
use std::panic::Location;

#[track_caller]
fn print_caller() {
    println!("called from {}", Location::caller());
}

fn main() {
    print_caller();
}

Here print_caller() appears to take no arguments, but we compile it to something like this:

#![feature(panic_internals)]
use std::panic::Location;

fn print_caller(caller: &Location) {
    println!("called from {}", caller);
}

fn main() {
    print_caller(&Location::internal_constructor(file!(), line!(), column!()));
}

Dynamic Dispatch

In codegen contexts we have to modify the callee ABI to pass this information down the stack, but the attribute expressly does not modify the type of the function. The ABI change must be transparent to type checking and remain sound in all uses.

Direct calls to tracked functions will always know the full codegen flags for the callee and can generate appropriate code. Indirect callers won't have this information and it's not encoded in the type of the function pointer they call, so we generate a ReifyShim around the function whenever taking a pointer to it. This shim isn't able to report the actual location of the indirect call (the function's definition site is reported instead), but it prevents miscompilation and is probably the best we can do without modifying fully-stabilized type signatures.

Note: We always emit a ReifyShim when taking a pointer to a tracked function. While the constraint here is imposed by codegen contexts, we don't know during MIR construction of the shim whether we'll be called in a const context (safe to ignore shim) or in a codegen context (unsafe to ignore shim). Even if we did know, the results from const and codegen contexts must agree.

The Attribute

The #[track_caller] attribute is checked alongside other codegen attributes to ensure the function:

  • has the "Rust" ABI (as opposed to e.g., "C")
  • is not a foreign import (e.g., in an extern {...} block)
  • is not a closure
  • is not #[naked]

If the use is valid, we set CodegenFnAttrsFlags::TRACK_CALLER. This flag influences the return value of InstanceDef::requires_caller_location which is in turn used in both const and codegen contexts to ensure correct propagation.

Traits

When applied to trait method implementations, the attribute works as it does for regular functions.

When applied to a trait method prototype, the attribute applies to all implementations of the method. When applied to a default trait method implementation, the attribute takes effect on that implementation and any overrides.

Examples:

#![feature(track_caller)]

macro_rules! assert_tracked {
    () => {{
        let location = std::panic::Location::caller();
        assert_eq!(location.file(), file!());
        assert_ne!(location.line(), line!(), "line should be outside this fn");
        println!("called at {}", location);
    }};
}

trait TrackedFourWays {
    /// All implementations inherit `#[track_caller]`.
    #[track_caller]
    fn blanket_tracked();

    /// Implementors can annotate themselves.
    fn local_tracked();

    /// This implementation is tracked (overrides are too).
    #[track_caller]
    fn default_tracked() {
        assert_tracked!();
    }

    /// Overrides of this implementation are tracked (it is too). 
    #[track_caller]
    fn default_tracked_to_override() {
        assert_tracked!();
    }
}

/// This impl uses the default impl for `default_tracked` and provides its own for 
/// `default_tracked_to_override`.
impl TrackedFourWays for () {
    fn blanket_tracked() {
        assert_tracked!();
    }

    #[track_caller]
    fn local_tracked() {
        assert_tracked!();
    }

    fn default_tracked_to_override() {
        assert_tracked!();
    }
}

fn main() {
    <() as TrackedFourWays>::blanket_tracked();
    <() as TrackedFourWays>::default_tracked();
    <() as TrackedFourWays>::default_tracked_to_override();
    <() as TrackedFourWays>::local_tracked();
}

Background/History

Broadly speaking, this feature's goal is to improve common Rust error messages without breaking stability guarantees, requiring modifications to end-user source, relying on platform-specific debug-info, or preventing user-defined types from having the same error-reporting benefits.

Improving the output of these panics has been a goal of proposals since at least mid-2016 (see non-viable alternatives in the approved RFC for details). It took two more years until RFC 2091 was approved, much of its rationale for this feature's design having been discovered through the discussion around several earlier proposals.

The design in the original RFC limited itself to implementations that could be done inside the compiler at the time without significant refactoring. However in the year and a half between the approval of the RFC and the actual implementation work, a revised design was proposed and written up on the tracking issue. During the course of implementing that, it was also discovered that an implementation was possible without modifying the number of arguments in a function's MIR, which would simplify later stages and unlock use in traits.

Because the RFC's implementation strategy could not readily support traits, the semantics were not originally specified. They have since been implemented following the path which seemed most correct to the author and reviewers.

Profile Guided Optimization

rustc supports doing profile-guided optimization (PGO). This chapter describes what PGO is and how the support for it is implemented in rustc.

What Is Profiled-Guided Optimization?

The basic concept of PGO is to collect data about the typical execution of a program (e.g. which branches it is likely to take) and then use this data to inform optimizations such as inlining, machine-code layout, register allocation, etc.

There are different ways of collecting data about a program's execution. One is to run the program inside a profiler (such as perf) and another is to create an instrumented binary, that is, a binary that has data collection built into it, and run that. The latter usually provides more accurate data.

How is PGO implemented in rustc?

rustc current PGO implementation relies entirely on LLVM. LLVM actually supports multiple forms of PGO:

  • Sampling-based PGO where an external profiling tool like perf is used to collect data about a program's execution.
  • GCOV-based profiling, where code coverage infrastructure is used to collect profiling information.
  • Front-end based instrumentation, where the compiler front-end (e.g. Clang) inserts instrumentation intrinsics into the LLVM IR it generates.
  • IR-level instrumentation, where LLVM inserts the instrumentation intrinsics itself during optimization passes.

rustc supports only the last approach, IR-level instrumentation, mainly because it is almost exclusively implemented in LLVM and needs little maintenance on the Rust side. Fortunately, it is also the most modern approach, yielding the best results.

So, we are dealing with an instrumentation-based approach, i.e. profiling data is generated by a specially instrumented version of the program that's being optimized. Instrumentation-based PGO has two components: a compile-time component and run-time component, and one needs to understand the overall workflow to see how they interact.

Overall Workflow

Generating a PGO-optimized program involves the following four steps:

  1. Compile the program with instrumentation enabled (e.g. rustc -Cprofile-generate main.rs)
  2. Run the instrumented program (e.g. ./main) which generates a default-<id>.profraw file
  3. Convert the .profraw file into a .profdata file using LLVM's llvm-profdata tool.
  4. Compile the program again, this time making use of the profiling data (e.g. rustc -Cprofile-use=merged.profdata main.rs)

Compile-Time Aspects

Depending on which step in the above workflow we are in, two different things can happen at compile time:

Create Binaries with Instrumentation

As mentioned above, the profiling instrumentation is added by LLVM. rustc instructs LLVM to do so by setting the appropriate flags when creating LLVM PassManagers:

	// `PMBR` is an `LLVMPassManagerBuilderRef`
    unwrap(PMBR)->EnablePGOInstrGen = true;
    // Instrumented binaries have a default output path for the `.profraw` file
    // hard-coded into them:
    unwrap(PMBR)->PGOInstrGen = PGOGenPath;

rustc also has to make sure that some of the symbols from LLVM's profiling runtime are not removed by marking the with the right export level.

Compile Binaries Where Optimizations Make Use Of Profiling Data

In the final step of the workflow described above, the program is compiled again, with the compiler using the gathered profiling data in order to drive optimization decisions. rustc again leaves most of the work to LLVM here, basically just telling the LLVM PassManagerBuilder where the profiling data can be found:

	unwrap(PMBR)->PGOInstrUse = PGOUsePath;

LLVM does the rest (e.g. setting branch weights, marking functions with cold or inlinehint, etc).

Runtime Aspects

Instrumentation-based approaches always also have a runtime component, i.e. once we have an instrumented program, that program needs to be run in order to generate profiling data, and collecting and persisting this profiling data needs some infrastructure in place.

In the case of LLVM, these runtime components are implemented in compiler-rt and statically linked into any instrumented binaries. The rustc version of this can be found in src/libprofiler_builtins which basically packs the C code from compiler-rt into a Rust crate.

In order for libprofiler_builtins to be built, profiler = true must be set in rustc's config.toml.

Testing PGO

Since the PGO workflow spans multiple compiler invocations most testing happens in run-make tests (the relevant tests have pgo in their name). There is also a codegen test that checks that some expected instrumentation artifacts show up in LLVM IR.

Additional Information

Clang's documentation contains a good overview on PGO in LLVM here: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Sanitizers Support

The rustc compiler contains support for following sanitizers:

  • AddressSanitizer a faster memory error detector. Can detect out-of-bounds access to heap, stack, and globals, use after free, use after return, double free, invalid free, memory leaks.
  • LeakSanitizer a run-time memory leak detector.
  • MemorySanitizer a detector of uninitialized reads.
  • ThreadSanitizer a fast data race detector.

How to use the sanitizers?

To enable a sanitizer compile with -Zsanitizer=... option, where value is one of address, leak, memory or thread. For more details how to use sanitizers please refer to the unstable book.

How are sanitizers implemented in rustc?

The implementation of sanitizers relies almost entirely on LLVM. The rustc is an integration point for LLVM compile time instrumentation passes and runtime libraries. Highlight of the most important aspects of the implementation:

  • The sanitizer runtime libraries are part of the compiler-rt project, and will be built on supported targets when enabled in config.toml:

    [build]
    sanitizers = true
    

    The runtimes are placed into target libdir.

  • During LLVM code generation, the functions intended for instrumentation are marked with appropriate LLVM attribute: SanitizeAddress, SanitizeMemory, or SanitizeThread. By default all functions are instrumented, but this behaviour can be changed with #[no_sanitize(...)].

  • The decision whether to perform instrumentation or not is possible only at a function granularity. In the cases were those decision differ between functions it might be necessary to inhibit inlining, both at MIR level and LLVM level.

  • The LLVM IR generated by rustc is instrumented by dedicated LLVM passes, different for each sanitizer. Instrumentation passes are invoked after optimization passes.

  • When producing an executable, the sanitizer specific runtime library is linked in. The libraries are searched for in target libdir relative to default system root, so that this process is not affected by sysroot overrides used for example by cargo -Zbuild-std functionality.

Additional Information

Debugging support in the Rust compiler

This document explains the state of debugging tools support in the Rust compiler (rustc). The document gives an overview of debugging tools like GDB, LLDB etc. and infrastructure around Rust compiler to debug Rust code. If you want to learn how to debug the Rust compiler itself, then you must see Debugging the Compiler page.

The material is gathered from YouTube video Tom Tromey discusses debugging support in rustc.

Preliminaries

Debuggers

According to Wikipedia

A debugger or debugging tool is a computer program that is used to test and debug other programs (the "target" program).

Writing a debugger from scratch for a language requires a lot of work, especially if debuggers have to be supported on various platforms. GDB and LLDB, however, can be extended to support debugging a language. This is the path that Rust has chosen. This document's main goal is to document the said debuggers support in Rust compiler.

DWARF

According to the DWARF standard website

DWARF is a debugging file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.

DWARF reader is a program that consumes the DWARF format and creates debugger compatible output. This program may live in the compiler itself. DWARF uses a data structure called Debugging Information Entry (DIE) which stores the information as "tags" to denote functions, variables etc., e.g., DW_TAG_variable, DW_TAG_pointer_type, DW_TAG_subprogram etc. You can also invent your own tags and attributes.

Supported debuggers

GDB

We have our own fork of GDB - https://github.com/rust-dev-tools/gdb

Rust expression parser

To be able to show debug output we need an expression parser. This (GDB) expression parser is written in Bison and is only a subset of Rust expressions. This means that this parser can parse only a subset of Rust expressions. GDB parser was written from scratch and has no relation to any other parser. For example, this parser is not related to Rustc's parser.

GDB has Rust like value and type output. It can print values and types in a way that look like Rust syntax in the output. Or when you print a type as ptype in GDB, it also looks like Rust source code. Checkout the documentation in the manual for GDB/Rust.

Parser extensions

Expression parser has a couple of extensions in it to facilitate features that you cannot do with Rust. Some limitations are listed in the manual for GDB/Rust. There is some special code in the DWARF reader in GDB to support the extensions.

A couple of examples of DWARF reader support needed are as follows -

  1. Enum: Needed for support for enum types. The Rustc writes the information about enum into DWARF and GDB reads the DWARF to understand where is the tag field or is there a tag field or is the tag slot shared with non-zero optimization etc.

  2. Dissect trait objects: DWARF extension where the trait object's description in the DWARF also points to a stub description of the corresponding vtable which in turn points to the concrete type for which this trait object exists. This means that you can do a print *object for that trait object, and GDB will understand how to find the correct type of the payload in the trait object.

TODO: Figure out if the following should be mentioned in the GDB-Rust document rather than this guide page so there is no duplication. This is regarding the following comments:

This comment by Tom

gdb's Rust extensions and limitations are documented in the gdb manual: https://sourceware.org/gdb/onlinedocs/gdb/Rust.html -- however, this neglects to mention that gdb convenience variables and registers follow the gdb $ convention, and that the Rust parser implements the gdb @ extension.

This question by Aman

@tromey do you think we should mention this part in the GDB-Rust document rather than this document so there is no duplication etc.?

Developer notes

  • This work is now upstream. Bugs can be reported in GDB Bugzilla.

LLDB

We have our own fork of LLDB - https://github.com/rust-lang/lldb

Fork of LLVM project - https://github.com/rust-lang/llvm-project

LLDB currently only works on macOS because of a dependency issue. This issue was easier to solve for macOS as compared to Linux. However, Tom has a possible solution which can enable us to ship LLDB everywhere.

Rust expression parser

This expression parser is written in C++. It is a type of Recursive Descent parser. Implements slightly less of the Rust language than GDB. LLDB has Rust like value and type output.

Parser extensions

There is some special code in the DWARF reader in LLDB to support the extensions. A couple of examples of DWARF reader support needed are as follows -

  1. Enum: Needed for support for enum types. The Rustc writes the information about enum into DWARF and LLDB reads the DWARF to understand where is the tag field or is there a tag field or is the tag slot shared with non-zero optimization etc. In other words, it has enum support as well.

Developer notes

  • None of the LLDB work is upstream. This rust-lang/lldb wiki page explains a few details.
  • The reason for forking LLDB is that LLDB recently removed all the other language plugins due to lack of maintenance.
  • LLDB has a plugin architecture but that does not work for language support.
  • LLDB is available via Rust build (rustup).
  • GDB generally works better on Linux.

DWARF and Rustc

DWARF is the standard way compilers generate debugging information that debuggers read. It is the debugging format on macOS and Linux. It is a multi-language, extensible format and is mostly good enough for Rust's purposes. Hence, the current implementation reuses DWARF's concepts. This is true even if some of the concepts in DWARF do not align with Rust semantically because generally there can be some kind of mapping between the two.

We have some DWARF extensions that the Rust compiler emits and the debuggers understand that are not in the DWARF standard.

  • Rust compiler will emit DWARF for a virtual table, and this vtable object will have a DW_AT_containing_type that points to the real type. This lets debuggers dissect a trait object pointer to correctly find the payload. E.g., here's such a DIE, from a test case in the gdb repository:

    <1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type)
       <1aa>   DW_AT_containing_type: <0x1b4>
       <1ae>   DW_AT_name        : (indirect string, offset: 0x23d): vtable
       <1b2>   DW_AT_byte_size   : 0
       <1b3>   DW_AT_alignment   : 8
    
  • The other extension is that the Rust compiler can emit a tagless discriminated union. See DWARF feature request for this item.

Current limitations of DWARF

  • Traits - require a bigger change than normal to DWARF, on how to represent Traits in DWARF.
  • DWARF provides no way to differentiate between Structs and Tuples. Rust compiler emits fields with __0 and debuggers look for a sequence of such names to overcome this limitation. For example, in this case the debugger would look at a field via x.__0 instead of x.0. This is resolved via the Rust parser in the debugger so now you can do x.0.

DWARF relies on debuggers to know some information about platform ABI. Rust does not do that all the time.

Developer notes

This section is from the talk about certain aspects of development.

What is missing

Shipping GDB in Rustup

Tracking issue: https://github.com/rust-lang/rust/issues/34457

Shipping GDB requires change to Rustup delivery system. To manage Rustup build size and times we need to build GDB separately, on its own and somehow provide the artifacts produced to be included in the final build. However, if we can ship GDB with rustup, it will simplify the development process by having compiler emit new debug info which can be readily consumed.

Main issue in achieving this is setting up dependencies. One such dependency is Python. That is why we have our own fork of GDB because one of the drivers is patched on Rust's side to check the correct version of Python (Python 2.7 in this case. Note: Python3 is not chosen for this purpose because Python's stable ABI is limited and is not sufficient for GDB's needs. See https://docs.python.org/3/c-api/stable.html).

This is to keep updates to debugger as fast as possible as we make changes to the debugging symbols. In essence, to ship the debugger as soon as new debugging info is added. GDB only releases every six months or so. However, the changes that are not related to Rust itself should ideally be first merged to upstream eventually.

Code signing for LLDB debug server on macOS

According to Wikipedia, System Integrity Protection is

System Integrity Protection (SIP, sometimes referred to as rootless) is a security feature of Apple's macOS operating system introduced in OS X El Capitan. It comprises a number of mechanisms that are enforced by the kernel. A centerpiece is the protection of system-owned files and directories against modifications by processes without a specific "entitlement", even when executed by the root user or a user with root privileges (sudo).

It prevents processes using ptrace syscall. If a process wants to use ptrace it has to be code signed. The certificate that signs it has to be trusted on your machine.

See Apple developer documentation for System Integrity Protection.

We may need to sign up with Apple and get the keys to do this signing. Tom has looked into if Mozilla cannot do this because it is at the maximum number of keys it is allowed to sign. Tom does not know if Mozilla could get more keys.

Alternatively, Tom suggests that maybe a Rust legal entity is needed to get the keys via Apple. This problem is not technical in nature. If we had such a key we could sign GDB as well and ship that.

DWARF and Traits

Rust traits are not emitted into DWARF at all. The impact of this is calling a method x.method() does not work as is. The reason being that method is implemented by a trait, as opposed to a type. That information is not present so finding trait methods is missing.

DWARF has a notion of interface types (possibly added for Java). Tom's idea was to use this interface type as traits.

DWARF only deals with concrete names, not the reference types. So, a given implementation of a trait for a type would be one of these interfaces (DW_tag_interface type). Also, the type for which it is implemented would describe all the interfaces this type implements. This requires a DWARF extension.

Issue on Github: https://github.com/rust-lang/rust/issues/33014

Typical process for a Debug Info change (LLVM)

LLVM has Debug Info (DI) builders. This is the primary thing that Rust calls into. This is why we need to change LLVM first because that is emitted first and not DWARF directly. This is a kind of metadata that you construct and hand-off to LLVM. For the Rustc/LLVM hand-off some LLVM DI builder methods are called to construct representation of a type.

The steps of this process are as follows -

  1. LLVM needs changing.

    LLVM does not emit Interface types at all, so this needs to be implemented in the LLVM first.

    Get sign off on LLVM maintainers that this is a good idea.

  2. Change the DWARF extension.

  3. Update the debuggers.

    Update DWARF readers, expression evaluators.

  4. Update Rust compiler.

    Change it to emit this new information.

Procedural macro stepping

A deeply profound question is that how do you actually debug a procedural macro? What is the location you emit for a macro expansion? Consider some of the following cases -

  • You can emit location of the invocation of the macro.
  • You can emit the location of the definition of the macro.
  • You can emit locations of the content of the macro.

RFC: https://github.com/rust-lang/rfcs/pull/2117

Focus is to let macros decide what to do. This can be achieved by having some kind of attribute that lets the macro tell the compiler where the line marker should be. This affects where you set the breakpoints and what happens when you step it.

Source file checksums in debug info

Both DWARF and CodeView (PDB) support embedding a cryptographic hash of each source file that contributed to the associated binary.

The cryptographic hash can be used by a debugger to verify that the source file matches the executable. If the source file does not match, the debugger can provide a warning to the user.

The hash can also be used to prove that a given source file has not been modified since it was used to compile an executable. Because MD5 and SHA1 both have demonstrated vulnerabilities, using SHA256 is recommended for this application.

The Rust compiler stores the hash for each source file in the corresponding SourceFile in the SourceMap. The hashes of input files to external crates are stored in rlib metadata.

A default hashing algorithm is set in the target specification. This allows the target to specify the best hash available, since not all targets support all hash algorithms.

The hashing algorithm for a target can also be overridden with the -Z source-file-checksum= command-line option.

DWARF 5

DWARF version 5 supports embedding an MD5 hash to validate the source file version in use. DWARF 5 - Section 6.2.4.1 opcode DW_LNCT_MD5

LLVM

LLVM IR supports MD5 and SHA1 (and SHA256 in LLVM 11+) source file checksums in the DIFile node.

LLVM DIFile documentation

Microsoft Visual C++ Compiler /ZH option

The MSVC compiler supports embedding MD5, SHA1, or SHA256 hashes in the PDB using the /ZH compiler option.

MSVC /ZH documentation

Clang

Clang always embeds an MD5 checksum, though this does not appear in documentation.

Future work

Name mangling changes

  • New demangler in libiberty (gcc source tree).
  • New demangler in LLVM or LLDB.

TODO: Check the location of the demangler source. Question on Github.

Reuse Rust compiler for expressions

This is an important idea because debuggers by and large do not try to implement type inference. You need to be much more explicit when you type into the debugger than your actual source code. So, you cannot just copy and paste an expression from your source code to debugger and expect the same answer but this would be nice. This can be helped by using compiler.

It is certainly doable but it is a large project. You certainly need a bridge to the debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang) have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC.

Both debuggers expression evaluation implement both a superset and a subset of Rust. They implement just the expression language but they also add some extensions like GDB has convenience variables. Therefore, if you are taking this route then you not only need to do this bridge but may have to add some mode to let the compiler understand some extensions.

Appendix B: Background topics

This section covers a numbers of common compiler terms that arise in this guide. We try to give the general definition while providing some Rust-specific context.

What is a control-flow graph?

A control-flow graph is a common term from compilers. If you've ever used a flow-chart, then the concept of a control-flow graph will be pretty familiar to you. It's a representation of your program that exposes the underlying control flow in a very clear way.

A control-flow graph is structured as a set of basic blocks connected by edges. The key idea of a basic block is that it is a set of statements that execute "together" – that is, whenever you branch to a basic block, you start at the first statement and then execute all the remainder. Only at the end of the block is there the possibility of branching to more than one place (in MIR, we call that final statement the terminator):

bb0: {
    statement0;
    statement1;
    statement2;
    ...
    terminator;
}

Many expressions that you are used to in Rust compile down to multiple basic blocks. For example, consider an if statement:

a = 1;
if some_variable {
    b = 1;
} else {
    c = 1;
}
d = 1;

This would compile into four basic blocks:

BB0: {
    a = 1;
    if some_variable { goto BB1 } else { goto BB2 }
}

BB1: {
    b = 1;
    goto BB3;
}

BB2: {
    c = 1;
    goto BB3;
}

BB3: {
    d = 1;
    ...;
}

When using a control-flow graph, a loop simply appears as a cycle in the graph, and the break keyword translates into a path out of that cycle.

What is a dataflow analysis?

Static Program Analysis by Anders Møller and Michael I. Schwartzbach is an incredible resource!

to be written

What is "universally quantified"? What about "existentially quantified"?

to be written

What is co- and contra-variance?

Check out the subtyping chapter from the Rust Nomicon.

See the variance chapter of this guide for more info on how the type checker handles variance.

What is a "free region" or a "free variable"? What about "bound region"?

Let's describe the concepts of free vs bound in terms of program variables, since that's the thing we're most familiar with.

  • Consider this expression, which creates a closure: |a, b| a + b. Here, the a and b in a + b refer to the arguments that the closure will be given when it is called. We say that the a and b there are bound to the closure, and that the closure signature |a, b| is a binder for the names a and b (because any references to a or b within refer to the variables that it introduces).
  • Consider this expression: a + b. In this expression, a and b refer to local variables that are defined outside of the expression. We say that those variables appear free in the expression (i.e., they are free, not bound (tied up)).

So there you have it: a variable "appears free" in some expression/statement/whatever if it refers to something defined outside of that expressions/statement/whatever. Equivalently, we can then refer to the "free variables" of an expression – which is just the set of variables that "appear free".

So what does this have to do with regions? Well, we can apply the analogous concept to type and regions. For example, in the type &'a u32, 'a appears free. But in the type for<'a> fn(&'a u32), it does not.

Further Reading About Compilers

Thanks to mem, scottmcm, and Levi on the official Discord for the recommendations, and to tinaun for posting a link to a twitter thread from Graydon Hoare which had some more recommendations!

Other sources: https://gcc.gnu.org/wiki/ListOfCompilerBooks

If you have other suggestions, please feel free to open an issue or PR.

Books

Courses

Wikis

Misc Papers and Blog Posts

Appendix C: Glossary

The compiler uses a number of...idiosyncratic abbreviations and things. This glossary attempts to list them and give you a few pointers for understanding them better.

TermMeaning
arena/arena allocation
An arena is a large memory buffer from which other memory allocations are made. This style of allocation is called arena allocation. See this chapter for more info.
AST
The abstract syntax tree produced by the rustc_ast crate; reflects user syntax very closely.
binder
A "binder" is a place where a variable or type is declared; for example, the <T> is a binder for the generic type parameter T in fn foo<T>(..), and |a| ... is a binder for the parameter a. See the background chapter for more.
BodyId <div id="body-id"/An identifier that refers to a specific body (definition of a function or constant) in the crate. See the HIR chapter for more.
bound variable
A "bound variable" is one that is declared within an expression/term. For example, the variable a is bound within the closure expression |a| a * 2. See the background chapter for more
codegen
The code to translate MIR into LLVM IR.
codegen unit
When we produce LLVM IR, we group the Rust code into a number of codegen units (sometimes abbreviated as CGUs). Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use. (see more)
completeness
A technical term in type theory, it means that every type-safe program also type-checks. Having both soundness and completeness is very hard, and usually soundness is more important. (see "soundness").
control-flow graph
A representation of the control-flow of a program; see the background chapter for more
CTFE
Short for Compile-Time Function Evaluation, this is the ability of the compiler to evaluate const fns at compile time. This is part of the compiler's constant evaluation system. (see more)
cx
We tend to use "cx" as an abbreviation for context. See also tcx, infcx, etc.
DAG
A directed acyclic graph is used during compilation to keep track of dependencies between queries. (see more)
data-flow analysis
A static analysis that figures out what properties are true at each point in the control-flow of a program; see the background chapter for more.
DefId
An index identifying a definition (see librustc_middle/hir/def_id.rs). Uniquely identifies a DefPath. See the HIR chapter for more.
Double pointer
A pointer with additional metadata. See "fat pointer" for more.
drop glue
(internal) compiler-generated instructions that handle calling the destructors (Drop) for data types.
DST
Short for Dynamically-Sized Type, this is a type for which the compiler cannot statically know the size in memory (e.g. str or [u8]). Such types don't implement Sized and cannot be allocated on the stack. They can only occur as the last field in a struct. They can only be used behind a pointer (e.g. &str or &[u8]).
early-bound lifetime
A lifetime region that is substituted at its definition site. Bound in an item's Generics and substituted using a Substs. Contrast with late-bound lifetime. (see more)
empty type
see "uninhabited type".
Fat pointer
A two word value carrying the address of some value, along with some further information necessary to put the value to use. Rust includes two kinds of "fat pointers": references to slices, and trait objects. A reference to a slice carries the starting address of the slice and its length. A trait object carries a value's address and a pointer to the trait's implementation appropriate to that value. "Fat pointers" are also known as "wide pointers", and "double pointers".
free variable
A "free variable" is one that is not bound within an expression or term; see the background chapter for more
generics
The set of generic type parameters defined on a type or item.
HIR
The High-level IR, created by lowering and desugaring the AST. (see more)
HirId
Identifies a particular node in the HIR by combining a def-id with an "intra-definition offset". See the HIR chapter for more.
HIR Map
The HIR map, accessible via tcx.hir, allows you to quickly navigate the HIR and convert between various forms of identifiers.
ICE
Short for internal compiler error, this is when the compiler crashes.
ICH
Short for incremental compilation hash, these are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled.
infcx
The inference context (see librustc_middle/infer)
inference variable
When doing type or region inference, an "inference variable" is a kind of special type/region that represents what you are trying to infer. Think of X in algebra. For example, if we are trying to infer the type of a variable in a program, we create an inference variable to represent that unknown type.
intern
Interning refers to storing certain frequently-used constant data, such as strings, and then referring to the data by an identifier (e.g. a Symbol) rather than the data itself, to reduce memory usage and number of allocations. See this chapter for more info.
IR
Short for Intermediate Representation, a general term in compilers. During compilation, the code is transformed from raw source (ASCII text) to various IRs. In Rust, these are primarily HIR, MIR, and LLVM IR. Each IR is well-suited for some set of computations. For example, MIR is well-suited for the borrow checker, and LLVM IR is well-suited for codegen because LLVM accepts it.
IRLO
IRLO or irlo is sometimes used as an abbreviation for internals.rust-lang.org.
item
A kind of "definition" in the language, such as a static, const, use statement, module, struct, etc. Concretely, this corresponds to the Item type.
lang item
Items that represent concepts intrinsic to the language itself, such as special built-in traits like Sync and Send; or traits representing operations such as Add; or functions that are called by the compiler. (see more)
late-bound lifetime
A lifetime region that is substituted at its call site. Bound in a HRTB and substituted by specific functions in the compiler, such as liberate_late_bound_regions. Contrast with early-bound lifetime. (see more)
local crate
The crate currently being compiled.
LTO
Short for Link-Time Optimizations, this is a set of optimizations offered by LLVM that occur just before the final binary is linked. These include optimizations like removing functions that are never used in the final program, for example. ThinLTO is a variant of LTO that aims to be a bit more scalable and efficient, but possibly sacrifices some optimizations. You may also read issues in the Rust repo about "FatLTO", which is the loving nickname given to non-Thin LTO. LLVM documentation: here and here.
LLVM
(actually not an acronym :P) an open-source compiler backend. It accepts LLVM IR and outputs native binaries. Various languages (e.g. Rust) can then implement a compiler front-end that outputs LLVM IR and use LLVM to compile to all the platforms LLVM supports.
memoization
The process of storing the results of (pure) computations (such as pure function calls) to avoid having to repeat them in the future. This is typically a trade-off between execution speed and memory usage.
MIR
The Mid-level IR that is created after type-checking for use by borrowck and codegen. (see more)
miri
An interpreter for MIR used for constant evaluation. (see more)
monomorphization
The process of taking generic implementations of types and functions and instantiating them with concrete types. For example, in the code we might have Vec<T>, but in the final executable, we will have a copy of the Vec code for every concrete type used in the program (e.g. a copy for Vec<usize>, a copy for Vec<MyStruct>, etc).
normalize
A general term for converting to a more canonical form, but in the case of rustc typically refers to associated type normalization.
newtype
A wrapper around some other type (e.g., struct Foo(T) is a "newtype" for T). This is commonly used in Rust to give a stronger type for indices.
NLL
Short for non-lexical lifetimes, this is an extension to Rust's borrowing system to make it be based on the control-flow graph.
node-id or NodeId
An index identifying a particular node in the AST or HIR; gradually being phased out and replaced with HirId. See the HIR chapter for more.
obligation
Something that must be proven by the trait system. (see more)
placeholder
NOTE: skolemization is deprecated by placeholder a way of handling subtyping around "for-all" types (e.g., for<'a> fn(&'a u32)) as well as solving higher-ranked trait bounds (e.g., for<'a> T: Trait<'a>). See the chapter on placeholder and universes for more details.
point
Used in the NLL analysis to refer to some particular location in the MIR; typically used to refer to a node in the control-flow graph.
polymorphize
An optimization that avoids unnecessary monomorphisation. (see more)
projection
A general term for a "relative path", e.g. x.f is a "field projection", and T::Item is an "associated type projection".
promoted constants
Constants extracted from a function and lifted to static scope; see this section for more details.
provider
The function that executes a query. (see more)
quantified
In math or logic, existential and universal quantification are used to ask questions like "is there any type T for which is true?" or "is this true for all types T?"; see the background chapter for more.
query
Perhaps some sub-computation during compilation. (see more)
region
Another term for "lifetime" often used in the literature and in the borrow checker.
rib
A data structure in the name resolver that keeps track of a single scope for names. (see more)
sess
The compiler session, which stores global data used throughout compilation
side tables
Because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
sigil
Like a keyword but composed entirely of non-alphanumeric tokens. For example, & is a sigil for references.
soundness
A technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness").
span
A location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
substs
The substitutions for a given generic type or item (e.g. the i32, u32 in HashMap<i32, u32>).
tcx
The "typing context", main data structure of the compiler. (see more)
'tcx
The lifetime of the allocation arena. (see more)
token
The smallest unit of parsing. Tokens are produced after lexing (see more).
TLS
Thread-Local Storage. Variables may be defined so that each thread has its own copy (rather than all threads sharing the variable). This has some interactions with LLVM. Not all platforms support TLS.
trait reference
The name of a trait along with a suitable set of input type/lifetimes. (see more)
trans
The code to translate MIR into LLVM IR. Renamed to codegen.
ty
The internal representation of a type. (see more)
UFCS
Short for Universal Function Call Syntax, this is an unambiguous syntax for calling a method. (see more)
uninhabited type
A type which has no values. This is not the same as a ZST, which has exactly 1 value. An example of an uninhabited type is enum Foo {}, which has no variants, and so, can never be created. The compiler can treat code that deals with uninhabited types as dead code, since there is no such value to be manipulated. ! (the never type) is an uninhabited type. Uninhabited types are also called "empty types".
upvar
A variable captured by a closure from outside the closure.
variance
Determines how changes to a generic type/lifetime parameter affect subtyping; for example, if T is a subtype of U, then Vec<T> is a subtype Vec<U> because Vec is covariant in its generic parameter. See the background chapter for a more general explanation. See the variance chapter for an explanation of how type checking handles variance.
Wide pointer
A pointer with additional metadata. See "fat pointer" for more.
ZST
Zero-Sized Type. A type whose values have size 0 bytes. Since 2^0 = 1, such types can have exactly one value. For example, () (unit) is a ZST. struct Foo; is also a ZST. The compiler can do some nice optimizations around ZSTs.

Appendix D: Code Index

rustc has a lot of important data structures. This is an attempt to give some guidance on where to learn more about some of the key data structures of the compiler.

ItemKindShort descriptionChapterDeclaration
BodyIdstructOne of four types of HIR node identifiersIdentifiers in the HIRsrc/librustc_hir/hir.rs
CompilerstructRepresents a compiler session and can be used to drive a compilation.The Rustc Driver and Interfacesrc/librustc_interface/interface.rs
ast::CratestructA syntax-level representation of a parsed crateThe parsersrc/librustc_ast/ast.rs
rustc_hir::CratestructA more abstract, compiler-friendly form of a crate's ASTThe Hirsrc/librustc_hir/hir.rs
DefIdstructOne of four types of HIR node identifiersIdentifiers in the HIRsrc/librustc_hir/def_id.rs
DiagnosticBuilderstructA struct for building up compiler diagnostics, such as errors or lintsEmitting Diagnosticssrc/librustc_errors/diagnostic_builder.rs
DocContextstructA state container used by rustdoc when crawling through a crate to gather its documentationRustdocsrc/librustdoc/core.rs
HirIdstructOne of four types of HIR node identifiersIdentifiers in the HIRsrc/librustc_hir/hir_id.rs
NodeIdstructOne of four types of HIR node identifiers. Being phased outIdentifiers in the HIRsrc/librustc_ast/ast.rs
PstructAn owned immutable smart pointer. By contrast, &T is not owned, and Box<T> is not immutable.Nonesrc/librustc_ast/ptr.rs
ParamEnvstructInformation about generic parameters or Self, useful for working with associated or generic itemsParameter Environmentsrc/librustc_middle/ty/mod.rs
ParseSessstructThis struct contains information about a parsing sessionThe parsersrc/librustc_session/parse/parse.rs
QuerystructRepresents the result of query to the Compiler interface and allows stealing, borrowing, and returning the results of compiler passes.The Rustc Driver and Interfacesrc/librustc_interface/queries.rs
RibstructRepresents a single scope of namesName resolutionsrc/librustc_resolve/lib.rs
SessionstructThe data associated with a compilation sessionThe parser, The Rustc Driver and Interfacesrc/librustc_middle/session/mod.html
SourceFilestructPart of the SourceMap. Maps AST nodes to their source code for a single source file. Was previously called FileMapThe parsersrc/librustc_span/lib.rs
SourceMapstructMaps AST nodes to their source code. It is composed of SourceFiles. Was previously called CodeMapThe parsersrc/librustc_span/source_map.rs
SpanstructA location in the user's source code, used for error reporting primarilyEmitting Diagnosticssrc/librustc_span/span_encoding.rs
StringReaderstructThis is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parserThe parsersrc/librustc_parse/lexer/mod.rs
rustc_ast::token_stream::TokenStreamstructAn abstract sequence of tokens, organized into TokenTreesThe parser, Macro expansionsrc/librustc_ast/tokenstream.rs
TraitDefstructThis struct contains a trait's definition with type informationThe ty modulessrc/librustc_middle/ty/trait_def.rs
TraitRefstructThe combination of a trait and its input types (e.g. P0: Trait<P1...Pn>)Trait Solving: Goals and Clauses, Trait Solving: Lowering implssrc/librustc_middle/ty/sty.rs
Ty<'tcx>structThis is the internal representation of a type used for type checkingType checkingsrc/librustc_middle/ty/mod.rs
TyCtxt<'tcx>structThe "typing context". This is the central data structure in the compiler. It is the context that you use to perform all manner of queriesThe ty modulessrc/librustc_middle/ty/context.rs

Compiler Lecture Series

These are videos where various experts explain different parts of the compiler:

Rust Bibliography

This is a reading list of material relevant to Rust. It includes prior research that has - at one time or another - influenced the design of Rust, as well as publications about Rust.

Type system

Concurrency

Others

Papers about Rust

Humor in Rust

What's a project without a sense of humor? And frankly some of these are enlightening?