A High-Level Overview of TeX

If you’re a beginner and want to just get LaTeX to work, read LaTeX for Beginners instead. I assume you know the basics of LaTeX and have a working installation.

This isn’t mainly about how the TeX programming language itself works. Rather, it’s moreso about the environment that (La)TeX works in, and why some of its more puzzling behaviors occur.

History

TeX and LaTeX are very old, so some of the decisions they make will be quite archaic.1 I strongly recommend that if you want to do anything nontrivial (like randomizing hints), just use another language to compile into an intermediate TeX build file before compiling it into a PDF.2

Just keep this in mind as I go through some of these concepts. A lot of these design choices may not make sense at first, but they will become more clear through the context of “computers used to be really weak.”

Compiler

The compiler does not care what you used to edit the plain text. It just cares what’s in the text file.

By the way, the LaTeX language is not unique to online editors like Overleaf.3 That would be like thinking that just because you bought an all-in-one desktop where the monitor is tied with the desktop, the monitor and the desktop are the same thing, or you have to buy them together. The worst conclusion to come to would be that all-in-ones are the only type of desktop.4

There are a ton of text editors, but not that many compilers. The most popular are pdfLaTeX, XeTeX, LuaTeX, and ConTeXt. (It’s mostly pdfLaTeX.) Compilers are usually shipped with a distribution, more on that below.

TeX Distributions

A TeX distribution is a mechanism for distributing a compiler and a set of tools to help it. Typically this “set of tools” is a standardized set of classes, packages, as well as compilers (like pdflatex, XeTeX, LuaTeX). If you are using TeX Live, this “standardized set” is CTAN.5 MikTeX also uses CTAN, but supposedly updates are slower since there’s a single maintainer. This doesn’t matter unless you want a bleeding edge TeX install.

It’s important to make the following observation: when you are importing a document class or package, you’re importing a .cls or .sty file. No matter what you put in there. For instance, calling \documentclass{article} inserts article.cls into your document. It’s just that you haven’t ever looked at article.cls before. If you want to find it, run kpsewhich article.cls.6

Every software distribution comes with a package manager, and TeX Live/MiKTeX are no exception. Every TeX distribution worth using has tlmgr as their package manager. For those of you coming from a Linux background, this terminology will be familiar. Otherwise, here is a quick explanation of what a package manager is: it is a standardized interface which helps you install a set of software from a repository. So instead of installing software through an installer, each of which has its own idiosyncrasies, you run tlmgr install PACKAGE and the command automatically takes care of extraction and installation for you.

Build Process

If you want to do something non-trivial with TeX, I weakly recommend you refrain from using the latexmk tool until you understand why it would be helpful.7 Otherwise this will make no sense because “whee latexmk compiles exactly the number of times I need it to.”8

Anyone who’s run pdflatex instead of latexmk will have noticed a couple of things.

This is because LaTeX code executes sequentially, period. You can’t just go back and scan every line whenever you have a reference in the code. This isn’t a batch script, people! (And what happens when the user makes an error and the reference is undefined?)

Why is this an issue? Well, let’s consider the following example (fill in the rest of the document in your head):

\tableofcontents
\section{Insert Title Here}

Clearly \tableofcontents must run before \section. But we have a problem: How is \tableofcontents supposed to know what comes after it? It’s not like it can look ahead. Well, you could make it, but it would end up being slow, annoying, and buggy.

So to work around this, whenever pdflatex is run, it creates some auxiliary files like .aux and .toc files. Inside these auxiliary files is information about the last run of pdflatex, so when you run it next time, it reads the auxiliary files and can generate references correctly.

Keep in mind that synctex files are not part of this, they just exist so you can navigate between PDF and code.

TeX Paths

There are seven directories which a TeX install writes to, we’ll only explain three of them:

Typically a TeX installation writes to your root directory, I think this is kind of stupid if you’re the only user running TeX on your machine (or the only user on your machine, actually). If you put the TeX tree inside your root directory, then you’ll need root permissions to run tlmgr. But the root user doesn’t know what tlmgr is because it’s in your user PATH variable, and so you then need to point it in the right place — bah! Fortunately, it’s very easy to move a TeX install from root to HOME, you just need to change your PATH variable after.

No matter where you install TeX Live, any personal packages or classes you use should go into TEXMFHOME, period. A quick refresher: TEXMFHOME is ~/texmf by default, you can change this in texmf.cnf (run kpsewhich texmf.cnf to find it) or you can run export TEXMFHOME= in your bashrc or whatever shell you use.

Concluding Thoughts

Hopefully this explains why TeX behaves on your system the way it does. If you want to learn how TeX runs under the hood, Overleaf’s “How TeX macros actually work” is really good. A lot of LaTeX’s implementation details depend on this, because TeX really does two things uniquely:

  1. It can typeset stuff nicely. (You probably already knew this.)
  2. It uses category codes (“catcodes”) to interpret TeX documents.

Said article goes in depth about the second part. So far, I have yet to find an accessible explanation on how TeX actually decides how to typeset stuff, but I also haven’t looked very hard.