Jekyll2023-11-10T01:44:27+01:00https://meribold.org/feed.xmlPersonal SiteLukas WaymannUsing the Same Arch Linux Installation for a Decade2022-08-16T00:00:00+02:002022-08-16T16:14:38+02:00https://meribold.org/2022/08/16/same-arch-linux-installation-for-a-decade<p>As of today, I’ve been using the same Arch Linux installation for ten years on my main
computer. I don’t have much to say, but I’m writing this because my experience doesn’t
match the common notion that Arch Linux is unstable.<sup id="fnref:dont-get-me-wrong" role="doc-noteref"><a href="#fn:dont-get-me-wrong" class="footnote" rel="footnote">1</a></sup></p>
<p>I installed Arch Linux in August 2012<sup id="fnref:systemd" role="doc-noteref"><a href="#fn:systemd" class="footnote" rel="footnote">2</a></sup> on a ThinkPad X121e and never saw a need
to reinstall. In 2018, I switched to a ThinkPad X220 by moving my SSD. A few months ago,
I copied my complete installation to a ThinkPad X13 Gen 2 using rsync. The longest I went
without a system upgrade is nine months, but typically I upgrade about once per month.</p>
<p>Now it isn’t the case that nothing ever broke. Most disruptively, X did twice and audio
did once. But over ten years, this doesn’t compare too poorly to other operating systems.
With Ubuntu, I would’ve had to upgrade to a new release at least three times during the
same period to end up with a version that’s currently supported,<sup id="fnref:precise-to-bionic" role="doc-noteref"><a href="#fn:precise-to-bionic" class="footnote" rel="footnote">3</a></sup> and
five times to end up with the latest LTS release.<sup id="fnref:precise-to-jammy" role="doc-noteref"><a href="#fn:precise-to-jammy" class="footnote" rel="footnote">4</a></sup> And these release
upgrades don’t always go smoothly either.</p>
<p>I don’t know what you’re using, but I bet you’ve also spent some time fixing problems with
it over the last ten years.</p>
<h2 id="notes">Notes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:dont-get-me-wrong" role="doc-endnote">
<p>I’m not trying to convince anyone to use Arch Linux. <a href="#fnref:dont-get-me-wrong" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:systemd" role="doc-endnote">
<p>slightly before systemd became the default init system <a href="#fnref:systemd" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:precise-to-bionic" role="doc-endnote">
<p>from 12.04 to 14.04 to 16.04 to 18.04 <a href="#fnref:precise-to-bionic" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:precise-to-jammy" role="doc-endnote">
<p>from 12.04 to 14.04 to 16.04 to 18.04 to 20.04 to 22.04 <a href="#fnref:precise-to-jammy" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Lukas WaymannAs of today, I’ve been using the same Arch Linux installation for ten years on my main computer. I don’t have much to say, but I’m writing this because my experience doesn’t match the common notion that Arch Linux is unstable.1 I’m not trying to convince anyone to use Arch Linux. ↩Some Highlights From My Dotfiles2022-06-27T00:00:00+02:002022-08-04T14:50:15+02:00https://meribold.org/video/2022/06/27/dotfiles-highlights<iframe style="position: absolute; top: 0; left: 0; border: 0; width: 100%; height: 100%" src="https://www.youtube.com/embed/CZxo41Ao_Tc?rel=0" allowfullscreen="">
</iframe>Lukas WaymannA Critique of the Open Letter Calling for the Removal of RMS2021-04-07T00:00:00+02:002023-02-25T13:43:26+01:00https://meribold.org/2021/04/07/critique-of-rms-open-letter<p>On March 21, Richard M. Stallman (RMS) announced that he is on the Free Software
Foundation (FSF) board of directors again after having resigned in September 2019. Two
days later, an <a href="https://rms-open-letter.github.io">open letter</a> condemning RMS as misogynist, ableist, transphobic,
intolerant, bigoted, hateful, and dangerous was published. The letter demands Stallman’s
removal from all leadership positions as well as the removal of the entire board of the
FSF for enabling such a person.</p>
<p>I will not make an argument either for or against RMS. Those who know RMS personally are
in a better position to do that. I will only argue that the open letter is misleading,
divisive, and meant to whip up a mob. I think a good argument against RMS <em>can</em> be made,
but the authors of the letter instead chose to incite as much outrage as possible at the
cost of honesty and nuance.</p>
<p>Many of the letter’s signatories tell personal stories of RMS being arrogant, insensitive,
presumptuous, or inappropriate, and have concluded that RMS is unsuitable as a leader or
spokesperson. This is a perfectly justifiable stance to take—especially for those that
experienced or observed misbehavior themselves. Yet, the letter focuses on what RMS has
said rather than done, and makes accusations that are much more grave.</p>
<p>This hardly seems like an accident. The goal is to have as many people as
possible—including those that never interacted with RMS—be outraged and sign the
letter, a strategy that seems to have worked rather well. It surely also animated some
people that dislike this dynamic to come out in support of RMS.</p>
<p>The <a href="https://rms-open-letter.github.io/appendix">letter’s appendix</a> correctly quotes RMS saying “the most
plausible scenario is that [Virginia Giuffre] presented herself to [Marvin Minsky] as
entirely willing”, but then changes “presented” to “being” when referencing this quote
three sentences later. That same sentence contains a direct quotation that is nowhere to
be found in the source we’re given.<sup id="fnref:2021-04-16-update" role="doc-noteref"><a href="#fn:2021-04-16-update" class="footnote" rel="footnote">1</a></sup>
The appendix continues with a parenthetical granting
that “several news reports misrepresented Stallman’s position while discussing allegations
against Minsky”, but then asserts that “Stallman has previously expressed opinions that
were consistent with the inaccurate portrayal.” No source is given this time. On the
other hand, the appendix continues to link one of the misrepresentative news reports.</p>
<p>Let’s consider the evidence for RMS being transphobic. This is thin ice. The primary
author of the open letter, Molly de Blanc, writes on her personal blog:</p>
<blockquote>
<p>There is no space to argue over whether a comment was transphobic—if it hurt a trans
person then it is transphobic and it is unacceptable.</p>
</blockquote>
<p>As an aside, I highly recommend reading <a href="http://deblanc.net/blog/2021/01/12/1028-words-on-free-software/">that blog post</a> in full. Echoes of its world
view are present throughout the open letter.</p>
<p>So what makes RMS transphobic? He <a href="https://www.stallman.org/articles/genderless-pronouns.html">proposes</a> and uses a set of
singular, gender-neutral pronouns other than “they”, “their”, and
“theirs”.<sup id="fnref:2021-05-03-update" role="doc-noteref"><a href="#fn:2021-05-03-update" class="footnote" rel="footnote">2</a></sup></p>
<p>One transgender person <a href="https://libreboot.org/news/rms.html#rms-is-not-transphobic">called</a> the proposal “idiotic”, but
<a href="https://libreboot.org/news/rms.html#rms-is-not-transphobic">considers</a> “[c]alling RMS a transphobe […] an insult to people who suffer from
real transphobia.” I am also <a href="https://twitter.com/mjg59/status/1377406466504544258">told of</a> at least one non-binary
transgender person that does agree with the open letter’s assertion that Stallman’s
proposal is, in fact, “poorly disguised transphobia”. I’m tempted to err on the side of
assuming good intentions here. To me this looks like Stallman’s typical willingness to
die on strange, idiosyncratic hills. You can have your own opinion. Or maybe you can’t.
Remember, there “is no space to argue over whether a comment was transphobic”.</p>
<p>The accusation of ableism seems to primarily hinge on a highly insensitive note RMS added
to his website in October 2016, but completely rewrote a few months later. Presumably
someone told RMS about the problems with his original note. The open letter’s appendix
uses recent web.archive.org links throughout and gives no indication that this particular
link is an outdated capture from 2016. Here’s the <a href="https://stallman.org/archives/2016-sep-dec.html#31_October_2016_(Down's_syndrome)">rewritten note</a>:</p>
<blockquote>
<p>A noninvasive test for Down’s syndrome eliminates the small risk of the old test.
This might lead more women to get tested, and abort fetuses that have Down’s syndrome.</p>
<p>According to Wikipedia, Down’s syndrome is a combination of many kinds of medical
misfortune. Thus, when carrying a fetus that is likely to have Down’s syndrome, I
think the right course of action for the woman is to terminate the pregnancy.</p>
<p>That choice does right by the potential children that would otherwise likely be born
with grave medical problems and disabilities. As humans, they are entitled to the
capacity that is normal for human beings. I don’t advocate making rules about the
matter, but I think that doing right by your children includes not intentionally
starting them out with less than that.</p>
<p>When children with Down’s syndrome are born, that’s a different situation. They are
human beings and I think they deserve the best possible care.</p>
</blockquote>
<p>RMS has (for some reason) tens of thousands of political notes on his website ready to be
cherry-picked, and yet that wasn’t enough for the authors of the open letter and they
turned to an outdated web.archive.org capture from 2016. And then they didn’t bother to
inform the readers of this in any way.</p>
<p>Suggesting that RMS is intolerant is strange. If anything, he seems to be <em>too</em> tolerant.
Implying that RMS is hateful is just ridiculous.</p>
<p>I only looked into some of the open letter’s contents in depth and can’t comment on the
rest. Maybe the rest is completely fair. Martin Tournoij wrote a better and more
thorough <a href="https://www.arp242.net/rms.html">commentary</a> on the open letter than I
can. I do wonder how many people looked into the accusations before signing the letter,
though. One of the original signatories <a href="https://twitter.com/luis_in_brief/status/1377268090258354176">told me that</a>, “from a practical
perspective, transphobe or no, [RMS] needs to go” <a href="https://twitter.com/luis_in_brief/status/1377268518052241413">and that</a> “regardless of
exactly how ableist he is, he can’t be a leader of an organization”. Perhaps that’s true,
but it isn’t fair. Accusations like ableism and transphobia are front and center in the
open letter and likely convinced more than a handful of people to add their signature.</p>
<p>I’ll leave you with three more bits of evidence that things are more complex than the open
letter would have you believe:</p>
<ol>
<li>A competing <a href="https://rms-support-letter.github.io">letter in support of RMS</a> exists and has accumulated considerably more
signatures than the letter condemning RMS.</li>
<li>In response someone created a Chrome extension that marks repositories owned by a
signatory of this support letter with red text on GitHub.</li>
<li>The <a href="https://twitter.com/aaronbassett/status/1376601712379764737">announcement</a> of this extension was welcomed with 394
likes on Twitter.<sup id="fnref:twitter-response-clarification" role="doc-noteref"><a href="#fn:twitter-response-clarification" class="footnote" rel="footnote">3</a></sup></li>
</ol>
<h2 id="notes">Notes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:2021-04-16-update" role="doc-endnote">
<p>This is still the case as of April 16, but a reader informed me that
the appendix <em>used to</em> provide a different link here. Instead of a 2021
web.archive.org capture of a note from Stallman’s website it previously linked an old
version of the same note via a 2018 capture. The old, since-rewritten note <em>does</em>
contain the phrase “entirely willing”. Presumably the link was <a href="https://github.com/rms-open-letter/rms-open-letter.github.io/commit/f7d04be13369ec1f6933e8de8261d1dcfda4d430">changed</a> by
accident. I’d rather have the appendix rely on outdated notes as little as possible,
but I’ve notified two of the people that worked on the appendix in case they want to
revert the link back to the 2018 capture. <a href="#fnref:2021-04-16-update" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2021-05-03-update" role="doc-endnote">
<p>RMS uses “she” and “he” for transgender persons as preferred. He
refuses to use singular “they”. I added this note on 2021-05-03 because I noticed
there’s some confusion about this. <a href="#fnref:2021-05-03-update" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:twitter-response-clarification" role="doc-endnote">
<p>I am not suggesting that these 394 people are
representative of the ~3000 open letter signatories. <a href="#fnref:twitter-response-clarification" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Lukas WaymannOn March 21, Richard M. Stallman (RMS) announced that he is on the Free Software Foundation (FSF) board of directors again after having resigned in September 2019. Two days later, an open letter condemning RMS as misogynist, ableist, transphobic, intolerant, bigoted, hateful, and dangerous was published. The letter demands Stallman’s removal from all leadership positions as well as the removal of the entire board of the FSF for enabling such a person.Using fzf as a dmenu/Rofi Replacement2020-10-13T00:00:00+02:002021-01-01T17:47:33+01:00https://meribold.org/video/2020/10/13/fzfmenu<iframe style="position: absolute; top: 0; left: 0; border: 0; width: 100%; height: 100%" src="https://www.youtube.com/embed/kw2mnwhptjw?rel=0" allowfullscreen="">
</iframe>Lukas WaymannDemo: Newsboat+mpv as a YouTube Client2020-02-19T00:00:00+01:002021-01-01T17:47:33+01:00https://meribold.org/video/2020/02/19/newsboat-plus-mpv-youtube-client<iframe style="position: absolute; top: 0; left: 0; border: 0; width: 100%; height: 100%" src="https://www.youtube.com/embed/U31niad7bHY?rel=0" allowfullscreen="">
</iframe>Lukas WaymannEmail Workflow Demo2020-02-15T00:00:00+01:002021-01-01T16:59:56+01:00https://meribold.org/video/2020/02/15/email-workflow-demo<iframe style="position: absolute; top: 0; left: 0; border: 0; width: 100%; height: 100%" src="https://www.youtube.com/embed/9a2TJKQeVZc?rel=0" allowfullscreen="">
</iframe>Lukas WaymannVirtual Environments Demystified2018-02-13T00:00:00+01:002023-02-25T13:43:26+01:00https://meribold.org/python/2018/02/13/virtual-environments-9487<p>
<div class="confined-img-aspect-ratio-box" style="padding-top: calc(640 / 1464 * 100% + 640 / 1464 * 15px)">
<picture>
<source type="image/webp" srcset="/assets/virtual-boy-avgn-651w.webp 651w,
/assets/virtual-boy-avgn-976w.webp 976w,
/assets/virtual-boy-avgn-1464w.webp 1464w" sizes="(max-width: 75ch) 100vw, 75ch" />
<img class="aspect-ratio-box-inside" src="/assets/virtual-boy-avgn.jpg" alt="The nerd with his Virtual Boy" />
</picture>
</div>
</p>
<p>Here’s a non-exhaustive list of programs that are all meant to help create or manage
virtual environments in some way:</p>
<blockquote>
<p><a href="https://github.com/ofek/hatch">Hatch</a>,
<a href="https://pypi.python.org/pypi/VirtualEnvManager">VirtualEnvManager</a>,
<a href="https://github.com/kennethreitz/autoenv">autoenv</a>,
<a href="https://github.com/PyAr/fades">fades</a>,
<a href="https://gist.github.com/datagrok/2199506#a-better-activate-inve">inve</a>,
<a href="https://github.com/berdario/pew">pew</a>,
<a href="https://github.com/kennethreitz/pipenv">pipenv</a>,
<a href="https://github.com/pyenv/pyenv-virtualenv">pyenv-virtualenv</a>,
<a href="https://github.com/pyenv/pyenv-virtualenvwrapper">pyenv-virtualenvwrapper</a>,
<a href="https://github.com/pyenv/pyenv">pyenv</a>,
<a href="https://github.com/python/cpython/blob/3.6/Tools/scripts/pyvenv">pyvenv</a>,
<a href="https://github.com/kvbik/rvirtualenv">rvirtualenv</a>,
<a href="https://github.com/tox-dev/tox">tox</a>,
<a href="https://github.com/borntyping/v">v</a>,
<a href="https://docs.python.org/3/library/venv.html" title="The Python Standard Library: venv — Creation of virtual environments">venv</a>,
<a href="https://pypi.python.org/pypi/vex">vex</a>,
<a href="http://peak.telecommunity.com/DevCenter/EasyInstall#creating-a-virtual-python">virtual-python</a>,
<a href="https://github.com/brainsik/virtualenv-burrito">virtualenv-burrito</a>,
<a href="https://github.com/brbsix/virtualenv-mv">virtualenv-mv</a>,
<a href="https://github.com/pypa/virtualenv">virtualenv</a>,
<a href="https://pypi.python.org/pypi/virtualenvwrapper-win">virtualenvwrapper-win</a>,
<a href="https://pypi.python.org/pypi/virtualenvwrapper">virtualenvwrapper</a>,
<a href="https://pypi.python.org/pypi/workingenv.py">workingenv</a></p>
</blockquote>
<p>Clearly, this stuff must be really hard to get right. I also must be a moron, since,
after having written some thousand lines of Python, I don’t even know what problem we are
trying to solve here, and the abundance of relevant programs with subtly different names
has deterred me from reading up on it so far.</p>
<p>So what is a virtual environment? The <a href="https://docs.python.org/3/tutorial/venv.html" title="The Python Tutorial: Virtual Environments and Packages">official docs’ tutorial</a> describes
it as</p>
<blockquote>
<p>a self-contained directory tree that contains a Python installation for a particular
version of Python, plus a number of additional packages.</p>
</blockquote>
<p>A directory with a Python interpreter? Easy enough.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir virtual_env
$ cp /bin/python3 virtual_env/
</code></pre></div></div>
<p>Let’s see. Directory? Check. Contains a Python installation? Check. Contains a number
of additional packages? Zero is a number! (Check.) Particular version? Um…</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd virtual_env/
$ ./python3 --version
Python 3.6.3
</code></pre></div></div>
<p>I think that will do. Is it self-contained, though? It doesn’t contain itself…</p>
<div style="width: 33%; margin: auto">
<picture>
<source type="image/webp" srcset="/assets/russell-200w.webp 200w,
/assets/russell-300w.webp 300w,
/assets/russell-450w.webp 450w,
/assets/russell-675w.webp 675w" sizes="(max-width: 75ch) 33vw, 25ch" />
<img class="normal-img" style="width: 100%" src="/assets/russell.png" alt="Another nerd: Bertrand Russell in 1916" title="Consider the directory containing all directories that don't contain themselves." />
</picture>
</div>
<p>Jokes aside, there are only two things missing to actually make our directory a virtual
environment as specified by <a href="https://www.python.org/dev/peps/pep-0405/" title="PEP 405 -- Python Virtual Environments"><abbr title="Python Enhancement Proposal">PEP</abbr> 405</a>, the proposal that integrated a standard mechanism
for virtual environments with Python.<sup id="fnref:before-405" role="doc-noteref"><a href="#fn:before-405" class="footnote" rel="footnote">1</a></sup></p>
<ol>
<li>A file named <code class="language-plaintext highlighter-rouge">pyvenv.cfg</code> containing the line <code class="language-plaintext highlighter-rouge">home = /usr/bin</code></li>
<li>A <code class="language-plaintext highlighter-rouge">lib/python3.6/site-packages</code> subdirectory</li>
</ol>
<p>(Both paths are subject to the <abbr title="Operating system">OS</abbr> and the second one also to the Python version used.)</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo 'home = /usr/bin' > pyvenv.cfg
$ mkdir -p lib/python3.6/site-packages
</code></pre></div></div>
<p>I will also move the Python binary into a <code class="language-plaintext highlighter-rouge">bin</code> subdirectory.<sup id="fnref:why-tho" role="doc-noteref"><a href="#fn:why-tho" class="footnote" rel="footnote">2</a></sup></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir bin && mv python3 bin/
</code></pre></div></div>
<!-- > [T]he internal virtual environment layout mimics the layout of the Python installation
> itself on each platform.
> ---<https://www.python.org/dev/peps/pep-0405/#creating-virtual-environments> -->
<p>Fair. We have a directory that formally qualifies as a virtual environment:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tree --noreport
.
├── bin
│ └── python3
├── lib
│ └── python3.6
│ └── site-packages
└── pyvenv.cfg
</code></pre></div></div>
<p>This leads us to the next question.</p>
<h2 id="whats-the-point">What’s the point?</h2>
<p>When we run our copy of the Python binary, the <code class="language-plaintext highlighter-rouge">pyvenv.cfg</code> file changes what happens
during startup: the presence of the <code class="language-plaintext highlighter-rouge">home</code> key tells Python the binary belongs to a
virtual environment, the key’s value (<code class="language-plaintext highlighter-rouge">/usr/bin</code>) tells it where to find a complete Python
installation that includes the standard library.</p>
<p>The bottom line is that <code class="language-plaintext highlighter-rouge">./lib/python3.6/site-packages</code> becomes part of the <a href="https://docs.python.org/3/library/site.html">module search
path</a>. The point is that we can now install packages to that location, in particular,
specific versions that may conflict with the dependencies of another Python program on the
same system.<sup id="fnref:python-level-isolation" role="doc-noteref"><a href="#fn:python-level-isolation" class="footnote" rel="footnote">3</a></sup></p>
<p>For example, if your project needs exactly version 0.0.3 of
<a href="https://pypi.python.org/pypi/left-pad">left-pad</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ pip3 install -t lib/python3.6/site-packages/ left-pad==0.0.3
</code></pre></div></div>
<p>Now this will work:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./bin/python3 -c 'import left_pad'
</code></pre></div></div>
<p>While this should raise <a href="https://docs.python.org/3/library/exceptions.html#ModuleNotFoundError"><code class="language-plaintext highlighter-rouge">ModuleNotFoundError</code></a>, as desired:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python3 -c 'import left_pad'
</code></pre></div></div>
<p>Another project on the same system could have a different version of left-pad in its own
virtual environment, without interfering with this one.</p>
<!--
TODO: talk about isolation from the system-level and user-level site-packages directories?
-->
<h2 id="the-standard-tool-for-creating-virtual-environments">The standard tool for creating virtual environments</h2>
<p>In practice, one does not simply create virtual environments by hand, which brings us back
to the dauntingly long list of tools above. Fortunately, one of them is not like the
others. While it’s predated by most of them, this one ships with Python as part of the
standard library: <a href="https://docs.python.org/3/library/venv.html" title="The Python Standard Library: venv — Creation of virtual environments"><em>venv</em></a>.<sup id="fnref:venv-and-pyvenv" role="doc-noteref"><a href="#fn:venv-and-pyvenv" class="footnote" rel="footnote">4</a></sup></p>
<p>In its simplest form, venv is used to create a virtual environment like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python3 -m venv virtual_env
</code></pre></div></div>
<p>This creates the <code class="language-plaintext highlighter-rouge">virtual_env</code> directory and also copies or symlinks the Python
interpreter:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd virtual_env
$ find -name python3
./bin/python3
</code></pre></div></div>
<p>It also copies a bunch of other stuff: I get 650 files in 89 subdirectories amounting to
about 10 MiB in total. One of those files is the <code class="language-plaintext highlighter-rouge">pip</code> binary, and we can use it to
install packages into the virtual environment without passing extra command-line
arguments:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./bin/pip install left-pad
</code></pre></div></div>
<p>You can read more about using venv and <em>optional</em> magic like “activate” scripts in the
<a href="https://docs.python.org/3/tutorial/venv.html">Python tutorial</a> or venv’s
<a href="https://docs.python.org/3/library/venv.html">documentation</a>—this post is only meant to
boil down what a virtual environment actually is.</p>
<h2 id="summary">Summary</h2>
<p>A virtual environment is a directory containing a Python interpreter, a special
<code class="language-plaintext highlighter-rouge">pyvenv.cfg</code> file that affects startup of the interpreter, and some third-party Python
packages. Python packages installed into a virtual environment will not interfere with
other Python applications on the same system. The “<a href="https://docs.python.org/3/installing/">standard tool for creating virtual
environments</a>” is venv.</p>
<h2 id="appendix-timeline">Appendix: timeline</h2>
<!-- TODO: When and by whom was the term "virtual environment" coined? -->
<p>I think Ian Bicking’s <a href="https://web.archive.org/web/20051203055434/http://svn.colorstudy.com/home/ianb/non_root_python.py"><code class="language-plaintext highlighter-rouge">non_root_python.py</code></a> qualifies as the first tool for creating
virtual environments. Based on that, <a href="http://peak.telecommunity.com/dist/virtual-python.py"><code class="language-plaintext highlighter-rouge">virtual-python.py</code></a> was
<a href="https://github.com/pypa/setuptools/commit/3df2aabcc056e6d001355d4cec780437387ac4fa">added</a> to <a href="https://en.wikipedia.org/wiki/Setuptools#EasyInstall">EasyInstall</a> in version
<a href="http://peak.telecommunity.com/DevCenter/EasyInstall#release-notes-change-history">0.6a6</a> in October 2005. Here’s a timeline summarizing some
main events.</p>
<dl>
<dt>2005-10-17</dt>
<dd><code class="language-plaintext highlighter-rouge">virtual-python.py</code> is added to EasyInstall.</dd>
<dt>2006-03-08</dt>
<dd>Ian Bicking publishes a blog post about improving <code class="language-plaintext highlighter-rouge">virtual-python.py</code> titled
“<a href="http://www.ianbicking.org/working-env-brainstorm.html">Working Environment Brainstorm</a>”.</dd>
<dt>2006-03-15</dt>
<dd>Ian Bicking <a href="http://www.ianbicking.org/working-env.html">announces</a> <a href="https://web.archive.org/web/20060425105635/http://svn.colorstudy.com/home/ianb/working-env.py"><code class="language-plaintext highlighter-rouge">working-env.py</code></a>.</dd>
<dt>2006-04-26</dt>
<dd>Ian Bicking <a href="http://www.ianbicking.org/workingenv-revisited.html">announces</a> an improved version of
<code class="language-plaintext highlighter-rouge">working-env.py</code> called <a href="https://web.archive.org/web/20060516191525/http://svn.colorstudy.com:80/home/ianb/workingenv">workingenv</a>.
<!--
TODO: did anything important happen here?
--></dd>
<dt>2007-09-14</dt>
<dd><a href="https://github.com/pypa/virtualenv">virtualenv</a>’s <a href="https://github.com/pypa/virtualenv/commit/e02aa46f4f0eb5321c31641e89bde2c9b92547bb">first commit</a></dd>
<dt>2007-10-10</dt>
<dd>Ian Bicking announces virtualenv: “<a href="http://www.ianbicking.org/blog/2007/10/workingenv-is-dead-long-live-virtualenv.html">Workingenv is dead, long live
Virtualenv!</a>”</dd>
<dt>2009-10-24</dt>
<dd><code class="language-plaintext highlighter-rouge">virtual-python.py</code> is <a href="https://github.com/pypa/setuptools/commit/43d34734c801d2d9a72d5fa6e7fc74d80bdc11c1">removed</a> from EasyInstall.
<!--
TODO: did anything important happen here?
--></dd>
<dt>2011-06-13</dt>
<dd><abbr title="Python Enhancement Proposal">PEP</abbr> 405 is created.</dd>
<dt>2012-05-25</dt>
<dd><abbr title="Python Enhancement Proposal">PEP</abbr> 405 is accepted for inclusion in Python 3.3.</dd>
<dt>2012-09-29</dt>
<dd><a href="https://docs.python.org/dev/whatsnew/3.3.html#pep-405-virtual-environments">Python 3.3</a> is released and venv and <a href="https://github.com/python/cpython/blob/3.6/Tools/scripts/pyvenv">pyvenv</a> become part of the
standard library.</dd>
<dt>2014-03-16</dt>
<dd><a href="https://docs.python.org/dev/whatsnew/3.4.html">Python 3.4</a> is released and venv “<a href="https://docs.python.org/3/installing/">defaults to installing <abbr title="pip installs packages">pip</abbr> into all created
virtual environments</a>” now.</dd>
<dt>2015-09-13</dt>
<dd><a href="https://docs.python.org/dev/whatsnew/3.5.html">Python 3.5</a> is released. “<a href="https://docs.python.org/3/installing/">The use of venv is now recommended for creating virtual
environments.</a>”</dd>
<dt>2016-12-23</dt>
<dd><a href="https://docs.python.org/dev/whatsnew/3.6.html#id8">Python 3.6</a> is released; “<a href="https://docs.python.org/3/installing/">pyvenv was the recommended tool for creating virtual
environments for Python 3.3 and 3.4, and is deprecated in Python 3.6.</a>”</dd>
</dl>
<h2 style="display: initial" id="notes">Notes</h2>
<ul>
<li>The “Virtual Boy” image is used with permission from
<a href="https://en.wikipedia.org/wiki/James_Rolfe">James Rolfe</a>.</li>
<li>If you found this article helpful or otherwise worthwhile and want to say thanks, one
way you can do so is by <a href="https://www.buymeacoffee.com/meribold">buying me a coffee</a>.</li>
</ul>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:before-405" role="doc-endnote">
<p>Before <abbr title="Python Enhancement Proposal">PEP</abbr> 405 was accepted, virtual environments were purely the domain of
third-party tools with no direct support from the language itself. <a href="#fnref:before-405" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:why-tho" role="doc-endnote">
<p>I think this <em>should</em> not be necessary. But, because of what I assume to be a
bug in CPython, it is. A <code class="language-plaintext highlighter-rouge">bin/</code> subdirectory certainly is the conventional location
for the binary, though. <a href="#fnref:why-tho" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:python-level-isolation" role="doc-endnote">
<p>Be aware that we only get <a href="https://web.archive.org/web/20191129151330/https://pythonrants.wordpress.com/2013/12/06/why-i-hate-virtualenv-and-pip/">Python-level isolation</a>. <a href="#fnref:python-level-isolation" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:venv-and-pyvenv" role="doc-endnote">
<p>pyvenv also ships with Python, but was deprecated in version 3.6.
Both venv and pyvenv were added to Python in version 3.3. <a href="#fnref:venv-and-pyvenv" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Lukas WaymannA Survey of CPU Caches2017-10-20T00:00:00+02:002023-02-27T08:59:43+01:00https://meribold.org/2017/10/20/survey-of-cpu-caches<p>CPU caches are the fastest and smallest components of a computer’s memory hierarchy except
for registers. They are part of the CPU and store a subset of the data present in main
memory (<abbr title="random-access memory">RAM</abbr>) that is expected to be needed soon. Their purpose is to reduce the
frequency of main memory access.</p>
<p>Why can’t we just have one uniform type of memory that’s both big and fast? Cost is one
reason, but more fundamentally, since no signal can propagate faster than the speed of
light, every possible storage technology can only reach a finite amount of data within a
desired access latency.</p>
<h2 id="cache-operation-overview">Cache operation overview</h2>
<p>Whenever a program requests a memory address, the CPU checks its caches. If the
location is present, a <em>cache hit</em> occurs. Otherwise, the result is a <em>cache miss</em>, and
the next level of the memory hierarchy, which could be another CPU cache, is accessed.</p>
<p>CPU caches are managed by the CPU directly. They are generally opaque to the operating
system and other software. That is, programmers have no direct control over the contents
of CPU caches. Unless explicitly prevented, the CPU brings all accessed data into cache.
This happens in response to cache misses and will, much more often than not, cause another
cache entry to be evicted and replaced.</p>
<h2 id="types-of-cpu-caches">Types of CPU caches</h2>
<p>Current x86 CPUs generally have three main types of caches: data caches, instruction
caches, and translation lookaside buffers (TLBs). Some caches are used
for data as well as instructions and are called <em>unified</em>. A processor
may have multiple caches of each type, which are organized into numerical <em>levels</em>
starting at 1, the smallest and fastest level, based on their size and speed.</p>
<p>In practice, a currently representative x86 cache hierarchy consists of:</p>
<ul>
<li>Separate level 1 data and instruction caches of 32 to 64 KiB for each core (denoted
L1d and L1i).</li>
<li>A <abbr title="Unified caches are used for data as well as instructions.">unified</abbr> L2 cache of 256 to 512 KiB for each core.</li>
<li>Often a <abbr title="Unified caches are used for data as well as instructions.">unified</abbr> L3 cache of 2 to 16 MiB shared between all cores.</li>
<li>One or more <abbr title="translation lookaside buffers">TLBs</abbr> per core. These cache virtual-to-physical address associations of
memory pages.<sup id="fnref:tangential" role="doc-noteref"><a href="#fn:tangential" class="footnote" rel="footnote">1</a></sup></li>
</ul>
<p>Here’s a table with approximate access latencies:</p>
<table>
<thead>
<tr>
<th> </th>
<th>L1d</th>
<th>L2</th>
<th>L3</th>
<th>Main Memory</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cycles</td>
<td>3–4</td>
<td>10–12</td>
<td>30–70</td>
<td>100–150</td>
</tr>
</tbody>
</table>
<!-- TODO: note about data cache being the biggest target for optimizations? -->
<p>My laptop’s AMD E-450 CPU has cores with an L1d cache of 32 KiB and a <abbr title="Unified caches are used for data as well as instructions.">unified</abbr> L2 cache of
512 KiB each:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>lscpu | <span class="nb">grep</span> <span class="s1">'L1d\|L2'</span>
L1d cache: 32K
L2 cache: 512K
</code></pre></div></div>
<p>Let’s verify those sizes and measure the access latencies. The following C
program repeatedly reads elements from an array in random
order.<sup id="fnref:prefetching" role="doc-noteref"><a href="#fn:prefetching" class="footnote" rel="footnote">2</a></sup> To minimize the overhead of picking a random index, the array is
first set up as a circular, singly linked list where every element except the last points
to a random successor. When compiled with <code class="language-plaintext highlighter-rouge">-DBASELINE</code>, only this initialization is done.</p>
<div class="language-c wide-listing highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define N 100000000 // 100 million
</span>
<span class="k">struct</span> <span class="n">elem</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">elem</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span>
<span class="p">}</span> <span class="n">array</span><span class="p">[</span><span class="n">SIZE</span><span class="p">];</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="n">array</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">next</span> <span class="o">=</span> <span class="o">&</span><span class="n">array</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
<span class="n">array</span><span class="p">[</span><span class="n">SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">].</span><span class="n">next</span> <span class="o">=</span> <span class="n">array</span><span class="p">;</span>
<span class="c1">// Fisher-Yates shuffle the array.</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">size_t</span> <span class="n">j</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="p">(</span><span class="n">SIZE</span> <span class="o">-</span> <span class="n">i</span><span class="p">);</span> <span class="c1">// j is in [i, SIZE).</span>
<span class="k">struct</span> <span class="n">elem</span> <span class="n">temp</span> <span class="o">=</span> <span class="n">array</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="c1">// Swap array[i] and array[j].</span>
<span class="n">array</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">array</span><span class="p">[</span><span class="n">j</span><span class="p">];</span>
<span class="n">array</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">temp</span><span class="p">;</span>
<span class="p">}</span>
<span class="cp">#ifndef BASELINE
</span> <span class="kt">int64_t</span> <span class="n">dummy</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">elem</span> <span class="o">*</span><span class="n">i</span> <span class="o">=</span> <span class="n">array</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">n</span> <span class="o"><</span> <span class="n">N</span><span class="p">;</span> <span class="o">++</span><span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="n">dummy</span> <span class="o">+=</span> <span class="p">(</span><span class="kt">int64_t</span><span class="p">)</span><span class="n">i</span><span class="p">;</span>
<span class="n">i</span> <span class="o">=</span> <span class="n">i</span><span class="o">-></span><span class="n">next</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">dummy</span><span class="p">);</span>
<span class="cp">#endif
</span><span class="p">}</span>
</code></pre></div></div>
<div class="language-c narrow-listing highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define N 100000000 // 100 million
</span>
<span class="k">struct</span> <span class="n">elem</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">elem</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span>
<span class="p">}</span> <span class="n">array</span><span class="p">[</span><span class="n">SIZE</span><span class="p">];</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="n">array</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">next</span> <span class="o">=</span> <span class="o">&</span><span class="n">array</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
<span class="n">array</span><span class="p">[</span><span class="n">SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">].</span><span class="n">next</span> <span class="o">=</span> <span class="n">array</span><span class="p">;</span>
<span class="c1">// Fisher-Yates shuffle the array.</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// j is in [i, SIZE).</span>
<span class="kt">size_t</span> <span class="n">j</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="p">(</span><span class="n">SIZE</span> <span class="o">-</span> <span class="n">i</span><span class="p">);</span>
<span class="c1">// Swap array[i] and array[j].</span>
<span class="k">struct</span> <span class="n">elem</span> <span class="n">temp</span> <span class="o">=</span> <span class="n">array</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">array</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">array</span><span class="p">[</span><span class="n">j</span><span class="p">];</span>
<span class="n">array</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">temp</span><span class="p">;</span>
<span class="p">}</span>
<span class="cp">#ifndef BASELINE
</span> <span class="kt">int64_t</span> <span class="n">dummy</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">elem</span> <span class="o">*</span><span class="n">i</span> <span class="o">=</span> <span class="n">array</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">n</span> <span class="o"><</span> <span class="n">N</span><span class="p">;</span> <span class="o">++</span><span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="n">dummy</span> <span class="o">+=</span> <span class="p">(</span><span class="kt">int64_t</span><span class="p">)</span><span class="n">i</span><span class="p">;</span>
<span class="n">i</span> <span class="o">=</span> <span class="n">i</span><span class="o">-></span><span class="n">next</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">dummy</span><span class="p">);</span>
<span class="cp">#endif
</span><span class="p">}</span>
</code></pre></div></div>
<p>The difference in CPU cycles used by this program when complied with and without
<code class="language-plaintext highlighter-rouge">-DBASELINE</code> is the number of cycles that <code class="language-plaintext highlighter-rouge">N</code> memory accesses take. Dividing by <code class="language-plaintext highlighter-rouge">N</code>
yields the number of cycles one access takes on average.</p>
<p>Here are my results for different array sizes (set at compile time with the <code class="language-plaintext highlighter-rouge">SIZE</code> macro):</p>
<div class="chart-wrapper">
<img class="normal-img chart" src="/assets/cache-paper/access-time-plot.svg" alt="Plot of the average number of CPU cycles one access takes vs. the array size; the differences are due to how much of the array fits into which CPU cache" title="There is a table with the exact numerical results further down." />
</div>
<p>Up to 32 KiB, each access takes almost exactly 3 cycles. This is the L1d access time. At
32 KiB (the size of the L1d) the time increases to about 3.5 cycles. This is not
surprising since the cache is shared with other processes and the operating system, so
some of our data gets evicted. The first dramatic increase happens at 64 KiB followed by
smaller increases at 128 and 256 KiB. I suspect we are seeing a mixture of L2 and L1d
access, with less and less L1d hits and an L2 access time of around 25 cycles.</p>
<p>The values from 512 KiB (the size of the L2) to 128 MiB exhibit a similar pattern. As
more and more accesses go to main memory, the average delay for one access approaches 200
cycles.</p>
<table class="funny-table">
<thead>
<tr>
<th>Array Size (KiB)</th>
<th>Cycles / Iteration</th>
<th>Array Size (KiB)</th>
<th>Cycles / Iteration</th>
</tr>
</thead>
<tbody>
<tr>
<td> 1</td>
<td> 3.01</td>
<td> 512</td>
<td> 27.23</td>
</tr>
<tr>
<td> 2</td>
<td> 3.01</td>
<td> 1024</td>
<td>117.28</td>
</tr>
<tr>
<td> 4</td>
<td> 3.01</td>
<td> 2048</td>
<td>157.85</td>
</tr>
<tr>
<td> 8</td>
<td> 3.01</td>
<td> 4096</td>
<td>174.74</td>
</tr>
<tr>
<td> 16</td>
<td> 3.01</td>
<td> 8192</td>
<td>183.54</td>
</tr>
<tr>
<td> 32</td>
<td> 3.46</td>
<td> 16384</td>
<td>188.00</td>
</tr>
<tr>
<td> 64</td>
<td>15.34</td>
<td> 32768</td>
<td>191.39</td>
</tr>
<tr>
<td>128</td>
<td>18.85</td>
<td> 65536</td>
<td>193.95</td>
</tr>
<tr>
<td>256</td>
<td>24.73</td>
<td>131072</td>
<td>194.83</td>
</tr>
</tbody>
</table>
<h2 id="cache-lines">Cache lines</h2>
<p><em>Cache lines</em> or <em>cache blocks</em> are the unit of data transfer between main memory and
cache. They have a fixed size which is typically 64 bytes on x86/x64 CPUs—this means
accessing a single, uncached 4-byte integer entails loading another 60 adjacent bytes.</p>
<p>My E-450 CPU is no exception and both of its data caches have 64-byte cache lines:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ getconf LEVEL1_DCACHE_LINESIZE
64
$ getconf LEVEL2_CACHE_LINESIZE
64
</code></pre></div></div>
<p>We can verify this quite easily. The following program loops over an array with an
increment given at compile time as <code class="language-plaintext highlighter-rouge">STEP</code> and measures the processor time.</p>
<div class="language-c wide-listing highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define SIZE 67108864 // 64 * 1024 * 1024
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">int64_t</span><span class="o">*</span> <span class="n">array</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int64_t</span><span class="o">*</span><span class="p">)</span><span class="n">calloc</span><span class="p">(</span><span class="n">SIZE</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">int64_t</span><span class="p">));</span> <span class="c1">// 512 MiB</span>
<span class="kt">clock_t</span> <span class="n">t0</span> <span class="o">=</span> <span class="n">clock</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">SIZE</span><span class="p">;</span> <span class="n">i</span> <span class="o">+=</span> <span class="n">STEP</span><span class="p">)</span> <span class="p">{</span>
<span class="n">array</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">&=</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// Do something (anything).</span>
<span class="p">}</span>
<span class="kt">clock_t</span> <span class="n">t1</span> <span class="o">=</span> <span class="n">clock</span><span class="p">();</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%d %f</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">STEP</span><span class="p">,</span> <span class="mi">1000</span><span class="p">.</span> <span class="o">*</span> <span class="p">(</span><span class="n">t1</span> <span class="o">-</span> <span class="n">t0</span><span class="p">)</span> <span class="o">/</span> <span class="n">CLOCKS_PER_SEC</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-c narrow-listing highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 64 * 1024 * 1024</span>
<span class="cp">#define SIZE 67108864
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="c1">// 512 MiB</span>
<span class="kt">int64_t</span><span class="o">*</span> <span class="n">array</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int64_t</span><span class="o">*</span><span class="p">)</span><span class="n">calloc</span><span class="p">(</span>
<span class="n">SIZE</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">int64_t</span><span class="p">));</span>
<span class="kt">clock_t</span> <span class="n">t0</span> <span class="o">=</span> <span class="n">clock</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">SIZE</span><span class="p">;</span>
<span class="n">i</span> <span class="o">+=</span> <span class="n">STEP</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Do something (anything).</span>
<span class="n">array</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">&=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">clock_t</span> <span class="n">t1</span> <span class="o">=</span> <span class="n">clock</span><span class="p">();</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%d %f</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">STEP</span><span class="p">,</span>
<span class="mi">1000</span><span class="p">.</span> <span class="o">*</span> <span class="p">(</span><span class="n">t1</span> <span class="o">-</span> <span class="n">t0</span><span class="p">)</span> <span class="o">/</span>
<span class="n">CLOCKS_PER_SEC</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>These are my results for different values of <code class="language-plaintext highlighter-rouge">STEP</code>:</p>
<div class="chart-wrapper">
<img class="normal-img chart" src="/assets/cache-paper/line-size-plot.svg" alt="Plot of the CPU time used to run the program vs. the step size; the CPU time stays nearly constant for step sizes of 1, 2, 4, and 8" title="The CPU time is nearly constant for the first 4 step sizes." />
</div>
<p>As expected, the time roughly halves whenever the step size is doubled—but only from a
step size of 16. For the first 4 step sizes, it is almost constant.</p>
<p>This is because the run times are primarily due to memory access. Up to a step size of
8, every 64-byte line has to be loaded. At 16, the values we modify are 128 bytes
apart,<sup id="fnref:128-bytes" role="doc-noteref"><a href="#fn:128-bytes" class="footnote" rel="footnote">3</a></sup> so every other cache line is skipped. At 32, three out of four cache
lines are skipped, and so on.</p>
<p>Both cache and main memory can be thought of as being
<a href="https://en.wikipedia.org/wiki/Partition_of_a_set">partitioned</a> into cache lines. Data is
not
read or written starting from arbitrary main memory addresses, but only from addresses
that are multiples of the cache line size.</p>
<h2 id="prefetching">Prefetching</h2>
<p>Consider a simplified version of the C program accessing elements of an array at
random that just walks over the array sequentially. It still follows the
pointers to do this, but the array is no longer shuffled. These are my results of
profiling this new program as before:</p>
<div class="chart-wrapper">
<img class="normal-img chart" src="/assets/cache-paper/seq-access-time-plot.svg" alt="Plot of the average number of CPU cycles one access takes vs. the array size when the array is not shuffled" title="A table with the numerical results is further down again." />
</div>
<p>Until the working set size matches that of the L1d, the access times are virtually
unchanged at 3 cycles, but exceeding the L1d and hitting the L2 increases this by no more
than a single cycle. More strikingly, exceeding the L2 has similarly limited effect: the
access time plateaus not much above 6 cycles—about 3% of the maximum we saw for random
reads.</p>
<!-- Here's a table with the numerical results again: -->
<table class="funny-table">
<thead>
<tr>
<th>Array Size (KiB)</th>
<th>Cycles / Iteration</th>
<th>Array Size (KiB)</th>
<th>Cycles / Iteration</th>
</tr>
</thead>
<tbody>
<tr>
<td> 1</td>
<td>3.01</td>
<td> 512</td>
<td>5.15</td>
</tr>
<tr>
<td> 2</td>
<td>3.01</td>
<td> 1024</td>
<td>6.17</td>
</tr>
<tr>
<td> 4</td>
<td>3.01</td>
<td> 2048</td>
<td>6.20</td>
</tr>
<tr>
<td> 8</td>
<td>3.01</td>
<td> 4096</td>
<td>6.16</td>
</tr>
<tr>
<td> 16</td>
<td>3.01</td>
<td> 8192</td>
<td>6.14</td>
</tr>
<tr>
<td> 32</td>
<td>3.05</td>
<td> 16384</td>
<td>6.16</td>
</tr>
<tr>
<td> 64</td>
<td>3.99</td>
<td> 32768</td>
<td>6.13</td>
</tr>
<tr>
<td>128</td>
<td>3.98</td>
<td> 65536</td>
<td>6.13</td>
</tr>
<tr>
<td>256</td>
<td>3.94</td>
<td>131072</td>
<td>6.14</td>
</tr>
</tbody>
</table>
<p>Much of the improved performance can be explained by the more optimal use of cache lines:
the penalty of loading a cache line is distributed among 8 accesses now. This could at
best get us down to 12.5%. The missing improvements are due to <em>prefetching</em>.</p>
<p>Prefetching is a technique by which CPUs predict access patterns and preemptively push
cache lines up the memory hierarchy before the program needs them. This can not work
unless cache line access is predictable, though, which basically means
linear.<sup id="fnref:stride-example" role="doc-noteref"><a href="#fn:stride-example" class="footnote" rel="footnote">4</a></sup></p>
<p>Prefetching happens asynchronously to normal program execution and can therefore almost
completely hide the main memory latency. This is not quite what we observed because the
CPU performs little enough work for memory bandwidth to become the bottleneck.
<a href="https://github.com/meribold/cache-seminar-paper/blob/a32597fbb2c37c52d54a9b87194cc17760ffbc11/seq-access-times/access-times.c#L27-L29">Adding</a> some expensive operations like integer divisions
every loop iteration changes that and effectively levels the cycles spent per iteration
across all working set sizes:</p>
<div class="chart-wrapper">
<img class="normal-img chart" src="/assets/cache-paper/cpu-bound-seq-access-time-plot.svg" alt="Plot of the average number of CPU cycles one access takes vs. the array size when the array is not shuffled and the CPU performs some work for every accessed element" />
</div>
<!-- TODO: add captions to the images? `kramdown` doesn't support this directly, but something
like the following may work.
<figure style="width: 110%">
<figcaption style="text-align: center">
TODO
</figcaption>
<img src="/assets/cache-paper/cpu-bound-seq-access-time-plot.svg"
style="width: 100%"/>
</figure> -->
<p>What I described in this section is <em>hardware prefetching</em>. It uses dedicated silicon to
automatically detect access patterns. There is also <em>software prefetching</em>, which is
triggered by special machine instructions that may be inserted by the compiler or manually
by the programmer.<sup id="fnref:drepper" role="doc-noteref"><a href="#fn:drepper" class="footnote" rel="footnote">5</a></sup></p>
<h2 id="locality-of-reference">Locality of reference</h2>
<p>Two properties exhibited by computer code to varying degrees distinctly impact cache
effectiveness. One is <em>temporal locality</em>. The other is <em>spatial locality</em>. Both are
measures of how well the code’s memory access pattern matches certain principles.</p>
<h3 id="temporal-locality">Temporal locality</h3>
<p>One access suggests another. That is, once referenced memory locations tend to be used
again within a short time frame. This is really the intrinsic motivation for having a
memory hierarchy in the first place. When a cache line is loaded but not accessed again
before being evicted, the cache provided no benefit.</p>
<h3 id="spatial-locality">Spatial locality</h3>
<p><strong>1.</strong> For each accessed memory location, nearby locations are used as well within a short
time frame. <strong>2.</strong> Memory is accessed sequentially.</p>
<p>We have already seen that caches take advantage of both these principles by design:</p>
<ol style="font-weight: bold">
<li><span style="font-weight: normal">Data is loaded in blocks; subsequent accesses to locations in an already-loaded
cache line are basically free.</span></li>
<li><span style="font-weight: normal">Cache lines from sequential access patterns are prefetched ahead of
time.</span></li>
</ol>
<h3 id="notes">Notes</h3>
<p>Access to instructions inherently has good spatial locality since they are executed
sequentially outside of jumps, and good temporal locality because of loops and function
calls. Programs with good locality are called <em>cache-friendly</em>.</p>
<h2 id="example-stdvector-vs-stdlist">Example: <code class="language-plaintext highlighter-rouge">std::vector</code> vs. <code class="language-plaintext highlighter-rouge">std::list</code></h2>
<p>The following C++ program<sup id="fnref:big-os" role="doc-noteref"><a href="#fn:big-os" class="footnote" rel="footnote">6</a></sup> initializes a number of <abbr title="Standard Template Library">STL</abbr> containers with random
numbers and measures the
processor time needed to sum all of them. I first ran it with <code class="language-plaintext highlighter-rouge">Container</code> being a type
alias for <code class="language-plaintext highlighter-rouge">std::list</code>, then for <code class="language-plaintext highlighter-rouge">std::vector</code>. Either way, the asymptotic
complexity is Θ(N).</p>
<div class="language-cpp wide-listing highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">constexpr</span> <span class="kt">int</span> <span class="n">N</span> <span class="o">=</span> <span class="mi">5000</span><span class="p">;</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">Container</span> <span class="n">containers</span><span class="p">[</span><span class="n">N</span><span class="p">];</span>
<span class="n">std</span><span class="o">::</span><span class="n">srand</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">time</span><span class="p">(</span><span class="nb">nullptr</span><span class="p">));</span>
<span class="c1">// Append an average of 5000 random values to each container.</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">N</span> <span class="o">*</span> <span class="mi">5000</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">containers</span><span class="p">[</span><span class="n">std</span><span class="o">::</span><span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="n">N</span><span class="p">].</span><span class="n">push_back</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">rand</span><span class="p">());</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="kt">clock_t</span> <span class="n">t0</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">clock</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">m</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">m</span> <span class="o"><</span> <span class="n">N</span><span class="p">;</span> <span class="o">++</span><span class="n">m</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">num</span> <span class="o">:</span> <span class="n">containers</span><span class="p">[</span><span class="n">m</span><span class="p">])</span> <span class="p">{</span>
<span class="n">sum</span> <span class="o">+=</span> <span class="n">num</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="kt">clock_t</span> <span class="n">t1</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">clock</span><span class="p">();</span>
<span class="c1">// Also print the sum so the loop doesn't get optimized out.</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="n">sum</span> <span class="o"><<</span> <span class="sc">'\n'</span> <span class="o"><<</span> <span class="p">(</span><span class="n">t1</span> <span class="o">-</span> <span class="n">t0</span><span class="p">)</span> <span class="o"><<</span> <span class="sc">'\n'</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-cpp narrow-listing highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">constexpr</span> <span class="kt">int</span> <span class="n">N</span> <span class="o">=</span> <span class="mi">5000</span><span class="p">;</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">Container</span> <span class="n">containers</span><span class="p">[</span><span class="n">N</span><span class="p">];</span>
<span class="n">std</span><span class="o">::</span><span class="n">srand</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">time</span><span class="p">(</span><span class="nb">nullptr</span><span class="p">));</span>
<span class="c1">// Append an average of 5000 random</span>
<span class="c1">// values to each container.</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">N</span> <span class="o">*</span> <span class="mi">5000</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">containers</span><span class="p">[</span><span class="n">std</span><span class="o">::</span><span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="n">N</span><span class="p">]</span>
<span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">rand</span><span class="p">());</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="kt">clock_t</span> <span class="n">t0</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">clock</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">m</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">m</span> <span class="o"><</span> <span class="n">N</span><span class="p">;</span> <span class="o">++</span><span class="n">m</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">num</span> <span class="o">:</span> <span class="n">containers</span><span class="p">[</span><span class="n">m</span><span class="p">])</span> <span class="p">{</span>
<span class="n">sum</span> <span class="o">+=</span> <span class="n">num</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="kt">clock_t</span> <span class="n">t1</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">clock</span><span class="p">();</span>
<span class="c1">// Also print the sum so the loop</span>
<span class="c1">// doesn't get optimized out.</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="n">sum</span> <span class="o"><<</span> <span class="sc">'\n'</span>
<span class="o"><<</span> <span class="p">(</span><span class="n">t1</span> <span class="o">-</span> <span class="n">t0</span><span class="p">)</span> <span class="o"><<</span> <span class="sc">'\n'</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>My result is that computing the sum completes 158 times faster when using
<code class="language-plaintext highlighter-rouge">std::vector</code>.<sup id="fnref:flags" role="doc-noteref"><a href="#fn:flags" class="footnote" rel="footnote">7</a></sup> Some of this difference can be attributed to space overhead of the
linked list and the added indirection, but the more cache-friendly memory access pattern
of <code class="language-plaintext highlighter-rouge">std::vector</code> is key: using <code class="language-plaintext highlighter-rouge">std::list</code> as in this example means random memory access.</p>
<h3 id="note-true-oo-style">Note: “true” <abbr title="object-oriented">OO</abbr> style</h3>
<p>In <abbr title="object-oriented programming">OOP</abbr>, variables are typically referred to by pointers to a common base class. A
polymorphic container of such pointers allows for dynamic dispatch of virtual functions.
However, this carries the risk of degrading the performance of a sequential data structure
to that of a list.</p>
<p>
<picture>
<source srcset="/assets/cache-paper/oo-picture.webp" type="image/webp" />
<img src="/assets/cache-paper/oo-picture.png" alt="Graphic of a contiguous array of pointers with pointees that may be scattered pretty randomly throughout memory" title="The numbered boxes represent pointers that are laid out contiguously in memory. The unlabeled boxes represent the corresponding pointees, which may be scattered across memory pretty randomly." />
</picture>
</p>
<h2 id="conclusion">Conclusion</h2>
<p>The hidden constant separating the time complexities of two reasonable algorithms under
asymptotic analysis can get quite big because of cache effects. Understanding how CPU
caches work helps make good choices for writing fast programs and I hope this article
provided some insight. For a more in-depth discussion, you can read Ulrich Drepper’s
paper <a href="https://www.akkadia.org/drepper/cpumemory.pdf"><em>What Every Programmer Should Know About
Memory</em></a>, which also covers virtual memory,
cache associativity, write policies, replacement policies, cache coherence, software
prefetching, instruction caches, <abbr title="translation lookaside buffers">TLBs</abbr>, and more.</p>
<h2 style="display: initial" id="notes-1">Notes</h2>
<ul>
<li>This article is based on <a href="/assets/cache-paper.pdf">a seminar paper</a> in which you can find some more
details and a list of sources. The TeX files, full source code of all utilized
microbenchmarks, and <a href="https://github.com/meribold/cache-seminar-paper/blob/master/makefile">a makefile</a> that automates running them and builds the
PDF are all <a href="https://github.com/meribold/cache-seminar-paper">available on GitHub</a>.</li>
<li>If you found this article helpful or otherwise worthwhile and want to say thanks, one
way you can do so is by <a href="https://www.buymeacoffee.com/meribold">buying me a coffee</a>.</li>
</ul>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:tangential" role="doc-endnote">
<p>You don’t need to know what that means to understand the rest of this
article. <a href="#fnref:tangential" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:prefetching" role="doc-endnote">
<p>We access random elements because CPUs detect and optimize sequential
access using a technique called <em>prefetching</em>, which would prevent us from
determining access times. More on that later. <a href="#fnref:prefetching" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:128-bytes" role="doc-endnote">
<p>16 <code class="language-plaintext highlighter-rouge">int64_t</code> values of 8 bytes each <a href="#fnref:128-bytes" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:stride-example" role="doc-endnote">
<p>For example, the most complicated stride pattern my laptop’s CPU can detect is
one that skips over at most 3 cache lines (for- or backwards) and may alternate
strides (e.g. +1, +2, +1, +2, …). <a href="#fnref:stride-example" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:drepper" role="doc-endnote">
<p>Software prefetching is discussed by Ulrich Drepper in his paper
<a href="https://www.akkadia.org/drepper/cpumemory.pdf"><em>What Every Programmer Should Know About Memory</em></a>.
Drepper also goes into more detail on practically everything touched on in this article. <a href="#fnref:drepper" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:big-os" role="doc-endnote">
<p>adapted from an article by Sergey
Ignatchenko published in <a href="https://accu.org/journals/overload/24/134/overload134.pdf#page=6">issue 134 of the <em>Overload</em>
magazine</a> <a href="#fnref:big-os" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:flags" role="doc-endnote">
<p>I used <abbr title="GNU Compiler Collection">GCC</abbr> 6.3.1 with <code class="language-plaintext highlighter-rouge">-O3</code> and <code class="language-plaintext highlighter-rouge">-march=native</code>. <a href="#fnref:flags" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Lukas WaymannCPU caches are the fastest and smallest components of a computer’s memory hierarchy except for registers. They are part of the CPU and store a subset of the data present in main memory (RAM) that is expected to be needed soon. Their purpose is to reduce the frequency of main memory access.Primal UI2015-10-23T00:00:00+02:002021-01-01T16:59:56+01:00https://meribold.org/video/2015/10/23/primal-ui<iframe style="position: absolute; top: 0; left: 0; border: 0; width: 100%; height: 100%" src="https://www.youtube.com/embed/qVEXJF1SYD4?rel=0" allowfullscreen="">
</iframe>Lukas Waymann