Diving from the CUDA Error 804 into a bug of libnvidia-container

Several users reported to encounter "Error 804: forward compatibility was attempted on non supported HW" during the usage of some customized PyTorch docker images on our GPU cluster.

At first glance I recognized the culprit to be a version mismatch between installed driver on the host and required driver in the image. The corrupted images as they described were built targeting CUDA == 11.3 with a corresponding driver version == 465 , while some of our hosts are shipped with driver version 460. As a solution I told them to downgrade the targeting CUDA version by choosing a base image such as nvidia/cuda:11.2.0-devel-ubuntu18.04, which indeed well solved the problem.

But later on I suspected the above hypothesis being the real cause. An observed counterexample was that another line of docker images targeting even higher CUDA version would run normally on those hosts, for example, the latest ghcr.io/pytorch/pytorch:2.0.0-devel built for CUDA == 11.7. This won’t be the case if CUDA version mismatch truly matters.

Afterwards I did a bit of research concerning the problem and learnt some interesting stuff which this post is going to share. In short, the recently released minor version compatibility allows applications built for newer CUDA to run on machines with some older drivers, but libnvidia-container doesn’t correcly handle it due to a bug and eventually leads to such an error.

Towards thorough comprehension, this post will first introduce the constitution of CUDA components, following with the compatibility policy of different components, and finally unravel the bug and devise a workaround for it. But before diving deep, I’ll give two Dockerfile samples to illustrate the problem.

Reproduction Samples

The host reported as problematic has 8x GeForce RTX 3090 with driver version 460.67 and CUDA 11.2. Here is an image with torch == 1.12.1 built for CUDA 11.3 and fails on the host:

# Dockerfile_bad
FROM nvidia/cuda:11.3.0-cudnn8-devel-ubuntu20.04
RUN apt update -y && apt install -y python3 python3-pip
RUN pip install torch==1.12.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
ENTRYPOINT ["python", "-c", "import torch; print(torch.rand(2, 3).cuda())"]

By contrast below is an image with torch == 2.0.0 built for CUDA 11.7 and runs normally:

# Dockerfile_good
FROM ghcr.io/pytorch/pytorch:2.0.0-devel
ENTRYPOINT ["python", "-c", "import torch; print(torch.rand(2, 3).cuda())"]

For convenience I also write a Makefile to combine the process of building and running either image:

docker build -t good -< Dockerfile_good
docker run --gpus='"device=0"' --rm -it good
docker build -t bad -< Dockerfile_bad
docker run --gpus='"device=0"' --rm -it bad

With the Makefile you can run make good or make bad to see respective results:

$ make good
tensor([[0.1245, 0.2403, 0.9967],
[0.5950, 0.1597, 0.1985]], device='cuda:0')
$ make bad
<string>:1: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:68.)
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 217, in _lazy_init
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
make: *** [bad] Error 1

We start off touring from the constitution of CUDA.

Components of CUDA

When talked about the term “CUDA”, two concepts “CUDA Toolkit” and “NVIDIA Display Drivers” are usually mixed up. This figure illustrates their distinction as well as the cascading relationship:

Components of CUDA

The driver at low level bridges the communication between softwares and underlying NVIDIA hardwares. The toolkit instead lies at a higher level to provide convenience for easy GPU programming.

If we take a closer look at the driver, we can see it decomposed into two secondary components “user-mode driver or UMD (libcuda.so)” and “kernel-mode driver or KMD (nvidia.ko)”. The KMD runs in OS kernel to do the most intimate contact with the hardware, while the UMD as an abstraction provides API to communicate with the kernel driver.

Generally, the applications compiled by CUDA toolkit will dynamically search and link libcuda.so during starting, which under the hood dispatches user requests to the kernel as illustrated below:

%3 app binaries udrv libcuda.so [user-mode] app->udrv link against kdrv nvidia.ko [kernel-mode] udrv->kdrv talk with

So far so good, if only the compiler in toolkit agrees on APIs with the targeting driver.

Sadly, that is not the norm. In real world, developers compile the programs on one machine and dispatch them to run on others, expecting those programs compiled by a specific version of CUDA toolkit could run on a wide variety of hardwares, or otherwise users would complain about the corrupted binaries.

Towards this guarantee, several compatibility policies are induced.

CUDA Compatibility Policies

Before we introduce the policies, we should know about how the components are versioned. The CUDA toolkit and the drivers adopt different version schemes, with the toolkit versioned like 11.2 and drivers like 460.65. Therefore, “driver 460.65” refers to the version of libcuda.so and nvidia.ko; similarly, when somebody says “CUDA 11.2”, it’s the toolkit version being mentioned.

NVIDIA devises multiple rules to ensure user binaries would work on a wide range of driver-hardware combinations, which can be grouped into two categories, i.e., toolkit-driver compatibility and UMD-KMD compatibility.

Toolkit-driver compatibility

These policies constrain that binaries compiled by a specific CUDA toolkit can run on what version of driver.

Basically we have the “Backward Compatibility”. Each CUDA toolkit has a so-called toolkit driver version . Binaries compiled by that toolkit are guaranteed to run on drivers newer than the toolkit driver version. For example, the toolkit driver version of CUDA 11.2 is 460.27.03, which means binaries compiled by CUDA 11.2 should work on any driver >= 460.27.03. This is the most fundamental and agelong policy.

From CUDA 11 onwards, another policy named “Minor Version Compatibility” was proposed. This policy allows binaries compiled by toolkits with the same major version to a the same driver version requirement. For example, binaries compiled by CUDA 11.0 would work on driver >= 450.36.06. Since CUDA 11.2 has the same major version with CUDA 11.0, binaries compiled by CUDA 11.2 could also work on driver >= 450.36.06 .

The backward compatibility ensures compiled binaries would work on machines shipped with drivers of future version, while the minor version compatibility reduces the necessity of upgrading drivers to run some newly compiled binaries. Generally, a binary compiled by CUDA toolkit $X.Y$ should work with driver with version $M$, if either of the following satisfies:

  1. CUDA toolkit $X.Y$ has toolkit driver version $N$ and $M \geq N$;
  2. $X \geq 11$ and a CUDA toolkit $X.Y_2$ has toolkit driver version $N_2$ and $M \geq N_2$.

However, the above policies only consider the relationship between CUDA toolkit and drivers. What if the user-mode and kernel-mode drivers have diverged version? This is where UMD-KMD compatibility applies.

UMD-KMD compatibility

In ideal case, kernel-mode driver should always work with user-mode driver with the same version. But upgrading kernel-mode drivers is sometimes tricky and troublesome, of which some users such as data center admins could not take the risk. Towards this consideration, NVIDIA devised the “Forward Compatibility” to allow old-versioned KMD to cooperate with new-versioned UMD under some circumstance.

Specifically, a kernel-mode driver would support all user-mode drivers releases during its lifetime. For instance, the driver 418.x has end of life (EOL) in March 2022, before which driver 460.x was released, then KMD 418.x would work with UMD 460.x. The compatibility does not involve anything at a higher level such as CUDA toolkit.

It’s worth noting that, this policy does not apply to all GPU hardwares but only a fraction of them. NVIDIA has limited forward compatibility to be applicable for systems with NVIDIA Data Center GPUs (the Tesla branch) or NGC Server Ready SKUs of RTX cards . If you own a GeForce RTX 3090, like in my scenario, you won’t enjoy this stuff.

Summary of Compatibility

Let’s make a quick review for the various types of compatibility policies. If you have a binary compiled by CUDA $X.Y$, a host with UMD (libcuda.so) versioned $M$ and KMD (nvidia.ko) versioned $M'$, then they would work fine if both of the two conditions hold:

  1. The UMD and KMD is compatible. Specifically, either
    1. the GPU supports forward compatibility (Tesla branch or NGC ready), and driver $M$ comes before the EOL of driver $M'$ (the forward compatibility); or
    2. $M = M'$.
  2. The CUDA toolkit and UMD is compatible. Specifically, either
    1. CUDA toolkit $X.Y$ has toolkit driver version $N$ and $M \geq N$ (the backward compatibility); or
    2. major version $X \geq 11$ and there exists another toolkit $X.Y_2$ with toolkit driver version $N_2$ and $M \geq N_2$ (the minor version compatibility).

Generally, validating the above conditions should help whenever you run in any compatibility problems.

Back to Our Problem

So, what’s wrong with the docker image bad? With above rules in hands we can perform a simple analysis.

Could it be a toolkit-driver incompatibility? Probably NO. According to Table. 1 here, the minor version compatibility applies with CUDA 11.x and driver >= 450.80.02, which our driver version 460 satisfies, let alone binary compiled by CUDA 11.7 working like a charm in the case of docker image good.

It should be due to a KMD-UMD incompatibility, namely, the version of libcuda.so and nvidia.ko is incompatible. Since forward compatibility is not applicable for RTX 3090, we are expecting condition 1.2 holds, where libcuda.so and nvidia.ko should have the same version – this obviously was not the case.

How nvidia driver works with docker?

A process in a container is technically a special process on the host, which shares the same model as other processes do to interact with GPU drivers. Since KMD runs in kernel and not interfered by user space, all programs regardless of on host or in containers are communicate with the same KMD.

%3 bin1 host program kmd nvidia.ko bin1->kmd bin2 docker program bin2->kmd

By contrast, a program can flexibly choose which user-mode driver to link against. It can either link to the UMD installed along with the KMD on the host, or brings its own UMD during packaging and distribution.

%3 bin1 host program umd1 libcuda.so.X bin1->umd1 kmd nvidia.ko umd1->kmd bin2 docker program umd2 libcuda.so.Y bin2->umd2 umd2->kmd

We can list out all the UMDs in a running good container with the command:

$ docker run --gpus='"device=0"' --rm -it --entrypoint= good bash
root@3a19f802a459:/workspace# find / -name 'libcuda.so*' -exec bash -c "echo {} -\> \`readlink {}\`" \; 2>/dev/null
/usr/lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.460.67
/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so.460.67 ->

Looks like there is only one copy of libcuda.so that lies in /usr/lib/x86_64-linux-gnu/ with version 460.67. However, such libcuda.so was not packed with the docker image from the beginning. The library disappears if you omit the --gpus argument:

$ docker run --rm -it --entrypoint= good bash
root@3a19f802a459:/workspace# find / -name 'libcuda.so*' -exec bash -c "echo {} -\> \`readlink {}\`" \; 2>/dev/null

In fact, the library exists on the host and is injected into the container by docker runtime during the startup. This post demonstrates the injection process by viewing docker’s log. Mounting libcuda.so from the host will maximally ensures the KMD-UMD correspondence aligned.

Now that the docker runtime would choose a native UMD, why did the image bad fail?

The internal of image bad

We can likewise check the UMDs in a running bad container as belows:

$ docker run --gpus='"device=0"' --rm -it --entrypoint= bad bash
root@15f9b3c915b8:/# find / -name 'libcuda.so*' -exec bash -c "echo {} -\> \`readlink {}\`" \; 2>/dev/null
/usr/lib/x86_64-linux-gnu/libcuda.so.465.19.01 ->
/usr/lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.465.19.01
/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so.460.67 ->
/usr/local/cuda-11.3/compat/libcuda.so.465.19.01 ->
/usr/local/cuda-11.3/compat/libcuda.so.1 -> libcuda.so.465.19.01
/usr/local/cuda-11.3/compat/libcuda.so -> libcuda.so.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/stubs/libcuda.so ->

OOPS!!! Looks like there’s big difference here. We could derive two observations from the result:

  1. There is already a libcuda.so bundled inside the image at /usr/local/cuda-11.3/compat/libcuda.so.465.19.01, with a higher version of 465.19.01.
  2. During startup, both the native libcuda.so.460.67 and the bundled libcuda.so.465.19.01 are symlinked under /usr/lib/x86_64-linux-gnu/, and most importantly, it’s the bundled one being linked as libcuda.so and chosen by the program.

And that is the reason why the docker image bad violates KMD-UMD compatibility!

The bug of libnvidia-container

Such misbehavior is a consequence of a bug of libnvidia-container. But before we talk about it, let’s take a step back to see what the directory /usr/local/cuda-X/compat does and why should it exist.

Actually the compat directory is part of the CUDA compat package, according to the official docs, which exists to support the forward compatibility . The official base image nvidia/cuda:11.3.0-cudnn8-devel-ubuntu20.04 had this package built in, which contains a higher version UMD libcuda.so.465.19.01 in case of an older-versioned KMD running on the host. As aforementioned, to apply forward compatibility there exists requirement on the underlying hardware. When the requirement unsatisfied, such as for our RTX 3090 GPUs, the libcuda.so from compat package should hopefully not be linked against.

Unfortunately, current release of nvidia-docker would roughly attempt to apply forward compatibility, regardless of whether the GPUs meet the limitation.

The problem was encountered and studied by Gemfield who posted an article PyTorch 的 CUDA 错误:Error 804: forward compatibility was attempted on non supported HW as explanation. Gemfield observed nvidia-docker would simultaneously symlink both the native UMD on host and the compat UMD in docker image under /usr/lib/x86_64-linux-gnu/, and brutely choose the one with higher version as the libcuda.so.1, against which user programs would link.

Obviously this behavior is neither in line with forward compatibility nor with minor version compatibility. Gemfield opened an issue NVIDIA/nvidia-docker#1515 for discussion, where the author guessed it was a bug of libnvidia-container and another issue NVIDIA/libnvidia-container#138 was referred. Both issues are not yet resolved up till now.

The workaround is simple – if there’s no compat package, the compat UMD won’t be applied. We can either remove the compat package or brutely delete the /usr/local/cuda-X/compat directory to let it work:

# Dockerfile_bad
FROM nvidia/cuda:11.3.0-cudnn8-devel-ubuntu20.04
RUN apt update -y && apt install -y python3 python3-pip
RUN pip install torch==1.12.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# hl:begin
RUN apt purge cuda-compat-11-3 -y
# OR
RUN rm -rfv /usr/local/cuda-11.3/compat/
# hl:end
ENTRYPOINT ["python", "-c", "import torch; print(torch.rand(2, 3).cuda())"]
$ make bad
tensor([[0.0059, 0.6425, 0.2299],
[0.2306, 0.5954, 0.0226]], device='cuda:0')


This article elaborates the cause and workaround of CUDA Error 804 when NVIDIA GPUs working with docker. As preknowledge, I introduced the consistution of CUDA, the various categories of CUDA compatibility policies, and how the docker runtime deals with GPU driver. The culprit was discovered to be a bug or deficiency of libnvidia-container, which mishandled forward compatibility and minor version compatibility and was not yet resolved. As a workaround, one can remove the CUDA compat image inside the image to avoid forward compatibility being applied and light the minor version compatibility.


Modern Cryptography, GPG and Integration with Git(hub)

GPG (the GNU Privacy Guard) is a complete and free implementation of the OpenPGP standard. Based on various mature algorithms to select from, GPG acts as a convenient tool for daily cryptographic communication.

GPG has two primary functionalities: (1) it encrypts and signs your data for secure transfering and verifiable information integrity, and (2) it features a versatile key management system to construct and promote web of trust. GPG also has a well-designed command line interface for easy integration with other applications such as git.

This article is going to briefly elaborate some key concepts and usage of GPG, and then present demonstration to cryptographically sign git commits with the help of GPG.

Modern Cryptography 101

To understand how GPG or other privacy tools work, I should first introduce some basic ideas of modern cryptography. Let’s start with the two primary problems of secure communication, which includes data encryption and data integrity/authenticity verification.

Data Encryption

Peer-to-peer data encryption aims to prevent the message from being spied by a potential third party, especially when the two parties are communicating over a channel open to the public. Imagine Alice and Bob are mailing through pigeons, with the message unencrypted and clearly written on the paper. It is possible for a third person called Blake to intercept the pigeon, open the attached mailbox and read the message in it, without Alice and Bob knowing his existence.

Data encryption is introduced to defend against such attacks. For secure data exchange, Alice and Bob should agree on some kind of invertible message processing pipeline. The sender preprocesses (encrypts) the message before it attached to the pigeon, and the recipient performs the inverted process (decrypts) to read the clear message. In terms of cryptography, such pipeline is called a cryptographic algorithm or a cipher.

A cipher usually works with a key (or several keys). With the cipher fixed, the message encrypted with one key should only be decrypted with the same one. Modern ciphers are carefully designed to satisfy that Blake is hard to perform decryption without the key, even if he knows the full detail of the cipher. Under this assurance, Alice and Bob only have to choose a specific algorithm as cipher from the public list, and agree on the key before communication. This simplifies the process of negotiation, as they don’t have to discuss the sophisticated implementation of the cipher.

The currently available ciphers can be roughly categorized into two families, the symmetric ciphers and the public-key ciphers.

Symmetric Ciphers

Symmetric ciphers encrypt and decrypt messages using the same key. They went back far into human history. You might have heard of the Caesar cipher that replaces each plaintext letter with a different one a fixed number of places down the alphabet, which is a famous example of this category. For Caesar cipher, the key is the number of positions being shifted, like 3 for a tranformation of A->D, B->E.

Symmetric cipher exposes several drawbacks in realistic usage. First, it provides no defense against the scenario of the key being stolen. If Blake somehow knows the key, he can both spy and forge the messages sent between Alice and Bob. Also, it would require $n(n-1)/2$ keys to achieve pairwise communication among $n$ persons, increasing the expense of key exchange and opportunity of leakage.

Public-key Ciphers

By contrast, the public-key ciphers mitigate the problems by adopting a pair of keys instead of just one. A message encrypted by one key might only be decrypted with the other, and vice versa.

Practically we name one of them as public key and the other as secret key. The public key is published to whom we want to communicate with, while the secret key is kept locally and must only be known to ourselves. When Alice sends a message to Bob, the message is encrypted with Bob’s public key, and Bob uses his own private key to decrypt it on receiving.

Public-key ciphers reduce the adverse impact of public key leakage. An attacker with Alice’s public key in hand is unable to decrypt messages sent by others to her. Also, only $n$ keys have to be exchanged for $n$-person pairwise communication. The advantages overall result in lower key exchanging expense and inclined popularity of public-key ciphers in real life.

Digital Signatures for Data Integrity

Ciphers solve the problem of data encryption, preventing the messages transfered from being spied by a third party, albeit they do not guarantee the integrity and authenticity of the data. Bob cannot tell whether the message he received is truly sent by Alice, since his public key is known by the world. Towards this purpose, the concept of digital signatures must be introduced.

Digital signatures employs the idea of hashing. In cryptography, hashing is a technique to generate digest for a piece of message. The digest must be almost unique, that is, two different messages should ideally have unequal value of digests. Also, it should guarantee that no one would recover the original plaintext from the digest .

Practically, Alice the sender would attach an encrypted digest as digital signature along with the message, by first applying a hash function and then encrypt with Alice’s own private key on the message. Anyone can decrypt the signature with Alice’s public key to verify that the message is truly signed by Alice and sent as-is. Since no one else knows Alice’s private key, the signature cannot be forged and hence is a mighty tool to assure authenticity.


The ciphers and digital signatures form the foundation of modern cryptography, upon which OpenPGP is proposed and GPG built as a high-level structure for convenient daily usage. This post will not explain the full details of GPG, but its basic idea and some of the frequently-used operations as tutorial.

Compared with the basic public-key system, OpenPGP further adopts a more sophisticated design. OpenPGP adopts a concept “user” to distinguish identities. A user is uniquely identified by his real name and email, and could own a primary key pair plus an optional collection of sub key pairs, each key pair with potentially different capabilities such as encryption or signing. The separation of keys’ responsibility enables one to revoke compromised keys without interfering the validity of others, leading to more flexible key management.

Key Generation

To create a user and generate the key pair, we can use the gpg --generate-key command

$ gpg --generate-key
gpg (GnuPG) 2.2.19; Copyright (C) 2019 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Note: Use "gpg --full-generate-key" for a full featured key generation dialog.

GnuPG needs to construct a user ID to identify your key.

Real name: FooBar
Email address: foobar@foobar
You selected this USER-ID:
"FooBar <foobar@foobar>"

Change (N)ame, (E)mail, or (O)kay/(Q)uit? O
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
gpg: key 417706EE02BA78E3 marked as ultimately trusted
gpg: revocation certificate stored as '/home/hsfzxjy/.gnupg/openpgp-revocs.d/A100A3E7D94F665A2CB5A34D417706EE02BA78E3.rev'
public and secret key created and signed.

pub rsa3072 2023-01-10 [SC] [expires: 2025-01-09]
uid FooBar <foobar@foobar>
sub rsa3072 2023-01-10 [E] [expires: 2025-01-09]

In this example, we’ve created a user with real name being FooBar and the email foobar@foobar. During the process, the program will prompt a dialog inquiring to enter a passphrase, which acts as the main guardian to access your secret key.

By default GPG generates two keys with different capabilities. The primary key prefixed with pub is for signing (S) and certifying (C), and a sub key prefixed with sub for encrypting (E). With gpg --list-key and gpg --edit-key commands, we can inspect the keys stored in our local database and edit one or more of them.

Basic Document Signing

When posting a document to the public, one would like to claim his issuance and expects no one could tamper the content, which can be achieved by digitally signing the document. Let’s check an example

$ echo "hello world" > doc
$ gpg --sign -u FooBar doc
$ cat doc.gpg
-- some binary data --

Here we create a file named doc with a string "hello world" as the content. gpg --sign -u FooBar signs and encrypts the given document with user FooBar‘s secret key, with the bundled result written to a new file doc.gpg. A person knowing FooBar‘s public key could verify its integrity with --verify

$ gpg --verify doc.gpg
gpg: Signature made Tue 10 Jan 2023 09:02:03 PM CST
gpg: using RSA key A100A3E7D94F665A2CB5A34D417706EE02BA78E3
gpg: issuer "foobar@foobar"
gpg: Good signature from "FooBar <foobar@foobar>" [ultimate]

or directly decrypt it with --decrypt

$ gpg --decrypt doc.gpg
hello world
gpg: Signature made Tue 10 Jan 2023 09:02:03 PM CST
gpg: using RSA key A100A3E7D94F665A2CB5A34D417706EE02BA78E3
gpg: issuer "foobar@foobar"
gpg: Good signature from "FooBar <foobar@foobar>" [ultimate]

If the content of doc.gpg be tampered, either of the above operations will fail.

GPG provides several flags to customize the generation of digital signature. For instance, flag --clearsign forces the signature to be separately attached after the plain text, which is more convenient for scenario like sending via e-mail

$ gpg --clearsign -u FooBar -o- doc
Hash: SHA512

hello world


With -o<filename> the output will be directed to <filename> instead of the default file name doc.gpg.

Document Encryption and Trust of Web

Document signed using above method could be read by a wide audience, as long as they have user FooBar‘s public key. For a more limited usage where the document should be seen by specific recipient, say user BazBaz, we should encrypt it with BazBaz‘s public key.

The command gpg --export -u BazBaz > bazbaz.gpg will dump all public keys of user BazBaz to file bazbaz.gpg, which can be distributed and imported by other users across the web. As an example, user FooBar imports the file to his local database

(foobar) $ gpg --import bazbaz.gpg
gpg: key 90D332C875527240: public key "BazBaz <bazbaz@bazbaz>" imported
gpg: Total number processed: 2
gpg: imported: 2
gpg: new subkeys: 1
gpg: new signatures: 1
(foobar) $ gpg --list-key
pub rsa3072 2023-01-10 [SC] [expires: 2025-01-09]
uid [ultimate] FooBar <foobar@foobar>
sub rsa3072 2023-01-10 [E] [expires: 2025-01-09]

pub rsa3072 2023-01-10 [SC]
uid [ unknown] BazBaz <bazbaz@bazbaz>
sub rsa3072 2023-01-10 [E]

As we can see, the public key of BazBaz now shows up in the local list, but somehow the uid is labeled as [unknown] instead of [ultimate] as FooBar does.

The label [unknown] indicates that GPG will distrust any newly imported keys by default. OpenPGP comes with a multi-level trust model in defense against someone pretending as others’ identity, with [unknown] being the least trusted level. GPG will prompt us if we attempt to encrypt with an [unknown] key

(foobar) $ gpg --sign --encrypt -u foobar --recipient bazbaz doc
gpg: 9F85CD170E8B1269: There is no assurance this key belongs to the named user

sub rsa3072/9F85CD170E8B1269 2023-01-10 BazBaz <bazbaz@bazbaz>
Primary key fingerprint: EE0B 6575 8BBA 776A 2D05 21B2 90D3 32C8 7552 7240
Subkey fingerprint: 7E53 135B C569 F125 63D1 BEF2 9F85 CD17 0E8B 1269

It is NOT certain that the key belongs to the person named
in the user ID. If you *really* know what you are doing,
you may answer the next question with yes.

Use this key anyway? (y/N)

This mechanism protects us from accidentally sending secret information to forged identity.

To tell GPG that the identity is really trusted, we can sign the public key to increase its trust level. Remember this must be done after you actually verify the identity via direct contact to that person. The --sign-key flag is used for this purpose

(foobar) $ gpg -u foobar --sign-key bazbaz
pub rsa3072/90D332C875527240
created: 2023-01-10 expires: never usage: SC
trust: unknown validity: unknown
sub rsa3072/9F85CD170E8B1269
created: 2023-01-10 expires: never usage: E
[ unknown] (1). BazBaz <bazbaz@bazbaz>

pub rsa3072/90D332C875527240
created: 2023-01-10 expires: never usage: SC
trust: unknown validity: unknown
Primary key fingerprint: EE0B 6575 8BBA 776A 2D05 21B2 90D3 32C8 7552 7240

BazBaz <bazbaz@bazbaz>

Are you sure that you want to sign this key with your
key "FooBar <foobar@foobar>" (417706EE02BA78E3)

Really sign? (y/N) y
(foobar) $ gpg --list-key
-- omit --
pub rsa3072 2023-01-10 [SC]
uid [ full ] BazBaz <bazbaz@bazbaz>
sub rsa3072 2023-01-10 [E]

Now check the list again, we can see that the trust level of BazBaz‘s key changes from [unknown] into [full].

OpenPGP’s trust model allows trust to propagate over the web, which eases the overhead of acknowledging key identities. In short, if user A trusts user B‘s identity, and user B has signed the public key of user C, then user A will transitively trust user C‘s identity. User A by this way has no need to individually verify the identity of all the imported keys and therefore enjoys an easier key management scheme.

GPG and Git/Github Integration

GPG can be employed to claim the authenticity of your code by digitally signing your Git commits. Since Git itself uses email to distinguish authors, it’s possible to commit as other people’s identity. A story described how one could push code to Github as the identity of Linus Torvalds. Such vulnerability can be exploited to disseminate malicious code or falsy information over the internet.

Integrate GPG with Git

The Github Docs has a series of posts as the guideline to commit signing and Github interoperation. To start with, we should tell Git about our signing key:

$ gpg --list-secret-keys --keyid-format long
sec rsa3072/417706EE02BA78E3 2023-01-10 [SC] [expires: 2025-01-09]
uid [ultimate] FooBar <foobar@foobar>
ssb rsa3072/25E8CB9C4F68EE16 2023-01-10 [E] [expires: 2025-01-09]
$ git config --global gpg.signingkey 417706EE02BA78E3!

With the ! suffix the key precedes others and would always be used. We can alternatively configure to sign commits by default

$ git config --global commit.gpgsign true

As demonstration, let’s switch to the workspace of a git repository and commit the code as usually did

$ git add . && git commit -m 'signed commit'

Afterwards, we can inspect the history and see a digital signature attached

$ git show --show-signature HEAD
commit 57ac1c20094d0248a4a3e8676050f53f547a6afa (HEAD -> hexo)
gpg: Signature made Wed 11 Jan 2023 04:37:14 PM CST
gpg: using RSA key A100A3E7D94F665A2CB5A34D417706EE02BA78E3
gpg: Good signature from "FooBar <foobar@foobar>" [ultimate]
Author: hsfzxjy <hsfzxjy@gmail.com>
Date: Wed Jan 11 16:37:14 2023 +0800

signed commit
-- omit --

which indicates the commit has been signed with success.

Integrate GPG with Github

GPG-signed commits can be highlighted with a Verified label displayed aside on Github, as showcased in the image below,

from which other people would know and trust the authenticity of this commit. Towards this effect, one should associate his GPG keys with Github profile. As instructed in “Adding a GPG Key”, the GPG public key is firstly exported from the command line in the text-form as

$ gpg --armor --export foobar
# GPG public key exported

which should be copied to the clipboard with the separators included. Then in the upper-right corner of any page on Github, click the profile avatar and select Settings -> Access -> SSH and GPG keys -> New GPG key, paste the previously copied content into the box, and confirm with the Add GPG Key button, we should finish the association.


GPG is a convenient software to do cryptography jobs and perform key management. While its history went back into old days and the UX might look wierd, it still stands as one of the de-facto standards in modern world. This article extensively explains the fundamental idea of modern cryptography on which GPG is based, followed with the demonstration of some GPG every-day usages, and further the instruction to integrate it with external tools/services such as Git or Github. Hopefully it will enlighten you about the approaches to carry out secure message exchange in daily life.

Move the Root Partition of Ubuntu

Some days ago, I made the decision to shrink the footprint of Windows system on my laptop and reallocate the disk space to the Ubuntu system that resides next to it. Ubuntu is competent for my daily use of programming and web browsing so I hardly launched the OEM-shipped Windows since the laptop was bought. The Windows takes up a not-so-small portion of my SSD space, which can be better utilized instead of wasted in vain.

| --- Windows C: (256 GB) --- | --- Ubuntu / (256 GB) --- |
| --- Windows C: (120 GB) --- | --- Ubuntu / (392 GB) --- |

As planned in the diagram above, 136 GB space would be reclaimed from the Windows C: partition and merged into the root partition of Ubuntu. I had the experience to adjust the size of disk partitions, but this time the job was a little more risky, since it involved moving the starting point of Linux root partition. Linux relies on special information in directory /boot/efi to boot itself, and if the information is not modified accordingly during the moving, the entire system would become unbootable.

To avoid catastrophic consequence, I did some research beforehand and read a detailed guidance on AskUbuntu. It turns out the tweak requires two steps to accomplish. First is to adjust the partition sizes with the GParted tool as I used to do for ordinary data partitions. The GParted system has to reside on and be booted from a separate USB device, so that the hard disks in my laptop can be fully unmounted for manipulation. This is the easiest part thanks to the straightforward GUI partition editor provided by GParted, with which I can do the adjustment in a few clicks.

Each disk partition is assigned with a UUID or serial number like b424102c-a5a6-489f-b0bd-0ea0fc3be7c3 to uniquely identify itself, which will change as the partition moved or resized. So the next step is to rebuild grub configuration to ensure it contains the new serial number of my root partition. But before running grub-install I have to emulate the directory hierarchy of my Ubuntu system by mounting relevant partitions to form the root directory and using chroot to start an interactive shell in it

(gparted)$ mkdir /tmp/mydir
(gparted)$ mount /dev/nvme0n1p5 /tmp/mydir
(gparted)$ mount --bind /dev /tmp/mydir/dev
(gparted)$ mount --bind /proc /tmp/mydir/proc
(gparted)$ mount --bind /sys /tmp/mydir/sys
(gparted)$ chroot /tmp/mydir
chroot: failed to run command ‘/bin/bash’: No such file or directory

Following the guidance, however, chroot did not succeed and complained that /bin/bash could not be found. I checked the corresponding directory /tmp/mydir/bin and found it was a broken symbolic link

(gparted)$ ls /tmp/mydir/bin -al
lrwxrwxrwx 1 root root 7 May 28 2020 /tmp/mydir/bin -> usr/bin

It appears that /bin is a symlink to usr/bin, but my /usr directory resides on the other partition and not yet mounted. With the directory /usr mounted, the chroot command works as desired.

(gparted)$ mount /dev/sda7 /tmp/mydir/usr
(gparted)$ chroot /tmp/mydir

The spawned interactive shell allows command to run as it were in my Ubuntu system. Type grub-install /dev/nvme0n1 to write in the new serial number of root partition. It’s worth noting that the argument /dev/nvme0n1 passed to grub-install is the name of the hard disk device to write instead of some partition name like /dev/nvme0n1p1.

(ubuntu)$ grub-install /dev/nvme0n1
Installing for x86_64-efi platform.
grub-install: error: cannot find EFI directory.

Oops, the command failed and something still going wrong. After some time of inspection, I find the culprit to be that the directory /boot/efi is empty, which by default should be a bind mount to partition /dev/nvme0n1p1 but not mounted properly. This can be solved with another mount command

(ubuntu)$ mount /dev/nvme0n1p1 /boot/efi
(ubuntu)$ grub-install /dev/nvme0n1
Installing for x86_64-efi platform.
Installation finished. No error reported.

By now the boot information is eventually updated. I reboot my laptop and everything works as-is.

So the take away for this tweaking is that some special directories like /usr or /boot/efi reside in different partitions outside of the root directory / on my laptop. If you are fixing the grub and come across some similar error reports, be sure to correctly mount all concerning partitions to form the filesystem hierarchy.

A New Programmer Kicks a Roadblock

The time I composed my first program can be back to my junior high school age. It was the first day of PC lesson, and everybody crowded to the computer classroom. We were told to learn “programming” there. The kids who were talented would be selected and trained for OI . Others instead would go to an ordinary class and learn something more general.

I was anxious. Before the time I had no concept of what “programming” is, nor had I ever gone through a real PC lesson. The PC lesson in my primary school barely taught anything. Over the time the teachers let us play games instead. I could type merely a dozen of characters per minute, since I’d never received a thorough typing training. I was ignorant of inside the metal box. I was a complete computer idiot.

But some of my classmates did. They typed swiftly like wind, they knew how to play with the operating system, and what’s more, they were chattering excitedly about things like “C language”, “array” or “for-loop”, words I’d never heard of.

I sit in front of a monitor and the class began. The teacher said we were going to learn a language named “Pascal”, and she instructed us to open the “Free Pascal IDE”. I followed a few clicks through a cascaded menu and finally reached the item. A window popped out.


The screenshot was taken on my Ubuntu recently, but at the time it was on Windows 7 and looked slightly different. Not many people these days have heard of the Pascal language, and fewer have seen this antique interface.

It was the weirdest interface I had ever seen. The IDE was like another system trapped in a small unresizable window , with queerly rendered icons and widgets. The menu wouldn’t expand on cursor hovering. The editor wouldn’t scroll when I wheel my mouse. And most importantly, there was English everywhere, which frightened me.

The teacher then showed us our first program to type. It was a simple one that reads an integer from one file, and writes its square to another. The code was like

program program1;
a: integer;
assign(input, 'program1.in'); reset(input);
assign(output, 'program1.out'); rewrite(output);
writeln(a * a);
close(input); close(output);

It took me quite a while to put these lines onto the screen, and more time to “save the code as a file”. Before the day I had no idea of what a “file” is, plus the file selector of IDE was not ergonomic at all. After saving I just noticed an icon with title program1.pas popped out in the Windows file explorer. Then I hit the Compile menu entry. More icons popped out, including one named program1.exe — and that was my program.

The next to figure out was how to run the program, which was comprised of several complicated steps.

The first thing I should do is to right-click in the file explorer, select the “New -> Text Document” entry, and rename it to program1.in. The OS would prompted me as I’ve changed the file extension, but I should click “Yes”. Then right-click on the created file, select “Open with…” and choose “Notepad” in the dialog. In the notepad type an integer like 3, save and close it.

By now the input was prepared, and I should double-click the program1.exe file to execute the program. A black window flashed by, and one more icon with title program1.out appeared. Open it with the same trick as input file, where I saw the result number 9.

Woah, that was amazing. Within 40 minutes I’d created something “intelligent”, albeit excessively simple, working faithfully whatever number I fed .

In company with the flood of joy, however, there goes the frustration. It aroused a feeling that programming is complicated as ordering a banquet for a serious occasion, with so much doctrinal detail to care about. What upset me the most is, I spent most of my time fighting against irrelevant issues, but caught little idea about true programming throughout the class hour. The reason is that I lacked certain understanding about the OS beneath, without which one could go nowhere on the trip towards programming.

And there exists another question — is interacting with a program always so painful? Of course not, but until several months later did I realize the assign(...) statements were not a necessity of an integral program, and there’s so called “command line interface” where you can type the input easily and immediately get the result. The awkward interaction bridged by files was actually dedicated for OI evaluation, as I knew afterwards. It took me one year to understand the ABC of GUI programs, when I built my first Form-based application with Delphi. My program no longer shipped with an ugly black window! And it unlocked varied interaction as the ones for daily use. After more years, with a broader understanding of programming I get, I am able to create websites, mobile apps or anything fit in my requirement. But for a 12-year-old kid at the time, the first program was just not appealing and NOT COOL at all.

The class, of course, was not designed for teaching cool things. It was for choosing talented guys towards a specific target. But over the years, I kept seeing people who were new to programming and struggled at half way, for one reason or another. This makes me consider about the root obstacle for a new guy to learn programming.

The way they are taught is no doubt a fundamental factor. Learners should be motivated so as to earn confidence. I used to know some power users who get started with programming smoothly and swiftly. They have clear goals for programming, some to tweak the system behavior and others to automate daily work. They learn the minimal knowledge by documentation or blog posts, and then come up with a prototype program which accomplishes the job. The entire process is interesting and fulfilling.

But for elementary learners, the ones who know little or nothing about computers, this does not always apply. Most of them are aimless, having no idea what programming can be used for. What’s worse, they are taught to use inappropriate tooling, deteriorating the learning to some boring and painful nightmare.

I can remember in the Programming 101 of my college around six years ago, we were taught C and to use the obsolete Visual C++ 6.0 IDE. The compulsory course was rather like a math one, where most time was spent in the classroom reading slides, and homework was handwritten to figure out the result of code fragments. The merely four coding tasks were to implement some algorithms and data structures, fairly dull. Some of the classmates had no deep knowledge of computers, or even didn’t use before (since mobile devices were popularized). They went through a hard time to understand low-level concepts like pointers, and were desperate in finishing the coding tasks. They learnt for the exams, with little or no interest, and soon forgot the things in one or two semesters.

I am not claiming languages at low-level like C is not suitable as the first language — for those will major in computer science, they demonstrate well how the machine works. But other learners deserve a much modern language at a higher level, plus a coding environment that will hide off obscure machine detail. The language and tooling should ease the hurdle to create appealing projects.

The language we do have, like Python. It’d better to be young or carefully designed, so that backward compatibility won’t cause too much confusing syntax. It should support imperative paradigm so as not to blow the learner’s brain (unless for mathematicians), but not limited to this for going further. And most importantly, it should conceal the low-level stuff to better illustrate the basic idea of programming.

But the tooling we don’t, at least not yet perfect. Lots of work should be done to create such a layer between OS and ignorant learners, and should be done perfectly well without bugs. I’ve seen bugged programming environment leaked more detail about the underlying support, causing its users frustrated and frightened.

The need of domain-specific learners may also be noticed. Some people learn programming to improve the productivity in their expertised fields, e.g., data analysis or financial trading. Like power user, they would be fulfilled if the first few programs can assist the jobs, but from time to time that’s not the case. The guidance or tools, however, are often poorly crafted, probably because there’re few professional programmers in the field.

Over the time I have witnessed programming languages and toolchains evolving, which enables the chance for skilled guys to build faster and safer programs more easily, but the learning curve of 0 to 1 benefits not much from the trend. I am expecting to see it change in the future.

Git-based Dependencies in Dart and Go

Both Dart and Go support decentralized package distribution. One is able to directly adopt an existing git repository as dependency, easing the effort of distributing packages.

Sometimes we might expect more fine-grained control on what to pull from a git repository. For example, to lock a package’s version, we would specify a particular tag, commit or branch name to pull from. Or if it’s a mono-repo, we would choose a sub-directory from the repository root. This post summarizes how to achieve these purposes in both languages.


dart pub add has several options related to adding a git repository as dependency. To start with, one should specify the repository’s URL with the --git-url argument

dart pub add repo --git-url https://github.com/user/repo.git

This command adds a dependency named repo by pulling from https://github.com/user/repo.git. The --git-path argument can be provided to specify from which sub-directory of the repository dart should read from

dart pub add repo --git-url https://github.com/user/repo.git --git-path subdir/

Dart can also read from a specific git commit or branch (but not tags!), for which one should supply the --git-ref argument

dart pub add repo --git-url https://github.com/user/repo.git --git-path subdir/ --git-ref branch_name
# OR
dart pub add repo --git-url https://github.com/user/repo.git --git-path subdir/ --git-ref <commit-hash>


Go modules are born to be based on git repositories. We regularly use go get to add a dependency

go get github.com/user/repo

, which pulls a remote repository, and parses its content as a Go module. A tag @vX.Y.Z can be suffixed to specify a particular git tag like

go get github.com/user/repo@vX.Y.Z

, which instead pulls from a tag named vX.Y.Z. There’s a detailed description on version tag’s semantics at Go Modules Reference - Versions. Straightforwardly, we can append its path after the repository’s name if a sub-directory is to be used

go get github.com/user/repo/subdir

Things become a little trickier when both sub-directory and tag are wanted. Literally, we might type a command as below

go get github.com/user/repo/subdir@vX.Y.Z

It, however, will fail with a complaint go: github.com/user/repo/subdir@vX.Y.Z: invalid version: unknown revision subdir/vX.Y.Z. What’s happening is, when a sub-directory is involved, Go modules will seek for a tag name with a pattern like subdir/vX.Y.Z, instead of aforementioned vX.Y.Z. This enables multiple sub-repos in a large mono-repo to individually tag their own version. We are hence required to rename the tag as subdir/vX.Y.Z, which should work as intended.


Reversy Naming

I am always a dedicated fan of writing naturally readable code – by “naturally readable” I mean, one can read a line of code as if it were a sentence of English (or maybe other human languages). It’s believed that the practice encourages more self-explainable code, as the code reads more like a human-composed article, instead of some gibberish only recognizable by machine.

The practice recommends to name functions or variables following the word order of human language, for English that is, subjects come after verbs, and adjectives go before nouns that being modified. The samples below showcase how it guides naming in a program (please hold your opinions about the casing)

  • append_to_list(lst, item). A function that appends an item to a list, which can read as “append to the list (specified by name lst) with the item”.
  • register_service_notifier(func). A function that registers another function as a service notifier, which can read as “register a service notifier with the function func“.
  • UserFollowersListView. The name of a web component which is a list view to display followers for a user.

It plays well and improves my developing experience most of the time, but there is no silver bullet, just like other practices or guidelines.Sometimes I found the readability even degrades. I kept skimming the lines and just couldn’t locate an item efficiently.

For a brief period, I thought it was caused by the verbosely long word sequence, since compared with shorter ones, they took more time to recognize. But then after some investigation, I realized it was not.

The true culprit is that, such “naturally readable” naming displaces the emphasis from the beginning of word sequence. The emphasis of a name is the words with highest level, usually the most general ones. For instance, append_to_list emphasizes on list, which is however placed at the rear of the name.

As human, at least for me, a name with its emphasis at the front is more recognizable than those doesn’t. When skimming through a screen of code, my sight focuses on token boundary like whitespaces, hopping from one to another. During which time, I will glimpse at one or two words next to the boundary, which, usually at the fore of a name, and subconsciously match them to what I am concerning.

The matching process itself, will speed up if I meet first the words at a leading position of the name / phrase. My mental model resolves a concept by first the general idea and then the descriptive details. This, however, goes to the opposite of most human languages, as far as I know, whose grammar puts general words after the modifiers.

And thus I come up with a new practice – the Reversy Naming, in order to accord with my mental model, which places the emphasis first in a name, and then goes the words at low-level. To illustrate, I apply the style to the three names above as an example

  • append_to_list -> list_append
  • register_service_notifier -> service_notifier_register
  • UserFollowersListView -> ListViewUserFollowers

Probably wierd at first sight, but despite the inversed word order, they are not difficult to read. In fact, here comes several additional benefits.

Firstly, it conforms with the qualified syntax in most programming languages, which most people used to. A programming language with object-oriented paradigm usually supports a syntax like object.method. In Python I write things like list.append() for years, which is similar to aforementioned list_append, and I haven’t got any readability problem with it.

The next point is, the names will align well if they appears in consecutive lines. Consider there are many functions to operate a service, with “naturally readable” naming, we have

func RegisterServiceNotifier()
func UnregisterServiceNotifier()
func StartService()
func StopService()

It is not evident at a glance that these functions are for manipulating the same type, although they have a common word “Service” in the middle. But with Reversy Naming, we could have

func ServiceNotifierRegister()
func ServiceNotifierUnregister()
func ServiceStart()
func ServiceStop()

Now they share a common prefix, which indicates their affiliation, crystal and neat. If there’s a secondary emphasis, “Service -> Notifier” for instance, they will also align in a good manner.

I’ve also spotted similar naming rules in the wild, which supports Reversy Naming is an acceptable and recommended practice in some scenarios. For example, the Style Guide - Vue.js reads

Order of words in component names

Component names should start with the highest-level (often most general) words and end with descriptive modifying words.

|- SearchButtonClear.vue
|- SearchButtonRun.vue
|- SearchInputQuery.vue
|- SearchInputExcludeGlob.vue
|- SettingsCheckboxTerms.vue
|- SettingsCheckboxLaunchOnStartup.vue

Since editors typically organize files alphabetically, all the important relationships between components are now evident at a glance.




















































Invalid Golang Pointers Can Bite You Even If You Don't Dereference

In Golang, if you coerce a uintptr variable into unsafe.Pointer (or further, to some *T), the linter will warn with the message "possible miuse of unsafe.Pointer". This makes sense because the uintptr variable may contain an address that points to a piece of invalid memory, and dereferencing such a pointer is catastrophic (usually aborts the program).

I was always aware of the above discipline, but I thought it would be OK to hold the pointers but not dereference them. This is true in C/C++, but not for Golang, which I did not realize until recently.

In fact, the program can panic even if you just keep an invalid pointer on the stack!

A strange invalid pointer panic

The story back from an attempt of interoperation between Golang and JVM, when I was working on a Go-written dynamic library which need to operate bluetooth socket on Android. Android does not provide any native interfaces for bluetooth, so I had to call into JVM and invoke Java APIs.

I have learned JNI beforehand, which is an interface designed for interacting with JVM from native codes. Since JNI is provided to programmers as C++ header files, I had to seek a Golang binding. Then I noticed xlab/android-go which, as utilities, encapsulates the full list of JNI types and functions. The project was out of maintenance for some while, but using only the JNI pieces should be fine.

With the help of xlab/android-go, I quickly finished a prototype of my library, so good, so far. I bundled the library into apk file, ran it on my phone, but unfortunately it crashed with the stack strace

runtime: bad pointer in frame kcore_android/bluetooth.ioWorker.Loop at 0x400018eeb0: 0x1
fatal error: invalid pointer found on stack

runtime stack:
runtime.throw({0x7dbf000df2?, 0x7dbf19b4a0?})
/usr/local/go/src/runtime/panic.go:992 +0x50 fp=0x7d980b6c70 sp=0x7d980b6c40 pc=0x7dbf058ff0
runtime.adjustpointers(0x7d980b7000?, 0x36581?, 0x7dbf164983?, {0x7dbf192338?, 0x7dbf19b4a0?})
/usr/local/go/src/runtime/stack.go:628 +0x1cc fp=0x7d980b6cb0 sp=0x7d980b6c70 pc=0x7dbf0716cc
runtime.adjustframe(0x7d980b7000, 0x7d980b70f8)
/usr/local/go/src/runtime/stack.go:670 +0xa4 fp=0x7d980b6d40 sp=0x7d980b6cb0 pc=0x7dbf0717b4
runtime.gentraceback(0x7d00001000?, 0x7d980b7140?, 0xffffff80ffffffe0?, 0x40001824e0, 0x0, 0x0, 0x7fffffff, 0x7dbf116168, 0x43?, 0x0)
/usr/local/go/src/runtime/traceback.go:330 +0x734 fp=0x7d980b7060 sp=0x7d980b6d40 pc=0x7dbf07b7d4
runtime.copystack(0x40001824e0, 0x1000)
/usr/local/go/src/runtime/stack.go:930 +0x300 fp=0x7d980b7220 sp=0x7d980b7060 pc=0x7dbf071fa0
/usr/local/go/src/runtime/stack.go:1110 +0x37c fp=0x7d980b73d0 sp=0x7d980b7220 pc=0x7dbf0723fc
/usr/local/go/src/runtime/asm_arm64.s:314 +0x70 fp=0x7d980b73d0 sp=0x7d980b73d0 pc=0x7dbf084bc0

goroutine 51 [copystack, locked to thread]:
--- snip ---

I was not frightened, since no code would succeed in one go. But the error report did frustrate me from two perspectives

  1. It involved one of my stack frames (kcore_android/bluetooth.ioWorker.Loop), but the panic was thrown from some source code that lies out of my codebase (runtime/stack.go).
  2. It was caused by an invalid pointer, whose value was 0x1.

I guessed the pointer was returned from the Java side, for some unknown reason it had a wierd value of 0x1. But what I didn’t understand is how it could crash my program. I have tried carefully to avoid dereferencing any non-Go pointer in my code.

Also, the mismatch between stack frame and source code made me really difficult to locate the problem. For a time I thought goroutine 51 stopped at the scene where the pointer troubled, as its stack trace contained the aforementioned frame bluetooth.ioWorker.Loop, but it didn’t. In fact, the goroutine stopped at another line when I restarted the program! This was annoying.

It took me almost half a day to resolve and understand the problem. I will first explain the origin of the invalid pointer, and then show how it would crash the program.

The origin of 0x1 pointer

In JNI, the C type jobject acts as a handle for Java object, which is technically an alias of void*. They can be created by calling most JNI functions like JNIEnv->CallObjectMethod.

Although being a pointer type, a jobject variable is not necessarily a valid pointer. To understand one should know that there exists two kinds of object references in JNI, local reference and global reference. Local references will be recycled at the end of a Java frame, while global references survive longer until you delete them.

They not only differ semantically, but practically diverse in values. Local references often contain smaller values like 0x01, 0x75, yet global references will have values like 0x7dbeffc1cf. I guess local references are not actual pointers but indices of some internal object tables.

Symmetrically, xlab/android-go defines a Jobject which was an alias for unsafe.Pointer. So if you recieve a local reference from JNI functions, you are owning an invalid pointer at Go side.

Go runtime checks invalid pointers during stack growth

What’s interesting is that, goroutines do not statically allocate their stack. Instead, they are able to grow or shrink the stack according to our needs. I will not dive into the details of this mechanism, which you may read from the article Go: How does the goroutine stack size evolve? if you are interested.

My panic was thrown by an invalid pointer checking during stack growing. Why should the Go runtime check for invalid pointers here? Because growing a stack involves memory re-allocation, and the runtime must ensure no pointer is invalidated after the potential moving.

To see how a moving could invalidate pointers, let’s consider an example. Say we have a goroutine whose stack ranged in address space 0x8000 - 0x8800. An integer i int was stored at 0x8000, and a pointer ptr *int referenced to that int stored at 0x8004, whose value is 0x8000. Now we grow the stack by moving it to address space 0xA000 - 0xB000. If ptr retains its old value, it will no longer point to i since i has been moved to 0xA000! Therefore, during a stack growth, Go runtime must also check the existence for such pointers, and change their values accordingly.

However, the Go runtime does more than checking whether or not a pointer value falls in the old address space range. It also checks and complains about pointers with small values

func adjustpointers(/*...*/) {
/* --- snip --- */
if f.valid() && 0 < p && p < minLegalPointer && debug.invalidptr != 0 {
// Looks like a junk value in a pointer slot.
// Live analysis wrong?
getg().m.traceback = 2
print("runtime: bad pointer in frame ", funcname(f), " at ", pp, ": ", hex(p), "\n")
throw("invalid pointer found on stack")
/* --- snip --- */

The above snippet can be found at runtime/stack.go. If a pointer value is less than minLegalPointer (which is 4096), the runtime will also panic! And that’s the culprit for my case.


Now I know that the panic comes from two aspects. First I have an invalid pointer due to FFI, although I don’t mean to dereference it. The Go runtime, however, does more than I thought behind the scene. It moves the goroutine stack when necessary, during which it checks and complains for invalid pointers.

This reminds me not to coerce foreign pointer-like values into Go pointers, if you won’t dereference them at the Go side. The safest practice is to keep them uintptr. As a solution, I patch and slim xlab/android-go into hsfzxjy/android-jni-go, which works like a charm.

I also create a minimal example to reproduce the above problem, for whom interested to investigate. In this example, the main goroutine stack will grow during the invocation of foo() -> bar() -> baz(), during which the Go runtime encounters the crafted pointer ptr, and eventually panics.

package main

import (

func main() {
var a [10]int

func foo(a [10]int) {
var b [100]int
ptr := unsafe.Pointer(uintptr(1))
fmt.Printf("%p\n", ptr)

func bar(a [100]int) {
var b [1000]int

func baz(a [1000]int) {}

Side Project(副业)

计算机从业者们似乎都喜欢写 side project,这在中文社区中有个通俗的说法即「搞副业」。如果你经常逛 V2EX、Reddit 的编程板块或是 Hacker News,你会看到人们分享的各种各样的 side project,小到一个百余行代码的实用小工具,大至一个框架、一个网站乃至一个完整的准商业项目。

人们在分享自己的创造时往往怀着极大的热情。这是一种即使隔着屏幕也能感受到的心情,就像七岁的男孩组装好了第一辆四驱车,又或是料理爱好者凭自己的努力烧了一顿高难度的饭菜。他们分享的是自己的宝贝,并期望在人群中掀起波澜。在一些社区如 r/rust,人们热衷于讨论这样的分享,给予肯定以及有意义的反馈。但事情并不总是如意,在另外一些地方,如综合性的或是冷门的社区,只有少数分享会被人们注意,更多的则是被略过,直至沉没在信息流中。这通常会令人沮丧。

人们怀着相似的心情分享创造,分享时的心态却又不尽相同。我偶尔会尝试透过文字猜测不同分享者的想法。发在新手区的帖子有一种年少的单纯。他们的分享并不复杂,往往是初尝某个领域后的练手项目。社区的成员通常也会以较低的标准对待这样的项目,给予鼓励,仿佛为刚学会走路的孩童喝彩。匆匆留下一个链接的,倘若不是社区不鼓励附言(如 Hacker News),极有可能是一位佛系人士。他们分享的往往是随手创造的项目,如从工作中抽象出的类库,或是某些能提升生活质量的工具。他们没有为此花费过多的精力,因此不会过于在意他人的看法。项目能够帮助他人当然是极好的,如果不能,他们也不会过度失落。但也有另一类人,他们在自己的项目中倾注了大量的心血,因此也希望从社区获得相当的回应。他们在行文中会极力为自己的项目卖好,有甚者更是不加掩饰地向社区索要点赞(如 Github 的 Star)。这些人属于分享创造的狂热者。

每种职业都有类似的现象。行业内人士自发形成圈子,以交流讨论行业的知识和技能。作家们相互赏阅评判各自的作品,厨师们举办茶会比拼自己的料理。但计算机从业者的热情尤为高涨。不论水平的高与低,工作的忙与闲,大家都热衷于写 side project。

Side project 首先是实用的,这由计算机从业者的性质决定。在信息化的时代中,他们是日常生活工具的锻造者,是数字世界的铁匠、木匠。计算机从业者的创造主题广泛而多样,从游戏到生产力工具软件,乃至一些智能硬件,无一不可解决切实的问题。但与传统世界的工匠不同,完成一个完整的作品并不需要诸多精细化分工的角色。一旦有了想法,一个人便可马上付诸行动,借助完善的文档快速地构建一个可用的原型。和其他行业相比,计算机从业者可以花费更少的资源完成一个项目。

计算机从业者们通常倡导终身学习,而完成 side project 是一个很好的学习机会。在工作中你也许只负责项目的一小方面,涉及的知识就像巨型人偶中的一颗螺丝。但在 side project 中,你会接触到项目的各个方面,从规划设计到技术实现再到落地部署,总有一些之前没有涉猎过的内容。抑或者业界有人推出了新的技术或框架,正好能用于实现你的想法,你便会去接近并尝试使用它们。有人说这么大的工程量也太费精力了,会使人身心俱疲。但对于极客而言,学习和折腾各种技术本身就是一件快乐的事情。

Side project 也是自豪感的一个重要来源。凭自己的力量创造一件事物,并用它来解决实际的问题,这本身就是一件了不得的事情。如果这个事物再被外人所知悉,那更是有超越造物主的自豪感——毕竟无人和造物主一起分享这种喜悦。自豪感不总是能轻易获得,却又是生活的必需品。如果你不能从工作中获得自豪感,你会更倾向于从 side project 中获得这种体验。

而更重要的是,网络及成熟的代码托管平台简化了 side project 的分享乃至协作。这是相较于其他行业的一个优势。经由网络,他人可方便地体验分享的作品,同时作品也可被快速传播。人们不仅能体验作品,很多时候也能了解到作品的实现细节。这要归功于开源文化的流行。创作者不仅会分享成品,也乐于分享成品的源代码,以供他人学习或是改进。而代码托管平台的使用已在计算机从业者间形成共识,人们可以在上面阅读项目的细节,或是和作者携手完成项目。这同样也是一件非常有意义而令人快乐的事情。

A Flaw of Promoting Complex Trait Bounds in Rust

Days ago, for some reason, I was trying to implement a function that can polymorphize over its return type. The solution is simple, but my brain was jammed at that time, trapped in some complicated typing tricks for hours.

During the struggling, I coincidently ran into something that is temporarily a flaw in the current Rust compiler implementation. In some cases, the compiler is not smart enough to promote known trait bounds, and we have to replicate them again and again. Although the problem is afterwards proved to be a useless “X-Y Problem”, I would still like to share the story.

The Problem

Let’s say we are going to write a function that digests a given &[u8] slice and computes a hash value. The function would adopt either of two different algorithms, and a u64 or u128 integer is returned as the hash result.

Trivially, this can be achieved by splitting into two functions get_hash_u64() and get_hash_u128(). But I prefer to have a single and unified interface, so concretely, I am expecting a function to polymorphize over its return type, with the following signature

fn get_hash<T>(b: &[u8]) -> T
where /* some bounds on T */
{ todo!() }

Two things I should fill in for the above snippet

  1. The where-clause. Some trait bounds might be satisfied for typevar T, and I expect them to be as concise as possible in order for less verbosity in callers.
  2. The body. The function should behave differently regarding different typevar T.

In order to emulate the effect of choosing different hashing algorithm, we expect a different numeric value be returnedwhen different typevar T supplied. Also, since the argument b: &[u8] is irrelavant to our problem, I will omit it in the following text for brevity. So overall, I would like the two assertions to be held

assert_eq!(get_hash::<u64>(), 42u64);
assert_eq!(get_hash::<u128>(), 4242u128);

The Simple Answer

Before stepping far, I will place a simple and straight-forward solution at the front, in case of anybody taking the same wrong path.

Specifically, we can define a trait, say HashVal, as the upper bound of all possbile return types for get_hash.

trait HashVal: Sized {
fn digest() -> Self;

For each possible type such as u64 or u128, we place corresponding hashing algorithm in HashVal::digest

impl HashVal for u64 {
fn digest() -> Self { 42u64 }

impl HashVal for u128 {
fn digest() -> Self { 4242u128 }

fn get_hash<T: HashVal>() -> T {

get_hash<T>() is now polymorphized over return type T. User might select a 64-bit hashing algorithm via a calling like get_hash::<u64>().

This solution is neat and, most importantly, the prerequisite T: HashVal is concise and self-explained, which saves a lot of verbosity in callers’ where-clause. However, this didn’t come to my mind at that time. I alternatively choose a more complicated solution.

The Complicated Answer

In this version, I start by a dummy struct Hasher and a trait HashDispatcher<T>.

type Hasher;

trait HashDispatcher<T> {
fn digest() -> T;

The struct Hasher implements HashDispatcher<T> for different type T with corresponding algorithm filled in digest() method

impl HashDispatcher<u64> for Hasher {
fn digest() -> u64 { 42u64 };

impl HashDispatcher<u128> for Hasher {
fn digest() -> u128 { 4242u128 };

fn get_hash<T>() -> T where Hasher: HashDispatcher<T> {

Function get_hash<T>() delegates the calling to Hasher::digest(), which requires a verbose trait bound Hasher: HashDispatcher<T>. In order to reduce the boilerplate, I was seeking to write another trait, named also HashVal, such that for all T being a HashVal, the trait bound Hasher: HashDispatcher<T> holds, or formally $$\text{T: HashVal} \Rightarrow \text{Hasher: HashDispatcher<T>}$$. If achieved, the signature of get_hash can be largely deduced into

fn get_hash<T: HashVal>() -> T;

The Incorrect Attempt for HashVal

The first attempt I made was to place the bound in the where-clause of a generic impl

trait HashVal {};
impl<T> HashVal for T where Hasher: HashDispatcher<T> {}

I mistakenly thought this would fulfill my purpose. The statement instead should read as “for every T that satisfies Hasher: HashDispatcher, T is a HashVal”, which delivers a different implication that converses to what I expect. Thank u/schungx for pointing out in the reddit thread. As a counter-example, one is able to impl other types as HashVal, while without ensuring them to satisfy my bound

impl HashVal for String {}

The Correct yet Flawed Attempt

u/SkiFire13 mentioned that the trait bound should be placed at the definition of HashVal to meet my requirement like this

trait HashVal where Hasher: HashDispatcher<Self> {}

I have no memory of seeing a where clause in the trait definition before. The syntax is not introduced by “The Book”, but rather mentioned in the RFC of where clause.

where-clause for trait is not a new concept. In fact, the “supertrait” bound can be regarded as a specialized version of where-style bound

trait Foo: Bar {}  // is equivalent to
trait Foo where Self: Bar {}

More generally, the where-clause is used to elaborate the constraints that the typevars (or the special Self) should satisify. If SomeT: Trait holds, type SomeT should meet all the requirements in Trait‘s where-clause.

As for our case, the where-clause grants an upper bound for HashVal – any type T implements HashVal should satisfy Hasher: HashDispatcher<T> beforehand, which is precisely our requirement.

With this declaration, however, we still cannot deduce the trait bound of get_hash to T: HashVal, due to the flaw of current compiler. A long discussion “where clauses are only elaborated for supertraits, and not other things” can be found on Github back in 2015.

In short words, except from some simple constraints like supertraits, the constraints in where-clause will only be respected within the trait definition (to ensure some type-checks in the trait can pass), but not be promoted in other places.

trait HashVal: Sized where Hasher: HashDispatcher<Self> {
fn foo() -> Self {
// OK. the "where" bound permits the casting
<Hasher as HashDispatcher<Self>>::digest()
impl<T: Sized> HashVal for T where Hasher: HashDispatcher<T> {}

// this fails, the bound is not promoted
fn get_hash<T: HashVal>() -> T {

The flaw is quite annoying. We still have to replicate the verbose trait bounds here and there. Hopefully it can be fixed in the future.