Diving from the CUDA Error 804 into a bug of libnvidia-container

en

Several users reported to encounter "Error 804: forward compatibility was attempted on non supported HW" during the usage of some customized PyTorch docker images on our GPU cluster.

At first glance I recognized the culprit to be a version mismatch between installed driver on the host and required driver in the image. The corrupted images as they described were built targeting CUDA == 11.3 with a corresponding driver version == 465 , while some of our hosts are shipped with driver version 460. As a solution I told them to downgrade the targeting CUDA version by choosing a base image such as nvidia/cuda:11.2.0-devel-ubuntu18.04, which indeed well solved the problem.

But later on I suspected the above hypothesis being the real cause. An observed counterexample was that another line of docker images targeting even higher CUDA version would run normally on those hosts, for example, the latest ghcr.io/pytorch/pytorch:2.0.0-devel built for CUDA == 11.7. This won’t be the case if CUDA version mismatch truly matters.

Afterwards I did a bit of research concerning the problem and learnt some interesting stuff which this post is going to share. In short, the recently released minor version compatibility allows applications built for newer CUDA to run on machines with some older drivers, but libnvidia-container doesn’t correcly handle it due to a bug and eventually leads to such an error.

Towards thorough comprehension, this post will first introduce the constitution of CUDA components, following with the compatibility policy of different components, and finally unravel the bug and devise a workaround for it. But before diving deep, I’ll give two Dockerfile samples to illustrate the problem.

READ MORE

Modern Cryptography, GPG and Integration with Git(hub)

en

GPG (the GNU Privacy Guard) is a complete and free implementation of the OpenPGP standard. Based on various mature algorithms to select from, GPG acts as a convenient tool for daily cryptographic communication.

GPG has two primary functionalities: (1) it encrypts and signs your data for secure transfering and verifiable information integrity, and (2) it features a versatile key management system to construct and promote web of trust. GPG also has a well-designed command line interface for easy integration with other applications such as git.

This article is going to briefly elaborate some key concepts and usage of GPG, and then present demonstration to cryptographically sign git commits with the help of GPG.

READ MORE

Move the Root Partition of Ubuntu

en

Some days ago, I made the decision to shrink the footprint of Windows system on my laptop and reallocate the disk space to the Ubuntu system that resides next to it. Ubuntu is competent for my daily use of programming and web browsing so I hardly launched the OEM-shipped Windows since the laptop was bought. The Windows takes up a not-so-small portion of my SSD space, which can be better utilized instead of wasted in vain.

#gk:immersive
(before)
| --- Windows C: (256 GB) --- | --- Ubuntu / (256 GB) --- |
(after)
| --- Windows C: (120 GB) --- | --- Ubuntu / (392 GB) --- |
READ MORE

A New Programmer Kicks a Roadblock

en

The time I composed my first program can be back to my junior high school age. It was the first day of PC lesson, and everybody crowded to the computer classroom. We were told to learn “programming” there. The kids who were talented would be selected and trained for OI . Others instead would go to an ordinary class and learn something more general.

I was anxious. Before the time I had no concept of what “programming” is, nor had I ever gone through a real PC lesson. The PC lesson in my primary school barely taught anything. Over the time the teachers let us play games instead. I could type merely a dozen of characters per minute, since I’d never received a thorough typing training. I was ignorant of inside the metal box. I was a complete computer idiot.

READ MORE

Reversy Naming

en

I am always a dedicated fan of writing naturally readable code – by “naturally readable” I mean, one can read a line of code as if it were a sentence of English (or maybe other human languages). It’s believed that the practice encourages more self-explainable code, as the code reads more like a human-composed article, instead of some gibberish only recognizable by machine.

The practice recommends to name functions or variables following the word order of human language, for English that is, subjects come after verbs, and adjectives go before nouns that being modified. The samples below showcase how it guides naming in a program (please hold your opinions about the casing)

  • append_to_list(lst, item). A function that appends an item to a list, which can read as “append to the list (specified by name lst) with the item”.
  • register_service_notifier(func). A function that registers another function as a service notifier, which can read as “register a service notifier with the function func“.
  • UserFollowersListView. The name of a web component which is a list view to display followers for a user.

It plays well and improves my developing experience most of the time, but there is no silver bullet, just like other practices or guidelines.Sometimes I found the readability even degrades. I kept skimming the lines and just couldn’t locate an item efficiently.

READ MORE

Invalid Golang Pointers Can Bite You Even If You Don't Dereference

en

In Golang, if you coerce a uintptr variable into unsafe.Pointer (or further, to some *T), the linter will warn with the message "possible miuse of unsafe.Pointer". This makes sense because the uintptr variable may contain an address that points to a piece of invalid memory, and dereferencing such a pointer is catastrophic (usually aborts the program).

I was always aware of the above discipline, but I thought it would be OK to hold the pointers but not dereference them. This is true in C/C++, but not for Golang, which I did not realize until recently.

In fact, the program can panic even if you just keep an invalid pointer on the stack!

READ MORE

A Flaw of Promoting Complex Trait Bounds in Rust

en

Days ago, for some reason, I was trying to implement a function that can polymorphize over its return type. The solution is simple, but my brain was jammed at that time, trapped in some complicated typing tricks for hours.

During the struggling, I coincidently ran into something that is temporarily a flaw in the current Rust compiler implementation. In some cases, the compiler is not smart enough to promote known trait bounds, and we have to replicate them again and again. Although the problem is afterwards proved to be a useless “X-Y Problem”, I would still like to share the story.

READ MORE

Initialize Process Pool Worker with Individual Value

en

There could be scenes when you are using multiprocessing.pool.Pool and you want to perform some initialization for each worker before tasks are scheduled via Pool.map() or something alike. For example, you create a pool of 4 workers, each for one GPU, and expect tasks scheduled on Worker-i to precisely utilize GPU-i. In this case, Worker-i should be initialized with env var CUDA_VISIBLE_DEVICES=<i> set.

To initialize spawned workers, the constructor of Pool provides two arguments concerning the job 1initializer and initargs. initializer is expected to be a callable, and if specified, each worker process will call initializer(*initargs) when it starts.

import multiprocessing as mp
import multiprocessing.pool as mpp

def worker(arg1):
print(arg1)

mpp.Pool(processes=2, initializer=worker, initargs=(42, ))
# 42
# 42

This is, however, slightly away from what we expect. The initializer is called with same arguments in each worker, while in our case, the arguments are expected to be different, like value 1 for Worker-0 and value 1 for Worker-1. There are two approaches to do the tricks.

Use a Queue

Queue and SimpleQueue types in module multiprocessing 2 implement multi-producer, multi-consumer FIFO queues under the multi-processing scenario. We may create and share a queue among parent and worker processes, send individual values from parent processes and read them from workers. Since the sending and receiving operations are synchronized, we won’t run into any race conditions.

def worker(q):
print(q.get())

q = mp.SimpleQueue()
p = mpp.Pool(processes=2, initializer=worker, initargs=(q,))
for i in range(2):
q.put(i)
p.close()
# 0
# 1

Use a Value

Alternatively, we may use a lighter shared object other than a queue. The Value type in module multiprocessing 3 allows sharing simple values across multiple processes. It can also synchronize accesses to values to avoid race conditions if necessary. We can use a Value object to allocate an individual id for each worker process.

def worker(v):
with v.get_lock():
val = v.value
v.value += 1
print(val)

v = mp.Value(ctypes.c_int32, 0, lock=True)
p = mpp.Pool(processes=2, initializer=worker, initargs=(v,))
p.close()
# 0
# 1
READ MORE

Rust - Python FFI From Scratch

en

I was recently working on a side project that involves communication between binaries written in Rust and web interfaces written in Python. Moving a part of my project onto a language like Rust is under several considerations: 1) the logic is all about manipulating byte arrays, where Python has deficit and system language like Rust is superior; 2) the logic happens to be complicated, I need a static type system to ensure the correctness, and also the match expression of Rust is found helpful in getting things concise; 3) I was planning to develop CLI tools with Rust, which calls this fraction of functionality, and I don’t want to rewrite the stuff in the future.

READ MORE

[Extending Hexo For My Site] Part 1 - Better Mathjax Rendering

en

I am a heavy user of Mathjax. Mathjax is a library that renders Tex-compatible syntax into pretty equations in web scenarios. Hence I am always mixing up Markdown and Tex snippets in my writing. The annoying part is Tex snippets have low priority in my Markdown renderer, and are sometimes incorrectly rendered into Markdown elements. For instance, $a_1, a_2$ becomes $a1, a2$, where underscores within $...$ are mistakenly recognized as an emphasis element. A bunch of escaping is required to avoid the situation, which drives me mad. So I got to seek a permanant solution.

READ MORE