A Flaw of Promoting Complex Trait Bounds in Rust

Days ago, for some reason, I was trying to implement a function that can polymorphize over its return type. The solution is simple, but my brain was jammed at that time, trapped in some complicated typing tricks for hours.

During the struggling, I coincidently ran into something that is temporarily a flaw in the current Rust compiler implementation. In some cases, the compiler is not smart enough to promote known trait bounds, and we have to replicate them again and again. Although the problem is afterwards proved to be a useless “X-Y Problem”, I would still like to share the story.

Read More

Initialize Process Pool Worker with Individual Value

There could be scenes when you are using multiprocessing.pool.Pool and you want to perform some initialization for each worker before tasks are scheduled via Pool.map() or something alike. For example, you create a pool of 4 workers, each for one GPU, and expect tasks scheduled on Worker-i to precisely utilize GPU-i. In this case, Worker-i should be initialized with env var CUDA_VISIBLE_DEVICES=<i> set.

To initialize spawned workers, the constructor of Pool provides two arguments concerning the job initializer and initargs. initializer is expected to be a callable, and if specified, each worker process will call initializer(*initargs) when it starts.

import multiprocessing as mp
import multiprocessing.pool as mpp

def worker(arg1):

mpp.Pool(processes=2, initializer=worker, initargs=(42, ))
# 42
# 42

This is, however, slightly away from what we expect. The initializer is called with same arguments in each worker, while in our case, the arguments are expected to be different, like value 1 for Worker-0 and value 1 for Worker-1. There are two approaches to do the tricks.

Use a Queue

Queue and SimpleQueue types in module multiprocessing implement multi-producer, multi-consumer FIFO queues under the multi-processing scenario. We may create and share a queue among parent and worker processes, send individual values from parent processes and read them from workers. Since the sending and receiving operations are synchronized, we won’t run into any race conditions.

def worker(q):

q = mp.SimpleQueue()
p = mpp.Pool(processes=2, initializer=worker, initargs=(q,))
for i in range(2):
# 0
# 1

Use a Value

Alternatively, we may use a lighter shared object other than a queue. The Value type in module multiprocessing allows sharing simple values across multiple processes. It can also synchronize accesses to values to avoid race conditions if necessary. We can use a Value object to allocate an individual id for each worker process.

def worker(v):
with v.get_lock():
val = v.value
v.value += 1

v = mp.Value(ctypes.c_int32, 0, lock=True)
p = mpp.Pool(processes=2, initializer=worker, initargs=(v,))
# 0
# 1

Rust - Python FFI From Scratch

I was recently working on a side project that involves communication between binaries written in Rust and web interfaces written in Python. Moving a part of my project onto a language like Rust is under several considerations: 1) the logic is all about manipulating byte arrays, where Python has deficit and system language like Rust is superior; 2) the logic happens to be complicated, I need a static type system to ensure the correctness, and also the match expression of Rust is found helpful in getting things concise; 3) I was planning to develop CLI tools with Rust, which calls this fraction of functionality, and I don’t want to rewrite the stuff in the future.

Read More

[Extending Hexo For My Site] Part 1 - Better Mathjax Rendering

I am a heavy user of Mathjax. Mathjax is a library that renders Tex-compatible syntax into pretty equations in web scenarios. Hence I am always mixing up Markdown and Tex snippets in my writing. The annoying part is Tex snippets have low priority in my Markdown renderer, and are sometimes incorrectly rendered into Markdown elements. For instance, $a_1, a_2$ becomes $a<em>1, a</em>2$, where underscores within $...$ are mistakenly recognized as an emphasis element. A bunch of escaping is required to avoid the situation, which drives me mad. So I got to seek a permanant solution.

Read More

[Extending Hexo For My Site] Part 0 - Preface

I’ve been struggling to choose a handy tool for blogging about seven or eight years ago. Before the day I’ve tried building my own blog system using Django. It was great proudness and excitement to see the first “Hello World” post appeared in my browser, but soon I realized that was far from a ready-to-use product. The editor on the admin site was less functional than Sublime Text or VSCode, and sometimes buggy. The rendered content would mess up and out of my control from time to time. And most importantly, I had to pay for a VPS (or PaaS, still costly) to run the site. I was in high school at the time, and no much income for the bills. Too much trivia to care about just for a perfect writing experience. So I gave up.

It was then I read about the concept of static site generators. I love the idea that separates writing from post rendering and publishing. One will have enough freedom to pick the most suitable tool in either stage. No more need to endure the shitty web editors and I can embrace my favourite local ones. Also the renderer is highly customizable, plus lots of fabulous themes to choose from.

At this era you may recommend Hugo, but I chose Hexo then, partly because of the Node.js booming at that time. It was hardly said to be perfect initially, but as years passed I’ve made it much more handy, by developing plugins to meet my own requirements. I have bundled them in this repository hexo-enhanced. Some of them are short in source code, but greatly improve my experience during writing. I am going to open up a new series to share the story behind the plugin.

Table of Contents

Debug a 'torch.tensor(1).cuda()' hanging

Today a user of our GPU cluster ran into a problem where executing python -c 'import torch; torch.tensor(1).cuda() would hang forever and could not be killed. The problem occured on a rather old Docker image (with torch == 0.4.0), and would disappear if newer images were used. It was caused by some far less known coincidents, which surprised me and I want to share in this post.

The Problem

The hanging program is spawned by following command:

/usr/bin/docker run --rm -u 1457:1457 \
--gpus '"device='0,1,2,3'"' \
-v /ghome/username:/ghome/username -v /gdata/username:/gdata/username \
-it --ipc=host --shm-size 64G \
-v /gdata1/username:/gdata1/username -v /gdata2/username:/gdata2/username \
-e HOME=/ghome/username \
-m 48G --memory-swap 48G --cpus 5 \
--name username2 \
bit:5000/deepo_9 \
python3 -c 'import torch; torch.tensor(1).cuda()'

the Docker image bit:5000/deepo_9 he used was built with CUDA-9, while the host has multiple 1080Ti GPU cards and CUDA upgraded to 11.4. Looks like there’s some binary incompatibility, considering the fact that the problem would gone with newer images.

Read More



Read More

Retrieve Contents over HTTP without curl or wget

I came across a piece of interesting vulnerable script from a post on V2EX on V2EX. A bash function named __curl inside the file retrieves contents over HTTP as a simple alternative for command curl or wget, in scenarios where no such utilities available.

function __curl() {
read proto server path <<<$(echo ${1//// })
DOC=/${path// //}
[[ x"${HOST}" == x"${PORT}" ]] && PORT=80

exec 3<>/dev/tcp/${HOST}/$PORT
echo -en "GET ${DOC} HTTP/1.0\r\nHost: ${HOST}\r\n\r\n" >&3
(while read line; do
[[ "$line" == $'\r' ]] && break
done && cat) <&3
exec 3>&-

The function makes use of certain less known features of Linux and the Bash language.

The first one is communicating over TCP through files. Linux employs a design philosophy of “everything are files”. One could find some special device files in directory /dev, through which we can manipulate the underlying devices. Specifically, manipulating a TCP socket connecting ${HOST}:${PORT} could be achieved by accessing device file /dev/tcp/${HOST}/${PORT}. Since HTTP is a text-based protocol over TCP, working with it is no more difficult than reading / writing a text file. Line exec 3<>$FILENAME opens file $FILENAME under read-write mode and binds it to descriptor 3. The next line then manually composes an HTTP payload and writes out to &3, which is in fact requesting the URL http ://${HOST}:${PORT}. By reading the same file descriptor, we retrieve the response content from the service. The trick serves as a primitive workaround for retrieving contents from web.

Another one is parameter substitution in Bash. The expression ${var//PATTERN/REPL} substitutes all occurrences of PATTERN in var into REPL. If REPL omitted, the matched substrings will be deleted. For example, in this script, ${1//// } would replace all slashes / into white spaces in variable $1.


  1. Parameter Substitution

[Unravelling mocona] Part 1 - Verbosity or Anti-Pattern

I was once working as an intern at MSRA around two years ago, at which I joined a research project and started developing upon a large codebase. It’s a practice in ML research fields to adopt an existing code repository as codebase, instead of crafting everything from scratch. Such codebases usually come with convenient “infrastructures” , so researchers would not have to implement them once again, which could be time-wasting and error-prone. All we need is to write our models and losses, and put them into experiments.

The flow works just fine if you are proposing minor improvement on algorithms. The codebase provides an easy approach to prove and iterate your idea. But things would get worse if your work goes beyond it, especially touching the encapsulated infrastructures. Those convenient parts would constraint you and enforce your code into spaghetti.

Read More

[Unravelling mocona] Part 0 - Preface

The early idea of hsfzxjy/mocona was come up with in late April. It was not until July that I figure out a reasonable design for the project. I finished most of my idea and released the first version approaching August. Nevertheless, there’s no chance for me to share the story behind the library. Now another month gone, it’s time to do some writing.

The series, as I planning, would cover the motivation of creating mocona, some technical details and usage, along with some critical thinking on the creative process. For whom interested in CPython internals or would like to extend the language, it is worth to read through.

Table of Contents