[Extending Hexo For My Site] Part 0 - Preface

2022-01-01 | Series

I’ve been struggling to choose a handy tool for blogging about seven or eight years ago. Before the day I’ve tried building my own blog system using Django. It was great proudness and excitement to see the first “Hello World” post appeared in my browser, but soon I realized that was far from a ready-to-use product. The editor on the admin site was less functional than Sublime Text or VSCode, and sometimes buggy. The rendered content would mess up and out of my control from time to time. And most importantly, I had to pay for a VPS (or PaaS, still costly) to run the site. I was in high school at the time, and no much income for the bills. Too much trivia to care about just for a perfect writing experience. So I gave up.

It was then I read about the concept of static site generators. I love the idea that separates writing from post rendering and publishing. One will have enough freedom to pick the most suitable tool in either stage. No more need to endure the shitty web editors and I can embrace my favourite local ones. Also the renderer is highly customizable, plus lots of fabulous themes to choose from.

At this era you may recommend Hugo, but I chose Hexo then, partly because of the Node.js booming at that time. It was hardly said to be perfect initially, but as years passed I’ve made it much more handy, by developing plugins to meet my own requirements. I have bundled them in this repository hexo-enhanced. Some of them are short in source code, but greatly improve my experience during writing. I am going to open up a new series to share the story behind the plugin.

Part 1 - Better Mathjax Rendering

Debug a 'torch.tensor(1).cuda()' hanging

2021-12-14 | Tech

Today a user of our GPU cluster ran into a problem where executing python -c 'import torch; torch.tensor(1).cuda() would hang forever and could not be killed. The problem occured on a rather old Docker image (with torch == 0.4.0), and would disappear if newer images were used. It was caused by some far less known coincidents, which surprised me and I want to share in this post.

The Problem

The hanging program is spawned by following command:

/usr/bin/docker run --rm -u 1457:1457 \
    --gpus '"device='0,1,2,3'"' \
    -v /ghome/username:/ghome/username -v /gdata/username:/gdata/username \
    -it --ipc=host --shm-size 64G \
    -v /gdata1/username:/gdata1/username -v /gdata2/username:/gdata2/username \
    -e HOME=/ghome/username \
    -m 48G --memory-swap 48G --cpus 5 \
    --name username2 \
    bit:5000/deepo_9 \
    python3 -c 'import torch; torch.tensor(1).cuda()'

the Docker image bit:5000/deepo_9 he used was built with CUDA-9, while the host has multiple 1080Ti GPU cards and CUDA upgraded to 11.4. Looks like there’s some binary incompatibility, considering the fact that the problem would gone with newer images.

不自由的互联网

2021-11-23 | Soliloquy

互联网正在变得不自由——这已经成为了一种共识。这种不自由是从获取内容的角度出发，我们依自己的意愿获取知识/资讯的难度在增加。「中文互联网已死」。优质资源被各大互联网巨头所垄断，人们只能从应用内部访问，并时常受到推荐算法的说教。在巨头领域之外的公网则是一片荒漠，充斥着大量低质量、重复的内容，让秉承自主意志前来探索的人心寒。而在世界其他地方，情况可能没有这么严重，但类似的现象如巨头垄断也是存在的。互联网并不像十几年前我们所憧憬的那样。

Retrieve Contents over HTTP without curl or wget

2021-10-30 | Memo

I came across a piece of interesting vulnerable script from a post on V2EX ¹) on V2EX. A bash function named __curl inside the file retrieves contents over HTTP as a simple alternative for command curl or wget, in scenarios where no such utilities available.

#!/bin/bash
function __curl() {
  read proto server path <<<$(echo ${1//// })
  DOC=/${path// //}
  HOST=${server//:*}
  PORT=${server//*:}
  [[ x"${HOST}" == x"${PORT}" ]] && PORT=80

  exec 3<>/dev/tcp/${HOST}/$PORT
  echo -en "GET ${DOC} HTTP/1.0\r\nHost: ${HOST}\r\n\r\n" >&3
  (while read line; do
   [[ "$line" == $'\r' ]] && break
  done && cat) <&3
  exec 3>&-
}

The function makes use of certain less known features of Linux and the Bash language.

The first one is communicating over TCP through files. Linux employs a design philosophy of “everything are files”. One could find some special device files in directory /dev, through which we can manipulate the underlying devices. Specifically, manipulating a TCP socket connecting ${HOST}:${PORT} could be achieved by accessing device file /dev/tcp/${HOST}/${PORT}. Since HTTP is a text-based protocol over TCP, working with it is no more difficult than reading / writing a text file. Line exec 3<>$FILENAME opens file $FILENAME under read-write mode and binds it to descriptor 3. The next line then manually composes an HTTP payload and writes out to &3, which is in fact requesting the URL http ://${HOST}:${PORT}. By reading the same file descriptor, we retrieve the response content from the service. The trick serves as a primitive workaround for retrieving contents from web.

Another one is parameter substitution in Bash. The expression ${var//PATTERN/REPL} substitutes all occurrences of PATTERN in var into REPL. If REPL omitted, the matched substrings will be deleted. For example, in this script, ${1//// } would replace all slashes / into white spaces in variable $1.

References

Parameter Substitution

[收到条阿里云的告警，看不懂是做什么用的，请教一下 - V2EX](https://www.v2ex.com/t/811424

[Unravelling mocona] Part 1 - Verbosity or Anti-Pattern

2021-09-16 | Tech

I was once working as an intern at MSRA around two years ago, at which I joined a research project and started developing upon a large codebase. It’s a practice in ML research fields to adopt an existing code repository as codebase, instead of crafting everything from scratch. Such codebases usually come with convenient “infrastructures” , so researchers would not have to implement them once again, which could be time-wasting and error-prone. All we need is to write our models and losses, and put them into experiments.

The flow works just fine if you are proposing minor improvement on algorithms. The codebase provides an easy approach to prove and iterate your idea. But things would get worse if your work goes beyond it, especially touching the encapsulated infrastructures. Those convenient parts would constraint you and enforce your code into spaghetti.

[Unravelling mocona] Part 0 - Preface

2021-09-12 | Series

The early idea of hsfzxjy/mocona was come up with in late April. It was not until July that I figure out a reasonable design for the project. I finished most of my idea and released the first version approaching August. Nevertheless, there’s no chance for me to share the story behind the library. Now another month gone, it’s time to do some writing.

The series, as I planning, would cover the motivation of creating mocona, some technical details and usage, along with some critical thinking on the creative process. For whom interested in CPython internals or would like to extend the language, it is worth to read through.

Part 1 - Verbosity or Anti-Pattern

Understanding pickle in Python

2021-08-14 | Tech

The module pickle shipped in Python could be used for generic-purpose object serialization and de-serialization. It’s been widely adopted or recommended as backend in scenarios like persisting states or IPC.

Employed by many famous frameworks, though, the magic behind it still seems to be vague for daily users, especially guys fresh to the language. People come across “unpicklable” errors from time to time, but don’t know the reason; or re-invent state persistence by themselves, even if pickle could be competent. People sometimes write error-prone codes, merely because they are afraid of or unaware of pickle.

This post thus attempts to clarify the usage of pickle module in an easy understanding way, by answering three questions.

Rough Notes on Deploying Vaultwarden & NextCloud Bookmarks

2021-07-26 | Tech

I’ve been struggling for years on two things: synchronize passwords and blog posts I have read across devices. The problem kills me so much since my devices, an Android mobile, an Ubuntu laptop and an iPad, are less supported by big App companies. Aside, I want to gain control for all my data, so there should better exist a self-hosted solution. The problem are partially solved recently by deploying Vaultwarden and NextCloud on VPS. This blog post dictates the setup process and problems I met, in case anyone searching for this topic.

Install Vaultwarden and NextCloud on VPS

The two services are both luckily dockerized. To install there’s nothing more complicated than a command:

语言狂热者与实用主义者

2021-07-08 | Soliloquy

编程语言界有着一场旷日持久的争论。人们会为了一种（或一类）语言，或是自己熟用的，或是自己所欣赏的，与他人吵得不可开交。而争论的起点，可能只是某人一句小小的抱怨。各方各为其主，剑拔弩张，俨然一次声势浩大的圣战。

尽管众语言不可一概而论，我们还是可以粗略地将争论的人群分为两个派别：语言狂热者与实用主义者。这是光谱的两个极端。语言狂热者关注语言本身，或是钟情于新的、现代化的语言特性，并据此评判一种语言；实用主义者则侧重于语言的工程实践，常会以语言的生态、业界使用率反驳他人。当然，也不乏两者兼具的人，对双方的意见各持有一定比例。

Demystify the randomness in CUDA kernels

2021-06-11 | Tech

You might have heard that many CUDA operators contains some kind of non-determinism, and to eliminate the randomness, one must pay for the degradation of performance. The warning occurs many times in blog posts or framework documentation, but few of them give a detailed explanation for the source of randomness. To this end, the post is going to explore the problem.

When talked about GPU computation, one might come up with a notion of some super-fast hardwares. The surprising speed comes from intensive parallelism of the architecture, which allows users to run thousands of routines on parallel (compared to dozens on ordinary CPUs). The routines are called threads, and similar to the concept with the same name in operating systems, they suffer from non-deterministic execution order and data race condition.

Non-deterministic execution order means, if we arrange all instructions of different threads into a sequence, ordered by their occurrence time, the sequence could vary greatly across invocations. If two threads run on parallel, each with a single instruction, we cannot tell which one is executed first. This is the fundamental origin of randomness, and is inevitable.

Data race condition is one of the consequences of non-deterministic execution order. When the threads is manipulating some shared variables, and the manipulation is not atomic, i.e. consists of interruptible instruction sequence, the program might yield undesired results. Programs should be carefully designed to avoid race condition, with the help of locks or atomic operations. To alleviate, CUDA provides atomic arithmetic routines like atomicAdd() or atomicMax() for safe access to shared memory.

By far we have seen that there does exist some kind of randomness inside GPUs, and if not handled properly, our program will give incorrect results when working with shared variables. But one may argue that, we have atomic operations like atomicAdd(). If a program correctly sums up the same collection of numbers, although the order might be messed, it should always returns the same result. Sadly this is wrong, since some arithmetic operations DOES rely on the order of operands! Let’s take the following CUDA program as an example:

PREVNEXT