Cython and Threads

Pure Python sucks in the scene of parallel computing, due to the existence of the Global Interpreter Lock (aka GIL). GIL prevents codes from different threads to access or manipulate the interpreter. The mechanism alleviates the risk of race condition, but makes multi-threading program “single-threaded” as well. Sadly, there’s no way to release the lock from pure Python.

OK. So what about not using pure Python? Shall we make an extension to bypass the mechanism? The answer is yes, and that’s what most of scientific libaries do.

As of writing extensions, Cython is a good choice, less verbose, and more similar to Python syntactically. In Cython, one can release GIL temporarily for a block using the with nogil: syntax. Will it better utilize multi-core CPU? Let’s have a try.

Read More


Obtain a Random Unused TCP Port with Bash

We might sometimes want to randomly choose an unused TCP port on Linux. This occurs from time to time on a server, when the administrator wants to expose an HTTP port for each user. Or, you just need an available port for IPC. Let’s make it happen with bash only.

The function below does the right job:

function unused_port() {
N=${1:-1}
comm -23 \
<(seq "1025" "65535" | sort) \
<(ss -Htan |
awk '{print $4}' |
cut -d':' -f2 |
sort -u) |
shuf |
head -n "$N"
}

We are going to step through the code next.

Read More


Information Theory: KL Divergence

Assume there are two hypotheses $H_1$ and $H_2$, r.v. $X$ ranged in alphabets $a_1,\ldots,a_k$. Under hypothesis $H_i$, $X$ has pdf $p(X=a_j|H_i)=p_i(a_j)$. According to Law of Total Probability, we have:

$$ p(H_i|a_k) = \frac{p(H_i)p_i(a_k)}{p_1(a_k)p(H_1)+p_2(a_k)p(H_2)} $$

The formula can be transformed into:

$$ \log \frac{p_2(a_k)}{p_1(a_k)} = \log \frac{p(H_2|a_k)}{p(H_1|a_k)} - \log \frac{p(H_2)}{p(H_1)} $$

which implies that, $\log \frac{p_2(a_k)}{p_1(a_k)}$ equals the difference of log likelihood ratio before and after conditioning $X=a_k$. We define $\log \frac{p_2(a_k)}{p_1(a_k)}$ be the discrimination information for $H_2$ over $H_1$, when $X=a_k$. The expectation of discrimination information is KL divergence, denoted as:

$$D_{KL}(P_2||P_1) = \sum_k p_2(a_k) \log \frac{p_2(a_k)}{p_1(a_k)} $$

which sometimes denoted as $I(p2,p1;X)$, or simply $I(p2,p1)$ if without ambiguity.

KL Divergence can be interpreted as a measure of expected information for $X$ gained after distribution shifted from $p_1$ to $p_2$, where $p_1$ and $p_2$ regarded as prior and post-prior distributions.

Read More


Information Theory: Entropy and Mutual Information

Given a discrete r.v. $X$, where $X$ ranged in ${a_1, \ldots, a_n}$, $\mathbb{P}(X=a_k)=p_k$. Entropy $H(X)$ is defined as:

$$H(X)= - \sum_k p_k \log p_k$$

When regarded as a function of ${p_k}$, entropy satisfies the following properties:

  1. $H(p_1,\ldots,p_n)$ is continuous, and non-negative;
  2. $H(p_1,\ldots,p_n)$ is convex w.r.t. $(p_1,\ldots,p_n)$;
  3. $H(p_1,\ldots,p_n)$ has a unique maxima $(\frac{1}{n},\ldots,\frac{1}{n})$;
  4. $H(n):=H(\frac{1}{n},\ldots,\frac{1}{n})$ increases along with $n$;
  5. $H(p_1,\ldots,p_n)=H(p_1+\ldots+p_k,p_{k+1},\ldots,p_n)+(p_1+\ldots+p_k)H(p_{k+1}’,\ldots,p_n’)$.

Property 5 is so-called addictivity. That is, if we observe $X$ in two steps, firstly obtaining a value from ${\hat{a},a_{k+1},\ldots,a_n}$ and then another value from ${a_1,\ldots,a_k}$ if $\hat{a}$ selected, the entropy of the whole system should be sum of these two subsystems.

Note that a function satisfying property 1, 4, 5 must have a form of $H(\vec{p})= - C \sum_k p_k \log p_k$, which reveals that entropy function is unique.

Entropy measures the uncertainty of a random value. Intuitively, entropy reaches its maximum $\log n$ when all alphabets occur with same probability, and likewise has a minimum of $0$ if $p_k=1$ for some $k$.

Entropy also represents the smallest average length to encode a message. Say we have a message consisting of alphabets $a_1,\ldots,a_n$, occurring with probability $p_1,\ldots,p_n$. Now we want to assign a code (an $N$-ary string) to each alphabet, with no two codes sharing a same prefix. The length of the codes are denoted as $l_1,\ldots,l_n$. Shannon’s source coding theroem states that the average code length $\sum_k p_k l_k$ could not be less than $H(p_1,\ldots,p_n)$ (taking $N$ as logarithm base).

Read More


铁板烧

最近喜欢上了那家铁板烧。在寸土寸金的北京,那家店占据了食街不大的一角。开放式的厨房几乎就是它的全部。一圈细长的大理石柜台,被椅子围得严严实实。人们就此坐下,点菜,颇有一番日式的味道。隔着不高的玻璃,可以看到三个大厨在柜台另一边忙碌。食材触到铁板的「吱」响,伴随时不时的爆裂声,白色的烟汽袅起,混合着诱人的香味,好不热闹。

入冬后,食街上的人更多了。日落后,室外温度急剧下降,这时来热闹的地方吃点热热的东西,再合适不过了。独自点上一份肉,一份菜,一碟满满的炒饭,吃得满面油光,奢侈而满足。推门而出,白天发生的不愉快,或是接下来要加班的怠惰,多少缓解了一些。但每次吃完,衣服上,头发上总会带上一股味道——铁板烧的味道,油烟的味道。

有人不在乎这味道,觉得这是烟火味,是相关料理的灵魂,是生活的气息。在以前,我是很痛恨这种说法的。吃火锅,或是烧烤时,高温食材逸散出的分子,几乎无孔不入,附在衣服上、头发上,就像被泼了一身油。每一回都要努力地洗澡,再将衣服由里至外换一遍,方可除去这种味道。我不能忍受这样的味道,藉而讨厌相关的料理。但近几年,这种想法似乎在慢慢消失,我开始能接受这种味道。尽管还是接受不了烧烤,但这种铁板烧倒是可以,甚至慢慢喜欢上了。

讨厌烟火味,和睡前一定要洗澡一样,这些观念的背后似乎有着奇妙的机理。人们小时候的经历,或是大人的说教,或是自身的负面遭遇,仿佛在主导这些观念。怕小鸡的以前踩死过小鸡,不敢吃鱼的是被鱼刺卡过。但我的这个观念好像是自发产生的,没有听从任何人表达过这样的观点,而是打心底里就讨厌这种味道。

如此根深蒂固的观念,又是如何被改变的呢?大概是太忙碌,忙得无暇顾及这类事。烟汽熏天的铁板烧,是我不可多得的闲暇时光。生活于生存之上,生存尚不能满足,又谈何更高层次的吹毛求疵?

一个习惯的改变,或者是一种生活方式的丧失。


西郊线

西郊线真的是电车,那种拖着两个辫子,走得不快的电车。沿途的车站也是很小的车站。不到两米宽的站台,甚至比广州的BRT站更窄一些,立了一排凳子,仅此而已。半露天的车站,有遮雨棚,却没有真正意义的墙。透过一人多高的玻璃看,站台外沿长满了狗尾巴草,再往后铺开了大片的草甸和桦林,一直延伸到远处的香山,和仿佛水洗后的天空。午前的秋阳给眼前的一切都染上了愉悦的暖色。

Read More


Proof of the Gumbel Max Trick

Statement

Assume that $\alpha_1, \alpha_2, \ldots, \alpha_n$ satisify $\sum_k\alpha_k=1$. Define

$$Z=\arg\max_k\{\log\alpha_k+G_k\}$$

where $G_1,\ldots,G_n \text{ i.i.d.}\sim Gumbel(0,1)$, whose PDF and CDF are defined as

$$\begin{align} f(x)&=e^{-(x+e^{-x})} \\ F(x)&=e^{-e^{-x}}\end{align}$$

. Then $\mathbb{P}(Z=k)=\alpha_k$.

Proof

Set $u_k=\log{\alpha_k}+G_k$. We prove by direct calculations.

$$\begin{align} \mathbb{P}(Z=k)&=\mathbb{P}(u_k \geq u_j,\forall j \neq k) \\ &=\int_{-\infty}^\infty \mathbb{P}(u_k \geq u_j, \forall j \neq k|u_k)\mathbb{P}(u_k) du_k \\ &=\int_{-\infty}^\infty \prod_{j\neq k}\mathbb{P}(u_k \geq u_j|u_k)\mathbb{P}(u_k) du_k \\ &=\int_{-\infty}^\infty \prod_{j\neq k}e^{-e^{-u_k+\log \alpha_j}} e^{-(u_k-\log\alpha_k+e^{-(u_k-\log\alpha_k)})} du_k \\ &=\int_{-\infty}^\infty e^{-\sum_{j\neq k}\alpha_je^{-u_k}} \alpha_k e^{-(u_k+\alpha_k e^{-u_k})} du_k \\ &=\alpha_k \int_{-\infty}^\infty e^{-u_k-(\alpha_k+\sum_{j\neq k}\alpha_j)e^{-u_k}} du_k \\ &= \alpha_k \end{align}$$.

Application

The trick is commonly used in DL to make sampling over a discrete distribution differentiable.

References


Option::as_ref

Let’s consider the following function:

use std::ptr::NonNull;

fn transform<T>(option: &Option<NonNull<T>>) -> Option<&T> {
option.map(|x| unsafe { x.as_ref() })
}

The function transform takes an Option<NonNull<T>> as input, and convert the inner pointer to immutable reference &T if available. The method NonNull::as_ref() is marked unsafe so we need an unsafe block. The snippet causes an compilation error:

Read More


Rc and RefCell tricks

Say we need a type Cursor<T> , which contains a mutable reference to T . The type has a method, say .dup() , which returns a new instance of Cursor<T> , while sharing the interior reference across instances. Such design is a common pattern in database driver library. Users might expect to hold multiple cursors simultaneously, each of which owns a reference to the same connection object.

A possible implementation might look like:

struct Cursor<'a, T> {
obj: &'a mut T,
}

impl<'a, T> Cursor<'a, T> {
fn new(t: &'a mut T) -> Cursor<'a, T> {
Cursor { obj: t }
}

fn dup(&mut self) -> Cursor<T> {
Cursor { obj: self.obj }
}
}

fn main() {
let mut i = 1;
let mut cursor_a = Cursor::new(&mut i);
let _cursor_b = cursor_a.dup();
}

Perfect and neat, and luckily Rust compiler did not complain. As a freshman in Rust, it’s not easy to keep compiler silent all the time, especially when playing with references. But once I try to chain up the constructor and .dup() , the compiler screams:

Read More


三月十日杂感

我还是说出了那句话。我们走到一起,已经有七天了。

起初并没有表白的想法。当晚看完电影后,两人都有微妙的感觉,但直到最后也没有表露心意。「就当做又一次平常的约电影好了」目送她进楼时我想。看电影时没有说出来,她回去了我就更说不出来了——我认为我们当时还是普通朋友关系,表白显得太唐突了。我兴奋又失落地回到宿舍。兴奋的是我们之间的关系貌似又近了一些,失落的是这样下去大概率是没有结果的。

但当晚,她接连发了几条动态,内容和语气都暗示了她对我的好感,只是和我一样,缺乏勇气。两个相互有好感却又没有恋爱经验的人(当然这是我后来知道的),就要这样还未开始便结束了彼此的关系,这怕是上天对胆小鬼的惩罚吧。

Read More