Obtain a Random Available TCP Port with Bash

2021-03-10 | Tech

On Linux, we might sometimes want to choose an unused TCP port randomly. This occurs from time to time on a server, when the administrator wants to expose an HTTP port for a user. Or, you just need an available port for IPC. Let’s make it happen with pure bash scripting.

function unused_port() {
    N=${1:-1}
    comm -23 \
        <(seq "1025" "65535" | sort) \
        <(ss -Htan |
            awk '{print $4}' |
            cut -d':' -f2 |
            sort -u) |
        shuf |
        head -n "$N"
}

We would take apart the function step by step in the following paragraphs.

Information Theory: KL Divergence

2020-01-15 | Tech

Assume there are two hypotheses $H_1$ and $H_2$, r.v. $X$ ranged in alphabets $\{a_1,\ldots\,a_k\}$. Under hypothesis $H_i$, $X$ has pdf $p(X=a_j|H_i)=p_i(a_j)$. According to Law of Total Probability, we have:

$$ p(H_i|a_k) = \frac{p(H_i)p_i(a_k)}{p_1(a_k)p(H_1)+p_2(a_k)p(H_2)} $$

The formula can be transformed into:

$$ \log \frac{p_2(a_k)}{p_1(a_k)} = \log \frac{p(H_2|a_k)}{p(H_1|a_k)} - \log \frac{p(H_2)}{p(H_1)} $$

which implies that, $\log \frac{p_2(a_k)}{p_1(a_k)}$ equals the difference of log likelihood ratio before and after conditioning $X=a_k$. We define $\log \frac{p_2(a_k)}{p_1(a_k)}$ be the discrimination information for $H_2$ over $H_1$, when $X=a_k$. The expectation of discrimination information is KL divergence, denoted as:

$$D_{KL}(P_2||P_1) = \sum_k p_2(a_k) \log \frac{p_2(a_k)}{p_1(a_k)} $$

which sometimes denoted as $I(p2,p1;X)$, or simply $I(p2,p1)$ if without ambiguity.

KL Divergence can be interpreted as a measure of expected information for $X$ gained after distribution shifted from $p_1$ to $p_2$, where $p_1$ and $p_2$ regarded as prior and post-prior distributions.

Information Theory: Entropy and Mutual Information

2020-01-04 | Tech

Given a discrete r.v. $X$, where $X$ ranged in $\{a_1, \ldots, a_n\}$, $\mathbb{P}(X=a_k)=p_k$. Entropy $H(X)$ is defined as:

$$H(X)= - \sum_k p_k \log p_k$$

When regarded as a function of $\{p_k\}$, entropy satisfies the following properties:

$H(p_1,\ldots,p_n)$ is continuous, and non-negative;
$H(p_1,\ldots,p_n)$ is convex w.r.t. $(p_1,\ldots,p_n)$;
$H(p_1,\ldots,p_n)$ has a unique maxima $(\frac{1}{n},\ldots,\frac{1}{n})$;
$H(n):=H(\frac{1}{n},\ldots,\frac{1}{n})$ increases along with $n$;
$H(p_1,\ldots,p_n)=H(p_1+\ldots+p_k,p_{k+1},\ldots,p_n)+(p_1+\ldots+p_k)H(p_{k+1}',\ldots,p_n')$.

Property 5 is so-called addictivity. That is, if we observe $X$ in two steps, firstly obtaining a value from $\{\hat{a},a_{k+1},\ldots,a_n\}$ and then another value from $\{a_1,\ldots,a_k\}$ if $\hat{a}$ selected, the entropy of the whole system should be sum of these two subsystems.

Note that a function satisfying property 1, 4, 5 must have a form of $H(\vec{p})= - C \sum_k p_k \log p_k$, which reveals that entropy function is unique.

Entropy measures the uncertainty of a random value. Intuitively, entropy reaches its maximum $\log n$ when all alphabets occur with same probability, and likewise has a minimum of $0$ if $p_k=1$ for some $k$.

Entropy also represents the smallest average length to encode a message. Say we have a message consisting of alphabets $a_1,\ldots,a_n$, occurring with probability $p_1,\ldots,p_n$. Now we want to assign a code (an $N$-ary string) to each alphabet, with no two codes sharing a same prefix. The length of the codes are denoted as $l_1,\ldots,l_n$. Shannon’s source coding theroem states that the average code length $\sum_k p_k l_k$ could not be less than $H(p_1,\ldots,p_n)$ (taking $N$ as logarithm base).

Proof of the Gumbel Max Trick

2019-08-01 | Tech

Statement

Assume that $\alpha_1, \alpha_2, \ldots, \alpha_n$ satisify $\sum_k\alpha_k=1$. Define

$$Z=\arg\max_k\{\log\alpha_k+G_k\}$$

where $G_1,\ldots,G_n \text{ i.i.d.}\sim Gumbel(0,1)$, whose PDF and CDF are defined as

$$\begin{align} f(x)&=e^{-(x+e^{-x})} \\ F(x)&=e^{-e^{-x}}\end{align}$$

. Then $\mathbb{P}(Z=k)=\alpha_k$.

Proof

Set $u_k=\log{\alpha_k}+G_k$. We prove by direct calculations.

$$\begin{align} \mathbb{P}(Z=k)&=\mathbb{P}(u_k \geq u_j,\forall j \neq k) \\ &=\int_{-\infty}^\infty \mathbb{P}(u_k \geq u_j, \forall j \neq k|u_k)\mathbb{P}(u_k) du_k \\ &=\int_{-\infty}^\infty \prod_{j\neq k}\mathbb{P}(u_k \geq u_j|u_k)\mathbb{P}(u_k) du_k \\ &=\int_{-\infty}^\infty \prod_{j\neq k}e^{-e^{-u_k+\log \alpha_j}} e^{-(u_k-\log\alpha_k+e^{-(u_k-\log\alpha_k)})} du_k \\ &=\int_{-\infty}^\infty e^{-\sum_{j\neq k}\alpha_je^{-u_k}} \alpha_k e^{-(u_k+\alpha_k e^{-u_k})} du_k \\ &=\alpha_k \int_{-\infty}^\infty e^{-u_k-(\alpha_k+\sum_{j\neq k}\alpha_j)e^{-u_k}} du_k \\ &= \alpha_k \end{align}$$.

Application

The trick is commonly used in DL to make sampling over a discrete distribution differentiable.

References

Option::as_ref

2019-06-26 | Tech

Let’s consider the following function:

use std::ptr::NonNull;

fn transform<T>(option: &Option<NonNull<T>>) -> Option<&T> {
    option.map(|x| unsafe { x.as_ref() })
}

The function transform takes an Option<NonNull<T>> as input, and converts the inner pointer to an immutable reference &T if possible. The method NonNull::as_ref() is marked unsafe so we need an unsafe block. The snippet causes an compilation error:

Rc, RefCell and Interior Mutability

2019-06-23 | Tech

Say we need a type Cursor<T> , which holds a mutable reference to T. A method .dup() duplicates the internal reference, wraps it in a new instance of Cursor<T> and returns. Such pattern exists commonly in database driver library. Users could hold multiple cursors simultaneously, with each owning a (mutable) reference to the same connection object.

One might implements with a primitive mutable reference:

struct Cursor<'a, T> {
    obj: &'a mut T,
}

impl<'a, T> Cursor<'a, T> {
    fn new(t: &'a mut T) -> Cursor<'a, T> {
        Cursor { obj: t }
    }

    fn dup(&mut self) -> Cursor<T> {
        Cursor { obj: self.obj }
    }
}

fn main() {
    let mut i = 1;
    let mut cursor_a = Cursor::new(&mut i);
    let _cursor_b = cursor_a.dup();
}

Perfect and neat, and luckily Rust compiler did not complain. Fresh Rustanceans would have to work hard for shutting up the compiler, especially when fighting with references.

The invocation of ::new() and .dup() are on separate lines. Now what about to chain up the constructor and .dup()? This time the compiler fails:

Haskell 笔记：State Monad

2018-12-15 | Tech

一个依赖于外部状态 s 的伪函数 f' :: a -> b，我们可以将其改写为 f :: a -> s -> (b, s) 使其良定。即，在输入输出中显式传递状态 s。现在，我们需要利用 Monad 将状态传递过程隐藏起来。

注意到，输出值 (b, s) 中的末状态 s 不仅依赖于输入状态，更依赖于之前更改过状态的一系列函数及其逻辑。因此我们不能简单地将 Monad 定义为 (a, s) 类似的形式，否则两个函数用 >=> 结合的结果将与函数逻辑无关，这与我们的期望不符。

考虑如下定义：

newtype State s a = { runState :: s -> (a, s) }

由于 -> 的右结合性，f :: a -> s -> (b, s) 和 f :: a -> State s b 等价。固定 s，则 State s 可以成为一个 Monad。一个类型为 State s a 的值通常也被称为一个 state processor。

现在尝试定义 (>>=) :: State s a -> (a -> State s b) -> State s b。若 p >>= f，则 p 蕴含了在此之前所有的状态处理逻辑，我们希望将 p 和 f 的逻辑融合在一起，成为一个新的 state processor，并作为返回值。

p >>= f = 
    (
        State $ \s -> (b, s'')
        where
            (a, s') = (runState p) s
            p2 = f a -- :: State s b
            (b, s'') = (runState p2) s'
    )

return 是平凡的：

return a = State $ (\s -> (a, s))

fmap 可以作如下定义：

fmap :: (a -> b) -> (State s a) -> (State s b)
fmap f = 
    (
        \pIn -> (
            \s -> (b, s')
            where
                (a, s') = (runState pIn) s
                b = f a
        )

如此一来，我们可以将一系列的依赖外部状态的函数串成一个依赖外部状态的函数，传以初始状态，便可得到结果。

Haskell 笔记：Monad 引论

2018-12-14 | Tech

动机

pure functions 看似完美，但却不能模拟现实世界中的诸多任务。这是由于 pure functions 是良定的映射，对于特定的输入值会返回唯一的输出。这种模式在面对如下任务时会显得苍白无力：

有可能失败的任务。如大多数的 IO。
依赖外部状态的任务。如（伪）随机数生成器。
非确定性任务，即对于确定的输入可能有多个输出。这种在 IP 中较为少见。
对外界会造成影响的任务。如大多数的写入过程。

这些问题可以用数学中的域扩充技巧来解决。

域扩充

在数学中，当定义问题的范畴不足以容纳问题的解时，我们通常会对相关的范畴进行扩充。类似的技巧同样也可以应用在这里。

假设一个不良定的函数 f: A -> B：

如果 f 有可能失败，我们可以将 B 扩充为 Err(B) ∪ { reasons of failures }，其中 reasons of failures 可能是对异常的描述，也可以是空值一类的东西。则 f': A -> Err(B) 是良定的映射，且与 f 行为一致。事实上，这就是 Maybe Monad 和 Either Monad。
如果 f 依赖于外部状态，我们定义 Pref(B) 为 从外部状态空间到 B 的映射的全体，则 f': A -> Pref(B) 为良定的映射，且行为和 f 一致。换言之，对于特定的输入 a，f'(a) 返回一个函数，其中蕴含了已知 a 时如何从各种不同状态得到结果的逻辑。事实上，这就是 State Monad。
如果 f 具有非确定性，我们将 B 扩充为 Power(B)，即 B 的幂集。则 f': A -> Power(B) 为良定的映射，且行为与 f 一致。事实上，这就是 List Monad。
如果 f 依赖于真实世界，我们将 B 扩充为 IO(B)，其中的元素为一些值域为 B 的伪函数，可能对真实世界有影响。这些伪函数已经脱离了 pure functions 的范畴，但将它们看成元素是没有问题的。如此一来 f': A -> IO(B) 为良定的映射，且行为与 f 一致。事实上，这就是 IO Monad。

以上操作都有一个共同点，即对一个不良定函数的值域做了扩充，使之变成良定函数。如果用 Haskell 语言描述，它们都有相似的型：f :: a -> m b，其中 m 为扩充规则。

一个问题随之而来：这样的新函数该怎么结合？为此我们要对相关逻辑进行抽象。这就是 Monad。

Monad

这里我们尝试从实际需求出发，导出一个 Type Constructor 成为 Monad 的必要条件。

约定两个名称：

称 a -> m b 型函数为 monadic function
称 a -> b 型函数为 non-monadic function

首先需要解决的是 monadic functions 如何结合的问题。这个问题具有重要的现实意义。monadic function 常常代表某种计算任务，它们之间的结合相当于把若干计算任务串行化，而后者是非常常见的需求。

我们希望有一种运算符有如下的类型 (b -> m c) -> (a -> m b) -> (a -> m c)，在此记为 >=> （因其形状，常被叫做 fish operator）。一个自然的想法是，Monad m 需要某种平凡的拆箱操作 extract' :: m a -> a。所谓“平凡”，即 extract' 不应该丢失参数的任何信息。但这往往不能实现，因为 m a 通常会比 a 包含更多的信息，导致 extract' 无法构成良定的映射。例如 Maybe a 中的值 Nothing 就无法在 a 中找到对应的值。

而事实上，我们不需要条件这么强的拆箱操作。在 m 已是 Functor 的情况下，拆箱操作可以弱化为 join :: m (m a) -> m a。我们尝试用 fmap、 join 合成 >=>。

f :: b -> m c
g :: a -> m b

fmap f :: m b -> m (m c)
(fmap f) . g :: a -> m (m c)
join . (fmap f) . g :: a -> m c

-- i.e.

f >=> g = join . (fmap f) . g

Functor 的假设是容易成立的。当然我们可以定义多个不同的 fmap，如此产生的 Monad 会有不同的语义。join 的假设也是容易成立的，m (m a) 通常和 m a 包含相同多的信息。故此做法是实际可行的。

我们再考虑 monadic function 和 non-monadic function 结合的问题。期望有如此一个运算：>.> :: (b -> c) -> (a -> m b) -> (a -> m c)。注意，此处返回值是 a -> m c 而不是 a -> c，因为我们不希望 a -> m b 产生的额外信息有所丢失。自然地，我们希望有一个平凡的装箱操作，return :: a -> m a。如此一来便可结合 >=> 完成上面的运算：

f :: b -> c
g :: a -> m b

return . f :: b -> m c
(return . f) >=> g :: a -> m c

-- i.e.

f >.> g :: (return . f) >=> g

non-monadic function 和 monadic function 另一个方向的结合是平凡的。

综上我们可以得到成为 Monad 的基本条件：

是 Functor，存在 fmap :: (a -> b) -> m a -> m b
有一个平凡的拆箱操作 join :: m (m a) -> m a
有一个平凡的装箱操作 return :: a -> m a

为了描述平凡，我们要求三个函数必须满足如下公理（下面的 f 为 non-monadic function）：

return . f == (fmap f) . return （return 的平凡性）
join . fmap (fmap f) == (fmap f) . join （join 的平凡性）

事实上在 Category Theory 中，还有另外两条公理：
join . (fmap join) == join . join
join . fmap return == join . return == id
以上四条公理描述了 Id（恒等 Functor）、m、m^2、m^3 之间的泛性质，并使图交换。

Monad Typeclass

以下为 Prelude 中的定义：

class Functor m => m a where

    return :: a -> m a
    (>>=)  :: m a -> (a -> m b) -> m b

此处没有出现 join，也没有 fish operator，而是使用了一个更常用的算符 >>= （通常称为 bind operator）。这是因为在实际中我们不直接将函数结合，而是使用 non-pointfree 的写法。

此外，还有 >> :: m a -> m b -> m b 运算符。return、>>=、>> 三者是构成 do-notation 的基础。此处不再赘述。

References

Haskell 笔记：Applicative

2018-11-18 | Tech

Motivation

Functor solves the problem of mapping regular single-parameter functions into a sub-category, but that’s not easy for functions with more than one parameter.

Let’s consider a function with two parameters f :: a -> b -> c, which can also read as a -> (b -> c). Applying fmap on f will yield fmap f :: m a -> m (b -> c). That’s still distant from what we expect: f' :: m a -> m b -> m c. To get f', we need a transform from m (b -> c) to m b -> m c. Here we denote it as <*> :: m （b -> c) -> m b -> m c. We will later show that such transform is universal for functions with more parameters.

Now consider a function with three parameters f :: a -> b -> c -> d. We are going to transform it into a wrapped-value version, with the help of fmap and <*>.

f :: a -> b -> c -> d

(fmap f) :: m a -> m (b -> (c -> d))

\a_ b_ -> (fmap f a_) <*> b_
    :: m a -> m b -> m (c -> d)

\a_ b_ c_ -> ((fmap f a_) <*> b_) <*> c_
    :: m a -> m b -> m c -> (m d)

Here \a_ b_ c_ -> ((fmap f a_) <*> b_) <*> c_ is in the desired type. For most of the time, applying parameters directly is actually what we want, instead of the function itself, so the code could simply be written as ((fmap f a) <*> b) <*> c, where a, b and c are wrapped values. Parenthesis could be omitted if precedences are set properly, which leads to a neat and easy-to-read form:

f `fmap` a <*> b <*> c

In haskell, fmap has an infix name <$>. So finally we get: f <$> a <*> b <*> c.

Applicative

Haskell pre-defines a type class Applicative, which captures the pattern of <*>. Any type that implements Applicative works well with <$> and <*>.

class Functor f => Applicative (f :: * -> *) where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b
  GHC.Base.liftA2 :: (a -> b -> c) -> f a -> f b -> f c
  (*>) :: f a -> f b -> f b
  (<*) :: f a -> f b -> f a

Note that an Applicative is also a Functor. Apart from <*>, there are some other helper functions or operators in Applicative.

pure is equivalent to the default value constructor of f, e.g. (:[]) for List or Just for Maybe. This may be handful when lifting an unwrapped value to a wrapped one.

liftA2 transforms a binary operator to the corresponding version. The function exists as binary operators would be frequently passed among high-order functions.

*> takes two wrapped parameters and simply returns the second one, which sequence up two wrapped values. This is quite useful for Applicative with action semantics, such as IO. In fact, it’s so useful that Haskell introduces a syntax sugar for it, known as the do-notation. Particularly:

do
    putStrLn "1"
    putStrLn "2"

is equivalent to

putStrLn "1" *> putStrLn "2"

<* is similar. Both will be reviewed while studying Monad.

Haskell 笔记：Category Theory and Functor

2018-11-18 | Tech

Category Theory

A category consists of three parts:

A collection of objects.
A collection of morphisms, each of which map one object to another.
A composition operator of these morphisms, i.e. morphisms can be composed. If f: A -> B and g: B -> C are morphisms, f.g generates a new morphism A -> C.

Note that a morphism has no specific semantics of mapping, but simply links two objects together. Morphisms are also called Arrows.

Examples

Set Category: Set

All sets and standard functions form a category. Functions need not to be surjective, since morphisms have no mapping semantics.

Group Category: Grp

All groups and homomorphisms between groups form a category. A group has specific algebaric structure, which morphisms should preserve.

Laws

Three laws that a category should obey:

Composition should be associative.
Composition operation should be enclosed in the category, i.e. if f: A -> B and g: B -> C, there must be a h: A -> C satisfying h = f . g.
For each object A, there should exist an identity morphism id(A): A -> A s.t. for every f: A -> B, f = id(A) . f = f . id(B).

Note that:

There may exist serveral morphisms between A and B.
An identity has type A -> A, but a morphism with such type needs not to be an identity.

Functors in Category Theory

A functor maps a category to another category. It should contains two mappings for objects and for morphisms, with composition operation and category laws preserved.

There’s a trivial functor from Grp to Set, which maps groups to their underlying sets, and group morphisms to functions with same behavior but defined on sets instead of groups.

Paramateric Types in Haskell

It’s common to create new types that hold values of other types. List[a] type constructor creates types that holds sequential values of same type; Maybe[a] creates types that hold operation states (failure, or success with returned values).

Usually we expect derived types to inherit functions from types being wrapped. For example, List[Int] should have element-wise addition as Int does, and Maybe[Int] should have similar operations with no burden of re-wrapping and unwrapping. Such ‘inheritance’ should be done automatically if possible, since it is only concerned with the structure of types instead of specific functions.

Hask Category

Haskell language itself forms a category, with all types being objects, and functions being morphisms. Such category is called Hask.

Hask Functors

class Functor m where
    fmap :: (a -> b) -> m a -> m b

A parameteric type implementing class Functor is a category functor, mapping Hask to one of its sub-category, where types m a are the object collection. The type constructor m maps objects, and specific fmap defined on m maps corresponding functions.

It’s worth noted that (a -> b) -> m a -> m b can also read as (a -> b) -> (m a -> m b), as -> is right-associative. This may provide a clearer view of fmap, which takes a regular function in Hask and returns the corresponding function in sub-category.

Examples:

fmap (+) :: List[Int] -> List[Int] generates element-wise addition in List[Int].

fmap (+) :: Maybe Int -> Maybe Int generates such function:

maybePlus :: Maybe Int -> Maybe Int
maybePlus _        Nothing  = Nothing
maybePlus Nothing  _        = Nothing
maybePlut (Just x) (Just y) = Maybe (x + y)

PREVNEXT