Author Archives: Hong

Swap Training and Test Data During Cross-Validation in scikit-learn

Scikit-learn is a well known Python machine learning library. It provides various utilities for machine learning, including those for cross-validation. In a standard \(K\)-fold cross-validation, the data are split into \(K\) subsets (with equal size). There are \(K\) rounds of training and testing. In each round, one subset is used as test data and all other subsets are used as training data. Under this setup, as long as \(K > 2\), there are always more training data than test data in each round of the cross-validation. Whilst this is desirable in most cases, in some machine learning applications, it is more desirable to have training data less than test data. For example, in graph embedding, each node in the network has a vector representation and labels. When running cross-validation, it is more desirable to use a smaller number of nodes as training data than the number of nodes as test data, since this better mimics the real-world scenario in terms of the amount of available training data (e.g., here). In scikit-learn, we can achieve this by swapping training and test data.

Continue reading

Creating Multiple-Choice Exams with Answering Boxes Using LaTeX

For a multiple-choice exam, to ease the grading procedure, it is often preferred to ask students to write their answers collectively in an answer sheet with their choices of answers filled in boxes. However, LaTeX, in particular the exam document class, does not directly provide the feature to automatically generate such boxes. In this post, we will let LaTeX to automatically generate these answering boxes, and with correct answers filled in when the answers document class option is turned on. The effects are displayed below, with correct answers shown and not shown, respectively. Their respective PDF files are also available: Without answers; with answers.

Continue reading

Use HTTP Clients with SOCKS Proxies (or SSH Tunnels) on GNU/Linux

On GNU/Linux, it is easy to create SOCKS proxies using programs such as ssh or tor. However, many applications on GNU/Linux, such as LibreOffice and genymotion (up to the date on which this post is written), can be configured to directly use HTTP proxies (or web proxies), but not SOCKS proxies. In this post, we will use privoxy, a non-cache web proxy, to enable these applications to use SOCKS proxies.

Continue reading

Enable Auto Completion for pip in Zsh

Pip is a package management system for installing and managing Python software packages. To enable auto completion for pip in zsh, the documentation of pip suggests adding the following line to ~/.zshrc:

eval "`pip completion --zsh`"

However, merely having this line would not enable auto completion for pip3. To enable auto completion for pip3 as well, add the following line after the line above:

compctl -K _pip_completion pip3

Enable Natural Scrolling for Trackpads Using libinput

Libinput is a library to handle input devices in Wayland and X.Org. It can be used as a drop-in replacement for evdev and synaptics in X.Org, and it is supported by a wide range of desktop environments, including GNOME and Xfce. In this post, we will see how to enable natural scrolling for trackpads using libinput. We will also leave mouses alone, i.e., no natural scrolling for mouses.

First, we need to know the name of the trackpad to enable natural scrolling for. This can be easily known by executing xinput --list. My output includes the following:

⎡ Virtual core pointer                          id=2    [master pointer  (3)]
⎜   ↳ Virtual core XTEST pointer                id=4    [slave  pointer  (2)]
⎜   ↳ bcm5974                                   id=13   [slave  pointer  (2)]

It is easy to see that my trackpad is bcm5974.

Continue reading

Send Git Patches Using GUI Email Clients

Git is a popular version control system that manages many impacting open source projects. Many of these projects, such as Linux kernel, zsh, GNU Emacs, etc. require patches and pull requests to be sent to their mailing lists. This can usually be done via git send-email or git imap-send, which, however, often takes a large amount of configuration and headaches for many people to make it work for the first time. Here, I propose a new approach to send git patches via email with GUI email clients, such as Thunderbird, Evolution, Claws Mail and Apple Mail.

Continue reading

Parallelize make by Default

The make utility is an standard utility on POSIX systems (GNU/Linux, macOS, etc.) that update files derived from other files, such as compiling source files to their binary forms. It is widely supported and used across different fields such as organizing and building C/C++/Fortran projects, building Sphinx documentation, etc.

The most popular implementation of the make utility is probably GNU make, which is usually the default make program on various GNU/Linux distributions. (On macOS, the version of the default GNU make is pretty old. Please consult Install and Use GNU Command Line Tools on macOS/OS X for a newer version.) It adds one very important feature besides the standard make specification: parallelization. The command line option -j can be used to specify the maximum number of jobs that it is allowed to run simultaneously. However, it is quite annoying to type up this option every time when using it—we want a setting such that the CPU can be fully utilized by default. To achieve this goal, add the following lines to your ~/.bashrc if you use bash or ~/.zshrc if you use zsh:

# set MAKEFLAGS
if type nproc &>/dev/null; then   # GNU/Linux
  export MAKEFLAGS="$MAKEFLAGS -j$(($(nproc)-1))"
elif type sysctl -n hw.ncpu &>/dev/null; then   # macOS, FreeBSD
  export MAKEFLAGS="$MAKEFLAGS -j$(($(sysctl -n hw.ncpu)-1))"
fi

The code above sets the envrionmental variable MAKEFLAGS, which specifies the command line arguments of any invoked make subprocesses. It is set such that the maximum number of jobs that is allowed to be run simultaneously is equal to the number of available CPU cores minus 1. In this way, the hardware is more or less fully utilized when using make, with one CPU core left for other potential tasks on the system.

Attachment Reminder in Emacs message-mode

When sending out an email, sometimes we just forgot to attach the attachments. An attachment reminder can largely prevent this: You are asked to confirm whether you have forgotten the attachments if your message body shows that you may need one. However, Emacs by default does not provide such a feature in its mail composing mode message-mode, which is used in email clients such as gnus and mu4e. Here is an attachment reminder based on this comment:

(defun my-message-current-line-cited-p ()
  "Indicate whether the line at point is a cited line."
  (save-match-data
    (string-match (concat "^" message-cite-prefix-regexp)
                  (buffer-substring (line-beginning-position) (line-end-position)))))

(defun my-message-says-attachment-p ()
  "Return t if the message suggests there can be an attachment."
  (save-excursion
    (goto-char (point-min))
    (save-match-data
      (let (search-result)
        (while
            (and (setq search-result (re-search-forward "\\(attach\\|pdf\\|file\\)" nil t))
                 (my-message-current-line-cited-p)))
        search-result))))

(defun my-message-has-attachment-p ()
  "Return t if the message has an attachment."
  (save-excursion
    (goto-char (point-min))
    (save-match-data
      (re-search-forward "<#part" nil t))))

(defun my-message-pre-send-check-attachment ()
  (when (and (my-message-says-attachment-p)
             (not (my-message-has-attachment-p)))
    (unless
        (y-or-n-p "The message suggests that you may want to attach something, but no attachment is found. Send anyway?")
      (error "It seems that an attachment is needed, but none was found. Aborting sending."))))
(add-hook 'message-send-hook 'my-message-pre-send-check-attachment)

If “attach”, “pdf” or “file” are in the message you are sending, a confirmation would be required to confirm the attachments if they are not attached.

Set up Synapse (Matrix Homeserver) on Ubuntu 16.04

Matrix is an open standard for decentralized persistent communication, shares somewhat similar goals to Jabber/XMPP. It attracts people from using centralized communicating software such as Facebook Messenger, WhatsApp, etc. In the Matrix protocol, a piece of software called “homeserver” plays a key role to connect users. To use Matrix, one such server must be set up. In this post, we will set up Synapse, an implementation of the Matrix homeserver maintained by the Matrix team, using a minimal configuration on Ubuntu 16.04.

Continue reading