Swap Training and Test Data During Cross-Validation in scikit-learn

Scikit-learn is a well known Python machine learning library. It provides various utilities for machine learning, including those for cross-validation. In a standard \(K\)-fold cross-validation, the data are split into \(K\) subsets (with equal size). There are \(K\) rounds of training and testing. In each round, one subset is used as test data and all other subsets are used as training data. Under this setup, as long as \(K > 2\), there are always more training data than test data in each round of the cross-validation. Whilst this is desirable in most cases, in some machine learning applications, it is more desirable to have training data less than test data. For example, in graph embedding, each node in the network has a vector representation and labels. When running cross-validation, it is more desirable to use a smaller number of nodes as training data than the number of nodes as test data, since this better mimics the real-world scenario in terms of the amount of available training data (e.g., here). In scikit-learn, we can achieve this by swapping training and test data.

Continue reading

Creating Multiple-Choice Exams with Answering Boxes Using LaTeX

For a multiple-choice exam, to ease the grading procedure, it is often preferred to ask students to write their answers collectively in an answer sheet with their choices of answers filled in boxes. However, LaTeX, in particular the exam document class, does not directly provide the feature to automatically generate such boxes. In this post, we will let LaTeX to automatically generate these answering boxes, and with correct answers filled in when the answers document class option is turned on. The effects are displayed below, with correct answers shown and not shown, respectively. Their respective PDF files are also available: Without answers; with answers.

Continue reading

Use HTTP Clients with SOCKS Proxies (or SSH Tunnels) on GNU/Linux

On GNU/Linux, it is easy to create SOCKS proxies using programs such as ssh or tor. However, many applications on GNU/Linux, such as LibreOffice and genymotion (up to the date on which this post is written), can be configured to directly use HTTP proxies (or web proxies), but not SOCKS proxies. In this post, we will use privoxy, a non-cache web proxy, to enable these applications to use SOCKS proxies.

Continue reading

Enable Auto Completion for pip in Zsh

Pip is a package management system for installing and managing Python software packages. To enable auto completion for pip in zsh, the documentation of pip suggests adding the following line to ~/.zshrc:

eval "`pip completion --zsh`"

However, merely having this line would not enable auto completion for pip3. To enable auto completion for pip3 as well, add the following line after the line above:

compctl -K _pip_completion pip3