Category Archives: Python

Platform Dependent Python Coverage Test with Tox

Last updated on December 20, 2020

When testing Python programs, is often used in measuring code coverage, and enforcing 100% code coverage is regarded as a good practice:

# .coveragerc
# Enforce 100% coverage test
fail_under = 100
show_missing = True

However, if there are some lines of code that are platform dependent (i.e., they are never executed on at least one platform), code coverage tests usually fail. For example, the following code snippet would always lead to a coverage that is less than 100% on a platform other than Windows:

if != 'nt':
    # Do something if the OS is not Windows...

You can ask to ignore this block by adding a comment # pragma: no cover, but then would ignore it on all platforms, including all non-Windows platforms. If you use tox for testing, this issue can be resolved cleanly.

Continue reading

Catching FileNotFoundError? Watch Out!

Last updated on October 3, 2020

In Python, FileNotFoundError is an exception that is raised when a requested file does not exist. Many people assume that when their programs fail to open a file in read-only mode or delete a file, FileNotFoundError must be raised and they should only need to process that. For example, some people would write code similar to:

def process_file(path):
    import sys

        f = open(path, 'r')  # or os.remove(path)
    except FileNotFoundError as e:
        print(f"File {path} not found!", file=sys.stderr)
    # process the file...

However, this code may actually trigger unexpected errors. The reason is that, the failure to open a file in read-only mode or delete a file is not necessarily caused by the non-existence of the file. Very often, it's for different reasons: insufficient permission, or the file is a directory. In this case, PermissionError or IsADirectoryError would be thrown instead of FileNotFoundError. So, in the example above, one would want to catch all of them:

Continue reading

Nikola: How to Deploy Compiled Webpages to a Different Git Repository

Last updated on November 18, 2018

Nikola is one of the most popular static website generators. It compiles source files into final publishable webpages offline and then uploads those files to a web host. Compared to dynamic websites such as those powered by PHP or Ruby on Rails, static websites offer better security and faster page loading.

Nikola provides some utilities to ease the deployment procedure (i.e., uploading compiled webpages), especially for deploying as GitHub pages. Unfortunately, Nikola does not (and its team does not plan to) provide a direct way to deploy the compiled webpages to a git repository that is different from the one that hosts the source files. This is often useful when you want to hide the source files in a private git repository and leave the git repository that hosts the compiled webpages public. Luckily, Nikola provides customizable deploying commands. Assuming output is the directory where the compiled webpages are located, change the value of DEPLOY_COMMANDS using the following in (replace with your email address, with your designated git repository on GitHub/GitLab/BitBucket/etc., and master with your designated branch):

    'default': [
        "cd output && git init && git config && touch .nojekyll && git add .",
        "cd output && git commit -a -m 'Nikola'",
        "cd output && git push -f master",

Now running nikola deploy should deploy the compiled webpages to your designated git repository and branch.

Swap Training and Test Data During Cross-Validation in scikit-learn

Last updated on October 11, 2018

Scikit-learn is a well known Python machine learning library. It provides various utilities for machine learning, including those for cross-validation. In a standard \(K\)-fold cross-validation, the data are split into \(K\) subsets (with equal size). There are \(K\) rounds of training and testing. In each round, one subset is used as test data and all other subsets are used as training data. Under this setup, as long as \(K > 2\), there are always more training data than test data in each round of the cross-validation. Whilst this is desirable in most cases, in some machine learning applications, it is more desirable to have training data less than test data. For example, in graph embedding, each node in the network has a vector representation and labels. When running cross-validation, it is more desirable to use a smaller number of nodes as training data than the number of nodes as test data, since this better mimics the real-world scenario in terms of the amount of available training data (e.g., here). In scikit-learn, we can achieve this by swapping training and test data.

Continue reading

Enable Auto Completion for pip in Zsh

Last updated on August 8, 2017

Pip is a package management system for installing and managing Python software packages. To enable auto completion for pip in zsh, the documentation of pip suggests adding the following line to ~/.zshrc:

eval "`pip completion --zsh`"

However, merely having this line would not enable auto completion for pip3. To enable auto completion for pip3 as well, add the following line after the line above:

compctl -K _pip_completion pip3

Too Many Escaping Backslashes? Avoid Them!

Last updated on October 1, 2016

Backslash escaping is common in programming. Sometimes we may let a file go through a few filters or template engines, such as markdown, quik, etc. and things become even worse if we are writing the template files from a string which requires backslash escaping for any literal backslashes appearing in the string. On Windows, things are more horrible than on Unices (You know why, right? Hint: path separator). Then, if you need a “real” backslash in the final output, you may end up with four or eight or sixteen backslashes in the original file. This is horrible. To avoid this situation, I wrote a short preprocessing script in Python to double or quadruple or octuple or zzzuple your backslashes.

Continue reading

Use Travis CI with Jython

Last updated on November 16, 2018

This post was updated on Feb 11, 2013, since the old way never works now.

Travis CI is a hosted continuous integration service for the open source community, helping run tests for your GitHub projects for every single push and pull request. However, by the time this post is written, Travis CI has not officially supported Jython, a Python interpreter written in Java. This post will help you setup a Jython testing environment for a Python project on Travis CI.

Continue reading