Wed 09 Mar 2016 — Thu 21 Nov 2019

Python Catch-Up

Useful notes on modern developments in Python.

Strings

UTF-8 is the default on Linux and Windows now.

You can write a formatted string for expression interpolation.

This can include width, precision, and underscore formatting, amongst others.

Annoyingly, underscores are included in width.

f" \
  This is a formatted string, which can contain arbitrary expressions: \
  {2**16:011_},\
  {233/3:.4}\
"
'   This is a formatted string, which can contain arbitrary expressions:   000_065_536,  77.67'

Numbers

You can put arbitrary underscores into number literals. This has no effect, and is just for readability.

1_000_000 # One miillllllion dollars
1000000

There is a math.isclose(a, b, relative_tolerance, absolute_tolerance) function for comparing floating point numbers.

Dates and Times

Python now has functions to do timings in nanoseconds. These work on Linux and Windows.

There is a now a concept of a fold. This is used when the clocks go back.

Tuples

Python 2 used to have a tuple-unpacking behaviour in function arguments def fn((a, b)):.

This was removed in Python 3. But if you see something weird that looks like that, then now you know why.

Annotations

You can put arbitrary expressions on variable names, and on function arguments and return values.

from sys import modules

annotated_variable: "This variable name has an annotation" = 5 / 3

argument_b = "argument b"

def annotated_function(a: "argument a",
	      b: argument_b = 8) -> "return value annotation":
    return b


[
    ## Variable annotations are stored on the current module.
    modules[__name__].__annotations__,
    ## Function annotations are stored on the function object.'
    annotated_function.__annotations__
]

[{'annotated_variable': 'This variable name has an annotation'},
{'a': 'argument a', 'b': 'argument b', 'return': 'return value annotation'}]

These show up in documentation generated using pydoc.

In the future (from Python 4.0), annotations will be lazily evaluated.

You can get this behaviour now:

from __future__ import annotations

Annotations are mostly used for type hints now. They might be useful for other things, but you should think carefully first.

Type Hints

https://docs.python.org/3.8/library/typing.html

This is still subject to change. It's based on annotations (see above). Annotate your function signatures and return value with your types.

Use the mypy static type checker to verify your program. Type hints do nothing at run-time.

You can create types by combining built-in types and assigning them to a variable. Alternatively, you can use NewType('NewTypeName', oldType).

When creating your own classes, you can inherit from the types in the typing package. You might want to use multiple inheritance for this.

If we need to use None, we instead write type(None), although you can get away without doing this when filling in type parameters for generics.

Important built-in types inside the typing package are:

Any
anything that can go anywhere. Use it to hack around it the type system is being difficult. Consider whether you should be using object instead.
Union[A,B,C]
accepts one of the specified types.
Optional[A]
alias for Union[A,None].
Intersection[A,B,C]
not implemented. Type must be a subtype of all of the listed types.
Tuple[A,B,C]
requires all the types in the specified order.
Callable[A,B,C,X] or Callable[…,X]
a function.
Type[C]
accepts the class object C.
Generic[A,B,C]
takes type parameters when you specialize (subclass) it. You can fill in some, none or all of the type parameters each time you subclass.
TypeVar['SomeGenericParameter']
define a type parameter, to be used by generic. Can be marked as covariant or contravariant. Can specify complicated type bounding rules.
@typing.no_type_check
a decorator which turns off type checks for this function or class

In addition to this, it also includes generic versions of most of the built-in types.

There's also a special class for NameTuple:

from typing import NamedTuple

class TestIt(NamedTuple):
    description: str
    x: int

TestIt(
    "hello",
    5
)

TestIt(description='hello', x=5)

Sometimes we have the situation where we can't refer to a type because it isn't declared yet. For example, if a class needs to refer to itself in its methods. This is called a forward reference. We can use a string with the type's name instead.

Destructuring

Called unpacking in Python. * unpacks arrays, and ** unpacks dictionaries.

Arrays can be unpacked for assignment, in other array constructors, and in function calls:

first, _, *third_onwards = [1, 2, 3, 5, 4]
print("array unpacked using assignment", third_onwards)

print(
    "array constructed by unpacking other arrays",
    [
	*[1, 2, 3, 4, 5],
	6, 7, 8,
	*[9, 10]
    ]
)

print(
    "function call positional parameters filled by unpacking array",
    *[1, 2, 3, 4, 5]
)

def varargs(first, *rest):
    return rest

print("use of varargs", varargs(1, 2, 3, 4, 5))
array unpacked using assignment [3, 5, 4]
array constructed by unpacking other arrays [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
function call positional parameters filled by unpacking array 1 2 3 4 5
use of varargs (2, 3, 4, 5)

Trying to use the * syntax for dictionaries gives you the keys.

The ** syntax for dictionary doesn't work on the left-hand-side of an assignment, but everything else is OK.

a, _, *other_keys = {"a": 1, "b": 2, "c": 3, "d": 4}
print("dictionary keys unpacked using assignment", other_keys)

## You can't use the ** syntax in assignments.
## a, _, **other_keys = {"a": 1, "b": 2, "c": 3, "d": 4}

print(
    "dictionary constructed by unpacking other dictionaries",
    {
	"a": 1,
	**{"b": 2, "c": 3},
	**{"d": 4, "e": 5, "f": 6}
    }
)

def f(a, b, c):
    return (a, b, c)

print(
    "keyword arguments met by unpacking dictionary",
    f(**{"a": 1, "b": 2, "c": 3})
)

def kw_varargs(a, **kwargs):
    return kwargs

print(
    "keyword varargs function call",
    kw_varargs(**{"a": 1, "b": 2, "c": 3})
)
dictionary keys unpacked using assignment ['c', 'd']
dictionary constructed by unpacking other dictionaries {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}
keyword arguments met by unpacking dictionary (1, 2, 3)
keyword varargs function call {'b': 2, 'c': 3}

Bare * Trick

Any function arguments after a varargs *rest argument must be called as keyword arguments.

If you don't want to accept varargs arguments, but you do want to force some to be keyword arguments, then you can just use a bare star.

def f(*, x):
    return x

f(x=0)
0

Files

It's worth looking at the built in glob package.

Generator Delegation

A nice yield from syntax was introduced in Python 3.3.

def generator_a():
    yield 1
    yield 2
    yield 3

def generator_b():
    yield "head"
    yield from generator_a()
    yield from range(-3, 0)
    yield "tail"

[thing for thing in generator_b()]
['head', 1, 2, 3, -3, -2, -1, 'tail']

Unittest

Update use pytest instead. Much better.

https://docs.python.org/3/library/unittest.html

It's based on JUnit.

You inherit from unittest.TestCase, and give it methods named like def test_thing(self):.

Inside those methods, the self object will have some assertion methods available.

If we implement setUp(self): or tearDown(self): methods, then they will automatically be called.

We run our tests using python3 -m unittest my-tests.py.

Subtests

Sometimes we want to write little tests. We can do this using the subTest context manager and the with keyword:

with self.subtest(thing = "whatever"):
    self.assertSomething(computation(thing))

Debugging

https://docs.python.org/3/library/pdb.html

The debugger has an API, some Python command line options, and some interactive commands.

In particular, you can use the breakpoint() function to enter debug mode.

python3 -m pdb some-python.py

Once you're in debug mode, press 'h' to get help on available commands.

TODO Asynchrony

asyncio coroutines https://www.python.org/dev/peps/pep-0492/ https://www.python.org/dev/peps/pep-0530/ https://docs.python.org/3/library/asyncio.html

You can include the await keyword inside a generator function to create an asynchronous generator. TODO: How do you then use it?

You can do the same inside comprehensions and generator expressions.

Metaprogramming and reflection

See metaclasses.

Classes have an init_subclass() function, which is called TODO: when is it called?

There is an inspect.signature(some_function) function. It understands types.

from inspect import signature

def fn(a, b, c, *args, **kwargs):
    return None

str(signature(fn))
'(a, b, c, *args, **kwargs)'

zipapp

zipapp is a built in package for Python which allows you to package up a program for easy distribution.

python3 -m zipapp source-directory -m "intial_module:main_function" -o my-application.pyz
python3 my-application.pyz

VirtualEnv

VirtualEnv is an obsolete way to create an isolated Python environment with its own installed packages. This helps when different projects have different dependencies.

At some point, Python starting shipping the pyvenv script instead.

Now, it's just built into the command line.

(I generally prefer to use Nix to create my environments instead.)

pip install virtualenv && virtualenv venv # Distant past
pyvenv venv # Past
python3.6 -m venv # Current

Useful Python Libraries

Functional Programming

See functools.

If you need to pass around standard arithmetic operators, see the operator module.

Stats and Matrices

Numpy

Pandas

Pandas docs

Missing data is NaN. Ooops.

Pandas is basically a bit like data.frames or dplyr from R. It provides data structures, indexing, sorting, grouping.

You can broadcast simple operations like + over every element in a DataFrame or Series.

Data is mutable.

Basic stats and IO.

Sparse data structures available.

useful pandas snippets

  • Series

    A vector

  • DataFrame

    Like R's data.frame

    Has a .info() method (like str() from R, also shows memory usage)

  • Date and Time

    Timestamp: a datetime

    • to_timestamp

    Period: a datetime with a duration

    • PeriodIndex has a defined frequency
    • can't use binacry operations on two Periods with different frequencies

    Timedelta: a duration

    • use resample to change the frequency
    • to_timedelta

    DateOffset: a frequency (a bit like a Timedelta, I suppose)

    • can be 'anchored', e.g. to day N of the month of day M of the week
    • isAnchored() tells you if its anchored
    • can be added and subtracted to Timestamps
    • can roll Timestamps forward and backward to the nearest
    • Indexing

      All the various indexes here support partial slicing by strings. For example, my_timeseries["2016"].

Scipy

scikit-learn

http://scikit-learn.org/stable/user_guide.html

A toolkit of basic machine learning algorithms.

Command Line

Try Argh. This is based on the built in argparse, but should be nicer.

Update: Argh doesn't appear to be maintained any more. Click is a good alternative.

Emacs Integration for Python

https://www.emacswiki.org/emacs/PythonProgrammingInEmacs

Python mode is built in. It doesn't play well with org-babel, however. In particular, you can't put any empty lines inside functions of classes.

Furthermore, if you set results: output on your source block, the output will be fairly nonsensical with lots of unnecessary prompts.

The ob-ipython package solves this problem. However, it also introduces a new problem by breaking table output.

Libraries and Packages

There are various tools to make emacs aware of virtualenv.

However, I prefer to use nix packages instead.

Python Packaging

A module is an anything.py file. You can import anything that's defined in it (functions, classes, variables, and so on).

A package is a directory which contains an __init__.py file. All the files inside it are its modules. It may also contain other directories as sub-packages, imported with a dotted syntax.

Python has a concept of the Python Path. This includes the dist folder of the Python install itself, the working directory of the original file which was called, and the PYTHON_PATH environment variable.

You may import any module or package which is in a directory on the Python Path.

The tool setuptools creates a distributable for a package. First you need to put a setup.py file inside the package. This defines which other packages should be included in the distributable and which external packages it depends on.

Setuptools also allows you to generate executables by specifying entry points within your packages.

Setuptools can produce a source distribution (sdist), which is portable across platforms.

python3 setup.py sdist

Setuptools can also produce a binary distribution (wheel), which is only fully portable if it and its dependencies are pure Python. Python does not deal with the complexities of compiling non-Python code to other platforms.

python3 setup.py bdist

Lastly, setuptools has a way for you to install your packages locally with a symlink, so that you can make changes to them and observe the effect on other Python programs on your system that use them.

python3 setup.py develop

I haven't been able to make the develop command work well with the Nix package manager.