Python Catch-Up
Useful notes on modern developments in Python.
Strings
UTF-8 is the default on Linux and Windows now.
You can write a formatted string for expression interpolation.
This can include width, precision, and underscore formatting, amongst others.
Annoyingly, underscores are included in width.
f" \ This is a formatted string, which can contain arbitrary expressions: \ {2**16:011_},\ {233/3:.4}\ "
' This is a formatted string, which can contain arbitrary expressions: 000_065_536, 77.67'
Numbers
You can put arbitrary underscores into number literals. This has no effect, and is just for readability.
1_000_000 # One miillllllion dollars
1000000
There is a math.isclose(a, b, relative_tolerance, absolute_tolerance)
function for comparing floating point numbers.
Dates and Times
Python now has functions to do timings in nanoseconds. These work on Linux and Windows.
There is a now a concept of a fold. This is used when the clocks go back.
Tuples
Python 2 used to have a tuple-unpacking behaviour in function arguments def fn((a, b)):
.
This was removed in Python 3. But if you see something weird that looks like that, then now you know why.
Annotations
You can put arbitrary expressions on variable names, and on function arguments and return values.
from sys import modules annotated_variable: "This variable name has an annotation" = 5 / 3 argument_b = "argument b" def annotated_function(a: "argument a", b: argument_b = 8) -> "return value annotation": return b [ ## Variable annotations are stored on the current module. modules[__name__].__annotations__, ## Function annotations are stored on the function object.' annotated_function.__annotations__ ]
[{'annotated_variable': 'This variable name has an annotation'}, {'a': 'argument a', 'b': 'argument b', 'return': 'return value annotation'}]
These show up in documentation generated using pydoc.
In the future (from Python 4.0), annotations will be lazily evaluated.
You can get this behaviour now:
from __future__ import annotations
Annotations are mostly used for type hints now. They might be useful for other things, but you should think carefully first.
Type Hints
https://docs.python.org/3.8/library/typing.html
This is still subject to change. It's based on annotations (see above). Annotate your function signatures and return value with your types.
Use the mypy static type checker to verify your program. Type hints do nothing at run-time.
You can create types by combining built-in types and assigning them to
a variable. Alternatively, you can use NewType('NewTypeName', oldType)
.
When creating your own classes, you can inherit from the types in the typing package. You might want to use multiple inheritance for this.
If we need to use None
, we instead write type(None)
, although you
can get away without doing this when filling in type parameters for generics.
Important built-in types inside the typing package are:
- Any
- anything that can go anywhere. Use it to hack around it the type system is being difficult. Consider whether you should be using object instead.
- Union[A,B,C]
- accepts one of the specified types.
- Optional[A]
- alias for
Union[A,None]
. - Intersection[A,B,C]
- not implemented. Type must be a subtype of all of the listed types.
- Tuple[A,B,C]
- requires all the types in the specified order.
- Callable[A,B,C,X] or Callable[…,X]
- a function.
- Type[C]
- accepts the class object C.
- Generic[A,B,C]
- takes type parameters when you specialize (subclass) it. You can fill in some, none or all of the type parameters each time you subclass.
- TypeVar['SomeGenericParameter']
- define a type parameter, to be used by generic. Can be marked as covariant or contravariant. Can specify complicated type bounding rules.
@typing.no_type_check
- a decorator which turns off type checks for this function or class
In addition to this, it also includes generic versions of most of the built-in types.
There's also a special class for NameTuple:
from typing import NamedTuple class TestIt(NamedTuple): description: str x: int TestIt( "hello", 5 )
TestIt(description='hello', x=5)
Sometimes we have the situation where we can't refer to a type because it isn't declared yet. For example, if a class needs to refer to itself in its methods. This is called a forward reference. We can use a string with the type's name instead.
Destructuring
Called unpacking in Python. *
unpacks arrays, and **
unpacks dictionaries.
Arrays can be unpacked for assignment, in other array constructors, and in function calls:
first, _, *third_onwards = [1, 2, 3, 5, 4] print("array unpacked using assignment", third_onwards) print( "array constructed by unpacking other arrays", [ *[1, 2, 3, 4, 5], 6, 7, 8, *[9, 10] ] ) print( "function call positional parameters filled by unpacking array", *[1, 2, 3, 4, 5] ) def varargs(first, *rest): return rest print("use of varargs", varargs(1, 2, 3, 4, 5))
array unpacked using assignment [3, 5, 4] array constructed by unpacking other arrays [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] function call positional parameters filled by unpacking array 1 2 3 4 5 use of varargs (2, 3, 4, 5)
Trying to use the *
syntax for dictionaries gives you the keys.
The **
syntax for dictionary doesn't work on the left-hand-side of
an assignment, but everything else is OK.
a, _, *other_keys = {"a": 1, "b": 2, "c": 3, "d": 4} print("dictionary keys unpacked using assignment", other_keys) ## You can't use the ** syntax in assignments. ## a, _, **other_keys = {"a": 1, "b": 2, "c": 3, "d": 4} print( "dictionary constructed by unpacking other dictionaries", { "a": 1, **{"b": 2, "c": 3}, **{"d": 4, "e": 5, "f": 6} } ) def f(a, b, c): return (a, b, c) print( "keyword arguments met by unpacking dictionary", f(**{"a": 1, "b": 2, "c": 3}) ) def kw_varargs(a, **kwargs): return kwargs print( "keyword varargs function call", kw_varargs(**{"a": 1, "b": 2, "c": 3}) )
dictionary keys unpacked using assignment ['c', 'd'] dictionary constructed by unpacking other dictionaries {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6} keyword arguments met by unpacking dictionary (1, 2, 3) keyword varargs function call {'b': 2, 'c': 3}
Bare * Trick
Any function arguments after a varargs *rest
argument must be called as keyword arguments.
If you don't want to accept varargs arguments, but you do want to force some to be keyword arguments, then you can just use a bare star.
def f(*, x): return x f(x=0)
0
Files
It's worth looking at the built in glob package.
Generator Delegation
A nice yield from
syntax was introduced in Python 3.3.
def generator_a(): yield 1 yield 2 yield 3 def generator_b(): yield "head" yield from generator_a() yield from range(-3, 0) yield "tail" [thing for thing in generator_b()]
['head', 1, 2, 3, -3, -2, -1, 'tail']
Unittest
Update use pytest instead. Much better.
https://docs.python.org/3/library/unittest.html
It's based on JUnit.
You inherit from unittest.TestCase
, and give it methods named like def test_thing(self):
.
Inside those methods, the self
object will have some assertion methods available.
If we implement setUp(self):
or tearDown(self):
methods, then they will automatically be called.
We run our tests using python3 -m unittest my-tests.py
.
Subtests
Sometimes we want to write little tests. We can do this using the subTest context manager and the with keyword:
with self.subtest(thing = "whatever"): self.assertSomething(computation(thing))
Debugging
https://docs.python.org/3/library/pdb.html
The debugger has an API, some Python command line options, and some interactive commands.
In particular, you can use the breakpoint()
function to enter debug mode.
python3 -m pdb some-python.py
Once you're in debug mode, press 'h' to get help on available commands.
TODO Asynchrony
asyncio coroutines https://www.python.org/dev/peps/pep-0492/ https://www.python.org/dev/peps/pep-0530/ https://docs.python.org/3/library/asyncio.html
You can include the await keyword inside a generator function to create an asynchronous generator. TODO: How do you then use it?
You can do the same inside comprehensions and generator expressions.
Metaprogramming and reflection
See metaclasses.
Classes have an init_subclass()
function, which is called TODO: when is it called?
There is an inspect.signature(some_function)
function. It understands types.
from inspect import signature def fn(a, b, c, *args, **kwargs): return None str(signature(fn))
'(a, b, c, *args, **kwargs)'
zipapp
zipapp is a built in package for Python which allows you to package up a program for easy distribution.
python3 -m zipapp source-directory -m "intial_module:main_function" -o my-application.pyz
python3 my-application.pyz
VirtualEnv
VirtualEnv is an obsolete way to create an isolated Python environment with its own installed packages. This helps when different projects have different dependencies.
At some point, Python starting shipping the pyvenv script instead.
Now, it's just built into the command line.
(I generally prefer to use Nix to create my environments instead.)
pip install virtualenv && virtualenv venv # Distant past pyvenv venv # Past python3.6 -m venv # Current
Useful Python Libraries
Functional Programming
See functools.
If you need to pass around standard arithmetic operators, see the operator module.
Stats and Matrices
Numpy
Pandas
Missing data is NaN. Ooops.
Pandas is basically a bit like data.frames or dplyr from R. It provides data structures, indexing, sorting, grouping.
You can broadcast simple operations like +
over every element in a DataFrame or Series.
Data is mutable.
Basic stats and IO.
Sparse data structures available.
- Series
A vector
- DataFrame
Like R's data.frame
Has a .info() method (like str() from R, also shows memory usage)
- Date and Time
Timestamp: a datetime
to_timestamp
Period: a datetime with a duration
- PeriodIndex has a defined frequency
- can't use binacry operations on two Periods with different frequencies
Timedelta: a duration
- use
resample
to change the frequency to_timedelta
DateOffset: a frequency (a bit like a Timedelta, I suppose)
- can be 'anchored', e.g. to day N of the month of day M of the week
isAnchored()
tells you if its anchored- can be added and subtracted to Timestamps
- can roll Timestamps forward and backward to the nearest
Scipy
scikit-learn
http://scikit-learn.org/stable/user_guide.html
A toolkit of basic machine learning algorithms.
Emacs Integration for Python
https://www.emacswiki.org/emacs/PythonProgrammingInEmacs
Python mode is built in. It doesn't play well with org-babel, however. In particular, you can't put any empty lines inside functions of classes.
Furthermore, if you set results: output
on your source block, the
output will be fairly nonsensical with lots of unnecessary prompts.
The ob-ipython package solves this problem. However, it also introduces a new problem by breaking table output.
Libraries and Packages
There are various tools to make emacs aware of virtualenv.
However, I prefer to use nix packages instead.
Python Packaging
A module is an anything.py
file. You can import anything that's
defined in it (functions, classes, variables, and so on).
A package is a directory which contains an __init__.py
file. All
the files inside it are its modules. It may also contain other
directories as sub-packages, imported with a dotted syntax.
Python has a concept of the Python Path. This includes the dist
folder of the Python install itself, the working directory of the
original file which was called, and the PYTHON_PATH
environment
variable.
You may import any module or package which is in a directory on the Python Path.
The tool setuptools creates a distributable for a package. First you
need to put a setup.py
file inside the package. This defines which
other packages should be included in the distributable and which
external packages it depends on.
Setuptools also allows you to generate executables by specifying entry points within your packages.
Setuptools can produce a source distribution (sdist), which is portable across platforms.
python3 setup.py sdist
Setuptools can also produce a binary distribution (wheel), which is only fully portable if it and its dependencies are pure Python. Python does not deal with the complexities of compiling non-Python code to other platforms.
python3 setup.py bdist
Lastly, setuptools has a way for you to install your packages locally with a symlink, so that you can make changes to them and observe the effect on other Python programs on your system that use them.
python3 setup.py develop
I haven't been able to make the develop
command work well with the
Nix package manager.