Raymond Hettinger Profile picture
Chief trainer for Mutable Minds. Certified Public Accountant. Python guru. Alloy and TLA⁺ enthusiast. Aspiring pianist. Former pilot. Born at 320 ppm CO₂.
Aromat Profile picture 1 added to My Authors
6 Apr
#Python factlet: The dict.popitem() method is guaranteed to remove key/value pairs in LIFO order.

>>> d = dict(red=1, green=2, blue=3)
>>> d.popitem()
('blue', 3)
>>> d.popitem()
('green', 2)
>>> d.popitem()
('red', 1)

1/
In contrast, OrderedDict.popitem() supports both FIFO and LIFO extraction of key/value pairs.

>>> from collections import OrderedDict
>>> d = OrderedDict(red=1, green=2, blue=3)

>>> d.popitem(last=False) # FIFO
('red', 1)

>>> d.popitem() # LIFO
('blue', 3)

2/
OrderedDict can efficiently move entries to either end without a hash table update.

>>> d = OrderedDict(red=1, green=2, blue=3)

>>> d.move_to_end('green')
>>> list(d)
['red', 'blue', 'green']

>>> d.move_to_end('green', last=False)
>>> list(d)
['green', 'red', 'blue']

3/
Read 4 tweets
3 Jan
#Python factlet: The len() function insists that the corresponding __len__() method return a value x such that:

0 ≤ x.__index__() ≤ sys.maxsize

* 3.0 and '3' don't have an __index__ method.
* -1 is too small.
* sys.maxsize+1 is too big.

1/
You could call __len__() successfully, but the len() function fails:

class A:
def __len__(self):
return -1

>>> a = A()

>>> a.__len__()
-1

>>> len(a)
...
ValueError: __len__() should return >= 0

2/
Classes written in C typically implement __len__() with the mp_length or sq_length slot. That constrains them to sys.maxsize limits:

>>> r = range(10**100)
>>> r.__len__()
Traceback (most recent call last):
...
OverflowError: Python int too large to convert to C ssize_t

3/
Read 4 tweets
2 Jan
@yera_ee Each way has its advantages.

With dataclasses, you get nice attribute access, error checking, a name for the aggregate data, and a more restrictive equality test. All good things.

Dicts are at the core of the language and are interoperable with many other tools: json, **kw, …
@yera_ee Dicts have a rich assortment of methods and operators.
People learn to use dicts on their first day.
Many existing tools accept or return dicts.
pprint() knows how to handle dicts.
Dicts are super fast.
JSON.
Dicts underlie many other tools.
@yera_ee Embrace dataclasses but don't develop an aversion to dicts.

Python is a very dict centric language.

Mentally rejecting dicts would be like developing an allergy to the language itself. It leads to fighting the language rather than working in harmony with it.
Read 4 tweets
27 Dec 20
1/ #Python tip: Override the signature for *args with the __text_signature__ attribute:

def randrange(*args):
'Choose a random value from range(start[, stop[, step]]).'
return random.choice(range(*args))

randrange.__text_signature__ = '(start, stop, step, /)'
2/ The attribute is accessed by the inspect module:

>>> inspect.signature(randrange)
<Signature (start, stop, step, /)>
3/ Tooltips and help() will now be more informative:

>>> help(randrange)
randrange(start, stop, step, /)
Choose a random value from range(start[, stop[, step]]).
Read 4 tweets
25 Oct 20
1/ #Python tip: The functools.cache() decorator is astonishingly fast.

Even an empty function that returns None can be sped-up by caching it. 🤨

docs.python.org/3/library/func…
2/ Here are the timings:

$ python3.9 -m timeit -r11 -s 'def s(x):pass' 's(5)'
5000000 loops, best of 11: 72.1 nsec per loop

$ python3.9 -m timeit -r11 -s 'from functools import cache' -s 'def s(x):pass' -s 'c=cache(s)' 'c(5)'
5000000 loops, best of 11: 60.6 nsec per loop
3/ Behind the scenes, there is a single dictionary lookup and a counter increment.

Some kinds arguments will take longer to hash, but you get the idea, @cache has very little overhead.
Read 5 tweets
6 Sep 20
1/ #Python data science tip: To obtain a better estimate (on average) for a vector of multiple parameters, it is better to analyze sample vectors in aggregate than to use the mean of each component.

Surprisingly, this works even if the components are unrelated to one another.
2/ One example comes from baseball.

Individual batting averages near the beginning of the season aren't as good of a performance predictor as individual batting averages that have been “shrunk” toward the collective mean.

Shockingly, this also works for unrelated variables.
3/ If this seems unintuitive, then you're not alone. That is why it is called Stein's paradox 😉

The reason it works is that errors in one estimate tend to cancel the errors in the other estimates.

statweb.stanford.edu/~ckirby/brad/o…
Read 5 tweets
31 Aug 20
Another building block for a #Python floating point ninja toolset:

def veltkamp_split(x):
'Exact split into two 26-bit precision components'
t = x * 134217729.0
hi = t - (t - x)
lo = x - hi
return hi, lo

csclub.uwaterloo.ca/~pbarfuss/dekk…
Input: one signed 53-bit precision float

Output: two signed 26-bit precision floats

Invariant: x == hi + lo

Constant: 134217729.0 == 2.0 ** 27 + 1.0
Example:

>>> hi, lo = veltkamp_split(pi)
>>> hi + lo == pi
True
>>> hi.hex()
'0x1.921fb58000000p+1'
>>> lo.hex()
'-0x1.dde9740000000p-26'

Note all the trailing zeros and the difference between the two exponents. Also both the lo and hi values are signed.
Read 6 tweets
9 Aug 20
#Python tip: #hypothesis is good at finding bugs; however, often as not, the bug is in your understanding of what the code is supposed to do.

1/
Initial belief: The JSON module is buggy because #hypothesis finds cases that don't round-trip.

Bugs in understanding:
* The JSON spec doesn't have NaNs
* A JSON module feature is that lists and tuples both serialize into arrays but can't be distinguished when deserialized.

2/
Initial belief:The colorsys module is buggy because #hypothesis finds conversions that don't round-trip

Bugs in understanding:
* color gamuts are limited
* colors in one gamut may not be representable in another
* implementations are constrained by underlying specifications

3/
Read 9 tweets
24 Jun 20
#Python tip: Given inexact data, subtracting nearly equal
values increases relative error significantly more
than absolute error.

4.6 ± 0.2 Age of Earth (4.3%)
4.2 ± 0.1 Age of Oceans (2.4%)
___
0.4 ± 0.3 Huge relative error (75%)

This is called “catastrophic cancellation”.

1/
The subtractive cancellation issue commonly arises in floating point arithmetic. Even if the inputs are exact, intermediate values may not be exactly representable and will have an error bar. Subsequent operations can amplify the error.

1/7th is inexact but within ± ½ ulp.

2/
A commonly given example arises when estimating derivatives with f′(x) ≈ Δf(x) / Δx.

Intuitively, the estimate improves as Δx approaches zero, but in practice, the large relative error from the deltas can overwhelm the result.

Make Δx small, but not too small. 😟

3/
Read 7 tweets
19 Jun 19
#python 3.8 Good news for anyone working on number theory problems.

The three-argument form of pow() just got more powerful. When the exponent is -1, it computes modular multiplicative inverses.

>>> pow(38, -1, 137)
119
>>> 119 * 38 % 137
1
RSA key generation example:

prime1 = 865035927998844907
prime2 = 13228623409150767103
totient = (prime1 - 1) * (prime2 - 1)
private = 9262355554452364883609426718195904769
public = pow(private, -1, totient)
assert public * private % totient == 1
For the curious, the private key can be any random integer smaller than and relatively prime to the totient:

while True:
private = randrange(totient)
if math.gcd(private, totient) == 1:
break
Read 6 tweets