Performance and Alternatives¶
intspan
piggybacks Python’s set
type. inspanlist
piggybacks
list
. So they both store every integer individually. Unlike Perl’s
Set::IntSpan
these types are not optimized for long contiguous runs.
For sets of
several hundred or even thousands of members, you’ll probably never
notice the difference.
But if you’re doing extensive processing of large sets (e.g. with 100K, 1M, or more elements), or doing numerous set operations on them (e.g. union or intersection), a data structure based on lists of ranges, run length encoding, or Judy arrays might perform and scale better. Horses for courses.
There are several modules you might want to consider as alternatives or
supplements. AFAIK, none of them provide the convenient integer span
specification intspan
does, but they have other virtues:
- cowboy provides generalized ranges and multi-ranges. Bonus points for the tagline: “It works on ranges”
- spans provides several different
kinds of ranges and then sets for those ranges. Includes nice
datetime
based intervals similar to PostgreSQL time intervals, andfloat
ranges/sets. More ambitious and general thanintspan
, but lacks truly convenient input or output methods akin tointspan
. - ranger is a generalized range and range set module. It supports open and closed ranges, and includes mapping objects that attach one or more objects to range sets.
- rangeset is a generalized range set module. It also supports infinite ranges.
- judy a Python wrapper around Judy arrays that are implemented in C. No docs or tests to speak of.
- RoaringBitmap, a hybrid array and bitmap structure designed for efficient compression and fast operations on sets of 32-bit integers.