B. Performance expectations

Psyco can compile code that uses arbitrary object types and extension modules. Operations that it does not know about will be compiled into direct calls to the C code that implements them. However, some specific operations can be optimized, and sometimes massively so -- this is the core idea around which Psyco is built, and the reason for the sometimes impressive results.

The other reason for the performance improvement is that the machine code does not have to decode the pseudo-code (``bytecode'') over and over again while interpreting it. Removing this overhead is what compilers classically do. They also simplify the frame objects, making function calls more efficients. So does Psyco. But doing only this would be ineffective with Python, because each bytecode instruction still has a lot of run-time decoding to do (typically, looking up the type of the arguments in tables, invoking the corresponding operation and building a resulting Python object).

The type-based look-ups and the successive construction and destruction of objects for all intermediate values is what Psyco can most successfully cancel, but it needs to be taught about a type and its operations before it can do so.

We list below the specifically optimized types and operations. Possible performance gains are just wild guesses; specialization is known to give often-good-but-hard-to-predict gains. Remember, all operations not listed below work well -- they just cannot be much accelerated.

Virtual-time objects are objects that, when used as intermediate values, are simply not be built at run-time at all. The noted performance gains only apply if the object can actually remain virtualized. Any unsupported operation will force the involved objects to be normally built.

Type Operations Notes

Any built-in type reading members and methods (1) (py2.2)
Built-in function and method call (1) (py2.2)
Integer truth testing, unary + - ~ abs(), binary + - * | & << >> ^, comparison (2)
Dictionary len() (4)
Float truth testing, unary + - abs(), binary + - * /, comparison (5)
Function call (6)
Sequence iterators iter() and next() (7)
List len(), item get and set, concatenation (8)
Long all arithmetic operations (9)
Instance method call (1) (py2.2)
String len(), item get, slicing, concatenation (10)
Tuple len(), item get, concatenation (11)
Type call (py2.2)
array.array item get, item set (15)

Type	Operations	Notes
Any built-in type	reading members and methods	(1) (py2.2)
Built-in function and method	call	(1) (py2.2)
Integer	truth testing, unary `+` `-` `~` `abs()`, binary `+` `-` `*` `\|` `&` `<<` `>>` `^`, comparison	(2)
Dictionary	`len()`	(4)
Float	truth testing, unary `+` `-` `abs()`, binary `+` `-` `*` `/`, comparison	(5)
Function	call	(6)
Sequence iterators	`iter()` and `next()`	(7)
List	`len()`, item get and set, concatenation	(8)
Long	all arithmetic operations	(9)
Instance method	call	(1) (py2.2)
String	`len()`, item get, slicing, concatenation	(10)
Tuple	`len()`, item get, concatenation	(11)
Type	call	(py2.2)
array.array	item get, item set	(15)

Built-in function Notes

range (8)
xrange (13)
chr, ord (10)
id
type (py2.2)
len, abs, divmod
apply (14)
the whole math module (16)

Built-in function	Notes
`range`	(8)
`xrange`	(13)
`chr`, `ord`	(10)
`id`
`type`	(py2.2)
`len`, `abs`, `divmod`
`apply`	(14)
the whole `math` module	(16)

Notes:

(py2.2): Python 2.2 only.
(1): In the common "object.method(args)" the intermediate bound method object is never built; it is translated into a direct call to the function that implements the method. For C methods, the underlying PyMethodDef structure is decoded at compile-time. Algorithms doing repetitive calls to methods of e.g. lists or strings can see huge benefits.
(2): Virtual-time integers can be 100 times faster than their regular counterpart.
(4): Complex data structures are not optimized yet, beyond (1). In a future version it is planned to allow these structures to be re-implemented differently by Psyco, with an implementation that depends on actual run-time usage.
(5): Psyco does not know about the Intel FPU instruction set. It emits calls to C functions that just add or multiply two doubles together. Virtual-time floats are still about 10 times faster than Python.
(6): Virtual-time functions occur when defining a function inside another function, with some default arguments.
(7): Sequence iterators are virtual-time, making for loops over sequences as efficient as what you would write in C.
(8): Short lists and ranges of step 1 are virtualized. A for looping over a range is as efficient as the common C for loop. For the other cases of lists see (4).
(9): Minimal support only. Objects of this type are never virtualized. The majority of the CPU time is probably spent doing the actual operation anyway, not in the Python glue.
(10): Virtual-time strings come in many flavors: single characters implemented as a single byte; slices implemented as a pointer to a portion of the full string; concatenated strings implemented as a (possibly virtual) list of the strings this string is the join of. Text-manipulation algorithms should see massive speed-ups.
(11): Programs manipulating small tuples in local variables can see them completely virtualized away. In general however, the gains with tuples are mostly derived from the various places where Python (and Psyco that mimics it) internally manipulates tuples.
(13): Psyco can optimize range well enough to make xrange useless. Indeed, with no specific support xrange would be less efficient than range! Currently xrange is almost identical to range.
(14): Without keyword argument dictionary.
(15): Type codes 'I' and 'L' are not supported. Type code 'f' does not support item assignment. The speed of a complex algorithm using an array as buffer (like manipulating an image pixel-by-pixel) should be very high; closer to C than plain Python.
(16): Missing: frexp, ldexp, log, log10, modf. See note (5).