python代码是先翻译成字节码,存储于pyc中,然后由python虚拟机中解释执行。了解字节码的原理有助于写出高效的代码。本篇记录内容大体来自PyCon会议视频。
举例如下代码:
def fib(n):
if n < 2:
return n
current, next = 0, 1
while n:
current, next = next, current + next
n -= 1
return current
加载该函数后,可以查看该函数转成字节码之后的相关信息:
In [6]: fib.__code__.co_argcount
Out[6]: 1
In [7]: fib.__code__.co_consts
Out[7]: (None, 2, 0, 1, (0, 1))
In [8]: fib.__code__.co_name
Out[8]: 'fib'
In [9]: fib.__code__.co_varnames
Out[9]: ('n', 'current', 'next')
In [10]: fib.__code__.co_filename
Out[10]: '/tmp/fib.py'
In [13]: fib.__code__.co_code
Out[13]: b'|\x00\x00d\x01\x00k\x00\x00r\x10\x00|...'
注意,其中fib.__code__.co_consts第一个为None,就是因为python函数在没有任何返回时会返回None,它随时准备着返回None,所以None成为必须用引用到的常量。后面的例子中,也会体现这一点。
我们把fib的字节码反编译得到如下内容:
In [3]: dis.dis(fib)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (2)
6 COMPARE_OP 0 (<)
9 POP_JUMP_IF_FALSE 16
3 12 LOAD_FAST 0 (n)
15 RETURN_VALUE
4 >> 16 LOAD_CONST 4 ((0, 1))
19 UNPACK_SEQUENCE 2
22 STORE_FAST 1 (current)
25 STORE_FAST 2 (next)
5 28 SETUP_LOOP 37 (to 68)
>> 31 LOAD_FAST 0 (n)
34 POP_JUMP_IF_FALSE 67
6 37 LOAD_FAST 2 (next)
40 LOAD_FAST 1 (current)
43 LOAD_FAST 2 (next)
46 BINARY_ADD
47 ROT_TWO
48 STORE_FAST 1 (current)
51 STORE_FAST 2 (next)
7 54 LOAD_FAST 0 (n)
57 LOAD_CONST 3 (1)
60 INPLACE_SUBTRACT
61 STORE_FAST 0 (n)
64 JUMP_ABSOLUTE 31
>> 67 POP_BLOCK
8 >> 68 LOAD_FAST 1 (current)
71 RETURN_VALUE
dis.opname是所有python的操作指令的名称,是一个数组,预留总长度256,但实际用到的并不多。根据上面的二进制co_code来看,就很容易翻译得到上面的汇编码:
In [6]: dis.opname[ord('|')]
Out[6]: 'LOAD_FAST'
In [7]: dis.opname[ord('d')]
Out[7]: 'LOAD_CONST'
In [8]: dis.opname[ord('k')]
Out[8]: 'COMPARE_OP'
dis.opmap是一个字典,便于从操作名称反查指令编号:
In [12]: dis.opmap['LOAD_FAST']
Out[12]: 124 ==> '|'
通常来讲,翻译之后的字节码肯定是越少越好,比如下面创建一个dict对象,通过翻译结果,我们很容易作出对比:
In [3]: dis.dis('a = {}')
1 0 BUILD_MAP 0
3 STORE_NAME 0 (a)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
对比于:
In [4]: dis.dis('a = dict()')
1 0 LOAD_NAME 0 (dict)
3 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
6 STORE_NAME 1 (a)
9 LOAD_CONST 0 (None)
12 RETURN_VALUE
再比如我们常说尽量少使用点号(.),否则对性能会有影响:
In [3]: dis.dis('fib.fib(10)')
1 0 LOAD_NAME 0 (fib)
3 LOAD_ATTR 0 (fib)
6 LOAD_CONST 0 (10)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 RETURN_VALUE
对比于:
In [5]: dis.dis('fib(10)')
1 0 LOAD_NAME 0 (fib)
3 LOAD_CONST 0 (10)
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 RETURN_VALUE
再举一个例子:
In [12]: def slow_week():
....: seconds_per_day = 86400
....: return seconds_per_day * 7
In [13]: dis.dis(slow_week)
2 0 LOAD_CONST 1 (86400)
3 STORE_FAST 0 (seconds_per_day)
3 6 LOAD_FAST 0 (seconds_per_day)
9 LOAD_CONST 2 (7)
12 BINARY_MULTIPLY
13 RETURN_VALUE
如果没有那个中间变量second_per_day呢,结果会大不一样:
In [14]: def fast_week():
....: return 86400 * 7
In [15]: dis.dis(fast_week)
2 0 LOAD_CONST 3 (604800)
3 RETURN_VALUE
从上面也可以看出,整个字节码的最终在虚拟机里执行其实是一系列栈上的操作,Python里有两个栈,一个叫执行栈(evaluation stack),一个叫块栈(block stack),块栈用于记录try/except或者with等等之类的代码块位置,便于做一些hook操作,比如异常时知道pop到什么位置,同时需要执行哪些finally操作,再比如with结束后,需要做哪些资源回收操作。
但不同的bytecode指令运行的时间也是不一样的,正如CPU里,汇编码除法DIV比加法ADD消耗更多的时钟周期。通常来讲,在Python里有这么一些耗时排序:
LOAD_CONST > LOAD_FAST > LOAD_NAME or LOAD_GLOBAL
而SETUP_LOOP、SETUP_WITH、SETUP_EXCEPTION之类的指令会同时操作两个栈,所以要expensive得多。而LOAD_ATTR和BINARY_SUBSCR等查询属性之类的操作涉及字典查询也相当expensive。
最后做一个更复杂例子的对比,同一个算法,三种写法。写法一:
In [26]: def squares1():
....: res = []
....: i = 0
....: while i <= 10:
....: res.append(i ** 2)
....: i += 1
....: return res
In [27]: dis.dis(squares1)
2 0 BUILD_LIST 0
3 STORE_FAST 0 (res)
3 6 LOAD_CONST 1 (0)
9 STORE_FAST 1 (i)
4 12 SETUP_LOOP 43 (to 58)
>> 15 LOAD_FAST 1 (i)
18 LOAD_CONST 2 (10)
21 COMPARE_OP 1 (<=)
24 POP_JUMP_IF_FALSE 57
5 27 LOAD_FAST 0 (res)
30 LOAD_ATTR 0 (append)
33 LOAD_FAST 1 (i)
36 LOAD_CONST 3 (2)
39 BINARY_POWER
40 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
43 POP_TOP
6 44 LOAD_FAST 1 (i)
47 LOAD_CONST 4 (1)
50 INPLACE_ADD
51 STORE_FAST 1 (i)
54 JUMP_ABSOLUTE 15
>> 57 POP_BLOCK
7 >> 58 LOAD_FAST 0 (res)
61 RETURN_VALUE
写法二:
In [28]: def squares2():
....: res = []
....: for i in range(0, 11):
....: res.append(i ** 2)
....: return res
In [29]: dis.dis(squares2)
2 0 BUILD_LIST 0
3 STORE_FAST 0 (res)
3 6 SETUP_LOOP 40 (to 49)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 1 (0)
15 LOAD_CONST 2 (11)
18 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
21 GET_ITER
>> 22 FOR_ITER 23 (to 48)
25 STORE_FAST 1 (i)
4 28 LOAD_FAST 0 (res)
31 LOAD_ATTR 1 (append)
34 LOAD_FAST 1 (i)
37 LOAD_CONST 3 (2)
40 BINARY_POWER
41 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
44 POP_TOP
45 JUMP_ABSOLUTE 22
>> 48 POP_BLOCK
5 >> 49 LOAD_FAST 0 (res)
52 RETURN_VALUE
写法三:
In [30]: def squares3():
....: return [i**2 for i in range(0, 11)]
In [31]: dis.dis(squares3)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x7fa3775d7780, file "<ipython-input-30-f61e67381afc>", line 2>)
3 LOAD_CONST 2 ('squares3.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 3 (0)
15 LOAD_CONST 4 (11)
18 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
21 GET_ITER
22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
25 RETURN_VALUE