Python系列之加速Python小tips

这篇文章学习了Python中加速计算的一些小tips,其中有很多方法只是略有提及,并没有做深入地学习,目前只是大致了解一些针对不同情形可以使用什么加速方法,后续使用的时候再有针对性地学习

分析代码运行时间

总结:

  • 测试代码单次运行时间time.time()之差,或者使用%%time,如果是单行,可以使用%time
  • 平均用时:timeit模块或者%%timeit
  • 函数分析运行时间:profile模块或者%prun
  • 分析代码运行时间:line_profiler或者%lprun

测试代码的运行时间

通用方法

1
2
3
4
5
6
7
8
9
import time
# 记录开始时间
tic=time.time()
# 运行陈程序
much_job=[x**2 for x in range(1,1000000,3)]
# 记录结束时间
toc=time.time()
# 小数点后保留5位小数
print ('used {:.5}s'.format(toc-tic))
used 0.13954s

jupyter中的方法

1
2
%%time
much_job=[x**2 for x in range(1,1000000,3)]
CPU times: user 135 ms, sys: 7.1 ms, total: 142 ms
Wall time: 140 ms

测试代码平均用时

通用方法

1
2
3
4
5
6
from timeit import timeit
g=lambda x:x**2+1
def main():
return (g(2)**120)

timeit('main()',globals={'main':main},number=10)
2.5913002900779247e-05
1
help(timeit)
Help on function timeit in module timeit:

timeit(stmt='pass', setup='pass', timer=<built-in function perf_counter>, number=1000000, globals=None)
    Convenience function to create Timer object and call timeit method.

jupyter中的方法

1
2
3
4
5
6
%%timeit -n 10
g=lambda x:x**2+1
def main():
return (g(2)**120)

main()
2.34 µs ± 211 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)

分析函数运行时间

通用方法

1
2
3
4
5
6
def relu(x):
return(x if x>0 else 0)

def main():
result=[relu(x) for x in range(-100000,100000,1)]
return (result)
1
2
import profile
profile.run('main()')
      200006 function calls in 0.672 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.672    0.672 :0(exec)
     1    0.000    0.000    0.000    0.000 :0(setprofile)
200000    0.322    0.000    0.322    0.000 <ipython-input-9-aae7666d2db5>:1(relu)
     1    0.000    0.000    0.671    0.671 <ipython-input-9-aae7666d2db5>:4(main)
     1    0.348    0.348    0.671    0.671 <ipython-input-9-aae7666d2db5>:5(<listcomp>)
     1    0.001    0.001    0.672    0.672 <string>:1(<module>)
     1    0.000    0.000    0.672    0.672 profile:0(main())
     0    0.000             0.000          profile:0(profiler)

jupyter中的方法

和前面的结果相同,但是是以弹框的形式展示的

1
%prun main()

按行分析代码运行时间

%lprun命令如果不能得到正确的输出,可以参考:Interactive Python: cannot get %lprun to work, although line_profiler is imported properly

通用方法

1
2
3
4
5
6
def relu(x):
return(x if x>0 else 0)

def main():
result=[relu(x) for x in range(-100000,100000,1)]
return (result)
1
2
3
4
from line_profiler import LineProfiler
lprofile = LineProfiler(main,relu)
lprofile.run('main()')
lprofile.print_stats()
Timer unit: 1e-06 s

Total time: 0.077296 s
File: <ipython-input-72-aae7666d2db5>
Function: relu at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def relu(x):
     2    200000      77296.0      0.4    100.0      return(x if x>0 else 0)

Total time: 0.259755 s
File: <ipython-input-72-aae7666d2db5>
Function: main at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def main():
     5         1     259754.0 259754.0    100.0      result=[relu(x) for x in range(-100000,100000,1)]
     6         1          1.0      1.0      0.0      return (result)

jupyter中的方法

如果报错:UsageError: Line magic function %lprun not found.
解决方法

  • 暂时的解决方法:%load_ext line_profiler
  • 永久的解决方法:在~/.ipython/profile_default/ipython_config.py(如果没有这个文件,可以使用ipython profile create创建)文件中添加
    1
    2
    3
    c.TerminalIPythonApp.extensions = [
    'line_profiler',
    ]
1
%load_ext line_profiler
1
%lprun -f main -f relu main()

加速查找

总结:

  • 单个列表中查找某个元素:使用set比使用list
  • 两个列表联合查找:使用dict比使用两个list要快

使用set而不是list进行查找

1
data=(i**2 + 1 for i in range(1000000) )

低速方法-list

1
list_data=list(data)
1
2
%%time
1098987 in list_data
CPU times: user 25.4 ms, sys: 0 ns, total: 25.4 ms
Wall time: 25.1 ms

False

高速方法-set

1
set_data=set(data)
1
2
%%time
1098987 in set_data
CPU times: user 7 µs, sys: 0 ns, total: 7 µs
Wall time: 12.9 µs

False

使用dict而非两个list进行匹配查找

1
2
list_a=[2*i-1 for i in range(1000000)]
list_b=[i**2 for i in list_a]

低速方法-两个list

1
2
%%time
print (list_b[list_a.index(876567)])
768369705489
CPU times: user 12 ms, sys: 0 ns, total: 12 ms
Wall time: 11.9 ms

高速方法-dict

1
2
3
a=[1,2]
b=[3,4]
dict(zip(a,b))
{1: 3, 2: 4}
1
dict_ab=dict(zip(list_a,list_b))
1
2
%%time
print (dict_ab.get(876567))
768369705489
CPU times: user 179 µs, sys: 12 µs, total: 191 µs
Wall time: 109 µs

加速循环

总结:

  • 使用for循环比使用while循环更加快
  • 避免在循环中重复计算

优先使用for循环而不是while循环

低速方法-while

1
2
3
4
5
6
%%time
s,i=0,0
while i < 10000:
i += 1
s += i
print (s)
50005000
CPU times: user 4.14 ms, sys: 7 µs, total: 4.15 ms
Wall time: 4 ms

高速方法-for

1
2
3
4
5
%%time
s=0
for i in range(1,10001):
s += i
print (s)
50005000
CPU times: user 3.05 ms, sys: 2 µs, total: 3.06 ms
Wall time: 2.91 ms

在循环体中避免重复计算

低速方法

1
a=[i**2+1 for i in range(2000)]
1
2
3
%%time
# sum重复计算
b=[i/sum(a) for i in a]
CPU times: user 54.7 ms, sys: 76 µs, total: 54.8 ms
Wall time: 53.6 ms

高速方法

1
2
3
%%time 
sum_a=sum(a)
b=[i/sum_a for i in a]
CPU times: user 360 µs, sys: 0 ns, total: 360 µs
Wall time: 367 µs

加速函数

总结:

  • 使用循环代替递归,递归速度更慢一些
  • 使用lru_cache缓存机制加速递归
  • 使用numba加速函数

使用循环代替递归

低速方法-递归

1
2
3
4
%%time
def fib(n):
return (1 if n in (1,2) else fib(n-1) + fib(n-2))
print (fib(30))
832040
CPU times: user 273 ms, sys: 2.78 ms, total: 275 ms
Wall time: 274 ms

高速方法-循环

1
2
3
4
5
6
7
8
9
%%time
def fib(n):
if n in (1,2):
return (1)
a,b=1,1
for i in range(2,n):
a,b=b,a+b
return (b)
print(fib(30))
832040
CPU times: user 195 µs, sys: 13 µs, total: 208 µs
Wall time: 149 µs

缓存机制加速递归函数

低速方法-递归

1
2
3
4
%%time
def fib(n):
return (1 if n in (1,2) else fib(n-1) + fib(n-2))
print (fib(30))
832040
CPU times: user 275 ms, sys: 2.79 ms, total: 277 ms
Wall time: 275 ms

高速方法-缓存

lru_cache参考链接

1
2
3
4
5
6
7
8
9
%%time
from functools import lru_cache

# 最多缓存100个函数运行的结果
# 如果为None,则无限制,设置为2n时,性能最佳
@lru_cache(100)
def fib(n):
return (1 if n in (1,2) else fib(n-1)+fib(n-2))
print (fib(30))
832040
CPU times: user 293 µs, sys: 19 µs, total: 312 µs
Wall time: 227 µs

使用numba加速Python函数

低速方法

1
2
3
4
5
6
7
8
9
10
11
%%time
def my_power(x):
return (x**2)

def my_power_sum(n):
s=0
for i in range(1,n+1):
s=s+my_power(i)
return (s)

print(my_power_sum(1000000))
333333833333500000
CPU times: user 456 ms, sys: 1.81 ms, total: 458 ms
Wall time: 456 ms

高速方法-numba

参考链接:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
%%time
from numba import jit
@jit
def my_power(x):
return(x**2)

@jit
def my_power_sum(n):
s=0
for i in range(1,n+1):
s=s+my_power(i)
return (s)

print(my_power_sum(1000000))
333333833333500000
CPU times: user 100 ms, sys: 1.11 ms, total: 101 ms
Wall time: 100 ms

使用标准库函数进行加速

总结:

  • 使用collections.Counter加速计数
  • 使用collections.ChainMap加速字典合并

使用collections.Counter加速计数

低速方法

1
data=[x**2%1989 for x in range(2000000)]
1
2
3
4
5
6
%%time
values_count={}
for i in data:
i_cnt=values_count.get(i,0)
values_count[i]=i_cnt + 1
print(values_count.get(4,0))
8044
CPU times: user 682 ms, sys: 440 µs, total: 682 ms
Wall time: 680 ms

函数的使用:

1
dict.get(key, default=None)

参数:

  • key – 字典中要查找的键。
  • default – 如果指定键的值不存在时,返回该默认值值

高速方法-collections

1
2
3
4
%%time
from collections import Counter
values_count=Counter(data)
print(values_count.get(4,0))
8044
CPU times: user 234 ms, sys: 0 ns, total: 234 ms
Wall time: 233 ms

使用collections.ChainMap加速字典合并

低速方法

1
2
3
4
dict_a={i:i+1 for i in range(1,1000000,2)}
dict_b={i:2*i+1 for i in range(1,1000000,3)}
dict_c={i:3*i+1 for i in range(1,1000000,5)}
dict_d={i:4*i+1 for i in range(1,1000000,7)}
1
2
3
4
5
6
%%time
result=dict_a.copy()
result.update(dict_b)
result.update(dict_c)
result.update(dict_d)
print(result.get(9999,0))
10000
CPU times: user 79.9 ms, sys: 33.6 ms, total: 113 ms
Wall time: 112 ms

高速方法

1
2
3
4
%%time
from collections import ChainMap
chain=ChainMap(dict_a,dict_b,dict_c,dict_d)
print(result.get(9999,0))
10000
CPU times: user 186 µs, sys: 18 µs, total: 204 µs
Wall time: 150 µs

使用numpy向量化代替list

总结:

  • 使用array代替list进行运算
  • 使用np.ufunc代替math.func
  • 使用np.where代替if

使用array代替list

低速方法-list

1
2
3
4
%%time
a=range(1,1000000,3)
b=range(1000000,1,-3)
c=[3*a[i] - 2*b[i] for i in range(0,len(a))]
CPU times: user 176 ms, sys: 7.75 ms, total: 184 ms
Wall time: 181 ms

高速方法

1
2
3
4
5
%%time
import numpy as np
array_a=np.arange(1,1000000,3)
array_b=np.arange(1000000,1,-3)
array_c=3*array_a - 2*array_b
CPU times: user 4.18 ms, sys: 935 µs, total: 5.12 ms
Wall time: 3.65 ms

使用np.ufunc代替math.func

低速方法-math.func

1
2
3
4
%%time
import math
a=range(1,1000000,3)
b=[math.log(x) for x in a]
CPU times: user 117 ms, sys: 1.88 ms, total: 119 ms
Wall time: 116 ms

高速方法-np.ufunc

1
2
3
4
%%time
import numpy as np
array_a=np.arange(1,1000000,3)
array_b=np.log(array_a)
CPU times: user 20.8 ms, sys: 1.98 ms, total: 22.8 ms
Wall time: 21.3 ms

使用np.where代替if

1
2
import numpy as np
array_a=np.arange(-100000,1000000)

低速方法

np.vectorize可以将普通函数转换为支持向量化的函数

1
2
3
%%time
relu=np.vectorize(lambda x:x if x>0 else 0)
array_b=relu(array_a)
CPU times: user 241 ms, sys: 29.1 ms, total: 270 ms
Wall time: 325 ms

高速方法

1
2
3
%%time
relu=lambda x:np.where(x>0,x,0)
array_b=relu(array_a)
CPU times: user 4.86 ms, sys: 1e+03 µs, total: 5.86 ms
Wall time: 4.36 ms

加速pandas

总结:

  • 运算时:使用np.ufunc函数代替applymap
  • 初始化dataframe时:使用预分配存储代替动态扩容
  • 读写文件时:使用csv文件代替excel文件读写
  • 使用pandas多进程工具pandarallel

使用np.ufunc函数代替applymap

低速方法-使用applymap

1
2
3
4
5
import numpy as np
import pandas as pd
df=pd.DataFrame(np.random.randint(-10,11,size=(100000,26)),
columns=list('abcdefghijklmnopqrstuvwxyz'))
df.head()


abcdefghij...qrstuvwxyz
0-110-6718304-8...8-7-58-8-6846-4
17-43-90-7-103-66...54-5102-7-1-10-70
2-710354460-8-5...1-1024-50620
3-810-448-9-10-102-6...10891037-1205
44339-5-1413-6...3-1-269-102-16

5 rows × 26 columns


1
%time dfresult=df.applymap(lambda x:np.sin(x)+np.cos(x))
CPU times: user 9.28 s, sys: 72 ms, total: 9.36 s
Wall time: 9.35 s

高速方法-np.ufunc

1
2
%%time
dfresult=np.sin(df) + np.cos(df)
CPU times: user 256 ms, sys: 41.3 ms, total: 298 ms
Wall time: 325 ms

使用预分配存储代替动态扩容

低速方法-动态扩容

1
2
3
4
%%time
df=pd.DataFrame(columns=list('abcdefghijklmnopqrstuvwxyz'))
for i in range(10000):
df.loc[i,:]=range(i,i+26)
CPU times: user 12.5 s, sys: 0 ns, total: 12.5 s
Wall time: 12.5 s

高速方法-预分配存储

1
2
3
4
5
%%time
df=pd.DataFrame(np.zeros((10000,26)),
columns=list('abcdefghijklmnopqrstuvwxyz'))
for i in range(10000):
df.loc[i,:]=range(i,i+26)
CPU times: user 3.06 s, sys: 16.4 ms, total: 3.07 s
Wall time: 3 s

使用csv文件代替excel文件读写

低速方法-写入excel文件

1
2
%%time
df.to_excel('data.xlsx')
CPU times: user 4.64 s, sys: 37.4 ms, total: 4.68 s
Wall time: 4.86 s

高速方法-写入csv文件

1
2
%%time
df.to_csv('data.csv')
CPU times: user 300 ms, sys: 1.9 ms, total: 302 ms
Wall time: 300 ms

使用pandas多进程工具pandarallel

低速方法

1
2
3
4
5
import numpy as np
import pandas as pd
df=pd.DataFrame(np.random.randint(-10,11,size=(100000,26)),
columns=list('abcdefghijklmnopqrstuvwxyz'))
df.head()


abcdefghij...qrstuvwxyz
0-261010-16-97-31...-10-858-47-3994
141-6-1-2-5-1713...-101073031-280
201-2-3242-783...410-12-7-6104-4-9
3-45-9-8-2-7-13-2-4...10-9-3-8-4-3-5-3-1-2
483-7239-1000-2...43935-2-9-5-10-4

5 rows × 26 columns


1
2
%%time
result=df.apply(np.sum,axis=1)
CPU times: user 11.1 s, sys: 0 ns, total: 11.1 s
Wall time: 11.1 s

高速方法-pandarallel

参考链接:

支持的相关方法:

Without parallelisationWith parallelisation
df.apply(func)df.parallel_apply(func)
df.applymap(func)df.parallel_applymap(func)
df.groupby(args).apply(func)df.groupby(args).parallel_apply(func)
df.groupby(args1).col_name.rolling(args2).apply(func)df.groupby(args1).col_name.rolling(args2).parallel_apply(func)
series.map(func)series.parallel_map(func)
series.apply(func)series.parallel_apply(func)
series.rolling(args).apply(func)series.rolling(args).parallel_apply(func)
1
2
3
4
%%time
from pandarallel import pandarallel
pandarallel.initialize(nb_workers=4)
result=df.parallel_apply(np.sum,axis=1)
New pandarallel memory created - Size: 2000 MB
Pandarallel will run on 4 workers
CPU times: user 38.8 ms, sys: 58.8 ms, total: 97.6 ms
Wall time: 3.3 s

使用dask进行加速

dask简介

Dask provides ways to scale Pandas, Scikit-Learn, and Numpy workflows with minimal rewriting. It integrates well with these tools so that it copies most of their API and uses their data structures internally. Moreover, Dask is co-developed with these libraries to ensure that they evolve consistently, minimizing friction caused from transitioning from workloads on a local laptop, to a multi-core workstation, and to a distributed cluster. Analysts familiar with Pandas/Scikit-Learn/Numpy will be immediately familiar with their Dask equivalents, and have much of their intuition carry over to a scalable context.


适用情形
Dask use cases can be roughly divided in the following two categories:

  • Large NumPy/Pandas/Lists with dask.array, dask.dataframe, dask.bag to analyze large datasets with familiar techniques. This is similar to Databases, Spark, or big array libraries
  • Custom task scheduling. You submit a graph of functions that depend on each other for custom workloads. This is similar to Luigi, Airflow, Celery, or Makefiles

学习教程:

使用dask加速dataframe

低速方法

1
2
3
4
5
6
import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.randint(0,6,size=(100000000,5)),
columns=list('abcde'))
%time df.groupby('a').mean()
CPU times: user 4.13 s, sys: 3.72 s, total: 7.85 s
Wall time: 8.03 s


bcde
a
02.5003252.4994882.5006342.500346
12.4997252.4993422.4997072.500463
22.4995652.4995772.5001712.499852
32.4997902.4998612.4992052.500443
42.5008742.4998772.4994862.499790
52.4999372.4994932.5001262.500856

高速方法-dask

1
2
3
import dask.dataframe as dd
df_dask=dd.from_pandas(df,npartitions=40)
%time df_dask.groupby('a').mean().compute()
CPU times: user 12 s, sys: 6.58 s, total: 18.6 s
Wall time: 8.44 s


bcde
a
02.5003252.4994882.5006342.500346
12.4997252.4993422.4997072.500463
22.4995652.4995772.5001712.499852
32.4997902.4998612.4992052.500443
42.5008742.4998772.4994862.499790
52.4999372.4994932.5001262.500856

使用dask.delayed进行加速

低速方法

1
2
3
4
import time
def muchjob(x):
time.sleep(5)
return (x**2)
1
2
3
%%time
result = [muchjob(i) for i in range(5)]
result
CPU times: user 22 ms, sys: 15.4 ms, total: 37.4 ms
Wall time: 25 s

[0, 1, 4, 9, 16]

高速方法

1
2
3
4
5
%%time
from dask import delayed,compute
from dask import threaded,multiprocessing
values=[delayed(muchjob)(i) for i in range(5)]
result=compute(*values,schediler='multiprocessing')
CPU times: user 10.2 ms, sys: 2.93 ms, total: 13.1 ms
Wall time: 5.01 s

应用多线程多进程加速

总结:

  • 针对IO密集型任务:应用多线程加速
  • 针对CPU密集型任务:应用多进程加速

应用多线程加速IO密集型任务

低速方法-串行

1
2
3
4
5
6
7
8
9
%%time
def writefile(i):
with open(str(i)+ '.txt','w') as f:
s=('hello %d' %i)*10000000
f.write(s)

# 串行任务
for i in range(10):
writefile(i)
CPU times: user 441 ms, sys: 971 ms, total: 1.41 s
Wall time: 1.4 s

高速方法-多线程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
%%time
import threading

def writefile(i):
with open(str(i)+ '.txt','w') as f:
s=('hello %d' %i)*10000000
f.write(s)

# 多线程任务
thread_list=[]
for i in range(10):
t=threading.Thread(target=writefile,args=(i,))
# 设置守护线程
t.setDaemon(True)
thread_list.append(t)

for t in thread_list:
# 启动线程
t.start()

for t in thread_list:
# 等待子线程结束
t.join()
CPU times: user 527 ms, sys: 1.51 s, total: 2.04 s
Wall time: 3.97 s

应用多进程加速CPU密集型任务

低速方法

1
2
3
4
5
6
7
8
9
10
%%time
import time

def muchjob(x):
time.sleep(5)
return (x**2)

# 串行任务
ans=[muchjob(i) for i in range(8)]
print(ans)
[0, 1, 4, 9, 16, 25, 36, 49]
CPU times: user 38.3 ms, sys: 20.4 ms, total: 58.8 ms
Wall time: 40 s

高速方法-多进程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
%%time
import time
import multiprocessing
data=range(8)

def muchjob(x):
time.sleep(5)
return (x**2)

# 多进程任务
pool=multiprocessing.Pool(processes=4)
result=[]
for i in range(8):
result.append(pool.apply_async(muchjob,(i,)))
pool.close()
pool.join()
ans=[res.get() for res in result]
print(ans)
[0, 1, 4, 9, 16, 25, 36, 49]
CPU times: user 18.5 ms, sys: 523 ms, total: 541 ms
Wall time: 10.8 s

参考链接


-----本文结束感谢您的阅读-----

本文标题:Python系列之加速Python小tips

文章作者:showteeth

发布时间:2019年07月27日 - 17:43

最后更新:2019年07月27日 - 18:02

原始链接:http://showteeth.tech/posts/47268.html

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

0%