python - Efficiently accumulating a collection of sparse scipy matrices -
i've got collection of o(n) nxn scipy.sparse.csr_matrix
, , each sparse matrix has on order of n elements set. want add these matrices regular nxn numpy array. (n on order of 1000). arrangement of non-zero elements within matrices such resulting sum isn't sparse (virtually no 0 elements left in fact).
at moment i'm doing
reduce(lambda x,y: x+y,[m.toarray() m in my_sparse_matrices])
which works bit slow: of course sheer amount of pointless processing of zeros going on there absolutely horrific.
is there better way ? there's nothing obvious me in docs.
update: per user545424's suggestion, tried alternative scheme of summing sparse matrices, , summing sparse matrices onto dense matrix. code below shows approaches run in comparable time (python 2.6.6 on amd64 debian/squeeze on quad-core i7)
import numpy np import numpy.random import scipy import scipy.sparse import time n=768 s=768 d=3 def mkrandomsparse(): m=np.zeros((s,s),dtype=np.float32) r=np.random.random_integers(0,s-1,d*s) c=np.random.random_integers(0,s-1,d*s) e in zip(r,c): m[e[0],e[1]]=1.0 return scipy.sparse.csr_matrix(m) m=[mkrandomsparse() in xrange(n)] def plus_dense(): return reduce(lambda x,y: x+y,[m.toarray() m in m]) def plus_sparse(): return reduce(lambda x,y: x+y,m).toarray() def sum_dense(): return sum([m.toarray() m in m]) def sum_sparse(): return sum(m[1:],m[0]).toarray() def sum_combo(): # sum sparse matrices 'onto' dense matrix? return sum(m,np.zeros((s,s),dtype=np.float32)) def benchmark(fn): t0=time.time() fn() t1=time.time() print "{0:16}: {1:.3f}s".format(fn.__name__,t1-t0) in xrange(4): benchmark(plus_dense) benchmark(plus_sparse) benchmark(sum_dense) benchmark(sum_sparse) benchmark(sum_combo) print
and logs out
plus_dense : 1.368s plus_sparse : 1.405s sum_dense : 1.368s sum_sparse : 1.406s sum_combo : 1.039s
although can 1 approach or other come out ahead factor of 2 or messing n,s,d parameters... nothing order of magnitude improvement you'd hope see considering number of 0 adds should possible skip.
i think i've found way speed factor of ~10 if matrices sparse.
in [1]: scipy.sparse import csr_matrix in [2]: def sum_sparse(m): ...: x = np.zeros(m[0].shape) ...: in m: ...: ri = np.repeat(np.arange(a.shape[0]),np.diff(a.indptr)) ...: x[ri,a.indices] += a.data ...: return x ...: in [6]: m = [np.zeros((100,100)) in range(1000)] in [7]: x in m: ...: x.ravel()[np.random.randint(0,x.size,10)] = 1.0 ...: m = [csr_matrix(x) x in m] in [17]: (sum(m[1:],m[0]).todense() == sum_sparse(m)).all() out[17]: true in [18]: %timeit sum(m[1:],m[0]).todense() 10 loops, best of 3: 145 ms per loop in [19]: %timeit sum_sparse(m) 100 loops, best of 3: 18.5 ms per loop
Comments
Post a Comment