python - Efficiently accumulating a collection of sparse scipy matrices -


i've got collection of o(n) nxn scipy.sparse.csr_matrix, , each sparse matrix has on order of n elements set. want add these matrices regular nxn numpy array. (n on order of 1000). arrangement of non-zero elements within matrices such resulting sum isn't sparse (virtually no 0 elements left in fact).

at moment i'm doing

reduce(lambda x,y: x+y,[m.toarray() m in my_sparse_matrices]) 

which works bit slow: of course sheer amount of pointless processing of zeros going on there absolutely horrific.

is there better way ? there's nothing obvious me in docs.

update: per user545424's suggestion, tried alternative scheme of summing sparse matrices, , summing sparse matrices onto dense matrix. code below shows approaches run in comparable time (python 2.6.6 on amd64 debian/squeeze on quad-core i7)

import numpy np import numpy.random import scipy import scipy.sparse import time  n=768 s=768 d=3  def mkrandomsparse():     m=np.zeros((s,s),dtype=np.float32)     r=np.random.random_integers(0,s-1,d*s)     c=np.random.random_integers(0,s-1,d*s)     e in zip(r,c):         m[e[0],e[1]]=1.0     return scipy.sparse.csr_matrix(m)  m=[mkrandomsparse() in xrange(n)]  def plus_dense():     return reduce(lambda x,y: x+y,[m.toarray() m in m])  def plus_sparse():     return reduce(lambda x,y: x+y,m).toarray()  def sum_dense():     return sum([m.toarray() m in m])  def sum_sparse():     return sum(m[1:],m[0]).toarray()  def sum_combo():  # sum sparse matrices 'onto' dense matrix?     return sum(m,np.zeros((s,s),dtype=np.float32))  def benchmark(fn):     t0=time.time()     fn()     t1=time.time()     print "{0:16}:  {1:.3f}s".format(fn.__name__,t1-t0)  in xrange(4):     benchmark(plus_dense)     benchmark(plus_sparse)     benchmark(sum_dense)     benchmark(sum_sparse)     benchmark(sum_combo)     print 

and logs out

plus_dense      :  1.368s plus_sparse     :  1.405s sum_dense       :  1.368s sum_sparse      :  1.406s sum_combo       :  1.039s 

although can 1 approach or other come out ahead factor of 2 or messing n,s,d parameters... nothing order of magnitude improvement you'd hope see considering number of 0 adds should possible skip.

i think i've found way speed factor of ~10 if matrices sparse.

in [1]: scipy.sparse import csr_matrix  in [2]: def sum_sparse(m):    ...:     x = np.zeros(m[0].shape)    ...:     in m:    ...:         ri = np.repeat(np.arange(a.shape[0]),np.diff(a.indptr))    ...:         x[ri,a.indices] += a.data    ...:     return x    ...:   in [6]: m = [np.zeros((100,100)) in range(1000)]  in [7]: x in m:    ...:     x.ravel()[np.random.randint(0,x.size,10)] = 1.0    ...:               m = [csr_matrix(x) x in m]  in [17]: (sum(m[1:],m[0]).todense() == sum_sparse(m)).all() out[17]: true  in [18]: %timeit sum(m[1:],m[0]).todense() 10 loops, best of 3: 145 ms per loop  in [19]: %timeit sum_sparse(m) 100 loops, best of 3: 18.5 ms per loop 

Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -