Cost functional calculation for global optimization in CUDA -
i trying optimize function (say find minimum) n
parameters (xn
). xi
's bound range (for example -200
200
) , if parameter leaves range, function goes infinity fast. however, n
can large (from 20
60-70
) , computing it's value takes long time.
i don't think details about function of big relevance, here some: consists of weighted sum of 20-30
smaller functions (all different), on part consist of sums of dot products under sign of inverse sinusoidal function (arcsin
, arccos
, arctan
, etc). arcsin(x1 . x2) + arcsin(x4 . x7) + ...
.
the function has many local minima in general, approaches such (naive) conjugated gradients or quasi-newton useless. searching entire domain brute force slow.
my initial idea use sort of massive parallelization in combination genetic algorithm, performs many searches on different spots in domain of function, , @ regular intervals checks whether of searches reached local minima. if yes, compares them , discards results smallest one, , continues search until reasonably small value found.
my 2 questions are:
1) possible implement problem in cuda or similar technology? can cuda compute value of function fast enough?
2) better/faster implement problem on multicore pc (with 12+ cores)?
it possible implement global optimization algorithms on gpus, may have modify algorithms best performance. have implemented half dozen population-based metaheuristics on gpus, , possible excellent speedups relative multi-threaded cpu.
with population-based algorithms iterated on generations, may find population size becomes limiting factor in performance. if objective function not parallelizable, maximum number of parallel threads number of candidate solutions in population. gpus work best with, @ minimum, tens of thousands of simultaneous threads, not practical population-based algorithms due population inertia , other effects.
you can around population limitations extent running multiple instances of optimization in parallel, each starting different random seed. further, can arrange parallel instances communicate on network topology (this island model algorithm), parallel instances can collaboratively evolve solutions complex problems. have implemented in opencl msce thesis.
so overall, here succinct answers questions:
1) yes, can implement in cuda or similar technology. speedup depend on how parallelizable objective function , how many parallel instances want run @ same time. 2) faster implement on cpu, due wider range of existing libraries , fact cpu programming models conceptually simpler gpu models. whether or not "better" depends on value.
this still area of active research, , take longer build working gpu implementation take multicore cpu implementation. if implementation time primary concern, recommend take @ pagmo project (and python bindings pygmo), excellent implementation of island model optimizer wide range of local , global optimization functions included. islands can assigned arbitrary algorithm run independently, , can specify how communicate collaboratively optimize objective function.
http://sourceforge.net/apps/mediawiki/pagmo/index.php?title=main_page
your question within research area, , happy further if need it.
Comments
Post a Comment