c++ - Why would array<T, N> ever be slower than vector<T>? -
today decided benchmark , compare differences in gcc optimizability of std::vector
, std::array
. generally, found expected: performing task on each of collection of short arrays faster performing tasks on collection equivalent vectors.
however, found something unexpected: using std::vector
store collection of arrays faster using std::array
. in case result of artifact of large amount of data on stack, tried allocating array on heap , in c-style array on heap (but results still resemble array of arrays on stack , vector of arrays).
any idea why std::vector
ever outperform std::array
(on compiler has more compile-time information)?
i compiled using gcc-4.7 -std=c++11 -o3
(gcc-4.6 -std=c++0x -o3
should result in conundrum). runtimes computed using bash
-native time
command (user time).
code:
#include <array> #include <vector> #include <iostream> #include <assert.h> #include <algorithm> template <typename vec> double fast_sq_dist(const vec & lhs, const vec & rhs) { assert(lhs.size() == rhs.size()); double result = 0.0; (int k=0; k<lhs.size(); ++k) { double tmp = lhs[k] - rhs[k]; result += tmp * tmp; } return result; } int main() { const std::size_t k = 20000; const std::size_t n = 4; // declare data structure collection // (uncomment 1 of these time it) // array of arrays // runtime: 1.32s std::array<std::array<double, n>, k > mat; // array of arrays (allocated on heap) // runtime: 1.33s // std::array<std::array<double, n>, k > & mat = *new std::array<std::array<double, n>, k >; // c-style heap array of arrays // runtime: 0.93s // std::array<double, n> * mat = new std::array<double, n>[k]; // vector of arrays // runtime: 0.93 // std::vector<std::array<double, n> > mat(k); // vector of vectors // runtime: 2.16s // std::vector<std::vector<double> > mat(k, std::vector<double>(n)); // fill collection arbitrary values (std::size_t k=0; k<k; ++k) { (std::size_t j=0; j<n; ++j) mat[k][j] = k*n+j; } std::cerr << "constructed" << std::endl; // compute sum of pairwise distances in collection double tot = 0.0; (std::size_t j=0; j<k; ++j) { (std::size_t k=0; k<k; ++k) tot += fast_sq_dist(mat[j], mat[k]); } std::cout << tot << std::endl; return 0; }
nb 1: versions print same result.
nb 2: , demonstrate runtime differences between std::array<std::array<double, n>, k>
, std::vector<std::array<double, n> >
, , std::vector<std::vector<double> >
wasn't assignment/initialization when allocating, runtimes of allocating collection (i.e. commenting out computation , printing of tot
) 0.000s, 0.000s, , 0.004s, respectively.
nb 3: each method compiled , run separately (not timed back-to-back within same executable), prevent unfair differences in caching.
nb 4:
assembly array of arrays: http://ideone.com/sm8db
assembly vector of arrays: http://ideone.com/vhpjv
assembly vector of vectors: http://ideone.com/rztne
nb 5: absolutely clear, in no way intending criticize stl. absolutely love stl and, not use frequently, details of effective use have taught me lot of subtle , great features of c++. instead, intellectual pursuit: timing things learn principles of efficient c++ design.
furthermore, unsound blame stl, because difficult deconvolve etiology of runtime differential: optimizations turned on, can compiler optimizations slow code rather quicken it. optimizations turned off, can unnecessary copy operations (that optimized out , never executed in production code), can biased against data types more others.
if curious me, i'd love figuring out.
consider second , third tests. conceptually, identical: allocate k * n * sizeof(double)
bytes off heap , access them in same way. why different times?
all of "faster" tests have 1 thing in common: new[]
. of slower tests allocated new
or on stack. vector
uses new[]
under hood™. obvious cause new[]
, new
have more different implementations expected.
what i'm going suggest new[]
fall mmap
, allocate directly on page boundary, giving alignment speedup, whereas other 2 methods not allocate on page boundary.
consider using os allocation function directly map committed pages, , place std::array<std::array<double, n>, k>
it.
Comments
Post a Comment