c++ - Why would array<T, N> ever be slower than vector<T>? -


today decided benchmark , compare differences in gcc optimizability of std::vector , std::array. generally, found expected: performing task on each of collection of short arrays faster performing tasks on collection equivalent vectors.

however, found something unexpected: using std::vector store collection of arrays faster using std::array. in case result of artifact of large amount of data on stack, tried allocating array on heap , in c-style array on heap (but results still resemble array of arrays on stack , vector of arrays).

any idea why std::vector ever outperform std::array (on compiler has more compile-time information)?

i compiled using gcc-4.7 -std=c++11 -o3 (gcc-4.6 -std=c++0x -o3 should result in conundrum). runtimes computed using bash-native time command (user time).

code:

#include <array> #include <vector> #include <iostream> #include <assert.h> #include <algorithm>  template <typename vec> double fast_sq_dist(const vec & lhs, const vec & rhs) {   assert(lhs.size() == rhs.size());   double result = 0.0;   (int k=0; k<lhs.size(); ++k) {     double tmp = lhs[k] - rhs[k];     result += tmp * tmp;   }   return result; }  int main() {   const std::size_t k = 20000;   const std::size_t n = 4;    // declare data structure collection   // (uncomment 1 of these time it)    // array of arrays   // runtime: 1.32s   std::array<std::array<double, n>, k > mat;    // array of arrays (allocated on heap)   // runtime: 1.33s   //  std::array<std::array<double, n>, k > & mat = *new std::array<std::array<double, n>, k >;    // c-style heap array of arrays   // runtime: 0.93s   //  std::array<double, n> * mat = new std::array<double, n>[k];    // vector of arrays   // runtime: 0.93   //  std::vector<std::array<double, n> > mat(k);    // vector of vectors   // runtime: 2.16s   //  std::vector<std::vector<double> > mat(k, std::vector<double>(n));    // fill collection arbitrary values   (std::size_t k=0; k<k; ++k) {     (std::size_t j=0; j<n; ++j)       mat[k][j] = k*n+j;   }    std::cerr << "constructed" << std::endl;    // compute sum of pairwise distances in collection   double tot = 0.0;    (std::size_t j=0; j<k; ++j) {      (std::size_t k=0; k<k; ++k)        tot += fast_sq_dist(mat[j], mat[k]);    }     std::cout << tot << std::endl;    return 0; } 

nb 1: versions print same result.

nb 2: , demonstrate runtime differences between std::array<std::array<double, n>, k>, std::vector<std::array<double, n> >, , std::vector<std::vector<double> > wasn't assignment/initialization when allocating, runtimes of allocating collection (i.e. commenting out computation , printing of tot) 0.000s, 0.000s, , 0.004s, respectively.

nb 3: each method compiled , run separately (not timed back-to-back within same executable), prevent unfair differences in caching.

nb 4:
assembly array of arrays: http://ideone.com/sm8db
assembly vector of arrays: http://ideone.com/vhpjv
assembly vector of vectors: http://ideone.com/rztne

nb 5: absolutely clear, in no way intending criticize stl. absolutely love stl and, not use frequently, details of effective use have taught me lot of subtle , great features of c++. instead, intellectual pursuit: timing things learn principles of efficient c++ design.

furthermore, unsound blame stl, because difficult deconvolve etiology of runtime differential: optimizations turned on, can compiler optimizations slow code rather quicken it. optimizations turned off, can unnecessary copy operations (that optimized out , never executed in production code), can biased against data types more others.

if curious me, i'd love figuring out.

consider second , third tests. conceptually, identical: allocate k * n * sizeof(double) bytes off heap , access them in same way. why different times?

all of "faster" tests have 1 thing in common: new[]. of slower tests allocated new or on stack. vector uses new[] under hood™. obvious cause new[] , new have more different implementations expected.

what i'm going suggest new[] fall mmap , allocate directly on page boundary, giving alignment speedup, whereas other 2 methods not allocate on page boundary.

consider using os allocation function directly map committed pages, , place std::array<std::array<double, n>, k> it.


Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -