<small id='ah1m1'></small><noframes id='ah1m1'>

    <i id='ah1m1'><tr id='ah1m1'><dt id='ah1m1'><q id='ah1m1'><span id='ah1m1'><b id='ah1m1'><form id='ah1m1'><ins id='ah1m1'></ins><ul id='ah1m1'></ul><sub id='ah1m1'></sub></form><legend id='ah1m1'></legend><bdo id='ah1m1'><pre id='ah1m1'><center id='ah1m1'></center></pre></bdo></b><th id='ah1m1'></th></span></q></dt></tr></i><div id='ah1m1'><tfoot id='ah1m1'></tfoot><dl id='ah1m1'><fieldset id='ah1m1'></fieldset></dl></div>
  • <legend id='ah1m1'><style id='ah1m1'><dir id='ah1m1'><q id='ah1m1'></q></dir></style></legend>

      1. <tfoot id='ah1m1'></tfoot>
          <bdo id='ah1m1'></bdo><ul id='ah1m1'></ul>

        openmp中的并行for循环

        Parallel for loop in openmp(openmp中的并行for循环)
        <legend id='2fOBK'><style id='2fOBK'><dir id='2fOBK'><q id='2fOBK'></q></dir></style></legend>

        <small id='2fOBK'></small><noframes id='2fOBK'>

          <bdo id='2fOBK'></bdo><ul id='2fOBK'></ul>

            1. <i id='2fOBK'><tr id='2fOBK'><dt id='2fOBK'><q id='2fOBK'><span id='2fOBK'><b id='2fOBK'><form id='2fOBK'><ins id='2fOBK'></ins><ul id='2fOBK'></ul><sub id='2fOBK'></sub></form><legend id='2fOBK'></legend><bdo id='2fOBK'><pre id='2fOBK'><center id='2fOBK'></center></pre></bdo></b><th id='2fOBK'></th></span></q></dt></tr></i><div id='2fOBK'><tfoot id='2fOBK'></tfoot><dl id='2fOBK'><fieldset id='2fOBK'></fieldset></dl></div>
                  <tbody id='2fOBK'></tbody>

                  <tfoot id='2fOBK'></tfoot>

                • 本文介绍了openmp中的并行for循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我正在尝试并行化一个非常简单的 for 循环,但这是我很长时间以来第一次尝试使用 openMP.我对运行时间感到困惑.这是我的代码:

                  I'm trying to parallelize a very simple for-loop, but this is my first attempt at using openMP in a long time. I'm getting baffled by the run times. Here is my code:

                  #include <vector>
                  #include <algorithm>
                  
                  using namespace std;
                  
                  int main () 
                  {
                      int n=400000,  m=1000;  
                      double x=0,y=0;
                      double s=0;
                      vector< double > shifts(n,0);
                  
                  
                      #pragma omp parallel for 
                      for (int j=0; j<n; j++) {
                  
                          double r=0.0;
                          for (int i=0; i < m; i++){
                  
                              double rand_g1 = cos(i/double(m));
                              double rand_g2 = sin(i/double(m));     
                  
                              x += rand_g1;
                              y += rand_g2;
                              r += sqrt(rand_g1*rand_g1 + rand_g2*rand_g2);
                          }
                          shifts[j] = r / m;
                      }
                  
                      cout << *std::max_element( shifts.begin(), shifts.end() ) << endl;
                  }
                  

                  我用

                  g++ -O3 testMP.cc -o testMP  -I /opt/boost_1_48_0/include
                  

                  也就是说,没有-fopenmp",我得到了这些时间:

                  that is, no "-fopenmp", and I get these timings:

                  real    0m18.417s
                  user    0m18.357s
                  sys     0m0.004s
                  

                  当我使用-fopenmp"时,

                  when I do use "-fopenmp",

                  g++ -O3 -fopenmp testMP.cc -o testMP  -I /opt/boost_1_48_0/include
                  

                  我得到了这些数字:

                  real    0m6.853s
                  user    0m52.007s
                  sys     0m0.008s
                  

                  这对我来说没有意义.如何使用八个内核只能导致 3 倍性能提升?我是否正确编码循环?

                  which doesn't make sense to me. How using eight cores can only result in just 3-fold increase of performance? Am I coding the loop correctly?

                  推荐答案

                  您应该对 xy 使用 OpenMP reduction 子句>:

                  You should make use of the OpenMP reduction clause for x and y:

                  #pragma omp parallel for reduction(+:x,y)
                  for (int j=0; j<n; j++) {
                  
                      double r=0.0;
                      for (int i=0; i < m; i++){
                  
                          double rand_g1 = cos(i/double(m));
                          double rand_g2 = sin(i/double(m));     
                  
                          x += rand_g1;
                          y += rand_g2;
                          r += sqrt(rand_g1*rand_g1 + rand_g2*rand_g2);
                      }
                      shifts[j] = r / m;
                  }
                  

                  使用 reduction 每个线程在 xy 中累积自己的部分和,最后将所有部分值相加,以便获取最终值.

                  With reduction each thread accumulates its own partial sum in x and y and in the end all partial values are summed together in order to obtain the final values.

                  Serial version:
                  25.05s user 0.01s system 99% cpu 25.059 total
                  OpenMP version w/ OMP_NUM_THREADS=16:
                  24.76s user 0.02s system 1590% cpu 1.559 total
                  

                  参见 - 超线性加速 :)

                  See - superlinear speed-up :)

                  这篇关于openmp中的并行for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Constructor initialization Vs assignment(构造函数初始化 Vs 赋值)
                  Is a `=default` move constructor equivalent to a member-wise move constructor?(`=default` 移动构造函数是否等同于成员移动构造函数?)
                  Has the new C++11 member initialization feature at declaration made initialization lists obsolete?(声明时新的 C++11 成员初始化功能是否使初始化列表过时了?)
                  Order of constructor call in virtual inheritance(虚继承中构造函数调用的顺序)
                  How to use sfinae for selecting constructors?(如何使用 sfinae 选择构造函数?)
                  Initializing a union with a non-trivial constructor(使用非平凡的构造函数初始化联合)
                  • <i id='Rp8Kj'><tr id='Rp8Kj'><dt id='Rp8Kj'><q id='Rp8Kj'><span id='Rp8Kj'><b id='Rp8Kj'><form id='Rp8Kj'><ins id='Rp8Kj'></ins><ul id='Rp8Kj'></ul><sub id='Rp8Kj'></sub></form><legend id='Rp8Kj'></legend><bdo id='Rp8Kj'><pre id='Rp8Kj'><center id='Rp8Kj'></center></pre></bdo></b><th id='Rp8Kj'></th></span></q></dt></tr></i><div id='Rp8Kj'><tfoot id='Rp8Kj'></tfoot><dl id='Rp8Kj'><fieldset id='Rp8Kj'></fieldset></dl></div>
                      <tbody id='Rp8Kj'></tbody>
                    1. <small id='Rp8Kj'></small><noframes id='Rp8Kj'>

                      <legend id='Rp8Kj'><style id='Rp8Kj'><dir id='Rp8Kj'><q id='Rp8Kj'></q></dir></style></legend>
                      • <bdo id='Rp8Kj'></bdo><ul id='Rp8Kj'></ul>

                          • <tfoot id='Rp8Kj'></tfoot>