问题描述
我只是自学了一些 OpenMP,这可能很愚蠢.基本上,我试图在 C++ 中并行化广度优先搜索程序,每个节点都需要很长时间来处理.这是一个示例代码:
I just taught myself some OpenMP and this could be stupid. Basically I'm trying to parallelize a breadth first search program in c++, with each node taking a long time to process. Here's an example code:
queue<node*> q;
q.push(head);
while (!q.empty()) {
qSize = q.size();
for (int i = 0; i < qSize; i++) {
node* currNode = q.front();
q.pop();
doStuff(currNode);
q.push(currNode);
}
}
处理函数 doStuff() 非常昂贵,我想对其进行并行化.但是,如果我通过将 #pragma omp parallel for
放在 for 行之前来并行化 for 循环,则在运行时会弹出各种奇怪的错误.我猜原因是这样 q.front()
和 q.push()
也将并行化,并且多个线程可能会通过相同的节点q.front()
(因为它们都在任何 q.push
被处理之前就被处理了).
The processing function doStuff() is quite expensive and I want to parallelize it. However if I parallelize the for loop by putting #pragma omp parallel for
right before the for line, all kinds of weird error pop up at runtime. I'm guessing the reason is that this way q.front()
and q.push()
will also get parallelized, and multiple threads will possibly get the same node through q.front()
(because they all got processed before any q.push
has been processed).
我该如何解决这个问题?
How can I get around this?
推荐答案
解决方案是使用临界区保护对队列的访问.
The solution is to protect access to the queue with a critical section.
queue<node*> q;
q.push(head);
while (!q.empty()) {
qSize = q.size();
#pragma omp parallel for
for (int i = 0; i < qSize; i++) {
node* currNode;
#pragma omp critical
{
currNode = q.front();
q.pop();
}
doStuff(currNode);
#pragma omp critical
q.push(currNode);
}
}
这类似于拥有一个共同的互斥锁并锁定它.
This is similar to having a common mutex and locking it.
此版本在效率上有一些限制:在 for 循环结束时,尽管有工作在队列中,但一些线程可能会空闲.就处理队列为空但某些线程仍在计算的情况而言,制作一个线程在队列中有东西时连续工作的版本有点棘手.
There are some limits in efficiency with this version: At the end of the for loop, some threads may idle, despite work being in the queue. Making a version where threads continuously work whenever there is something in the queue is a bit tricky in terms of handling the situations where the queue is empty but some threads are still computing.
根据节点中涉及的数据大小,缓存效应和错误共享可能也会对性能产生重大影响.但这不能用具体的例子来讨论.在许多情况下,简单版本可能足够高效,但获得最佳性能可能变得任意复杂.
Depending of the data size involved in a node, you may also have significant performance impact of cache-effects and false sharing. But that can't be discussed with a specific example. The simple version will probably be sufficiently efficient in many cases, but getting optimal performance may become arbitrarily complex.
无论如何,您必须确保 doStuff
不会修改任何全局或共享状态.
In any case, you have to ensure that doStuff
does not do modify any global or shared state.
这篇关于并行化广度优先搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!