问题描述
在编写 CUDA 应用程序时,您可以在驱动程序级别或运行时级别工作,如图所示(库是 CUFFT 和 CUBLAS 用于高级数学):
When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math):
(来源:tomshw.it)
我假设两者之间的权衡是提高低级 API 的性能,但以增加代码复杂性为代价.具体的区别是什么?有哪些重要的事情是高级 API 不能做的?
I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased complexity of code. What are the concrete differences and are there any significant things which you cannot do with the high-level API?
我正在使用 CUDA.net 与 C# 进行互操作,它是作为驱动程序 API 的副本构建的.这鼓励在 C# 中编写大量相当复杂的代码,而使用运行时 API 的 C++ 等效代码会更简单.这样做有什么好处吗?我可以看到的一个好处是更容易将智能错误处理与其他 C# 代码集成.
I am using CUDA.net for interop with C# and it is built as a copy of the driver API. This encourages writing a lot of rather complex code in C# while the C++ equivalent would be more simple using the runtime API. Is there anything to win by doing it this way? The one benefit I can see is that it is easier to integrate intelligent error handling with the rest of the C# code.
推荐答案
CUDA 运行时可以将您的 CUDA 内核编译和链接到可执行文件中.这意味着您不必将 cubin 文件与您的应用程序一起分发,或者处理通过驱动程序 API 加载它们.正如您所指出的,它通常更易于使用.
The CUDA runtime makes it possible to compile and link your CUDA kernels into executables. This means that you don't have to distribute cubin files with your application, or deal with loading them through the driver API. As you have noted, it is generally easier to use.
相比之下,驱动程序 API 更难编程,但可以更好地控制 CUDA 的使用方式.程序员必须直接处理初始化、模块加载等.
In contrast, the driver API is harder to program but provided more control over how CUDA is used. The programmer has to directly deal with initialization, module loading, etc.
显然,通过驱动程序 API 可以查询比通过运行时 API 更详细的设备信息.例如,设备上可用的空闲内存只能通过驱动程序 API 查询.
Apparently more detailed device information can be queried through the driver API than through the runtime API. For instance, the free memory available on the device can be queried only through the driver API.
来自 CUDA 程序员指南:
From the CUDA Programmer's Guide:
它由两个API组成:
- 一种称为 CUDA 驱动程序 API 的低级 API,
- 一种称为 CUDA 运行时 API 的高级 API,它在CUDA 驱动程序 API.
这些 API 是互斥的:应用程序应该使用其中一个或其他.
These APIs are mutually exclusive: An application should use either one or the other.
CUDA 运行时通过提供隐式来简化设备代码管理初始化、上下文管理和模块管理.C 主机代码nvcc 生成的基于 CUDA 运行时(参见第 4.2.5 节),所以链接到此代码的应用程序必须使用 CUDA 运行时 API.
The CUDA runtime eases device code management by providing implicit initialization, context management, and module management. The C host code generated by nvcc is based on the CUDA runtime (see Section 4.2.5), so applications that link to this code must use the CUDA runtime API.
相比之下,CUDA 驱动程序 API 需要更多代码,更难编程和调试,但提供了更好的控制水平并且与语言无关,因为它只处理 cubin 对象(参见第 4.2.5 节).尤其是更难使用 CUDA 驱动程序 API 配置和启动内核,因为执行必须使用显式函数调用指定配置和内核参数而不是第 4.2.3 节中描述的执行配置语法.此外,设备仿真(参见第 4.5.2.9 节)不适用于 CUDA 驱动程序 API.
In contrast, the CUDA driver API requires more code, is harder to program and debug, but offers a better level of control and is language-independent since it only deals with cubin objects (see Section 4.2.5). In particular, it is more difficult to configure and launch kernels using the CUDA driver API, since the execution configuration and kernel parameters must be specified with explicit function calls instead of the execution configuration syntax described in Section 4.2.3. Also, device emulation (see Section 4.5.2.9) does not work with the CUDA driver API.
API 之间没有明显的性能差异.你的内核如何使用内存以及它们在 GPU 上的布局方式(在扭曲和块中)将产生更明显的效果.
There is no noticeable performance difference between the API's. How your kernels use memory and how they are laid out on the GPU (in warps and blocks) will have a much more pronounced effect.
这篇关于CUDA 驱动程序 API 与 CUDA 运行时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!