Extreme Optimization™: Complexity made simple.

Math and Statistics
Libraries for .NET

  • Home
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Resources
    • Downloads
    • QuickStart Samples
    • Sample Applications
    • Frequently Asked Questions
    • Technical Support
  • Order
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Partners
    • Contact us
Introduction
Deployment Guide
Nuget packages
Configuration
Using Parallelism
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Data Analysis Library User's GuideData Analysis Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand Data Access Library User's GuideData Access Library User's Guide
Expand ReferenceReference

Skip Navigation LinksHome»Documentation»Vector and Matrix Library User's Guide»Distributed and GPU Computing»GPU Computing with CUDA

GPU Computing with CUDA

Extreme Optimization Numerical Libraries for .NET Professional

NVIDIA's CUDA is one of the most widely used GPU computing platforms. The Extreme Optimization Numerical Libraries for .NET let you take advantage of CUDA-enabled graphics cards and devices through its distributed computing framework. CUDA support is enabled through the CudaProvider class.

This section only discusses issues specific to CUDA. For general information on the distributed computing framework, see the previous section on distributed and GPU computing.

Prerequisites

Version 4.0 or higher of the .NET Framework is required in order to use the CUDA functionality, You also need to have NVIDIA CUDA Toolkit v5.5 (for 32 bit) or v7.5 (for 64 bit) installed on your machine. This toolkit can be downloaded from NVIDIA's website.

To run the software, you need a CUDA-enabled graphics card with compute capability 1.3 or higher.

Creating CUDA enabled applications

The first step in adding CUDA support to your application is to add a reference to the CUDA provider assembly for your platform, Extreme.Numerics.Cuda.Net40.x86.dll or Extreme.Numerics.Cuda.Net40.x64.dll, to your application.

Next, you need to inform the distributed computing framework that you are using the CUDA provider:

C#
VB
C++
F#
Copy
DistributedProvider.Current =
    Extreme.Mathematics.Distributed.CudaProvider.Default;
DistributedProvider.Current =
    Extreme.Mathematics.Distributed.CudaProvider.Default

No code example is currently available or this language may not be supported.

DistributedProvider.Current <-
    Extreme.Mathematics.Distributed.CudaProvider.Default

Finally, you need to adapt your code to use distributed arrays where appropriate. The guidelines for working with distributed arrays from the previous section apply to CUDA code as well.

CUDA-specific functionality

The CUDA provider exposes a number of functions specific to the CUDA environment:

Method

Description

GetAvailableMemory

Returns the free memory available on the device, in bytes. Note that because of memory fragmentation, it is unlikely that a block of this size can be allocated.

GetTotalMemory

Returns the total memory on the device, in bytes.

GetDeviceLimit(Int32)

Wrapper for the cudaDeviceGetLimit function.

The GetAvailableMemory method is particularly useful for verifying that all device memory has been properly released.

Inter-operating with other CUDA libraries.

The CUDA provider supplies a large number of functions that are optimized for use on CUDA GPU's. Sometimes it is necessary to call into external libraries. This section outlines how to do this.

A pointer to device memory can be obtained from a distributed array through the NativeStorage property for vectors, and NativeStorage. These methods return a storage structure that has two relevant fields.

For vectors, the Values property is an IntPtr that points to the start of the memory block that contains the data for the vector. The Offset is the number of elements (not bytes) from the start of the memory block where the first element in the vector is stored. This information can be combined to get the starting address for the vector's elements.

Storage for vectors may not be contiguous. This can happen, for example, when the vector represents a row in a matrix. The Stride property specifies the number of elements between vector elements. A value of 1 corresponds to contiguous storage.

For matrices, the Values property is an IntPtr that points to the start of the memory block that contains the data for the vector. The Offset is the number of elements (not bytes) from the start of the memory block where the first element in the vector is stored. This information can again be combined to get the starting address for the vector's elements.

Matrices are stored in column-major order. This means that columns are stored contiguously. It is possible that not all elements in a matrix are contiguous. The LeadingDimension property specifies the number of elements between the start of each column. This is usually equal to the number of rows in the matrix, but not always.

Once the device addresses of the data have been obtained, they can be passed to an external function. If this function modifies the values of an array, this should be signaled by invalidating the array's local data. Otherwise, an outdated local copy of the data may be used when retrieving the results. This can be done with a call to Invalidate(DistributedDataLocation).

The CUDA provider has an overloaded Copy method that can copy from device to host, host to device, and device to device.

Copyright (c) 2004-2023 ExoAnalytics Inc.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2004-2023, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.