Category Parallel Programming

Basic thread synchronization in C++

This code is from the chapter 3 of the C++ Concurrency in Action book by Anthony Williams and can be downloaded directly from the book companion site. I made some minor changes and compiled it in Visual Studio 2013. Making the code available on MSDN may help many who learn multithreading in C++. Solution has […]

Histogram on GPU using CUDA

The following sample demonstrates how to compute a histogram on a GPU. CUDA SDK has a histogram sample which works for 64 and 256 bins with code both running on GPU and CPU. SDK CPU code is using bit-manipulation which should be replaced with SSE2 instructions. This article demonstrates 3 simple, entry-level, examples showing the […]

Matrix Transpose on GPU using CUDA

The following sample demonstrates matrix transpose on GPU. It starts with sequential code on the CPU and progresses towards more advanced optimizations, first a parallel transformation on the CPU, then several transformations on the GPU. In real life, it is impractical to do just a single matrix operation on the GPU due to the cost […]

Red-eye Removal with CUDA

This code demonstrates basic steps for red-eye correction in pictures. It requires a picture of someone with red eyes and a small template file which is a picture of a red eye to help us find eyes in the original picture. While the sample works with the picture that I tested it with, it requires […]

NVIDIA GPU Architecture & CUDA Programming Environment

1. Introduction GPU was first invented by NVidia in 1999. Originally GPUs were purely fixed-function devices, meaning that they were designed to specifically process stages of graphics pipeline such as vertex and pixel shaders, but they have evolved into increasingly flexible programmable processors. Modern GPUs are fully programmable manycore chips built around an array of […]

Sockets with Adaptive Communication Environment (ACE) Framework

Adaptive Communication Environment, or ACE for short, is a C++ framework to develop portable, high-performance networked multithreaded applications. References below introduce in greater detail to the history of the framework. ACE provides abstractions for sockets, demultiplexing loops, threads, synchronization primitives and can be used in high-performance, distributed real-time and embedded systems. ACE customers are Boeing and NASA, […]

Dining Philosophers in C++ 11

I have previously shown how to solve the problem using c#. I have to say that C++ 11 is officially awesome! Here’s my offering using that language. There are numerous solutions to the dining philosophers problem on the internet and I will not be reciting the story. You can find it at one of the following two […]

Dining Philosophers in C#

There are numerous solutions to the dining philosophers problem on the internet and I will not be reciting the story. You can find it at one of the following two links: MSDN Magazine Rosetta Code A few notes. Each philosopher is marked with a circle and can pick a chopstick on the left and on […]

Producer Consumer inter-thread communication using condition variables in C++

In this example we show inter-thread communication between producer and consumer threads using condition_variable.  In the end two threads will ping-pong passing control to each other. Code can be modified in such a way that producer will read and queue chunks of data from some source while consumer thread will process data from the queue. For example, […]

HDR Tone Mapping with CUDA 5

Note that the code for the article is available here. In this example, for the sake of learning, we are going to butcher some great images. Let’s describe the problem first. We are going to take some HDR images and modify their luminosity to lighten them. We could also darken the images, or apply changes only […]