C programming – adding multi-threading

A simple API

I have not worked a lot with concurrency in the past, though in my C++ code I made a small multi-threaded job system, that I barely used.

This time I wanted to introduce it quite early in the code to maximize chances of using it when applicable. And just to be more fun, decided to start from scratch.

So as a note I should say, I have not studied a whole lot about this, so just I might be doing it all wrong!!! (But I’ll learn by making mistakes)

I defined the following requirements for the threading API, trying to keep it as simple as possible:

  • run code asynchronously
  • be able to dedicate a thread to one task only
  • use multiple threads to run tasks concurrently
  • a task should be able to run multiple times without having to be rescheduled

That way multi-threading in my code can be used three ways:

  • constantly run independent asynchronous tasks
  • OR use all processors to concurrently execute a list of tasks as fast as possible
  • OR combine the two above

Let’s try implementing

NOTE: for now, because I’m working solely on Windows internally the code uses win32 API like CreateThread & WaitForMultipleObjects

First I needed something to safely read/write states in different threads:

typedef struct {
	CriticalSection critical_section;
	uint            value;
} mt_uint;

void c_mt_uint_init(mt_uint *p, uint value);
uint c_mt_uint_get(mt_uint *p);
void c_mt_uint_set(mt_uint *p, uint value);
void c_mt_uint_release(mt_uint *p);

I then decided on the following:

  • a job is just a function pointer with a bit of data (512 might be too much, I’ll see later on)
  • an asynchronous job is a job plus a state that can be queried
#define JOB_DONE     0
#define JOB_CONTINUE 1

/* job function pointer
 * return JOB_DONE if job is finished 
 * return JOB_CONTINUE if job needs to run again */
typedef uint(*job_func)(void*); 

typedef struct {
	job_func function;
	uint8    data[512];
} Job;

typedef enum {
	ASYNC_JOB_IDLE,
	ASYNC_JOB_QUEUED,
	ASYNC_JOB_PROCESSING,
	ASYNC_JOB_FINISHED,
	ASYNC_JOB_STATE_COUNT
} AsyncJobState;

typedef struct {
	mt_uint state;
	Job     job;
} AsyncJob;

void c_parallel_async_job_init(AsyncJob *async);
void c_parallel_async_job_release(AsyncJob *async);

I needed structures for the two different multi-threading use-cases (asynchronous jobs & parallel execution)

  • asynchronous job queue:
    • self-expanding ring buffer of AsyncJob pointers
  • job batch for multi-threaded execution (waits for all jobs to be done)
    • pointer to Job array
    • thread-safe job assignment count
    • thread-safe job processing count

And a state value to keep track of

  • global execution (the ON/OFF for threads)
  • concurrent mode (for parallel execution with job batch)

The worker thread loop looks like this:

  • check global execution (if thread should stop)
  • check if concurrent mode is active (the job batch is active)
    • if yes, get next available job in batch (increment batch assign count), execute until JOB_DONE is returned (or global execution is stopped)
    • increment batch process count
  • else check for asynchronous job in queue
    • if job available, execute it until JOB_DONE is returned (or global execution is stopped)
  • otherwise sleep for 1 millisecond

Now the API:

void c_parallel_start(uint thread_count);
void c_parallel_stop();
void c_parallel_queue(AsyncJob *async_job);
void c_parallel_execute(Job *jobs, uint count);

c_parallel_start: create and start the desired number of worker threads, default value zero will create one thread per processor (minus one)

c_parallel_stop: set global execution to off, then wait for all threads to finish

c_parallel_queue:

  • add AsyncJob pointer to async job queue
  • job will get picked up by worker thread if concurrent mode is inactive (or no more jobs to get from job batch)

c_parallel_execute:

  • set-up job batch: jobs processed count is zero, jobs assignment count is initial job count
  • set concurrent mode to active
  • loop until job processed count is equal to initial job count:
    • not in the loop – what happens in worker threads:
      • try to get a job from batch
      • if available run until JOB_DONE is returned
      • else try to get AsyncJob from queue
    • in “main” thread (caller thread):
      • try to get job from batch
        • if available run until JOB_DONE is returned
        • else set concurrent mode to inactive & sleep until process count is equal to initial job count
  • set concurrent mode to inactive

With this I should be able to meet all my requirements.

To dedicate one thread, all I have to do is create an asynchronous job that always returns JOB_CONTINUE, so the worker thread is not used for batch jobs. If I want to free the thread, I can have that job return JOB_DONE, and reschedule it later on if needed.

Missing features / possible improvements (probably forgetting a lot)

  • pausing threads
  • execute jobs in a queue only for a designated time (like if some processing time is left after rendering a frame)

Cranking up the speed (hopefully…)

I’m now gonna try to change the doc generator tool to make use of these new cool toys.

At the very least, I will able to parallelize reading the C files, and then make sure the HTML generation is happening concurrently.

I’m fully expecting to be hitting some initial issues where the multi-threaded code is slower than the sequential, but it’s a good use case to improve the API and get rid of bugs.

Stay tuned!