
A simple API
I have not worked a lot with concurrency in the past, though in my C++ code I made a small multi-threaded job system, that I barely used.
This time I wanted to introduce it quite early in the code to maximize chances of using it when applicable. And just to be more fun, decided to start from scratch.
So as a note I should say, I have not studied a whole lot about this, so just I might be doing it all wrong!!! (But I’ll learn by making mistakes)
I defined the following requirements for the threading API, trying to keep it as simple as possible:
- run code asynchronously
- be able to dedicate a thread to one task only
- use multiple threads to run tasks concurrently
- a task should be able to run multiple times without having to be rescheduled
That way multi-threading in my code can be used three ways:
- constantly run independent asynchronous tasks
- OR use all processors to concurrently execute a list of tasks as fast as possible
- OR combine the two above
Let’s try implementing
NOTE: for now, because I’m working solely on Windows internally the code uses win32 API like CreateThread & WaitForMultipleObjects
First I needed something to safely read/write states in different threads:
typedef struct {
CriticalSection critical_section;
uint value;
} mt_uint;
void c_mt_uint_init(mt_uint *p, uint value);
uint c_mt_uint_get(mt_uint *p);
void c_mt_uint_set(mt_uint *p, uint value);
void c_mt_uint_release(mt_uint *p);
I then decided on the following:
- a job is just a function pointer with a bit of data (512 might be too much, I’ll see later on)
- an asynchronous job is a job plus a state that can be queried
#define JOB_DONE 0
#define JOB_CONTINUE 1
/* job function pointer
* return JOB_DONE if job is finished
* return JOB_CONTINUE if job needs to run again */
typedef uint(*job_func)(void*);
typedef struct {
job_func function;
uint8 data[512];
} Job;
typedef enum {
ASYNC_JOB_IDLE,
ASYNC_JOB_QUEUED,
ASYNC_JOB_PROCESSING,
ASYNC_JOB_FINISHED,
ASYNC_JOB_STATE_COUNT
} AsyncJobState;
typedef struct {
mt_uint state;
Job job;
} AsyncJob;
void c_parallel_async_job_init(AsyncJob *async);
void c_parallel_async_job_release(AsyncJob *async);
I needed structures for the two different multi-threading use-cases (asynchronous jobs & parallel execution)
- asynchronous job queue:
- self-expanding ring buffer of AsyncJob pointers
- job batch for multi-threaded execution (waits for all jobs to be done)
- pointer to Job array
- thread-safe job assignment count
- thread-safe job processing count
And a state value to keep track of
- global execution (the ON/OFF for threads)
- concurrent mode (for parallel execution with job batch)
The worker thread loop looks like this:
- check global execution (if thread should stop)
- check if concurrent mode is active (the job batch is active)
- if yes, get next available job in batch (increment batch assign count), execute until JOB_DONE is returned (or global execution is stopped)
- increment batch process count
- else check for asynchronous job in queue
- if job available, execute it until JOB_DONE is returned (or global execution is stopped)
- otherwise sleep for 1 millisecond
Now the API:
void c_parallel_start(uint thread_count);
void c_parallel_stop();
void c_parallel_queue(AsyncJob *async_job);
void c_parallel_execute(Job *jobs, uint count);
c_parallel_start: create and start the desired number of worker threads, default value zero will create one thread per processor (minus one)
c_parallel_stop: set global execution to off, then wait for all threads to finish
c_parallel_queue:
- add AsyncJob pointer to async job queue
- job will get picked up by worker thread if concurrent mode is inactive (or no more jobs to get from job batch)
c_parallel_execute:
- set-up job batch: jobs processed count is zero, jobs assignment count is initial job count
- set concurrent mode to active
- loop until job processed count is equal to initial job count:
- not in the loop – what happens in worker threads:
- try to get a job from batch
- if available run until JOB_DONE is returned
- else try to get AsyncJob from queue
- in “main” thread (caller thread):
- try to get job from batch
- if available run until JOB_DONE is returned
- else set concurrent mode to inactive & sleep until process count is equal to initial job count
- try to get job from batch
- not in the loop – what happens in worker threads:
- set concurrent mode to inactive
With this I should be able to meet all my requirements.
To dedicate one thread, all I have to do is create an asynchronous job that always returns JOB_CONTINUE, so the worker thread is not used for batch jobs. If I want to free the thread, I can have that job return JOB_DONE, and reschedule it later on if needed.
Missing features / possible improvements (probably forgetting a lot)
- pausing threads
- execute jobs in a queue only for a designated time (like if some processing time is left after rendering a frame)
Cranking up the speed (hopefully…)
I’m now gonna try to change the doc generator tool to make use of these new cool toys.
At the very least, I will able to parallelize reading the C files, and then make sure the HTML generation is happening concurrently.
I’m fully expecting to be hitting some initial issues where the multi-threaded code is slower than the sequential, but it’s a good use case to improve the API and get rid of bugs.
Stay tuned!