Authoring C Content
This article is meant for kata authors and translators who would like to create new content in C. It attempts to explain how to create and organize things in a way conforming to authoring guidelines, shows the most common pitfalls and how to avoid them.
This article is not a standalone tutorial on creating kata or translations. It's meant to be a complementary, C-specific part of a more general set of HOWTOs and guidelines related to content authoring. If you are going to create a C translation, or a new C kata from scratch, please make yourself familiar with the aforementioned documents related to authoring in general first.
#
General infoAny technical information related to the C setup on Codewars can be found on the C reference page (language versions, available libraries, and setup of the code runner).
#
DescriptionC code blocks can be inserted with C-specific part in sequential code blocks:
```c
...your code here...
```
C-specific paragraphs can be inserted with language conditional rendering:
~~~if:c
...text visible only for C description...
~~~
~~~if-not:c
...text not visible in C description...
~~~
#
Tasks and RequirementsSome concepts don't always translate well to or from C. Because of this, some constructs should be avoided and some translations just shouldn't be done. Some high-level languages, like Python or JavaScript, reside on the exact opposite end of the spectrum than C, and translating kata between them and C can result in significant differences in difficulty, requirements, and design of the solution.
- C is much lower-level than many other popular languages available on Codewars. For this reason, many kata, even if their task can be translated to C directly, can turn out much harder in C than in the original language. There are many kata that were originally created as very easy and beginner-friendly (for example 8 kyu). But after translating into C, and adding aspects like memory management, or two dimensional C arrays, etc. they are not so easy anymore, and newbies complain that kata ranked 8 kyu is too difficult for them while it should be an entry-level task.
- C is statically typed, so any task that depends on dynamic typing can be difficult to translate into C, and attempts of forcing a C kata to reflect a dynamically typed interface can lead to ideas that enforce a really bad design.
- There are just a few additional libraries available for the C runner, so almost everything has to be implemented manually by the author or the user. Kata that take advantage of additional packages installed for other languages become much more difficult in C.
#
Coding#
Code styleUnlike for example Python or Java, there's no single guide for C code style, or even a set of naming conventions, which would be widely adopted and agreed upon by C programmers. Traditional naming conventions are using snake_case
, Win32 API naming conventions are using PascalCase
, there are GNU guidelines, Microsoft guidelines, Google guidelines, and some of them contradict each other. Just use whatever set of guidelines you like, but when you do, use it consistently.
#
Header filesNot as much of a problem for C as it is for C++, but still, C authors often forget to include required header files, or just leave them out deliberately because "it works" even when some of the files are not included. It happens mostly due to the following reasons:
- The compiler provides an implicit declaration of a function, when it's encountered in the code and was not declared. However, this behavior is not standard and is now deprecated. You need to explicitly include header files for library functions you use or declare them in some other way.
- Some header files include other header files indirectly, for example, file
foo.h
contains line#include <bar.h>
, which might appear to make the explicit include forbar.h
unnecessary. It's not true though, because the filefoo.h
might change one day, or might depend on some compiler settings or command line options, and after some changes to the configuration of the C runner, thebar.h
might be not included there anymore. That's why every file (i.e. code snippet) of a kata should explicitly include all required header files declaring functions used in it. - The author might think that header files for the testing framework are included automatically by the code runner. That is not the case though, and test suites need to include
criterion/criterion.h
explicitly.
#
Compilation warningsCompiler options related to warnings used by the C runner are somewhat strict and require some discipline to get the code to compile cleanly. -Wall
and -Wextra
may cause numerous warnings and some of them are very pedantic. However, code of C kata should still compile cleanly, without any warnings logged to the console. Even when a warning does not cause any problem with tests, users get distracted by them and blame them for failing tests.
#
Working with pointers and memory managementUnlike many modern, high-level languages, C does not manage memory automatically. Manual memory management is a very vast and complex topic, with many possible ways of achieving the goal depending on a specific case, caveats, and pitfalls.
Data hidden behind pointers can be arranged in many possible ways. Whenever a kata passes in a pointer to the user's solution or requires it to return or manipulate a pointer or data referenced by a pointer, it should explicitly and clearly provide all information necessary to carry out the operation correctly. The information can be put in one or more of the following places:
- The code itself. Specifying that a pointer points to
const
data can serve as a hint that it has not been allocated dynamically and won't be freed. Size hints for array parameters can help understanding how arrays are organized, etc. Correctly specified types can be very helpful, but not always sufficient. - Language-specific paragraph in the kata description.
- As a comment in the "Solution setup" snippet.
- When necessary, sample tests should present an example of how data is composed, passed to the user solution, fetched from it, worked on, and cleaned up afterwards.
When the structure, layout, or allocation scheme of pointed data is not described, users cannot know how to implement requirements without causing either a crash or a memory leak. Authors can choose the ownership strategy their kata should use, and the memory can be managed either by the test suite, by the user, or both. However, they should be aware of the advantages and disadvantages of each such strategy, and when and which applies the best.
Possible ways of handling memory management are described in the Memory Management in C kata article.
One of the consequences of unmanaged memory is that it's strongly recommended against returning string constants from C functions, especially when translating kata from other languages. Returning a string in other languages is not a problem, but in C it always raises questions of who should allocate it and how it should be allocated. Consider replacing the string with some simpler data type (eventually aliased with a typedef
), and/or provide some symbolic constants for available values. For example, if the requirement for the JavaScript version is: "Return the string 'BLACK' if a black pawn will be captured first, 'WHITE' if a white one, and 'NONE' if all pawns are safe.", C version should preferably provide and use the named constants BLACK
, WHITE
and NONE
. If the author decides to keep raw C-strings as elements of the kata interface, they should clearly specify the required allocation scheme.
#
Tests#
Testing frameworkC kata use the Criterion testing framework to implement and execute tests. Read its reference page to find out how to structure tests into groups and test cases, what assertions are available, etc.
Criterion supports many features that can be very helpful, but (unfortunately) are not commonly used by C authors. It allows for parameterized tests, setting up additional data, test fixture setup, teardown, custom descriptions, etc.
#
Test feedbackYou should notice that the report hooks used by the Codewars test runner produce one test output entry per assertion, so the test output panel can get very noisy.
#
Random utilitiesUnlike some other languages, C does not provide too many means of generating random numbers which could be used to build random tests. stdlib.h
header provides the rand
function which, while being quite simple, satisfies the majority of needs, but sometimes can be tricky to be used correctly.
Before rand
is called for the first time, it must be seeded with srand
. A call to srand
should be performed only once, in the setup phase of the random tests. srand
usually uses the current time as a seed, so authors need to include time.h
before using the time
function.
rand
can return integers only up to RAND_MAX
. There's no standard-compliant way to generate random values of types unsigned int
, long
, or double
. Authors who would like to generate random values out of the domain of rand
have to craft them manually. (TODO: create article with snippets with RNGs for types other than int
)
Additionally, the value of RAND_MAX
might differ on different platforms, or even change. For the current Codewars setup it's 2^31-1
, but there are some common platforms with RAND_MAX
being as small as 2^15-1
. This makes the code using rand
even less portable, and while portability might not be a big concern for Codewars kata, it could turn out to be an issue for users trying to reproduce random tests locally.
An alternative to rand
could be using random devices, like /dev/urandom
. This way of generating random numbers could partially alleviate the issue of the rand
being capped at RAND_MAX
, but also could inflate the amount of the boilerplate code and could cause additional problems with portability.
#
Reference solutionIf the test suite happens to use a reference solution to calculate expected values (which should be avoided when possible), or some kind of reference data like precalculated arrays, etc., it must not be possible for the user to call it, redefine, overwrite or directly access its contents. To prevent this, it should be defined as static
in the tests implementation file.
The reference solution or data must not be defined in the Preloaded code.
#
Input mutationGeneral guidelines for submission tests contain a section related to input mutation and how to prevent users from abusing it to work around kata requirements. Since C does not have reference semantics, it might appear that C kata are not affected by this problem, but it's not completely true. While data is passed to the user solution by value, it indeed cannot be easily modified by the user solution. However, when data is passed indirectly, by a pointer, or as an array, it can be modified even when it's marked as const
. Constness of a function argument can be forcefully cast away by a user and then they would be able to modify values passed as const T*
or as elements of const T[]
. It's usually not a problem in "real world" C programming, but on Codewars, users can take advantage of vulnerable test suites and modify their behavior this way. After calling a user solution, tests should not rely on the state of such values and they should consider them as potentially modified by a user.
#
Calling assertionsCriterion provides a set of useful assertions, but when used incorrectly, they can cause a series of problems:
- Stacktraces of a crashing user solution can reveal details that should not be visible,
- Use of an assertion not suitable for the given case may lead to incorrect test results,
- Incorrectly used assertions may produce confusing or unhelpful messages.
To avoid the above problems, calls to assertion functions should respect the following rules:
- The expected value should be calculated before invoking an assertion. The
expected
parameter passed to the assertion should not be a function call expression, but a value calculated directly beforehand. - Appropriate assertion functions should be used for each given test.
cr_assert_eq
is not suitable in all situations. Usecr_assert_float_eq
for floating point comparisons,cr_assert
for tests on boolean values,cr_assert_str_*
to test strings andcr_assert_arr_*
to test arrays. - Some additional attention should be paid to the order of parameters passed to assertion macros. It differs between various assertion libraries, and it happens to be quite often confused by authors, mixing up
actual
andexpected
in assertion messages. For the C testing framework, the order is(actual, expected)
. - To avoid unexpected crashes in tests, it's recommended to perform some additional assertions before assuming that the answer returned by the user solution has some particular form, or size. For example, if the solution returns a pointer (possibly pointing to an array), an explicit assertion should be added to check whether the returned pointer is valid, and not, for example,
NULL
; the size of the returned array, potentially reported by an output parameter, should be verified before accessing an element which could turn out to be located outside of its bounds. - Default messages produced by assertion macros are confusing, so authors should provide custom messages for failed assertions.
#
TestabilityIn C, not everything can be easily tested. It's not possible to reliably verify the size or bounds of a returned buffer, or the validity of a returned pointer. It's difficult to test for conditions which result in a crash or undefined behavior. It cannot be reliably verified whether there's no memory leaks and if all allocated memory were correctly released. Sometimes the only way is to skip some checks or crash the tests.
#
PreloadedAs C is a quite low-level language, it often requires some boilerplate code to implement non-trivial tests, checks, and assertions. It can be tempting to put some code that would be common to sample tests and submission tests in the Preloaded snippet, but this approach sometimes proves to be problematic (see here why), and can cause some headaches for users who are interested in training on the kata locally, or checking how the user solution is called, etc. It's strongly discouraged to use preloaded code to make the code common for test snippets if it would hide some information or implementation details interesting to the user.
#
Example test suiteBelow you can find an example test suite that covers most of the common scenarios mentioned in this article. Note that it does not present all possible techniques, so actual test suites can use a different structure, as long as they keep to established conventions and do not violate authoring guidelines.
//include headers for Criterion
#include <criterion/criterion.h>
//include all required headers
#include <math.h>
#include <time.h>
#include <stdlib.h>
#include <stddef.h>
#include <stdint.h>
//redeclare the user solution
void square_every_item(double items[], size_t size);
//reference solution defined as static
static void square_every_item_ref(double items[], size_t size)
{
for(size_t i = 0; i<size; ++i)
{
items[i] *= items[i];
}
}
//custom comparer for floating-point values
static int cmp_double_fuzzy_equal(double* a, double* b) {
double diff = *a - *b;
return fabs(diff) < 1e-10 ? 0 : diff;
}
//random test case generator
static void fill_random_array(double array[], size_t size) {
for(size_t i=0; i<size; ++i) {
//use rand to generate doubles
array[i] = (double)rand() / RAND_MAX * 100;
}
}
//helper function
static size_t get_mismatch_position(double actual[], double reference[], size_t size) {
for(size_t i=0; i<size; ++i) {
if(cmp_double_fuzzy_equal(actual+i, reference+i))
return i;
}
return SIZE_MAX;
}
//setup function, called by test suite setup macro below
void setup_random_tests(void) {
srand(time(NULL));
}
//a test case of fixed_tests suite for primary scenario
Test(fixed_tests, example_array) {
double items[5] = { 0.0, 1.1, 2.2, 3.3, 4.4 };
double expected[5] = { 0.0, 1.21, 4.84, 10.89, 19.36 };
square_every_item(items, 5);
//assertion macro suitable for arrays of doubles
cr_assert_arr_eq_cmp(items, expected, 5, cmp_double_fuzzy_equal);
}
//a test case of fixed_tests suite for potential edge case
Test(fixed_tests, empty_array) {
const double dummy = 42.5;
double items[1] = { dummy };
square_every_item(items, 0);
cr_assert_eq(items[0], dummy, "Empty array should not be tampered with.");
}
//setup of the test suite, if necessary
TestSuite(random_tests, .init=setup_random_tests);
//a set of small random tests, with verbose debugging messages
Test(random_tests, small_arrays) {
double input[10];
double actual[10];
double reference[10];
for(int i=0; i<10; ++i) {
//generate test case
size_t array_size = rand() % 10 + 1;
fill_random_array(input, array_size);
//kata requires the input to be mutated, so tests need to copy it, because
//input array is used after calling user and reference solution
memcpy(reference, input, sizeof(double) * array_size);
square_every_item_ref(reference, array_size);
//copy is made from original input, and not from an array fed to
//the reference solution
memcpy(actual, input, sizeof(double) * array_size);
square_every_item(actual, array_size);
//assertion uses custom message to avoid confusing test output
//it also uses data from original, non-mutated input array
size_t invalid_position = get_mismatch_position(actual, reference, array_size);
cr_assert_arr_eq_cmp(actual, reference, array_size, cmp_double_fuzzy_equal,
"Invalid answer at position %zu for input value %f, expected %f but got %f",
invalid_position,
input[invalid_position],
reference[invalid_position],
actual[invalid_position]);
}
}
//a set of large random tests, with not so detailed debugging messages
Test(random_tests, large_arrays) {
double array[1000]; //small enough to be allocated on the stack,
double reference[1000]; //but you can use dynamic memory if necessary.
for(int i=0; i<10; ++i) {
//generate test cases
size_t array_size = rand() % 200 + 801;
fill_random_array(array, array_size);
//since original array is no used after tests, it's enough to create only one copy
memcpy(reference, array, sizeof(double) * array_size);
square_every_item_ref(reference, array_size);
square_every_item(array, array_size);
//assertion uses custom message
cr_assert_arr_eq_cmp(array, reference, array_size, cmp_double_fuzzy_equal, "Invalid answer for arrays of size %zu", array_size);
}
}