CSE320/reference_doc/CSE320_ReferenceDoc.md
2022-04-16 11:33:13 -04:00

37 KiB
Raw Blame History

CSE 320 Reference

NOTE: This document has traditionally been provided (in PDF form) at the beginning of the course; however, it was written in the ancient past and the source was no longer available. This version (in Markdown) has been reverse-engineered from the PDF source, so that it can be updated in the future. The reverse engineering turned up some errors in the original document, and it likely introduced new errors. But now the errors can be corrected if somebody reports them 😃.

Using the Terminal

Great resources for understanding and working with command line:

http://www.ibm.com/developerworks/library/l-lpic1-103-1/

https://learnpythonthehardway.org/book/appendixa.html

GCC

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[]) {
	printf("Hello World!\n");
	return EXIT_SUCCESS;
}

Lines 1 and 2

Lines 1 and 2 are the C preprocessor statements which include function prototypes for some of the functions in the C standard library (aka libc). For now you can just vaguely relate these to the import statements you might find atthe top of a java file.

import java.util.scanner;

The C preprocessor is a very powerful tool and you will learn about it in future assignments. For now, just accept this basic explanation of what these two lines do. The #include directive takes the contents of the .h file and copies it into the .c file before the C compiler actually translates the C code.

:nerd: Files that end in .h are called header files. They typically contain preprocessor macros,function prototypes, struct information, and typedefs.

Line 4

Line 4 is how you describe the main() function of a C program. In C, if you are creating an executable program it must have one and ONLY one main function. It should also be as isolated as possible, if you can (and for this class you should always) have main() in its own .c file. Any main function you write in this course MUST return an integer value (in older textbooks/documentation they might return void; watch out).

This is sort of similar to the main() declaration in Java. In Java, arrays, since they are objects, have various different attributes (e.g. length). C is not an object oriented language and hence arrays contain no such information (arrays in C are very similar to arrays in MIPS). To remedy this issue two arguments are passed: argc, which contains how many elements are in the array and argv, which is an array of strings which contains each of the arguments passed on the command line. Even if no arguments are passed by the user, argv will contain at least one argument which is the name of the binary being executed.

:nerd: If you look through other C programs, you might see that there are quite a few different ways to declare main. In this course you may declare main just as it is in the helloworld example unless specified otherwise in the homework assignment.

😱 It is crucial that there exists exactly one main() function in your whole program. C is not like Java, where you can have a different main in every file and then choose which main you want to run. If you have more than one main when you try to compile it will give you an error. For example, assume you had two files main1.c and main2.c and you tried to compile them both into one program (reasonable thing to do). If both, main1.c and main2.c, have a main function defined in them, when you try to compile it you get the following linker error:

/tmp/cc8eYGEA.o: In function main:
main2.c:(.text+0x0): multiple definition of main
/tmp/ccaaqneq.o:main1.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

This error means that the main function is defined twice within your program. This concept extends to all functions. Two functions CAN NOT have the same name under normal conditions. In addition, function overloading is not allowed in C. Example: Assume you had the file func.c with the following function declarations.

void func(int a);
void func(int a, int b);

This will result in the following error

func.c:5:6:error: conflicting types for func
void func(int a, int b) {
     ^
func.c:1:6: note: previous definition of func was here
void func(int a) {

Line 5

Line 5 is how this program is printing out its values to standard output (stdout). The printf function can be compared to the System.out.printf() function in Java. This function accepts a char* argument known as the format string (assume for now char* is equivalent to the Java String type). This will work fine for when you know ahead of time what you want to print, but what if you want to print a variable?

If you assume C is like Java, you may try to concatenate strings in the following form:

int i = 5;
printf("The value of i is " + i + "\n");

If you try to compile this code, GCC may give you some of the following cryptic error messages:

error: invalid operands to binary + (have char * and char *)

or

warning: format not a string literal and no format arguments [-Wformat-security]

Unfortunately C, does not have string concatenation via the + operator. However, the printf() function also takes a variable number of arguments after the format string. In order to print a variable you have to specify one of many available conversion specifiers (character(s) followed by a % sign). Below is an example of how to print an integer in C.

:nerd: You can view a list of all printf formats here. Alternatively you can use the command man 3 printf in your terminal to view the documentation for printf as well. This is an example of a man page (manual page). Man pages are how most of the library functions in C are documented. You are highly encouraged to utilize them as they are extremely useful and highly beneficial. Man pages are also available online.

The printf function always prints to the filestream known as stdout (standard output). There are three standard streams that are usually available to each program, namely: stdin (standard input), stdout, and stderr (standard error). Prior to *nix, computer programs needed to specify and be connected to a particular I/O device such as magnetic tapes. This made portability nearly impossible. Later in the course we will delve deeper into “files” and how they represent abstract devices in Unix-like operating systems. For now understand that they work muchlike your typical .txt file. They can written to and read from.

Line 6

Line 6 is the end of the main function. The value returned in main is the value that represents the return code of the program. In *nix when a program exits successfully, the value returned is usually zero. When it has some sort of an error, the value is usually a non-zero number. Since these values are defined by programmers and they may be different depending on the system you are using, it is usually best to use the constants EXIT_SUCCESS and EXIT_FAILURE which are defined in stdlib.h for simple cases as they will represent the respective exit codes for each system.

The term *nix is used for describing operating systems that are derived from the Unix operating system (ex. BSD, Solaris) or clones of it (ex. Linux).

Compiling C Code

Begin compiling the following program:

#include<stdio.h>
#include<stdlib.h>

int main(int argc, char* argv[]) {
	printf("Hello World!\n");
	return EXIT_SUCCESS;
}

Navigate on the command line to where the .c file is located. If the file was called helloworld.c, type the following command to compile the program.

$ gcc helloworld.c

The $ is the commandline prompt. Your prompt may differ.

If no messages print, that means there were no errors and the executable was produced. To double check that your program produced a binary you can type the ls command to list all items in the directory.

$ ls
a.out helloworld.c
$

The file a.out is your executable program. To run this program, put a ./ in front of the binary name.

$ ./a.out
Hello World!
$

The ./ has a special meaning. The . translates to the path of the current directory. So if your file was in the cse320 directory on the users desktop then when you type ./a.out this would really translate to the path /home/user/Desktop/cse320/a.out.

Compilation Flags

Modify the helloworld program to sum up the values from 0 to 5.

#include<stdio.h>
#include<stdlib.h>

int main(int argc, char *argv[]) {
	int i, sum;
	for(i = 0; i < 6; i++) {
		sum += i;
	}
	printf("The sum of all integers from 0-5 is: %d\n", sum);
	return EXIT_SUCCESS;
}

Compile and run this program.

$ gcc helloworld2.c
$ ./a.out

The sum of all integers from 0-5 is: 15
$

This program compiled with no errors and even produced the correct result. However, there is a subtle but hazardous bug in this code. The developers of the gcc C compiler have built in some functionalities (enabled by flags) to help programmers find them.

Add the flags -Wall and -Werror to the gcc command when compiling. As so:

$ gcc -Wall -Werror helloworld2.c
helloworld2.c:7:3: error: variable 'sum' is uninitialized when used here
        [-Werror,-Wuninitialized]
    sum += i;
    ^~~
helloworld2.c:5:12: note: initialize the variable 'sum' to silence this warning
    int i, sum;
              ^
               = 0
1 error generated.
$

Depending on your compiler (gcc, clang, etc.) the above error and message may differ. Recent versions of gcc only produce an error when optimization (-O1, -O2, or -O3) is enabled.

The flag -Wall enables warnings for all constructions that some users consider questionable, and that are easy to avoid (or modify to prevent the warning), even in conjunction with macros.

The flag -Werror converts all warnings to errors. Source code which triggers warnings will be rejected.

This error means that the variable sum was used without being initialized. Why does this matter? The C language does not actually specify how the compiler should treat uninitialized variables. Implementations of the C compiler may zero them out for you, but really there is no specification of how this situation should be handled. This can lead to undefined behavior and cause the program to work one way one system and differently on other systems. To fix this error, simply initialize the variable sum to the value desired (0).

#include<stdio.h>
#include<stdlib.h>

int main(int argc, char *argv[]) {
	int i, sum = 0;
	for(i = 0; i < 6; i++) {
		sum += i;
	}
	printf("The sum of all integers from 0-5 is: %d\n", sum);
	return EXIT_SUCCESS;
}

Compile the program again and you should no longer see any errors.

$ gcc -Wall -Werror helloworld2.c
$ ./a.out
The sum of all integers from 0-5 is: 15
$

😱 In this class, you MUST ALWAYS compile your assignments with the flags -Wall -Werror. This will help you locate mistakes in your program and the grader will compile your assignment withthese flags as well. Consider this your warning, -Wall -Werror are necessary. Do not progress through your assignment without using these flags and attempt to fix the errors they highlight last minute.

GNU Make and Makefiles

As you program more in C, you will continue to add more flags and more files to your programs. To type these commands over and over again will eventually become an error laden chore. Also as you add more files, if you rebuild every file every time, even if it didnt change, it will take a long time to compile your program. To help alleviate this issue build tools were created. One such tool is GNU Make (you will be required to use Make in this class). Make itself has lots of options and features that can be configured. While mastering Make is not required from this class, you will probably want to learn how to make simple changes to what we supply.

Refer here for a great Makefile tutorial and information resource. You will always be provided with a working makefile, this is provided for extended learning.

http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/

Header Files

There are some coding practices that you should become familiar with in C from the beginning. The C compiler reads through your code once and only once. This means all functions and variables you use must be declared in advance of their usage or the compiler will not know how to compile and exit with errors. This is why we have header files, we declare all of our function prototypes in a .h file and #include it in our .c file. This is so we can write the body of our functions in any order and call them in any order we please.

A header file is just a file which ends in the .h extension. Typically you declare function prototypes, define struct and union types, #include other header files, #define constants and macros, and typedef. Some header files also expose global variables, but this is strongly discouraged as it can cause compilation errors.

When you define function prototypes in a .h file, you can then define the body of the function inside of any .c file. Though typically, if the header file was called example.h, we would define the functions in example.c. If we were producing a massive library like stdlibc, you may instead declare all the function prototypes in a single header file but put each function definition in its own file. Its all a preference, but these are two common practices. You should never be defining function bodies in the header though, this will just cause you issues later.

There are two ways to specify where the include directive looks for header files. If you use <>, when the preprocessor encounters the include statement it will look for the file in a predefined location on your system (usually /usr/include). If you use "", the preprocessor will look in the current directory of the file being processed. Typically system and library headers are included using <>, and custom headers that you have made for your program are included using "".

Header file example

#include<stdio.h>
#include<stdlib.h>
#include<stdlib.h>

#define TRUE 1
#define FALSE 0

struct student {
	char *first_name;
	char *last_name;
	int age;
	float gpa;
};

int foo(int a, int b);
void bar(void);
#include"example.h"

int main(int argc, char *argv[]){
	bar();
	return EXIT_SUCCESS;
}

void bar(void){
	printf("foo: %d", foo(2, 3));
}

int foo(int a, int b) {
	return a * b;
}

Header Guard a.k.a Include Guard

While using header files solves one issue, they create issues of their own. What if multiple files include the same header file? What if header file A includes header file B, and header file B includes header file A? If we keep including the same header file multiple times, this will make our source files larger than needed and slow down the compilation process. It may also cause errors if there are variables declared in the code. If two files keep including each other how does the compiler know when to stop? To prevent such errors one must utilize header guards. The header guard is used to prevent double and cyclic inclusion of a header file.

Header Guard example

In grandparent.h:

struct foo {
    int member;
};

In parent.h:

#include "grandparent.h"

In child.h:

#include "grandparent.h"
#include "parent.h"

The linker will create a temporary file that has literal copies of the foo definition twice and this will create a compiler error since the compiler does not know which definition takes precedence. The fix:

In grandparent.h:

#ifndef GRANDFATHER_H
#define GRANDFATHER_H
struct foo {
    int member;
};
#endif

In parent.h:

#include "grandparent.h"

In child.h:

#include "grandparent.h"
#include "parent.h"

ifndef, #define, #endif are preprocessor macros that prevent the double inclusion. This is because when the father.h file includes grandfather.h for the second time the #ifndef macro returns false so the second definition for foo is never included. Read here for more information.

You should always use header files and guards in your assignments. Newer compilers now support what is known as #pragma once. This directive performs the same operation as the header guard, but it may not be a cross platform solution when considering older machines.

Directory Structure

To help with a clear and consistent structure to your programs, you can use the following directory structure. This is a common directory structure for projects in C.

.
├── Makefile
├── include
│   ├── debug.h
│   └── func.h
└── src
    ├── main.c
    └── func.c

😱 You will be REQUIRED to follow this structure for ALL the homework assignments for this class. Failure to do so will result in a ZERO.

Datatype Sizes

Depending on the system and the underlying architecture, which can have different word sizes etc., datatypes can have various different sizes. In a language like Java, much of these issues are hidden from the programmer. The JVM creates another layer of abstraction which can allow the programmer to believe all datatypes are of same size no matter the underlying architecture. C, on the other hand, does not have this luxury. The programmer has to consider everything about the system being worked on. To make programs cross platform, code and logic needs to be tested, comparing results and output, and altered accordingly.

C lacks the ability to add new datatypes to its specification. Instead, it works with models known as LP64, ILP64, LLP64, ILP32, and LP32. The I stands for INT, the L stands for LONG and the P stands for POINTER. The number after the letters describes the maximum bit size of the data types.

The typical sizes of these models are described below in the following table (in bits):

TABLE WAS MISSING IN ORIGINAL -- NEED TO RECONSTRUCT!

Notice that the size of an integer on one machine could be different from that on another machine depending on which model the machine runs. To prove this to yourself, use the special operator in the C language known as sizeof. The operator sizeof will tell you the size of a specific datatype in bytes. As an exercise, you should create the following program and run it in your development environment and on a system with a different underlying architecture (such as 'Sparky') and compare the results.

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
	/* Basic data types */
	printf("=== Basic Data Types ===\n");
	printf("short: %lu bytes\n", sizeof(short));
	printf("int: %lu bytes\n", sizeof(int));
	printf("long: %lu bytes\n", sizeof(long));
	printf("long long: %lu bytes\n", sizeof(long long));
	printf("char: %lu byte(s)\n", sizeof(char));
	printf("double: %lu bytes\n", sizeof(double));
	/* Pointers */ printf("=== Pointers ===\n");
	printf("char*: %lu bytes\n", sizeof(char*));
	printf("int*: %lu bytes\n", sizeof(int*));
	printf("long*: %lu bytes\n", sizeof(long*));
	printf("void*: %lu bytes\n", sizeof(void*));
	printf("double*: %lu bytes\n", sizeof(double*));
	/* Special value - This may have undefined results... why? */
	printf("=== Special Data Types ===\n");
	printf("void: %lu byte(s)\n", sizeof(void));
	return EXIT_SUCCESS;
}

To further illustrate why this is a problem, consider the following program.

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
	// 0x200000000 -> 8589934592 in decimal
	long value = strtol("200000000", NULL, 16);
	printf("value: %ld\n", value);
	return EXIT_SUCCESS;
}

In libc, there exists a header stdint.h which has special types defined to make sure that if you use them, nomatter what system you are on, it can guarantee that they are the correct size.

Endianness

When dealing with multi byte values and different architectures, the endianness of each architecture should also be taken into account. There are many ways to detect what endianness your machine is, for example:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
	unsigned int i = 1;
    char *c = (char*)&i; // Convert the LSB into a character
    if(*c) {
		printf("little endian\n");
    } else {
		printf("big endian\n");
    }
	return EXIT_SUCCESS;
}

Can you think of why this works? Could you explain it if asked on an exam?

Assembly

During the compilation process, a C program is translated to an assembly source file. This is important because it is possible that something which has great performance in one system could have terrible performance in another with the exact same C implementation, in this case, the programmer has to inspect the assembly code for more information.

Example:

// asm.c
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
int main(int argc, char *argv[]) {
	char buffer[1024];
	// Get user input
	fgets(buffer, 1024, stdin);
	int64_t value = strtoll(buffer, NULL, 10);
	printf("You entered %" PRId64 "\n", value);
	return EXIT_SUCCESS;
}

Test the program with 32-bit binaries vs 64-bit binaries. To be able to compile a 32-bit binary on a 64-bit machine, utilize the -m32 flag provided by gcc-multilib (installed during HW0). Here is how to compile each program respectively:

$ gcc -Wall -Werror -m32 asm.c -o 32.out
$ gcc -Wall -Werror -m64 asm.c -o 64.out

Run each program and you should see this output:

$ ./64.out
75
You entered 75
$ ./32.out
75
You entered 75
> 75 is a value that is entered by the user. You can enter any number you choose.

Notice, even though both programs are compiled for different architectures, they still produce the same results.These programs are assembled using different instruction sets though. To see this compile the programs with the -S flag. This flag will store the intermediate assembly of the program in a .s file.

For the 64-bit program run:

$ gcc -Wall -Werror -m64 -S asm.c

Take a look at asm.s which was just generated in the current working directory.

# x86-64 assembly for asm.c
	.file "asm.c"
	.section .rodata
.LC0:
	.string "You entered %ld\n"
	.text .globl main
	.type main, @function
main:
.LFB2:
	.cfi_startproc
	pushq %rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq %rsp, %rbp
	.cfi_def_cfa_register 6
	subq $1072, %rsp
	movl %edi, -1060(%rbp)
	movq %rsi, -1072(%rbp)
	movq %fs:40, %rax
	movq %rax, -8(%rbp)
	xorl %eax, %eax
	movq stdin(%rip), %rdx
	leaq -1040(%rbp), %rax
	movl $1024, %esi
	movq %rax, %rdi
	call fgets
	leaq -1040(%rbp), %rax
	movl $10, %edx
	movl $0, %esi
	movq %rax, %rdi
	call strtoll
	movq %rax, -1048(%rbp)
	movq -1048(%rbp), %rax
	movq %rax, %rsi
	movl $.LC0, %edi
	movl $0, %eax
	call printf
	movl $0, %eax
	movq -8(%rbp), %rcx
	xorq %fs:40, %rcx
	je .L3
	call __stack_chk_fail
.L3:
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE2:
	.size main, .-main
	.ident "GCC: (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010"
	.section .note.GNU-stack,"",@progbits

Now compile it for x86 using the following command:

$ gcc -Wall -Werror -m32 -S asm.c

Again, take a look at asm.s which was just generated in current working directory.

# x86 assembly for asm.c
	.file "asm.c"
	.section .rodata
.LC0:
	.string "You entered %lld\n"
	.text .globl main
	.type main, @function
main:.LFB2:
	.cfi_startproc
	leal 4(%esp), %ecx
	.cfi_def_cfa 1, 0
	andl $-16, %esp
	pushl -4(%ecx)
	pushl %ebp
	.cfi_escape 0x10,0x5,0x2,0x75,0
	movl %esp, %ebp
	pushl %ecx
	.cfi_escape 0xf,0x3,0x75,0x7c,0x6
	subl $1060, %esp
	movl %ecx, %eax
	movl 4(%eax), %eax
	movl %eax, -1052(%ebp)
	movl %gs:20, %eax
	movl %eax, -12(%ebp)
	xorl %eax, %eax
	movl stdin, %eax
	subl $4, %esp
	pushl %eax
	pushl $1024
	leal -1036(%ebp), %eax
	pushl %eax
	call fgets
	addl $16, %esp
	subl $4, %esp
	pushl $10
	pushl $0
	leal -1036(%ebp), %eax
	pushl %eax
	call strtoll
	addl $16, %esp
	movl %eax, -1048(%ebp)
	movl %edx, -1044(%ebp)
	subl $4, %esp
	pushl -1044(%ebp)
	pushl -1048(%ebp)
	pushl $.LC0
	call printf
	addl $16, %esp
	movl $0, %eax
	movl -12(%ebp), %edx
	xorl %gs:20, %edx
	je .L3
	call __stack_chk_fail
.L3:
	movl -4(%ebp), %ecx
	.cfi_def_cfa 1, 0
	leave
	.cfi_restore 5
	leal -4(%ecx), %esp
	.cfi_def_cfa 4, 4
	ret
	.cfi_endproc
.LFE2:
	.size main, .-main
	.ident "GCC: (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010"
	.section .note.GNU-stack,"",@progbits

Additionally you can log into sparky, and use the C compiler on that machine. It will generate 32-bit SPARC assembly.

$ gcc -Wall -Werror -S asm.c
# 32-bit SPARC assembly
        .file   "asm.c"
        .section        ".rodata"
        .align 8
.LLC0:
        .asciz  "You entered %lld\n"
        .section        ".text"
        .align 4
        .global main
        .type   main, #function
        .proc   04
main:
        save    %sp, -1128, %sp
        st      %i0, [%fp+68]
        st      %i1, [%fp+72]
        add     %fp, -1032, %g1
        mov     %g1, %o0
        mov     1024, %o1
        sethi   %hi(__iob), %g1
        or      %g1, %lo(__iob), %o2
        call    fgets, 0
        nop
        add     %fp, -1032, %g1
        mov     %g1, %o0
        mov     0, %o1
        mov     10, %o2
        call    strtoll, 0
        nop
        std     %o0, [%fp-8]
        sethi   %hi(.LLC0), %g1
        or      %g1, %lo(.LLC0), %o0
        ld      [%fp-8], %o1
        ld      [%fp-4], %o2
        call    printf, 0
         nop
        mov     0, %g1
        mov     %g1, %i0
        return  %i7+8
         nop
        .size   main, .-main
        .ident  "GCC: (GNU) 4.9.1"

Assembly Analysis

The assembly generated for a particular architecture varies greatly even though it all accomplishes the exact same task on each system. Notice that the SPARC assembly is shorter than the other two (40 lines for SPARC, 67 lines for x86, and 51 lines for x86-64) and that the registers used are different in all three examples.

Take a look at how the format string in the printf call got translated:

printf("You entered %" PRId64 "\n", value);
.string "You entered %ld\n"  # x86-64; 64-bits
.string "You entered %lld\n" # x86; 32-bits
.asciz  "You entered %lld\n" # SPARC; 32-bits

See that PRId64 got translated to different formats: %ld and %lld. This is because the int64_t is translated to different types depending on the platform to guarantee that it is at least 64-bits wide. In the SPARC code, notice thatthere are nop instructions after the call to printf, strtoll, fgets, and return. This is because of a technique known as delayed branching used in the SPARC architecture.

In the x86 assembly, notice subl and pushl instructions which are used to manipulate the stack before calling functions. These instructions are absent from the x86-64 example. This is because x86 architecture has half the amount of registers as x86-64 architectures so the convention is to push arguments for a function call to the stack to compensate for this. At the core, the Application Binary Interface differs between the systems. There are also various other differences that cant be seen by looking at the assembly such as variable sized instruction formats, but, in general, you should just be aware that any C code gets translated very differently depending on the machine.

Preprocessor

Sometimes the easiest way to see what is happening in your program is to just use print statements. This is a method that everyone can do (and we know how to do!). However, we shouldnt just put printf all over our program. We do not always want to see these print outs (way too much information for normal operation) and we dont want to have to comment/uncomment lines constantly.

One possible solution to this is passing a command line argument that turns debugging on and off. This might be an acceptable solution but it will clutter our code with lots of if statements to check if debugging is enabled or not, make our binary larger when we dont want debugging enabled, etc. Instead we will use some preprocessor tricks to give us some logging statements when we compile with the flag -DDEBUG. When we compile without the flag -DDEBUG, none of these debugging statements will be printed.

We have defined in the given Makefile a debug target. This compiles your program with the -DDEBUG flag and -g, the latter of which is necessary for gdb to work. You can simply run:

$ make clean debug

as opposed to make clean all to set your program up for debugging.

Create a new header called debug.h and we can define each of these macros in this header and use them in main() by adding #include "debug.h" to main.c.

debug.h:

#ifndef DEBUG_H
#define DEBUG_H
#include<stdlib.h>
#include<stdio.h>

#define debug(msg) printf("DEBUG: %s", msg)

#endif

Then in your program use the debug macro

main.c:

#include "debug.h"

int main(int argc, char *argv[]) {
	debug("Hello, World!\n");
	return EXIT_SUCCESS;
}

Then compile your program and run it.

$ make clean all
$ bin/hw1
DEBUG: Hello, World!

Great! You just created your first preprocessor macro. Unfortunately this is no better than just adding a print statement. Let's fix that!

The preprocessor has #if, #elif, and #else directives that that we can use to control what gets added during compilation. (Also #endif for completing an if/else block) Let's create an if directive that will include a section of code if DEBUG is defined within the preprocessor.

debug.h:

#ifndef DEBUG_H
#define DEBUG_H
#include<stdlib.h>
#include<stdio.h>

#define debug(msg) printf("DEBUG: %s", msg)

#endif

main.c:

#include "debug.h"

int main(int argc, char *argv[]) {
	#ifdef DEBUG
		debug("Debug flag was defined\n");
	#endif
	printf("Hello, World!\n");
	return EXIT_SUCCESS;
}

When we compile this program it will check to see if #define DEBUG was defined in our program. Let's test this out.

$ make clean all
$ bin/hw1
Hello, World!

Cool the debug message didnt print out. Now let's define DEBUG during the compilation process, and run the program again.

$ make clean debug
$ bin/hw1
DEBUG: Debug flag was defined
Hello, World!

Here you can see that debug was defined so that extra code between #ifdef DEBUG and #endif was included. This technique will work for certain situations, but if we have a lot of logging messages in our program this will quickly clutter our code and make it unreadable. Fortunately we can do better.

Instead of doing #ifdef DEBUG all over our program we can instead do #ifdef DEBUG around our #define debug macro.

debug.h:

#ifndef DEBUG_H
#define DEBUG_H
#include<stdlib.h>
#include<stdio.h>

#if DEBUG
	#define debug(msg) printf("DEBUG: %s", msg)
#endif

#endif

main.c:

#include"debug.h"

int main(int argc, char *argv[]) {
	debug("Debug flag was defined\n");
	printf("Hello, World!\n");
	return EXIT_SUCCESS;
}

There is an issue with this, but let's try to compile the program.

$ make clean debug
$ bin/hw1
DEBUG: Debug flag was defined
Hello, World!

Cool it works. Now let's try to compile it without defining -DDEBUG.

$ make clean all
/tmp/cc6F04VW.o: In function `main':
debug.c:(.text+0x1a): undefined reference to `debug'
collect2: error: ld returned 1 exit status

Whoops. What happened here? Well when we used -DDEBUG the debug macro was defined, so it worked as expected. When we dont compile with -DDEBUG the #define debug is never declared in our file so it is never substituted in our program. Since we used debug in the middle of our code the preprocessor and compiler have no idea what debug symbol is so it fails. Luckily this is easy to fix. We simply have to add another case to our preprocessor if, else statement to handle this case.

debug.h:

#ifndef DEBUG_H
#define DEBUG_H
#include<stdlib.h>
#include<stdio.h>

#if DEBUG
	#define debug(msg) printf("DEBUG: %s", msg)
#else
	#define debug(msg)
#endif

#endif

main.c:

#include"debug.h"

int main(int argc, char *argv[]) {
	debug("Debug flag was defined\n");
	printf("Hello, World!\n");
	return EXIT_SUCCESS;
}

Here we tell the preprocessor to replace any occurrences of debug(msg) with nothing, so now when we dont compile with -DDEBUG. The preprocessor simply replaces debug("Debug flag was defined\n") with an empty space. Let's compile again.

$ make clean all
$ bin/hw1
Hello, World!

Cool. Now we can embed debug macros all over our program that look like normal functions. Theres still a few more cool tricks we can do to make this better.The preprocessor has a few special macros defined called __LINE__, __FILE__, and __FUNCTION__. These macros will be replaced by the preprocessor to evaluate to the line number where the macro is called, the file name that the macro is called in, and the function name that the macro is called in. Let's play with this a bit.

debug.h:

#ifndef DEBUG_H
#define DEBUG_H
#include<stdlib.h>
#include<stdio.h>

#ifdef DEBUG
	#define debug(msg) printf("DEBUG: %s:%s:%d %s", __FILE__, __FUNCTION__, __LINE__,msg)
#else
	#define debug(msg)
#endif

#endif

main.c:

#include"debug.h"
int main(int argc, char *argv[]) {
	debug("Debug flag was defined\n");
	printf("Hello, World!\n");
	return EXIT_SUCCESS;
}

Let's compile this program and run.

$ make clean debug
$ bin/hw1
DEBUG: debug.c:main:11 Debug flag was defined
Hello, World!

As you can see all the __FILE__, __FUNCTION__, and __LINE__ were replaced with the corresponding values for when debug was called in the program. Pretty cool, but we can still do even better! Normally when we want to print something we use printf() and use the format specifiers and variable arguments to print useful information. With our current setup though we cant do that. Fortunately for us the preprocessor offers up a __VA_ARGS__ macro which we can use to accomplish this.

I want to point out that the syntax for this gets a bit crazy and hard to understand (complex preprocessor stuff is a bit of a black art). Ill try my best to describe it but you may need to do some more googling if the below explanation is not sufficient.

#ifndef DEBUG_H
#define DEBUG_H
#include <stdlib.h>
#include <stdio.h>

#ifdef DEBUG
	#define debug(fmt, ...) printf("DEBUG: %s:%s:%d " fmt, __FILE__, __FUNCTION__,__LINE__, ##__VA_ARGS__)
#else
	#define debug(fmt, ...)
#endif

#endif

#include"debug.h"

int main(int argc, char *argv[]) {
	debug("Program has %d args\n", argc);
	printf("Hello, World!\n");
	return EXIT_SUCCESS;
}

First let's compile and run the program and see the results.

$ make clean debug
$ bin/hw1
DEBUG: debug.c:main:11 Program has 1 args
Hello, World!
$ make clean all
$ bin/hw1
Hello, World!

The macro works as expected, but let's try to explain it a bit.

First we changed the definition of the macro to be #define debug(fmt, ...). The first argument fmt is the format string that we normally define for printf and ... is the way to declare a macro that accepts a variable number of arguments.

Next we have "DEBUG: %s:%s:%d " fmt. The C compiler can concatenate string literals that are next to each other. So if fmt was the string "crazy %d concatenation" then this statements evaluates to "DEBUG:%s:%s:%d crazy %d concatenation". Then we have our predefined preprocessor macros that are used for the string "DEBUG: %s:%s:%d ", and then we reach this next confusing statement: , ##__VA_ARGS__. The macro __VA_ARGS__ will expand into the variable arguments provided to the debug statement, but then we have this crazy , ##. This is a hack for allowing no arguments to be passed to the debug macro, Ex. debug("I have no varargs"). If we didnt do this, the previous debug statement would throw an warning/error during the compilation process as it would expect a __VA_ARGS__ value.

This is one of the many interesting things we can use the C preprocessor for. Lastly preprocessor macros are in-text replacement before compilation, this can mean dangerous things when we are careless about how we use them. For example it is customary to never put a ; inside a macro definition since most programers would put a semicolon after the macro as they would most statements. Some programmers like to wrap the code in macros with a do{ /*some code here */ } while(false) loop. They do this because if your macro is made up of multiple statements, it will force you to add ; to all the statements in the do while loop. Then you still have to terminate this macro with a ; when you use it which makes it seem like a normal function in your C code.

Our final product will look like this:

#ifndef DEBUG_H
#define DEBUG_H
#include <stdlib.h>
#include <stdio.h>

#ifdef DEBUG
	#define debug(fmt, ...) do{printf("DEBUG: %s:%s:%d " fmt, __FILE__, __FUNCTION__,__LINE__, ##__VA_ARGS__)}while(0)
#else
	#define debug(fmt, ...)
#endif

#endif