1281 lines
37 KiB
Markdown
1281 lines
37 KiB
Markdown
|
# CSE 320 Reference
|
|||
|
|
|||
|
**NOTE: This document has traditionally been provided (in PDF form) at the beginning
|
|||
|
of the course; however, it was written in the ancient past and the source was no longer
|
|||
|
available. This version (in Markdown) has been reverse-engineered from the PDF source,
|
|||
|
so that it can be updated in the future. The reverse engineering turned up some errors
|
|||
|
in the original document, and it likely introduced new errors. But now the errors can
|
|||
|
be corrected if somebody reports them :smiley:.**
|
|||
|
|
|||
|
## Using the Terminal
|
|||
|
|
|||
|
Great resources for understanding and working with command line:
|
|||
|
|
|||
|
[http://www.ibm.com/developerworks/library/l-lpic1-103-1/](http://www.ibm.com/developerworks/library/l-lpic1-103-1/)
|
|||
|
|
|||
|
[https://learnpythonthehardway.org/book/appendixa.html](https://learnpythonthehardway.org/book/appendixa.html)
|
|||
|
|
|||
|
## GCC
|
|||
|
|
|||
|
```c
|
|||
|
#include <stdio.h>
|
|||
|
#include <stdlib.h>
|
|||
|
|
|||
|
int main(int argc, char* argv[]) {
|
|||
|
printf("Hello World!\n");
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
### Lines 1 and 2
|
|||
|
|
|||
|
Lines 1 and 2 are the C **preprocessor** statements which include
|
|||
|
**function prototypes** for some of the functions in the **C standard library**
|
|||
|
(aka libc). For now you can just vaguely relate these to the `import`
|
|||
|
statements you might find atthe top of a java file.
|
|||
|
|
|||
|
```java
|
|||
|
import java.util.scanner;
|
|||
|
```
|
|||
|
|
|||
|
The C preprocessor is a very powerful tool and you will learn about it
|
|||
|
in future assignments. For now, just accept this basic explanation of
|
|||
|
what these two lines do. The `#include` directive takes the contents of
|
|||
|
the `.h` file and copies it into the `.c` file before the C compiler
|
|||
|
actually translates the C code.
|
|||
|
|
|||
|
> :nerd: Files that end in .h are called header files. They typically
|
|||
|
contain preprocessor macros,function prototypes, **struct information**,
|
|||
|
and **typedefs**.
|
|||
|
|
|||
|
### Line 4
|
|||
|
|
|||
|
Line 4 is how you describe the `main()` function of a C program. In C,
|
|||
|
if you are creating an executable program it must have one and ONLY one
|
|||
|
main function. It should also be as isolated as possible, if you can
|
|||
|
(and for this class you should always) have `main()` in its own `.c`
|
|||
|
file. Any main function you write in this course MUST return an integer
|
|||
|
value (in older textbooks/documentation they might return `void`; watch
|
|||
|
out).
|
|||
|
|
|||
|
This is sort of similar to the `main()` declaration in Java. In Java,
|
|||
|
arrays, since they are objects, have various different attributes (*e.g.*
|
|||
|
length). C is not an object oriented language and hence arrays contain
|
|||
|
no such information (arrays in C are very similar to arrays in
|
|||
|
MIPS). To remedy this issue two arguments are passed: `argc`,
|
|||
|
which contains how many elements are in the array and `argv`, which is an
|
|||
|
array of strings which contains each of the arguments passed on the
|
|||
|
command line. Even if no arguments are passed by the user, `argv` will
|
|||
|
contain at least one argument which is the name of the binary being
|
|||
|
executed.
|
|||
|
|
|||
|
> :nerd: If you look through other C programs, you might see that
|
|||
|
there are quite a few different ways to declare `main`. In this course
|
|||
|
you may declare `main` just as it is in the `helloworld` example unless
|
|||
|
specified otherwise in the homework assignment.
|
|||
|
|
|||
|
> :scream: It is crucial that there exists exactly one `main()` function
|
|||
|
in your whole program. C is not like Java, where you can have a
|
|||
|
different main in every file and then choose which main you want to
|
|||
|
run. If you have more than one main when you try to compile it will
|
|||
|
give you an error. For example, assume you had two files `main1.c` and
|
|||
|
`main2.c` and you tried to compile them both into one program
|
|||
|
(reasonable thing to do). If both, `main1.c` and `main2.c`, have a main
|
|||
|
function defined in them, when you try to compile it you get the
|
|||
|
following linker error:
|
|||
|
|
|||
|
```
|
|||
|
/tmp/cc8eYGEA.o: In function ‘main’:
|
|||
|
main2.c:(.text+0x0): multiple definition of ‘main’
|
|||
|
/tmp/ccaaqneq.o:main1.c:(.text+0x0): first defined here
|
|||
|
collect2: error: ld returned 1 exit status
|
|||
|
```
|
|||
|
|
|||
|
This error means that the main function is defined twice within your
|
|||
|
program. This concept extends to all functions. Two functions *CAN NOT*
|
|||
|
have the same name under normal conditions. In addition, function
|
|||
|
overloading is not allowed in C. Example: Assume you had the file
|
|||
|
func.c with the following function declarations.
|
|||
|
|
|||
|
```c
|
|||
|
void func(int a);
|
|||
|
void func(int a, int b);
|
|||
|
```
|
|||
|
|
|||
|
This will result in the following error
|
|||
|
|
|||
|
```
|
|||
|
func.c:5:6:error: conflicting types for ‘func’
|
|||
|
void func(int a, int b) {
|
|||
|
^
|
|||
|
func.c:1:6: note: previous definition of ‘func’ was here
|
|||
|
void func(int a) {
|
|||
|
```
|
|||
|
|
|||
|
### Line 5
|
|||
|
|
|||
|
Line 5 is how this program is printing out its values to standard
|
|||
|
output (stdout). The printf function can be compared to the
|
|||
|
System.out.printf() function in Java. This function accepts a char*
|
|||
|
argument known as the format string (assume for now char* is equivalent
|
|||
|
to the Java String type). This will work fine for when you know ahead
|
|||
|
of time what you want to print, but what if you want to print a
|
|||
|
variable?
|
|||
|
|
|||
|
If you assume C is like Java, you may try to concatenate strings in
|
|||
|
the following form:
|
|||
|
|
|||
|
```java
|
|||
|
int i = 5;
|
|||
|
printf("The value of i is " + i + "\n");
|
|||
|
```
|
|||
|
|
|||
|
If you try to compile this code, GCC may give you some of the
|
|||
|
following cryptic error messages:
|
|||
|
|
|||
|
```
|
|||
|
error: invalid operands to binary + (have ‘char *’ and ‘char *’)
|
|||
|
```
|
|||
|
|
|||
|
or
|
|||
|
|
|||
|
```
|
|||
|
warning: format not a string literal and no format arguments [-Wformat-security]
|
|||
|
```
|
|||
|
|
|||
|
Unfortunately C, does not have string concatenation via the +
|
|||
|
operator. However, the `printf()` function also takes a variable number
|
|||
|
of arguments after the format string. In order to print a variable you
|
|||
|
have to specify one of many available **conversion specifiers**
|
|||
|
(character(s) followed by a % sign). Below is an example of how to
|
|||
|
print an integer in C.
|
|||
|
|
|||
|
> :nerd: You can view a list of all printf formats here. Alternatively
|
|||
|
you can use the command `man 3 printf` in your terminal to view the
|
|||
|
documentation for printf as well. This is an example of a man
|
|||
|
page (manual page). Man pages are how most of the library functions in
|
|||
|
C are documented. You are highly encouraged to utilize them as they are
|
|||
|
extremely useful and highly beneficial. Man pages are also available
|
|||
|
online.
|
|||
|
|
|||
|
The printf function always prints to the filestream known as `stdout`
|
|||
|
(standard output). There are three **standard streams** that are usually
|
|||
|
available to each program, namely: `stdin` (standard input), `stdout`, and
|
|||
|
`stderr` (standard error). Prior to `*nix`, computer programs needed to
|
|||
|
specify and be connected to a particular I/O device such as magnetic
|
|||
|
tapes. This made portability nearly impossible. Later in the course we
|
|||
|
will delve deeper into “files” and how they represent abstract devices
|
|||
|
in Unix-like operating systems. For now understand that they work
|
|||
|
muchlike your typical .txt file. They can written to and read from.
|
|||
|
|
|||
|
### Line 6
|
|||
|
|
|||
|
Line 6 is the end of the main function. The value returned in main is
|
|||
|
the value that represents the return code of the program. In `*nix` when
|
|||
|
a program exits successfully, the value returned is usually zero. When
|
|||
|
it has some sort of an error, the value is usually a non-zero
|
|||
|
number. Since these values are defined by programmers and they may
|
|||
|
be different depending on the system you are using, it is usually best
|
|||
|
to use the constants `EXIT_SUCCESS` and `EXIT_FAILURE` which are defined in
|
|||
|
`stdlib.h` for simple cases as they will represent the respective exit
|
|||
|
codes for each system.
|
|||
|
|
|||
|
> The term `*nix` is used for describing operating systems that are
|
|||
|
derived from the *Unix* operating system (ex. BSD, Solaris) or clones of
|
|||
|
it (ex. Linux).
|
|||
|
|
|||
|
## Compiling C Code
|
|||
|
|
|||
|
Begin compiling the following program:
|
|||
|
|
|||
|
```c
|
|||
|
#include<stdio.h>
|
|||
|
#include<stdlib.h>
|
|||
|
|
|||
|
int main(int argc, char* argv[]) {
|
|||
|
printf("Hello World!\n");
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
Navigate on the command line to where the `.c` file is located. If the
|
|||
|
file was called `helloworld.c`, type the following command to compile the
|
|||
|
program.
|
|||
|
|
|||
|
```
|
|||
|
$ gcc helloworld.c
|
|||
|
```
|
|||
|
|
|||
|
> The `$` is the commandline prompt. **Your prompt may differ**.
|
|||
|
|
|||
|
If no messages print, that means there were no errors and the
|
|||
|
executable was produced. To double check that your program produced a
|
|||
|
binary you can type the `ls` command to list all items in the directory.
|
|||
|
|
|||
|
```
|
|||
|
$ ls
|
|||
|
a.out helloworld.c
|
|||
|
$
|
|||
|
```
|
|||
|
|
|||
|
The file **`a.out`** is your executable program. To run this program,
|
|||
|
put a `./` in front of the binary name.
|
|||
|
|
|||
|
```
|
|||
|
$ ./a.out
|
|||
|
Hello World!
|
|||
|
$
|
|||
|
```
|
|||
|
|
|||
|
> The `./` has a special meaning. The `.` translates to the path of the
|
|||
|
current directory. So if your file was in the cse320 directory on the
|
|||
|
user’s desktop then when you type `./a.out` this would really
|
|||
|
translate to the path `/home/user/Desktop/cse320/a.out`.
|
|||
|
|
|||
|
## Compilation Flags
|
|||
|
|
|||
|
Modify the `helloworld` program to sum up the values from 0 to 5.
|
|||
|
|
|||
|
```c
|
|||
|
#include<stdio.h>
|
|||
|
#include<stdlib.h>
|
|||
|
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
int i, sum;
|
|||
|
for(i = 0; i < 6; i++) {
|
|||
|
sum += i;
|
|||
|
}
|
|||
|
printf("The sum of all integers from 0-5 is: %d\n", sum);
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
Compile and run this program.
|
|||
|
|
|||
|
```
|
|||
|
$ gcc helloworld2.c
|
|||
|
$ ./a.out
|
|||
|
|
|||
|
The sum of all integers from 0-5 is: 15
|
|||
|
$
|
|||
|
```
|
|||
|
|
|||
|
This program compiled with no errors and even produced the correct
|
|||
|
result. However, there is a subtle but hazardous bug in this code. The
|
|||
|
developers of the **gcc C compiler** have built in some functionalities
|
|||
|
(enabled by flags) to help programmers find them.
|
|||
|
|
|||
|
Add the flags `-Wall` and `-Werror` to the `gcc` command when compiling. As so:
|
|||
|
|
|||
|
```
|
|||
|
$ gcc -Wall -Werror helloworld2.c
|
|||
|
helloworld2.c:7:3: error: variable 'sum' is uninitialized when used here
|
|||
|
[-Werror,-Wuninitialized]
|
|||
|
sum += i;
|
|||
|
^~~
|
|||
|
helloworld2.c:5:12: note: initialize the variable 'sum' to silence this warning
|
|||
|
int i, sum;
|
|||
|
^
|
|||
|
= 0
|
|||
|
1 error generated.
|
|||
|
$
|
|||
|
```
|
|||
|
|
|||
|
> Depending on your compiler (gcc, clang, etc.) the above error and
|
|||
|
message may differ. Recent versions of gcc only produce an error when
|
|||
|
optimization (`-O1`, `-O2`, or `-O3`) is enabled.
|
|||
|
|
|||
|
> The flag `-Wall` enables warnings for all constructions that some users
|
|||
|
consider questionable, and that are easy to avoid (or modify to prevent
|
|||
|
the warning), even in conjunction with macros.
|
|||
|
|
|||
|
> The flag `-Werror` converts all warnings to errors. Source code
|
|||
|
> which triggers warnings will be rejected.
|
|||
|
|
|||
|
This error means that the variable `sum` was used without being
|
|||
|
initialized. Why does this matter? The C language does not actually
|
|||
|
specify how the compiler should treat uninitialized
|
|||
|
variables. Implementations of the C compiler may zero them out for you,
|
|||
|
but really there is no specification of how this situation should be
|
|||
|
handled. This can lead to undefined behavior and cause the program to
|
|||
|
work one way one system and differently on other systems. To fix this
|
|||
|
error, simply initialize the variable sum to the value desired (0).
|
|||
|
|
|||
|
```c
|
|||
|
#include<stdio.h>
|
|||
|
#include<stdlib.h>
|
|||
|
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
int i, sum = 0;
|
|||
|
for(i = 0; i < 6; i++) {
|
|||
|
sum += i;
|
|||
|
}
|
|||
|
printf("The sum of all integers from 0-5 is: %d\n", sum);
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
Compile the program again and you should no longer see any errors.
|
|||
|
|
|||
|
```
|
|||
|
$ gcc -Wall -Werror helloworld2.c
|
|||
|
$ ./a.out
|
|||
|
The sum of all integers from 0-5 is: 15
|
|||
|
$
|
|||
|
```
|
|||
|
|
|||
|
> :scream: In this class, you *MUST ALWAYS* compile your assignments
|
|||
|
> with the flags `-Wall -Werror`. This will help you locate mistakes in
|
|||
|
> your program and the grader will compile your assignment withthese
|
|||
|
> flags as well. Consider this your warning, `-Wall -Werror` are
|
|||
|
> necessary. Do not progress through your assignment without using
|
|||
|
> these flags and attempt to fix the errors they highlight last minute.
|
|||
|
|
|||
|
## GNU Make and Makefiles
|
|||
|
|
|||
|
As you program more in C, you will continue to add more flags and more
|
|||
|
files to your programs. To type these commands over and over again will
|
|||
|
eventually become an error laden chore. Also as you add more files, if
|
|||
|
you rebuild every file every time, even if it didn’t change, it will
|
|||
|
take a long time to compile your program. To help alleviate this issue
|
|||
|
build tools were created. One such tool is GNU Make (you will be
|
|||
|
required to use Make in this class). Make itself has lots of options
|
|||
|
and features that can be configured. While mastering Make is not
|
|||
|
required from this class, you will probably want to learn how to make
|
|||
|
simple changes to what we supply.
|
|||
|
|
|||
|
Refer
|
|||
|
[here](http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/)
|
|||
|
for a great Makefile tutorial and information resource. **You will
|
|||
|
always be provided with a working makefile, this is provided for
|
|||
|
extended learning.**
|
|||
|
|
|||
|
[http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/](http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/)
|
|||
|
|
|||
|
## Header Files
|
|||
|
|
|||
|
There are some coding practices that you should become familiar with
|
|||
|
in C from the beginning. The C compiler reads through your code once
|
|||
|
and only once. This means all functions and variables you use must be
|
|||
|
declared in advance of their usage or the compiler will not know how to
|
|||
|
compile and exit with errors. This is why we have header files, we
|
|||
|
declare all of our function prototypes in a `.h` file and
|
|||
|
`#include` it in our `.c` file. This is so we can write the body of our
|
|||
|
functions in any order and call them in any order we please.
|
|||
|
|
|||
|
A header file is just a file which ends in the `.h` extension. Typically
|
|||
|
you declare **function prototypes**, define `struct` and `union` types,
|
|||
|
`#include` other header files, `#define` constants and macros, and
|
|||
|
`typedef`. Some header files also expose global variables, but this is
|
|||
|
strongly discouraged as it can cause compilation errors.
|
|||
|
|
|||
|
When you define function prototypes in a `.h` file, you can then define
|
|||
|
the body of the function inside of any `.c` file. Though typically, if
|
|||
|
the header file was `called example.h`, we would define the functions in
|
|||
|
`example.c`. If we were producing a massive library like
|
|||
|
[stdlibc](https://en.wikipedia.org/wiki/C_standard_library), you
|
|||
|
may instead declare all the function prototypes in a single header file
|
|||
|
but put each function definition in its own file. It’s all
|
|||
|
a preference, but these are two common practices. You should never be
|
|||
|
defining function bodies in the header though, this will just cause you
|
|||
|
issues later.
|
|||
|
|
|||
|
There are two ways to specify where the include directive looks for
|
|||
|
header files. If you use `<>`, when the preprocessor encounters the
|
|||
|
include statement it will look for the file in a predefined location
|
|||
|
on your system (usually `/usr/include`). If you use `""`, the preprocessor
|
|||
|
will look in the current directory of the file being
|
|||
|
processed. Typically system and library headers are included using `<>`,
|
|||
|
and custom headers that you have made for your program are included
|
|||
|
using `""`.
|
|||
|
|
|||
|
### Header file example
|
|||
|
|
|||
|
```c
|
|||
|
#include<stdio.h>
|
|||
|
#include<stdlib.h>
|
|||
|
#include<stdlib.h>
|
|||
|
|
|||
|
#define TRUE 1
|
|||
|
#define FALSE 0
|
|||
|
|
|||
|
struct student {
|
|||
|
char *first_name;
|
|||
|
char *last_name;
|
|||
|
int age;
|
|||
|
float gpa;
|
|||
|
};
|
|||
|
|
|||
|
int foo(int a, int b);
|
|||
|
void bar(void);
|
|||
|
```
|
|||
|
|
|||
|
|
|||
|
```c
|
|||
|
#include"example.h"
|
|||
|
|
|||
|
int main(int argc, char *argv[]){
|
|||
|
bar();
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
|
|||
|
void bar(void){
|
|||
|
printf("foo: %d", foo(2, 3));
|
|||
|
}
|
|||
|
|
|||
|
int foo(int a, int b) {
|
|||
|
return a * b;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
### Header Guard a.k.a Include Guard
|
|||
|
|
|||
|
While using header files solves one issue, they create issues of their
|
|||
|
own. What if multiple files include the same header file? What if
|
|||
|
header file A includes header file B, and header file B includes
|
|||
|
header file A? If we keep including the same header file multiple
|
|||
|
times, this will make our source files larger than needed and slow
|
|||
|
down the compilation process. It may also cause errors if there are
|
|||
|
variables declared in the code. If two files keep including each other
|
|||
|
how does the compiler know when to stop? To prevent such errors one
|
|||
|
must utilize **header guards**. The header guard is used to prevent double
|
|||
|
and cyclic inclusion of a header file.
|
|||
|
|
|||
|
### Header Guard example
|
|||
|
|
|||
|
In grandparent.h:
|
|||
|
|
|||
|
```c
|
|||
|
struct foo {
|
|||
|
int member;
|
|||
|
};
|
|||
|
```
|
|||
|
|
|||
|
In parent.h:
|
|||
|
|
|||
|
```c
|
|||
|
#include "grandparent.h"
|
|||
|
```
|
|||
|
|
|||
|
In child.h:
|
|||
|
|
|||
|
```c
|
|||
|
#include "grandparent.h"
|
|||
|
#include "parent.h"
|
|||
|
```
|
|||
|
|
|||
|
The linker will create a temporary file that has literal copies of the
|
|||
|
`foo` definition twice and this will create a compiler error since the
|
|||
|
compiler does not know which definition takes precedence. The fix:
|
|||
|
|
|||
|
In grandparent.h:
|
|||
|
|
|||
|
```c
|
|||
|
#ifndef GRANDFATHER_H
|
|||
|
#define GRANDFATHER_H
|
|||
|
struct foo {
|
|||
|
int member;
|
|||
|
};
|
|||
|
#endif
|
|||
|
```
|
|||
|
|
|||
|
In parent.h:
|
|||
|
|
|||
|
```c
|
|||
|
#include "grandparent.h"
|
|||
|
```
|
|||
|
|
|||
|
In child.h:
|
|||
|
|
|||
|
```c
|
|||
|
#include "grandparent.h"
|
|||
|
#include "parent.h"
|
|||
|
```
|
|||
|
|
|||
|
`ifndef`, `#define`, `#endif` are preprocessor macros that
|
|||
|
prevent the double inclusion. This is because when the `father.h` file
|
|||
|
includes `grandfather.h` for the second time the `#ifndef` macro returns
|
|||
|
false so the second definition for `foo` is never included.
|
|||
|
Read [here](https://en.wikipedia.org/wiki/Include_guard#Double_inclusion)
|
|||
|
for more information.
|
|||
|
|
|||
|
> You should always use header files and guards in your
|
|||
|
assignments. Newer compilers now support what is known as `#pragma once`.
|
|||
|
This directive performs the same operation as the header guard,
|
|||
|
but it may not be a cross platform solution when considering
|
|||
|
older machines.
|
|||
|
|
|||
|
### Directory Structure
|
|||
|
|
|||
|
To help with a clear and consistent structure to your programs, you
|
|||
|
can use the following directory structure. This is a common directory
|
|||
|
structure for projects in C.
|
|||
|
|
|||
|
```
|
|||
|
.
|
|||
|
├── Makefile
|
|||
|
├── include
|
|||
|
│ ├── debug.h
|
|||
|
│ └── func.h
|
|||
|
└── src
|
|||
|
├── main.c
|
|||
|
└── func.c
|
|||
|
```
|
|||
|
|
|||
|
> :scream: You will be **REQUIRED** to follow this structure for **ALL** the homework
|
|||
|
assignments for this class. Failure to do so will result in a ZERO.
|
|||
|
|
|||
|
## Datatype Sizes
|
|||
|
|
|||
|
Depending on the system and the underlying architecture, which can
|
|||
|
have different word sizes etc., datatypes can have various different
|
|||
|
sizes. In a language like Java, much of these issues are hidden from
|
|||
|
the programmer. The JVM creates another layer of abstraction which can
|
|||
|
allow the programmer to believe all datatypes are of same size no
|
|||
|
matter the underlying architecture. C, on the other hand, does not
|
|||
|
have this luxury. The programmer has to consider everything about the
|
|||
|
system being worked on. To make programs cross platform, code and
|
|||
|
logic needs to be tested, comparing results and output, and altered
|
|||
|
accordingly.
|
|||
|
|
|||
|
C lacks the ability to add new datatypes to its
|
|||
|
specification. Instead, it works with models known as LP64,
|
|||
|
ILP64, LLP64, ILP32, and LP32. The `I` stands for `INT`, the `L` stands for
|
|||
|
`LONG` and the `P` stands for `POINTER`. The number after the letters
|
|||
|
describes the maximum bit size of the data types.
|
|||
|
|
|||
|
The typical sizes of these models are described below in the following
|
|||
|
table (in bits):
|
|||
|
|
|||
|
```
|
|||
|
TABLE WAS MISSING IN ORIGINAL -- NEED TO RECONSTRUCT!
|
|||
|
```
|
|||
|
|
|||
|
Notice that the size of an integer on one machine could be different
|
|||
|
from that on another machine depending on which model the machine
|
|||
|
runs. To prove this to yourself, use the special operator in the C
|
|||
|
language known as `sizeof`. The operator `sizeof` will tell you the size of
|
|||
|
a specific datatype in bytes. As an exercise, you should create the
|
|||
|
following program and run it in your development environment and on
|
|||
|
a system with a different underlying architecture (such as 'Sparky')
|
|||
|
and compare the results.
|
|||
|
|
|||
|
```c
|
|||
|
#include <stdlib.h>
|
|||
|
#include <stdio.h>
|
|||
|
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
/* Basic data types */
|
|||
|
printf("=== Basic Data Types ===\n");
|
|||
|
printf("short: %lu bytes\n", sizeof(short));
|
|||
|
printf("int: %lu bytes\n", sizeof(int));
|
|||
|
printf("long: %lu bytes\n", sizeof(long));
|
|||
|
printf("long long: %lu bytes\n", sizeof(long long));
|
|||
|
printf("char: %lu byte(s)\n", sizeof(char));
|
|||
|
printf("double: %lu bytes\n", sizeof(double));
|
|||
|
/* Pointers */ printf("=== Pointers ===\n");
|
|||
|
printf("char*: %lu bytes\n", sizeof(char*));
|
|||
|
printf("int*: %lu bytes\n", sizeof(int*));
|
|||
|
printf("long*: %lu bytes\n", sizeof(long*));
|
|||
|
printf("void*: %lu bytes\n", sizeof(void*));
|
|||
|
printf("double*: %lu bytes\n", sizeof(double*));
|
|||
|
/* Special value - This may have undefined results... why? */
|
|||
|
printf("=== Special Data Types ===\n");
|
|||
|
printf("void: %lu byte(s)\n", sizeof(void));
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
To further illustrate why this is a problem, consider the following program.
|
|||
|
|
|||
|
```c
|
|||
|
#include <stdlib.h>
|
|||
|
#include <stdio.h>
|
|||
|
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
// 0x200000000 -> 8589934592 in decimal
|
|||
|
long value = strtol("200000000", NULL, 16);
|
|||
|
printf("value: %ld\n", value);
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
In libc, there exists a header `stdint.h` which has special types
|
|||
|
defined to make sure that if you use them, nomatter what system you
|
|||
|
are on, it can guarantee that they are the correct size.
|
|||
|
|
|||
|
## Endianness
|
|||
|
|
|||
|
When dealing with multi byte values and different architectures, the
|
|||
|
**endianness** of each architecture should also be taken into
|
|||
|
account. There are many ways to detect what endianness your machine
|
|||
|
is, for example:
|
|||
|
|
|||
|
```c
|
|||
|
#include <stdio.h>
|
|||
|
#include <stdlib.h>
|
|||
|
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
unsigned int i = 1;
|
|||
|
char *c = (char*)&i; // Convert the LSB into a character
|
|||
|
if(*c) {
|
|||
|
printf("little endian\n");
|
|||
|
} else {
|
|||
|
printf("big endian\n");
|
|||
|
}
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
Can you think of why this works? Could you explain it if asked on an exam?
|
|||
|
|
|||
|
## Assembly
|
|||
|
|
|||
|
During the compilation process, a C program is translated to an
|
|||
|
assembly source file. This is important because it is possible that
|
|||
|
something which has great performance in one system could have
|
|||
|
terrible performance in another with the exact same C implementation,
|
|||
|
in this case, the programmer has to inspect the assembly code for
|
|||
|
more information.
|
|||
|
|
|||
|
Example:
|
|||
|
|
|||
|
```c
|
|||
|
// asm.c
|
|||
|
#include <stdlib.h>
|
|||
|
#include <stdio.h>
|
|||
|
#include <stdint.h>
|
|||
|
#include <inttypes.h>
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
char buffer[1024];
|
|||
|
// Get user input
|
|||
|
fgets(buffer, 1024, stdin);
|
|||
|
int64_t value = strtoll(buffer, NULL, 10);
|
|||
|
printf("You entered %" PRId64 "\n", value);
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
Test the program with 32-bit binaries vs 64-bit binaries. To be able
|
|||
|
to compile a 32-bit binary on a 64-bit machine, utilize the `-m32`
|
|||
|
flag provided by gcc-multilib (installed during HW0). Here is how to
|
|||
|
compile each program respectively:
|
|||
|
|
|||
|
```
|
|||
|
$ gcc -Wall -Werror -m32 asm.c -o 32.out
|
|||
|
$ gcc -Wall -Werror -m64 asm.c -o 64.out
|
|||
|
```
|
|||
|
|
|||
|
Run each program and you should see this output:
|
|||
|
|
|||
|
```
|
|||
|
$ ./64.out
|
|||
|
75
|
|||
|
You entered 75
|
|||
|
$ ./32.out
|
|||
|
75
|
|||
|
You entered 75
|
|||
|
```
|
|||
|
|
|||
|
> 75 is a value that is entered by the user. You can enter any number you choose.
|
|||
|
|
|||
|
Notice, even though both programs are compiled for different
|
|||
|
architectures, they still produce the same results.These programs are
|
|||
|
assembled using different instruction sets though. To see this compile
|
|||
|
the programs with the `-S` flag. This flag will store the intermediate
|
|||
|
assembly of the program in a `.s` file.
|
|||
|
|
|||
|
For the 64-bit program run:
|
|||
|
|
|||
|
```
|
|||
|
$ gcc -Wall -Werror -m64 -S asm.c
|
|||
|
```
|
|||
|
|
|||
|
Take a look at `asm.s` which was just generated in the **current working directory**.
|
|||
|
|
|||
|
```
|
|||
|
# x86-64 assembly for asm.c
|
|||
|
.file "asm.c"
|
|||
|
.section .rodata
|
|||
|
.LC0:
|
|||
|
.string "You entered %ld\n"
|
|||
|
.text .globl main
|
|||
|
.type main, @function
|
|||
|
main:
|
|||
|
.LFB2:
|
|||
|
.cfi_startproc
|
|||
|
pushq %rbp
|
|||
|
.cfi_def_cfa_offset 16
|
|||
|
.cfi_offset 6, -16
|
|||
|
movq %rsp, %rbp
|
|||
|
.cfi_def_cfa_register 6
|
|||
|
subq $1072, %rsp
|
|||
|
movl %edi, -1060(%rbp)
|
|||
|
movq %rsi, -1072(%rbp)
|
|||
|
movq %fs:40, %rax
|
|||
|
movq %rax, -8(%rbp)
|
|||
|
xorl %eax, %eax
|
|||
|
movq stdin(%rip), %rdx
|
|||
|
leaq -1040(%rbp), %rax
|
|||
|
movl $1024, %esi
|
|||
|
movq %rax, %rdi
|
|||
|
call fgets
|
|||
|
leaq -1040(%rbp), %rax
|
|||
|
movl $10, %edx
|
|||
|
movl $0, %esi
|
|||
|
movq %rax, %rdi
|
|||
|
call strtoll
|
|||
|
movq %rax, -1048(%rbp)
|
|||
|
movq -1048(%rbp), %rax
|
|||
|
movq %rax, %rsi
|
|||
|
movl $.LC0, %edi
|
|||
|
movl $0, %eax
|
|||
|
call printf
|
|||
|
movl $0, %eax
|
|||
|
movq -8(%rbp), %rcx
|
|||
|
xorq %fs:40, %rcx
|
|||
|
je .L3
|
|||
|
call __stack_chk_fail
|
|||
|
.L3:
|
|||
|
leave
|
|||
|
.cfi_def_cfa 7, 8
|
|||
|
ret
|
|||
|
.cfi_endproc
|
|||
|
.LFE2:
|
|||
|
.size main, .-main
|
|||
|
.ident "GCC: (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010"
|
|||
|
.section .note.GNU-stack,"",@progbits
|
|||
|
```
|
|||
|
|
|||
|
Now compile it for x86 using the following command:
|
|||
|
|
|||
|
```
|
|||
|
$ gcc -Wall -Werror -m32 -S asm.c
|
|||
|
```
|
|||
|
|
|||
|
Again, take a look at `asm.s` which was just generated in current working directory.
|
|||
|
|
|||
|
```
|
|||
|
# x86 assembly for asm.c
|
|||
|
.file "asm.c"
|
|||
|
.section .rodata
|
|||
|
.LC0:
|
|||
|
.string "You entered %lld\n"
|
|||
|
.text .globl main
|
|||
|
.type main, @function
|
|||
|
main:.LFB2:
|
|||
|
.cfi_startproc
|
|||
|
leal 4(%esp), %ecx
|
|||
|
.cfi_def_cfa 1, 0
|
|||
|
andl $-16, %esp
|
|||
|
pushl -4(%ecx)
|
|||
|
pushl %ebp
|
|||
|
.cfi_escape 0x10,0x5,0x2,0x75,0
|
|||
|
movl %esp, %ebp
|
|||
|
pushl %ecx
|
|||
|
.cfi_escape 0xf,0x3,0x75,0x7c,0x6
|
|||
|
subl $1060, %esp
|
|||
|
movl %ecx, %eax
|
|||
|
movl 4(%eax), %eax
|
|||
|
movl %eax, -1052(%ebp)
|
|||
|
movl %gs:20, %eax
|
|||
|
movl %eax, -12(%ebp)
|
|||
|
xorl %eax, %eax
|
|||
|
movl stdin, %eax
|
|||
|
subl $4, %esp
|
|||
|
pushl %eax
|
|||
|
pushl $1024
|
|||
|
leal -1036(%ebp), %eax
|
|||
|
pushl %eax
|
|||
|
call fgets
|
|||
|
addl $16, %esp
|
|||
|
subl $4, %esp
|
|||
|
pushl $10
|
|||
|
pushl $0
|
|||
|
leal -1036(%ebp), %eax
|
|||
|
pushl %eax
|
|||
|
call strtoll
|
|||
|
addl $16, %esp
|
|||
|
movl %eax, -1048(%ebp)
|
|||
|
movl %edx, -1044(%ebp)
|
|||
|
subl $4, %esp
|
|||
|
pushl -1044(%ebp)
|
|||
|
pushl -1048(%ebp)
|
|||
|
pushl $.LC0
|
|||
|
call printf
|
|||
|
addl $16, %esp
|
|||
|
movl $0, %eax
|
|||
|
movl -12(%ebp), %edx
|
|||
|
xorl %gs:20, %edx
|
|||
|
je .L3
|
|||
|
call __stack_chk_fail
|
|||
|
.L3:
|
|||
|
movl -4(%ebp), %ecx
|
|||
|
.cfi_def_cfa 1, 0
|
|||
|
leave
|
|||
|
.cfi_restore 5
|
|||
|
leal -4(%ecx), %esp
|
|||
|
.cfi_def_cfa 4, 4
|
|||
|
ret
|
|||
|
.cfi_endproc
|
|||
|
.LFE2:
|
|||
|
.size main, .-main
|
|||
|
.ident "GCC: (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010"
|
|||
|
.section .note.GNU-stack,"",@progbits
|
|||
|
```
|
|||
|
|
|||
|
Additionally you can log into sparky, and use the C compiler on that
|
|||
|
machine. It will generate 32-bit SPARC assembly.
|
|||
|
|
|||
|
```
|
|||
|
$ gcc -Wall -Werror -S asm.c
|
|||
|
```
|
|||
|
|
|||
|
```
|
|||
|
# 32-bit SPARC assembly
|
|||
|
.file "asm.c"
|
|||
|
.section ".rodata"
|
|||
|
.align 8
|
|||
|
.LLC0:
|
|||
|
.asciz "You entered %lld\n"
|
|||
|
.section ".text"
|
|||
|
.align 4
|
|||
|
.global main
|
|||
|
.type main, #function
|
|||
|
.proc 04
|
|||
|
main:
|
|||
|
save %sp, -1128, %sp
|
|||
|
st %i0, [%fp+68]
|
|||
|
st %i1, [%fp+72]
|
|||
|
add %fp, -1032, %g1
|
|||
|
mov %g1, %o0
|
|||
|
mov 1024, %o1
|
|||
|
sethi %hi(__iob), %g1
|
|||
|
or %g1, %lo(__iob), %o2
|
|||
|
call fgets, 0
|
|||
|
nop
|
|||
|
add %fp, -1032, %g1
|
|||
|
mov %g1, %o0
|
|||
|
mov 0, %o1
|
|||
|
mov 10, %o2
|
|||
|
call strtoll, 0
|
|||
|
nop
|
|||
|
std %o0, [%fp-8]
|
|||
|
sethi %hi(.LLC0), %g1
|
|||
|
or %g1, %lo(.LLC0), %o0
|
|||
|
ld [%fp-8], %o1
|
|||
|
ld [%fp-4], %o2
|
|||
|
call printf, 0
|
|||
|
nop
|
|||
|
mov 0, %g1
|
|||
|
mov %g1, %i0
|
|||
|
return %i7+8
|
|||
|
nop
|
|||
|
.size main, .-main
|
|||
|
.ident "GCC: (GNU) 4.9.1"
|
|||
|
```
|
|||
|
|
|||
|
## Assembly Analysis
|
|||
|
|
|||
|
The assembly generated for a particular architecture varies greatly
|
|||
|
even though it all accomplishes the exact same task on each
|
|||
|
system. Notice that the SPARC assembly is shorter than the other two
|
|||
|
(40 lines for SPARC, 67 lines for x86, and 51 lines for x86-64) and
|
|||
|
that the registers used are different in all three examples.
|
|||
|
|
|||
|
Take a look at how the format string in the printf call got translated:
|
|||
|
|
|||
|
```c
|
|||
|
printf("You entered %" PRId64 "\n", value);
|
|||
|
```
|
|||
|
|
|||
|
```
|
|||
|
.string "You entered %ld\n" # x86-64; 64-bits
|
|||
|
.string "You entered %lld\n" # x86; 32-bits
|
|||
|
.asciz "You entered %lld\n" # SPARC; 32-bits
|
|||
|
```
|
|||
|
|
|||
|
See that PRId64 got translated to different formats: `%ld` and
|
|||
|
`%lld`. This is because the `int64_t` is translated to different types
|
|||
|
depending on the platform to guarantee that it is at least 64-bits
|
|||
|
wide. In the SPARC code, notice thatthere are `nop` instructions after
|
|||
|
the call to `printf`, `strtoll`, `fgets`, and return. This is because of a
|
|||
|
technique known as **delayed branching** used in the SPARC architecture.
|
|||
|
|
|||
|
In the x86 assembly, notice `subl` and `pushl` instructions which are used
|
|||
|
to manipulate the stack before calling functions. These instructions
|
|||
|
are absent from the x86-64 example. This is because x86 architecture
|
|||
|
has half the amount of registers as x86-64 architectures so the
|
|||
|
convention is to push arguments for a function call to the stack
|
|||
|
to compensate for this. At the core, the **Application Binary Interface**
|
|||
|
differs between the systems. There are also various other differences
|
|||
|
that can’t be seen by looking at the assembly such as variable sized
|
|||
|
instruction formats, but, in general, you should just be aware that any
|
|||
|
C code gets translated very differently depending on the machine.
|
|||
|
|
|||
|
## Preprocessor
|
|||
|
|
|||
|
Sometimes the easiest way to see what is happening in your program is
|
|||
|
to just use print statements. This is a method that everyone can do
|
|||
|
(and we know how to do!). However, we shouldn’t just put `printf` all
|
|||
|
over our program. We do not always want to see these print outs (way
|
|||
|
too much information for normal operation) and we don’t want to have to
|
|||
|
comment/uncomment lines constantly.
|
|||
|
|
|||
|
One possible solution to this is passing a command line argument that
|
|||
|
turns debugging on and off. This might be an acceptable solution but it
|
|||
|
will clutter our code with lots of if statements to check if debugging
|
|||
|
is enabled or not, make our binary larger when we don’t want debugging
|
|||
|
enabled, etc. Instead we will use some preprocessor tricks to give us
|
|||
|
some logging statements when we **compile with** the flag
|
|||
|
`-DDEBUG`. When we **compile without** the flag `-DDEBUG`, none of these
|
|||
|
debugging statements will be printed.
|
|||
|
|
|||
|
We have defined in the given Makefile a `debug` target. This compiles
|
|||
|
your program with the `-DDEBUG` flag and `-g`, the latter of which is
|
|||
|
necessary for gdb to work. You can simply run:
|
|||
|
|
|||
|
```
|
|||
|
$ make clean debug
|
|||
|
```
|
|||
|
|
|||
|
as opposed to `make clean all` to set your program up for debugging.
|
|||
|
|
|||
|
Create a new header called `debug.h` and we can define each of these
|
|||
|
macros in this header and use them in `main()` by adding `#include "debug.h"`
|
|||
|
to `main.c`.
|
|||
|
|
|||
|
debug.h:
|
|||
|
|
|||
|
```c
|
|||
|
#ifndef DEBUG_H
|
|||
|
#define DEBUG_H
|
|||
|
#include<stdlib.h>
|
|||
|
#include<stdio.h>
|
|||
|
|
|||
|
#define debug(msg) printf("DEBUG: %s", msg)
|
|||
|
|
|||
|
#endif
|
|||
|
```
|
|||
|
|
|||
|
Then in your program use the debug macro
|
|||
|
|
|||
|
main.c:
|
|||
|
|
|||
|
```c
|
|||
|
#include "debug.h"
|
|||
|
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
debug("Hello, World!\n");
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
Then compile your program and run it.
|
|||
|
|
|||
|
```
|
|||
|
$ make clean all
|
|||
|
$ bin/hw1
|
|||
|
DEBUG: Hello, World!
|
|||
|
```
|
|||
|
|
|||
|
Great! You just created your first **preprocessor macro**. Unfortunately
|
|||
|
this is no better than just adding a print statement. Let's fix that!
|
|||
|
|
|||
|
The preprocessor has `#if`, `#elif`, and `#else` **directives** that that we can
|
|||
|
use to control what gets added during compilation. (Also `#endif` for
|
|||
|
completing an if/else block) Let's create an *if* directive that will
|
|||
|
include a section of code if `DEBUG` is defined within the preprocessor.
|
|||
|
|
|||
|
debug.h:
|
|||
|
|
|||
|
```c
|
|||
|
#ifndef DEBUG_H
|
|||
|
#define DEBUG_H
|
|||
|
#include<stdlib.h>
|
|||
|
#include<stdio.h>
|
|||
|
|
|||
|
#define debug(msg) printf("DEBUG: %s", msg)
|
|||
|
|
|||
|
#endif
|
|||
|
```
|
|||
|
main.c:
|
|||
|
|
|||
|
```c
|
|||
|
#include "debug.h"
|
|||
|
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
#ifdef DEBUG
|
|||
|
debug("Debug flag was defined\n");
|
|||
|
#endif
|
|||
|
printf("Hello, World!\n");
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
When we compile this program it will check to see if `#define DEBUG` was
|
|||
|
defined in our program. Let's test this out.
|
|||
|
|
|||
|
```
|
|||
|
$ make clean all
|
|||
|
$ bin/hw1
|
|||
|
Hello, World!
|
|||
|
```
|
|||
|
|
|||
|
Cool the debug message didn’t print out. Now let's define `DEBUG` during
|
|||
|
the compilation process, and run the program again.
|
|||
|
|
|||
|
```
|
|||
|
$ make clean debug
|
|||
|
$ bin/hw1
|
|||
|
DEBUG: Debug flag was defined
|
|||
|
Hello, World!
|
|||
|
```
|
|||
|
|
|||
|
Here you can see that debug was defined so that extra code between
|
|||
|
`#ifdef DEBUG` and `#endif` was included. This technique will work for
|
|||
|
certain situations, but if we have a lot of logging messages in our
|
|||
|
program this will quickly clutter our code and make it
|
|||
|
unreadable. Fortunately we can do better.
|
|||
|
|
|||
|
Instead of doing `#ifdef DEBUG` all over our program we can instead do
|
|||
|
`#ifdef DEBUG` around our `#define debug` macro.
|
|||
|
|
|||
|
debug.h:
|
|||
|
|
|||
|
```c
|
|||
|
#ifndef DEBUG_H
|
|||
|
#define DEBUG_H
|
|||
|
#include<stdlib.h>
|
|||
|
#include<stdio.h>
|
|||
|
|
|||
|
#if DEBUG
|
|||
|
#define debug(msg) printf("DEBUG: %s", msg)
|
|||
|
#endif
|
|||
|
|
|||
|
#endif
|
|||
|
```
|
|||
|
|
|||
|
main.c:
|
|||
|
|
|||
|
```c
|
|||
|
#include"debug.h"
|
|||
|
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
debug("Debug flag was defined\n");
|
|||
|
printf("Hello, World!\n");
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
There is an issue with this, but let's try to compile the program.
|
|||
|
|
|||
|
```
|
|||
|
$ make clean debug
|
|||
|
$ bin/hw1
|
|||
|
DEBUG: Debug flag was defined
|
|||
|
Hello, World!
|
|||
|
```
|
|||
|
|
|||
|
Cool it works. Now let's try to compile it without defining `-DDEBUG`.
|
|||
|
|
|||
|
```
|
|||
|
$ make clean all
|
|||
|
/tmp/cc6F04VW.o: In function `main':
|
|||
|
debug.c:(.text+0x1a): undefined reference to `debug'
|
|||
|
collect2: error: ld returned 1 exit status
|
|||
|
```
|
|||
|
|
|||
|
Whoops. What happened here? Well when we used `-DDEBUG` the debug macro
|
|||
|
was defined, so it worked as expected. When we don’t compile with
|
|||
|
`-DDEBUG` the `#define` debug is never declared in our file so it is
|
|||
|
never substituted in our program. Since we used `debug` in the middle of
|
|||
|
our code the preprocessor and compiler have no idea what `debug` symbol
|
|||
|
is so it fails. Luckily this is easy to fix. We simply have to add
|
|||
|
another case to our preprocessor if, else statement to handle this
|
|||
|
case.
|
|||
|
|
|||
|
debug.h:
|
|||
|
|
|||
|
```c
|
|||
|
#ifndef DEBUG_H
|
|||
|
#define DEBUG_H
|
|||
|
#include<stdlib.h>
|
|||
|
#include<stdio.h>
|
|||
|
|
|||
|
#if DEBUG
|
|||
|
#define debug(msg) printf("DEBUG: %s", msg)
|
|||
|
#else
|
|||
|
#define debug(msg)
|
|||
|
#endif
|
|||
|
|
|||
|
#endif
|
|||
|
```
|
|||
|
|
|||
|
main.c:
|
|||
|
|
|||
|
```c
|
|||
|
#include"debug.h"
|
|||
|
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
debug("Debug flag was defined\n");
|
|||
|
printf("Hello, World!\n");
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
Here we tell the preprocessor to replace any occurrences of `debug(msg)`
|
|||
|
with nothing, so now when we don’t compile with `-DDEBUG`. The
|
|||
|
preprocessor simply replaces `debug("Debug flag was defined\n")` with
|
|||
|
an empty space. Let's compile again.
|
|||
|
|
|||
|
```
|
|||
|
$ make clean all
|
|||
|
$ bin/hw1
|
|||
|
Hello, World!
|
|||
|
```
|
|||
|
|
|||
|
Cool. Now we can embed debug macros all over our program that look
|
|||
|
like normal functions. There’s still a few more cool tricks we can do
|
|||
|
to make this better.The preprocessor has a few special macros defined
|
|||
|
called ``__LINE__``, ``__FILE__``, and ``__FUNCTION__``. These macros will be
|
|||
|
replaced by the preprocessor to evaluate to the *line number* where the
|
|||
|
macro is called, the *file name* that the macro is called in, and the
|
|||
|
*function name* that the macro is called in. Let's play with this a bit.
|
|||
|
|
|||
|
debug.h:
|
|||
|
|
|||
|
```c
|
|||
|
#ifndef DEBUG_H
|
|||
|
#define DEBUG_H
|
|||
|
#include<stdlib.h>
|
|||
|
#include<stdio.h>
|
|||
|
|
|||
|
#ifdef DEBUG
|
|||
|
#define debug(msg) printf("DEBUG: %s:%s:%d %s", __FILE__, __FUNCTION__, __LINE__,msg)
|
|||
|
#else
|
|||
|
#define debug(msg)
|
|||
|
#endif
|
|||
|
|
|||
|
#endif
|
|||
|
```
|
|||
|
|
|||
|
main.c:
|
|||
|
|
|||
|
```c
|
|||
|
#include"debug.h"
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
debug("Debug flag was defined\n");
|
|||
|
printf("Hello, World!\n");
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
Let's compile this program and run.
|
|||
|
|
|||
|
```
|
|||
|
$ make clean debug
|
|||
|
$ bin/hw1
|
|||
|
DEBUG: debug.c:main:11 Debug flag was defined
|
|||
|
Hello, World!
|
|||
|
```
|
|||
|
|
|||
|
As you can see all the `__FILE__`, `__FUNCTION__`, and `__LINE__` were
|
|||
|
replaced with the corresponding values for when debug was called in the
|
|||
|
program. Pretty cool, but we can still do even better! Normally when
|
|||
|
we want to print something we use `printf()` and use the format
|
|||
|
specifiers and variable arguments to print useful information. With our
|
|||
|
current setup though we can’t do that. Fortunately for us the
|
|||
|
preprocessor offers up a `__VA_ARGS__` macro which we can use to
|
|||
|
accomplish this.
|
|||
|
|
|||
|
> I want to point out that the syntax for this gets a bit crazy and hard
|
|||
|
to understand (complex preprocessor stuff is a bit of a black
|
|||
|
art). I’ll try my best to describe it but you may need to do some more
|
|||
|
googling if the below explanation is not sufficient.
|
|||
|
|
|||
|
```c
|
|||
|
#ifndef DEBUG_H
|
|||
|
#define DEBUG_H
|
|||
|
#include <stdlib.h>
|
|||
|
#include <stdio.h>
|
|||
|
|
|||
|
#ifdef DEBUG
|
|||
|
#define debug(fmt, ...) printf("DEBUG: %s:%s:%d " fmt, __FILE__, __FUNCTION__,__LINE__, ##__VA_ARGS__)
|
|||
|
#else
|
|||
|
#define debug(fmt, ...)
|
|||
|
#endif
|
|||
|
|
|||
|
#endif
|
|||
|
|
|||
|
#include"debug.h"
|
|||
|
|
|||
|
int main(int argc, char *argv[]) {
|
|||
|
debug("Program has %d args\n", argc);
|
|||
|
printf("Hello, World!\n");
|
|||
|
return EXIT_SUCCESS;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
First let's compile and run the program and see the results.
|
|||
|
|
|||
|
```
|
|||
|
$ make clean debug
|
|||
|
$ bin/hw1
|
|||
|
DEBUG: debug.c:main:11 Program has 1 args
|
|||
|
Hello, World!
|
|||
|
$ make clean all
|
|||
|
$ bin/hw1
|
|||
|
Hello, World!
|
|||
|
```
|
|||
|
|
|||
|
The macro works as expected, but let's try to explain it a bit.
|
|||
|
|
|||
|
First we changed the definition of the macro to be `#define debug(fmt, ...)`.
|
|||
|
The first argument `fmt` is the format string that we normally
|
|||
|
define for printf and `...` is the way to declare a macro that accepts a
|
|||
|
variable number of arguments.
|
|||
|
|
|||
|
Next we have `"DEBUG: %s:%s:%d " fmt`. The C compiler can **concatenate
|
|||
|
string literals** that are next to each other. So if `fmt` was the string
|
|||
|
`"crazy %d concatenation"` then this statements evaluates to
|
|||
|
`"DEBUG:%s:%s:%d crazy %d concatenation"`. Then we have our predefined
|
|||
|
preprocessor macros that are used for the string `"DEBUG: %s:%s:%d "`,
|
|||
|
and then we reach this next confusing statement: ,
|
|||
|
`##__VA_ARGS__`. The macro `__VA_ARGS__` will expand into the variable
|
|||
|
arguments provided to the debug statement, but then we have this crazy
|
|||
|
`, ##`. This is a hack for allowing no arguments to be passed to the
|
|||
|
debug macro, Ex. `debug("I have no varargs")`. If we didn’t do this, the
|
|||
|
previous debug statement would throw an warning/error during
|
|||
|
the compilation process as it would expect a `__VA_ARGS__` value.
|
|||
|
|
|||
|
This is one of the many interesting things we can use the C
|
|||
|
preprocessor for. Lastly preprocessor macros are in-text replacement
|
|||
|
before compilation, this can mean dangerous things when we are
|
|||
|
careless about how we use them. For example it is customary to never
|
|||
|
put a ; inside a macro definition since most programers would put a
|
|||
|
semicolon after the macro as they would most statements. Some
|
|||
|
programmers like to wrap the code in macros with a `do{ /*some code
|
|||
|
here */ } while(false)` loop. They do this because if your macro is made
|
|||
|
up of multiple statements, it will force you to add ; to all the
|
|||
|
statements in the do while loop. Then you still have to terminate
|
|||
|
this macro with a ; when you use it which makes it seem like a normal
|
|||
|
function in your C code.
|
|||
|
|
|||
|
Our final product will look like this:
|
|||
|
|
|||
|
```c
|
|||
|
#ifndef DEBUG_H
|
|||
|
#define DEBUG_H
|
|||
|
#include <stdlib.h>
|
|||
|
#include <stdio.h>
|
|||
|
|
|||
|
#ifdef DEBUG
|
|||
|
#define debug(fmt, ...) do{printf("DEBUG: %s:%s:%d " fmt, __FILE__, __FUNCTION__,__LINE__, ##__VA_ARGS__)}while(0)
|
|||
|
#else
|
|||
|
#define debug(fmt, ...)
|
|||
|
#endif
|
|||
|
|
|||
|
#endif
|
|||
|
```
|