20 KiB
Homework 2 Debugging and Fixing - CSE 320 - Spring 2022
Professor Eugene Stark
Due Date: Friday 3/4/2022 @ 11:59pm
Introduction
In this assignment you are tasked with updating an old piece of software, making sure it compiles, and that it works properly in your VM environment.
Maintaining old code is a chore and an often hated part of software engineering. It is definitely one of the aspects which are seldom discussed or thought about by aspiring computer science students. However, it is prevalent throughout industry and a worthwhile skill to learn. Of course, this homework will not give you a remotely realistic experience in maintaining legacy code or code left behind by previous engineers but it still provides a small taste of what the experience may be like. You are to take on the role of an engineer whose supervisor has asked you to correct all the errors in the program, plus add additional functionality.
By completing this homework you should become more familiar with the C programming language and develop an understanding of:
- How to use tools such as
gdb
andvalgrind
for debugging C code. - Modifying existing C code.
- C memory management and pointers.
- Working with files and the C standard I/O library.
The Existing Program
Your goal will be to debug and extend an old program called par
,
which was written by Adam M. Costello and posted to Usenet in 1993.
I have rearranged the original source code and re-written the Makefile
to conform to the format we are using for the assignments in this course.
Besides a bug that was present in the original version, I have introduced
a few additional bugs here and there to make things more interesting
and educational for you 😉.
Although you will need to correct these bugs in order to make the program
function, they do not otherwise change the program behavior from what
the author intended.
The par
program is a simple paragraph reformatter. It is basically
designed to read text from the standard input, parse the text into
paragraphs, which are delimited by empty lines, chop each paragraph up
into a sequence of words (forgetting about the original line breaks),
choose new line breaks to optimize some criteria that are designed to
produce a pleasing result, and the finally output the paragraph with
the new line breaks. There are several parameters that can be set
which affect the result: the width of the output text, the length of
a "prefix" and a "suffix" to be prepended and appended to each output line,
a parameter "hang", which affects the default value of "prefix", and
a boolean parameter "last", which affects the way the last line of a
paragraph is treated.
What you have to do is to first get the program to compile (for the most part, I did not modify the original code, which requires some changes for it to compile cleanly with the compiler and settings we are using). Then, you need to test the program and find and fix the bugs that prevent it from functioning properly. Some of the bugs existed in the original version and some I introduced for the purposes of this assignment. Finally, you will make some modifications to the program.
As you work on the program, limit the changes you make to the minimum necessary
to achieve the specified objectives. Don't rewrite the program;
assume that it is essentially correct and just fix a few compilation errors and
bugs as described below. You will likely find it helpful to use git
for this (I did).
Make exploratory changes first on a side branch (i.e. not the master branch),
then when you think you have understood the proper changes that need to be made,
go back and apply those changes to the master branch. Using git
will help you
to back up if you make changes that mess something up.
Getting Started - Obtain the Base Code
Fetch base code for hw2
as you did for the previous assignments.
You can find it at this link:
https://gitlab02.cs.stonybrook.edu/cse320/hw2.
Once again, to avoid a merge conflict with respect to the file .gitlab-ci.yml
,
use the following command to merge the commits:
git merge -m "Merging HW2_CODE" HW2_CODE/master --strategy-option=theirs
:nerd: I hope that by now you would have read some
git
documentation to find out what the--strategy-option=theirs
does, but in case you didn't 😠 I will say that merging ingit
applies a "strategy" (the default strategy is called "recursive", I believe) and--strategy-option
allows an option to be passed to the strategy to modify its behavior. In this case,theirs
means that whenever a conflict is found, the version of the file from the branch being merged (in this caseHW2_CODE/master
) is to be used in place of the version from the currently checked-out branch. An alternative totheirs
isours
, which makes the opposite choice. If you don't specify one of these options,git
will leave conflict indications in the file itself and it will be necessary for you to edit the file and choose the code you want to use for each of the indicated conflicts.
Here is the structure of the base code:
. ├── .gitlab-ci.yml └── hw2 ├── doc │ ├── par.1 │ ├── par.doc │ └── protoMakefile ├── hw2.sublime-project ├── include │ ├── buffer.h │ ├── debug.h │ ├── errmsg.h │ └── reformat.h ├── Makefile ├── rsrc │ ├── banner.txt │ ├── gettysburg.txt │ └── loremipsum.txt ├── src │ ├── buffer.c │ ├── errmsg.c │ ├── main.c │ ├── par.c │ └── reformat.c ├── test_output │ └── .git-keep └── tests ├── basecode_tests.c ├── rsrc │ ├── banner.txt │ ├── basic.in -> gettysburg.txt │ ├── basic.out │ ├── blank_lines.txt │ ├── EOF.in │ ├── EOF.out │ ├── gettysburg.txt │ ├── loremipsum.txt │ ├── prefix_suffix.in -> banner.txt │ ├── prefix_suffix.out │ ├── valgrind_leak.in -> gettysburg.txt │ ├── valgrind_leak.out │ ├── valgrind_uninitialized.err │ ├── valgrind_uninitialized.in -> loremipsum.txt │ └── valgrind_uninitialized.out ├── test_common.c └── test_common.h
The src
directory contains C source code files buffer.c
. par.c
, reformat.c
,
and errmsg.c
, which were part of the original code. In addition, I have added
a new file main.c
, with a single main()
function that simply calls
original_main()
in par.c
. This is to satisfy our requirement (for Criterion)
that main()
is the only function in main.c
.
The include
directory contains C header files buffer.h
, reformat.h
, and
errmsg.h
, which were part of the original source code. I have also added our
debug.h
header file which may be of use to you.
The doc
directory contains documentation files that were part of the original
distribution of par
. The file par.1
is in the format traditionally used
for Unix manual pages. This file par.
is intended to be processed with the
the formatting program nroff
with argument -man
; for example:
nroff -man doc/par.1 | less
could be used to format and view its contents.
The tests
directory contains C source code (in file basecode_tests.c
) for some Criterion
tests that can help guide you toward bugs in the program. These are not guaranteed
to be complete or exhaustive. The test_common.c
and test_common.h
contain auxiliary code
used by the tests. The subdirectory tests/rsrc
contains input files and reference output files
that are used by the tests.
The par
program was not designed to be particularly conducive to unit testing,
so all the tests we will make (including the tests used in grading) will be so-called
"black box" tests, which test the input-output behavior of the program running as a
separate process from the test driver.
The test_common.c
file contains helper functions for launching an instance of par
as a separate process, redirecting stdin
from an input file, collecting the
output produced on stdout
and stderr
, checking the exit status of the program,
and comparing the output against reference output.
The test_output
directory is a "dummy" directory which is used to hold the output
produced when you run the Criterion tests. Look there if you want to understand,
for example, why the tests reported that the output produced by your program was
not as expected.
Before you begin work on this assignment, you should read the rest of this
document. In addition, we additionally advise you to read the
Debugging Document. One of the main goals of this assignment
is to get you to learn how to use the gdb
debugger, so you should right away
be looking into how to use this while working on the tasks in the following sections.
Part 1: Debugging and Fixing
You are to complete the following steps:
-
Clean up the code; fixing any compilation issues, so that it compiles without error using the compiler options that have been set for you in the
Makefile
. Usegit
to keep track of the changes you make and the reasons for them, so that you can later review what you have done and also so that you can revert any changes you made that don't turn out to be a good idea in the end. -
Fix bugs.
Run the program, exercising the various options, and look for cases in which the program crashes or otherwise misbehaves in an obvious way. We are only interested in obvious misbehavior here; don't agonize over program behavior that might just have been the choice of the original author. You should use the provided Criterion tests to help point the way, though they are not guaranteed to be exhaustive.
-
Use
valgrind
to identify any memory leaks or other memory access errors. Fix any errors you find.Run
valgrind
using a command of the following form:$ valgrind --leak-check=full --show-leak-kinds=all --undef-value-errors=yes [PAR PROGRAM AND ARGS]
Note that the bugs that are present will all manifest themselves in some way either as incorrect output, program crashes or as memory errors that can be detected by
valgrind
. It is not necessary to go hunting for obscure issues with the program output. Also, do not make gratuitous changes to the program output, as this will interfere with our ability to test your code.😱 The author of this program was pretty fastidious about freeing memory before exiting the program. Once you have fixed the bugs, the program should exit without any type of memory leak reported by
valgrind
, including memory that is "still reachable" at the time of exit. "Still reachable" memory corresponds to memory that is in use when the program exits and can still be reached by following pointers from variables in the program. Although some people consider it to be untidy for a program to exit with "still reachable" memory, it doesn't cause any particular problem. For the present program, however, there should not be any "still reachable" memory.😱 You are NOT allowed to share or post on PIAZZA solutions to the bugs in this program, as this defeats the point of the assignment. You may provide small hints in the right direction, but nothing more.
Part 2: Changes to the Program
Rewrite/Extend Options Processing
The basecode version of par
performs its own ad hoc processing of command-line options.
This is likely due to the fact that there did not exist a commonly accepted library
package for performing this function at the time the program was written.
However, as options processing is a common function that is performed by most programs,
and it is desirable for programs on the same system to be consistent in how they interpret
their arguments, there have been more elaborate standardized libraries that have been written
for this purpose. In particular, the POSIX standard specifies a getopt()
function,
which you can read about by typing man 3 getopt
. A significant advantage to using a
standard library function like getopt()
for processing command-line arguments,
rather than implementing ad hoc code to do it, is that all programs that use
the standard function will perform argument processing in the same way
rather than having each program implement its own quirks that the user has to remember.
For this part of the assignment, you are to replace the original argument-processing
code in main()
by code that uses the GNU getopt
library package.
In addition to the POSIX standard getopt()
function, the GNU getopt
package
provides a function getopt_long()
that understands "long forms" of option
arguments in addition to the traditional single-letter options.
In your revised program, main()
should use getopt_long()
to traverse the
command-line arguments, and it should support the following option syntax
(in place of what was originally used by the program):
-
--version
(long form only): Print the version number of the program. -
-w WIDTH
(short form) or--width WIDTH
(long form): Set the output paragraph width toWIDTH
. -
-p PREFIX
(short form) or--prefix PREFIX
(long form): Set the value of the "prefix" parameter toPREFIX
. -
-s SUFFIX
(short form) or--suffix SUFFIX
(long form): Set the value of the "suffix" parameter toSUFFIX
. -
-h HANG
(short form) or--hang HANG
(long form): Set the value of the "hang" parameter toHANG
. -
-l LAST
(short form) or either--last
or--no-last
(long form): Set the value of the boolean "last" parameter. For the short form, the values allowed forLAST
should be either0
or1
. -
-m MIN
(short form) or either--min
or--no-min
(long form). Set the value of the boolean "min" parameter. For the short form, the values allowed forMIN
should be either0
or1
.
You will probably need to read the Linux "man page" on the getopt
package.
This can be accessed via the command man 3 getopt
. If you need further information,
search for "GNU getopt documentation" on the Web.
😱 You MUST use the
getopt_long()
function to process the command line arguments passed to the program. Your program should be able to handle cases where the (non-positional) flags are passed IN ANY order. Make sure that you test the program with prefixes of the long option names, as well as the full names.
Revise the Error Message Scheme
The original program uses a very ad hoc scheme for error-message reporting:
if an error occurs, a string describing the error is stored into a global
character array errmsg
with a hard-coded maximum size. (This hard-coded
size has an occurrence in the fprintf()
format string in par.c
,
which creates undesirable implicit coupling between par.c
and errmsg.c
.)
At various points in the program, the existence of an error condition is checked
by looking to see if the first character of the error message string is a null
character '\0'
. Before the program terminates, if an error message exists,
then it is printed and the program exits with an error status, otherwise it exits
with a success indication.
Your job is to revise the error message scheme to make it somewhat more general
and to eliminate the hard-coded limitation on the length of an error message.
In particular, you should replace the interface defined in errmsg.h
by the
following function prototypes (exactly as shown):
/**
* @brief Set an error indication, with a specified error message.
* @param msg Pointer to the error message. The string passed by the caller
* will be copied.
*/
void set_error(char *msg);
/**
* @brief Test whether there is currently an error indication.
* @return 1 if an error indication currently exists, 0 otherwise.
*/
int is_error();
/**
* @brief Issue any existing error message to the specified output stream.
* @param file Stream to which the error message is to be issued.
* @return 0 if either there was no existing error message, or else there
* was an existing error message and it was successfully output.
* Return non-zero if the attempt to output an existing error message
* failed.
*/
int report_error(FILE *file);
/**
* Clear any existing error indication and free storage occupied by
* any existing error message.
*/
void clear_error();
The global array errmsg
should be removed from errmsg.h
and replaced
by a pointer variable declared as static char *
in errmsg.c
.
The functions whose prototypes are given above should be implemented so
that there is no fixed maximum imposed on the length of an error message.
This means that error messages should be dynamically allocated on the
heap (for example, using strdup()
). The implementation should take care
not to leak any memory used for error messages; for example if a new error
message is set when one already exists. Before exiting, the program should
call clear_error()
to cause any existing error message to be freed.
Part 3: Testing the Program
For this assignment, you have been provided with a basic set of Criterion tests to help you debug the program.
In the tests/basecode_tests.c
file, there are five test examples.
You can run these with the following command:
$ bin/par_tests
To obtain more information about each test run, you can supply the
additional option --verbose=1
.
You can also specify the option -j1
to cause the tests to be run sequentially,
rather than in parallel using multiple processes, as is the default.
The -j1
flag is necessary if the tests could interfere with each other in
some way if they are run in parallel (such as writing the same output file).
You will probably find it useful to know this; however the basecode tests have
been written so that they each use output files named after the test and
(hopefully) will not interfere with each other.
The tests have been constructed so that they will point you at most of the
problems with the program.
Each test has one or more assertions to make sure that the code functions
properly. If there was a problem before an assertion, such as a "segfault",
the test will print the error to the screen and continue to run the
rest of the tests.
The basecode test cases check the program operation by reading input from
a pre-defined input file, redirecting stdout
and stderr
to output files,
and comparing the output produced against pre-defined reference files.
Some of the tests use valgrind
to verify that no memory errors are found.
If errors are found, then you can look at the log file that is left behind
(in the test_output
directory) by the test code.
Alternatively, you can better control the information that valgrind
provides
if you run it manually.
The tests included in the base code are not true "unit tests", because they all
run the program as a black box using system()
.
You should be able to follow the pattern to construct some additional tests of
your own, and you might find this helpful while working on the program.
You are encouraged to try to write some of these tests so that you learn how
to do it. Note that in the next homework assignment unit tests will likely
be very helpful to you and you will be required to write some of your own.
Criterion documentation for writing your own tests can be found
here.
😱 Be sure that you test non-default program options to make sure that the program does not crash or otherwise misbehave when they are used.
Hand-in Instructions
Ensure that all files you expect to be on your remote repository are committed and pushed prior to submission.
This homework's tag is: hw2
$ git submit hw2