Assignment two questions and answers

Here are some questions and answers about assignment two. Suggestions for additions to this list are welcome (via e-mail).

New questions and answers will be added at the end. This means that the questions are not organized into topics. I hope that the list isn't long enough to make this too big a problem, and I think it's more important for people to be able to read only the new stuff, by scrolling down to the end.

Some issues are the same as for assignment one. Questions and answers from assignment one which still apply have been placed into a separate file. Any further such questions will be added at the bottom here, then placed into the merged file only for assignment three... etc.


Consider the file "input", which looks like this (I've added statement numbers):
0x = 3 + 3;
1y = x + 4;
2z = y * x;
3w = 14;
4x = w + 3;
5v = 29 - v;

We see that v is used in statement 5 only; w is used in statements 3 and 4 only; x is used in statement 0, 1, 2, and 4 only; y is used in statement 2 only; and z is used in statement 2 only.

Thus the live-ranges are as follows:
varstatementslive-range
v55-5
w3, 43-4
x0, 1, 2, 40-4
y1, 21-2
z22-2

We go through the line numbers, creating adjacencies:
linelive variablesedges added
0xnone
1x, y{x,y}
2x, y, z{x,z}, {y,z}, and we would add {x,y} except it's already there
3w, x{w,x}
4w, xwe would add {w,x} except it's already there
5vnone

This yields a graph which we could draw like this:

Since x, y, and z form a complete graph of order 3, they must be distinct colours. w can then be the same colour as either z or y, and v can be the same colour as any of them. So we need three memory locations.


There is now a "testgraph.c", available in ~ajr/a2 on CDF or here, which you can link with your graph.c to test it independently from main.c. It does not test the colouring algorithm, just a few things in the vertex and edge management. Make sure that your graph.c compiles with testgraph.c, emalloc.c, and no other .c files, using the original .h files.


Q: Which data representation should I use for the graph in assignment two, linked lists of adjacencies or an adjacency matrix?

A: You should not use linked lists of adjacencies, because this graph will typically be very dense. An adjacency matrix is almost always faster, but may use more memory. However, for a dense graph, the adjacency matrix uses less memory. And is still usually faster (it depends upon the task; it's faster for determining whether there is an edge between two arbitrary vertices).

Someone pointed out an interesting data structure which I hadn't thought of, which is to represent the complement of the graph as adjacency lists. If the graph is very dense, this could take less memory than the adjacency matrix.

But I don't think we have enough information about typical amounts of live-range overlap to be able to have any confidence that the adjacency-list-of-the-complement version will typically take less memory than the adjacency matrix, and it's certainly more complex and slower. I'd say to use the adjacency matrix representation for this assignment. It's easier, too.


Q: Can I add a function to symbol.c and call that function from main.c?

A: No, not if it's not a function specified in the originally-distributed symbol.h. Each of your main.c and symbol.c files must be able to work with other people's files. For example, I should be able to compile the my main.c, my graph.c, and your symbol.c, and get a working program. Or use your graph.c and my main.c. You can't change the specifications stated in the .h files.


Q: I need to add some (private) information to the GRAPH data type. Don't I have to put this in graph.h?

A: No. Graph.h is written at a sufficiently abstract level that it does not depend on the data representation, which is a private matter for graph.c. You can add members to the struct graph inside graph.c without modifying graph.h or any file which calls anything in graph.c.


Q: So we cannot add any "helper functions" to graph.c?

A: You can add internal "helper" functions to any file you are writing. You declare them within the .c file itself with 'extern'. You don't put their declarations in the .h file, and thus other files cannot call them. They are like private methods in java.

(Actually there is a keyword "static", which means two different things in C (they got stingy about allocating new keywords), but in the case of a function declaration it always means that it is not made available to other files. This keyword is quite underused in C and we won't expect you to use it in this course; sooner or later we'll be moving to C++ and using real classes again. However, the sample solution for assignment one will use "static" properly and you can see how it works at that point.)

The only rule is that each file has to be a unit -- self-contained except for the specifications in the supplied .h files. For example, a "helper function" used in graph.c and not declared in any .h file must appear in graph.c. Then your graph.c will work in combination with anyone else's other files.

After this assignment is submitted, someone could choose the best graph.c, the best symbol.c, and the best main.c, perhaps all from different students, and they would work correctly together.


Q: What is the "degree" of a vertex?

A: The degree of a vertex is the number of other vertices which are adjacent to this vertex. This is the same as saying it's the number of edges which mention this vertex. It's a local measure of how complicated the graph is, in a particular sense, and thus relevant to this somewhat heuristic graph-colouring algorithm: If we colour the vertices of higher degree first, we're more likely to end up with a small number of colours (to the extent that the graph is colourable with a small number of colours at all).

The term is defined at the end of the definitions section in the readings, bottom of page 372 in the graph theory section of the booklet.


Q: Can we assume a maximum number of variables so as to declare an array of vertices? Can I put a #define MAXSIZE 100 somewhere?

A: No. The input program may have more variables than this. It must be dynamic. You must make your program work with any number of variables so long as there is enough memory.

The structure of symbol.c makes it quite straightforward to do this dynamically, based on the number of symbols seen in the initial scanning phase. Use malloc() where necessary, as done in new_graph() in graph.c.


Q: Do we have to use something like quicksort for the sort by degree, or is an insertion sort ok?

A: Anything better than a bubblesort is ok, including an insertion sort. On second thought, a bubblesort is ok too, because the rest of the colouring algorithm is O(n3), so even with a bubblesort the whole thing is still O(n3).


Q: My program gives a "segmentation error" (or any other kind of run-time OS error), but I can't figure out where it is happening. What's the best way to track these down?

A: For most investigation into program behaviour, simple printfs scattered through the code, telling you the value of variables at interesting points, gives a lot of information. But specifically for mysterious crashes, you can compile and link with -g, and then use gdb, a debugger. I can't really tell you a lot here about using a debugger, but I wrote out a short example session which might help. The key command is "where" (then "quit").


Q: Help! My program doesn't work!

A:

  1. Think of a particular undesirable behaviour your program exhibits. ("Doesn't work" isn't a good enough starting point for a debugging task.)
  2. Trace your program up to the point of misbehaviour and figure out what the values of variables should be at the various points.
  3. Put in printf statements which show the values of variables at various points so that you can determine at what point the behaviour of your program is diverging from your expectations.
To determine the point of misbehaviour when the evidence of misbehaviour is a run-time exception such as "segmentation exception" or "bus error", use gdb as discussed in the previous question.


Q: Can I define new global variables?

A: You can define further variables at the file level, if needed, but you can't use them from other .c files. Just like "helper functions" -- add what you want to the module itself, but don't change the interface.


Q: So should I add global variables to store the adjacency information in graph.c?

A: No, because then it won't work if you create more than one graph.

Variables defined at the file level, but only used in that one .c file, are like class static variables in java. What you want for storing the adjacency information is like an instance variable in java. To achieve this, add your data into struct graph.

Similarly you can add to any of the other struct type definitions as needed, to hold your private data relating to each instance of that data type.


Q: Is the structure in symbol.c with the buckets basically a two-dimensional array?

A: You could think of it that way and I couldn't say you were wrong, but I don't think that that's the best way to think of it.

It is a set, without regard to order. The hashing algorithm gives you an efficient way of doing lookups.

hash() is a simple and quick function which deterministically produces a number from 0 to HASHSIZE (semi-inclusive, i.e. including 0 but not including HASHSIZE) which is the bucket number for that symbol. This is the standard symbol lookup technique used in compilers. If the number of buckets is close to the number of symbols (or more) and the hashing algorithm is good, you get approximately O(1) lookup time, way better even than a binary search tree.


Q: Speaking of two-dimensional arrays, how do I create a dynamic two-dimensional array for the adjacency matrix? No matter what I try, I get compiler errors.

A: You can't really do dynamic two-dimensional arrays in C, at least not the simple kind.

To do dynamic arrays, we use the fact that all we really need is a pointer to the zeroth item of the array; e.g. if we want an array of ten ints, we can use a value of type pointer-to-int to access the memory area through.

This doesn't work for more dimensions. We can allocate a dynamic number of arrays of five ints (as opposed to a dynamic number of ints), but that "5" is part of the type. The allocation would look like this:

	p = (int (*)[5])malloc(m * sizeof(int[5]));

So there's no way to make that dynamic, because the type has to be fully specified. That is, this pointer-versus-array dance gets us through just one dynamic array dimension.

One good solution is to do the indexing calculations yourself. Instead of trying to use a type like (int(*)[n]), use plain ol' int*:

	int *p;
	...
	p = (int *)malloc(m * n * sizeof(int));

Then instead of p[i][j], simply use p[i * n + j].

That is, it works like this, supposing that m==2 and n==3:
iji*n+j
000
011
022
103
114
125
and all is good because we did indeed allocate 6 ints of space.

There are other possible ways to do this.


Q: What is symbol_set_seen() supposed to do?

A: As already implemented in main.c in the starter code, every time we see a given symbol, we call symbol_set_seen().

For example, if we take the first sample input file, and look at the variable 'x':

        x = 3 + 3;
        y = x + 4;
        z = y * x;
        w = 14;
        x = w + 3;
        v = 29 - v;
We will call symbol_set_seen() many times, but looking only at the calls for the symbol 'x', it will be called with second arguments 0, 1, 2, and 4.

This is the data we need to be able to deduce the live-range of 'x'. The variable 'x' is live from the statements numbered 0 to 4 inclusive.

So, symbol_set_seen() will need to keep track, separately for each symbol, the min and the max of all of the statement number occurrence values seen so far.

Later on, the boolean function live_range_is_overlapping() in symbol.c will be able to use the live-range data (the min and max) to determine whether or not the two argument symbols' live-ranges overlap.


Q: How do I run the program with the file "input" as input?

A: ./a.out <input
The "<" is "i/o redirection". See the Student Guide to CDF, or look up i/o redirection in any reasonable how-to-use-unix book or CSC-209-like textbook.


Q: I don't understand token.c at all.

Reply: You don't have to understand token.c; you should be able to use it without looking at the insides. In fact, the using it is implemented in the starter code too.

Not that I want to discourage you from asking about it. We'll answer questions about token.c just as much as questions about any other aspect of the starter code. But you shouldn't worry about it; it shouldn't interfere with your work on the assignment.

Furthermore, you do have to be able to use code which you don't understand, or don't have source code for, or don't have the time to read. You should be able to use token.c without understanding it.


Q: How do I initialize members of a struct? The following gives a compile error:

struct symbol {
    ...
    int somethingelse = 3;
    ...
};

A: That is because the struct definition defines a type, not an object. There isn't an actual "somethingelse" member to initialize, not yet, until you malloc one of these structs.

So you have to assign to it it after the malloc() in symbol_lookup(). See new_graph() in graph.c for a similar example.


Q: Where do I store the adjacency matrix in graph.c? Do I add a global variable?

A: No, you do not add a global variable; you must have a separate adjacency matrix for each GRAPH which exists in the program. Thus you should add a variable or variables to store the adjacency information into struct graph.


Q: Why does symbol.c #include "token.h"?

A: Because I made a mistake. It shouldn't. (Obviously, we won't complain if your submitted symbol.c still has this #include in it, but I've removed it from my own solution...)


Q: What does that comment about "not distributed with starter code" in symbol.h mean?

A: Also a mistake (left-over from an earlier plan about what portions of the assignment you had to do and what we gave you). Sorry. There is a new version of symbol.h without that silly comment, but either symbol.h will work the same (and you don't submit it, anyway).


[
main course page]