Lab overview
Lab 01
Lab 02
Lab 03
Lab 04
Lab 05
Lab 06
Lab 07
Lab 08
Lab 09
Lab 10
Lab 11
Lab 12
Lab 13
Review

CSC 209 lab 03 exercises, week of 24 May 2022

[solutions are available (requires teach.cs authentication)]

Attendance

Please run "/u/csc209h/summer/present" on the console of a lab workstation, or get the TA to mark you as present.

Part one

1) Using a loop of the form

	for i in *
	do
	    ...
	done

, write a loop to output the names of the files in the current directory which are exactly three lines long (check each file with an 'if').

2) Why do variable interpolations which might have spaces in them need to be quoted? Write a shell script which looks something like this:

	cat "$1"/foo

but try it with and without the quotes, and try it with an argument which is a directory with spaces in the name. (This will take a little work to set up!)

3) Recall our files /u/csc209h/summer/pub/ex/01/[0-9] for which we found the "odd one out" two weeks ago. One of the possible solutions was to diff them all in a loop.

Write a shell script which reads a bunch of file names on the standard input, one per line, and outputs the names of those which differ from the first file. (That is, we're assuming that the first file is necessarily not the odd one out, just to make this a more manageable exercise.)

Remember: A loop of the form while read foo
will iterate until end-of-file on stdin. (You will use a more appropriate variable name, or list of variable names.)

Make sure that your program works for file names with spaces in them. I've copied them into a new directory under /u/csc209h/summer/pub/ex/03 with an awkward name beginning with a space, and you should test your program against that
(e.g. "find /u/csc209h/summer/pub/ex/03/\ * -type f -print | sh yourprog")
(and make sure that the output contains the correct file path name, including the double-space where it occurs)

4) Write a loop which reads all of its standard input, which should be a sequence of integers, one per line, and then outputs the sum. You don't have to deal with incorrect input (although an empty input (no bytes at all) is acceptable and should output zero).

Your loop exits upon end-of-file, so that it is possible to type commands like

	(echo 3; echo 4) | sh yourscript

as well as

	seq 100 | sh yourscript

5) February 2020 had five Saturdays. This is unusual for a February; it can only happen when February 1 occurs on Saturday and the year is a leap year.

Write a short loop in sh using cal to count how many such Februaries there are from the year 2000 to 3000. (Neither 2000 or 3000 meets this criterion, so it doesn't matter whether the loop is inclusive.)

(Example invocation of "cal" to get the calendar for February 2034:

	cal 2 2034

)

Part two: Exercises for credit

1) A "simple substitution" code is a code in which each letter is always encoded as a particular other letter. For example, perhaps 'a' goes to 'q', and 'b' goes to 'x'.

Simple substitution codes can usually be broken pretty easily. Some software tools can help with the process. In /u/csc209h/summer/pub/lab/03 you will find some encrypted text and some useful tools, some of which you will (re)write today. You probably want to be "cd"'d to that directory for most of the lab time, although of course you can only create files in your own directory.

Let's start with the file "problem1". Look at this file with "cat problem1".

In this encrypted file, an 'a' actually represents 'x'; a 'b' actually represents 'u'; and so on for a list of 26 substitutions. The fact that a given letter always goes to the same other letter no matter where it is in the file is what makes this a "simple" substitution code.

To simplify this lab, we will use lower-case letters only.

Create a file (e.g. in your home directory) whose contents are the two lines

	a x
	b u

This means that 'a' goes to 'x' and 'b' goes to 'u'.

The program "subst2tr" converts a decryption file of this format to appropriate arguments to tr to perform the decryption. Type "./subst2tr file" to see this in action (replacing "file" with the appropriate path name to the file you just created above).

Using backquotes, make an appropriate tr command by substituting the subst2tr output into the command-line appropriately. Also redirect in the encrypted file "problem1".

[Formulate your command-line, then click here to reveal my solution.]

Solution:

	tr `./subst2tr file` <problem1

Make sure you understand this command-line before proceeding.

Now, those two letters don't constitute very much progress in breaking this particular encryption, and this one is too difficult anyway. So let's skip ahead to the solution, which you will find in problem1.key. Run an appropriate command-line to decrypt "problem1" using "problem1.key" (and also using subst2tr and tr).

Store this command-line (which decrypts problem1 using problem1.key and our tools) in a file named "lab03a". If this is in your home directory, then while cd'd to /u/csc209h/summer/pub/lab/03 you can type "sh ~/lab03a" to run it. (Or equivalently, "sh $HOME/lab03a".) Test it, and 'cat' it to make sure its contents are just the one appropriate line (you will submit this for part of your lab credit).

2) Aliases

Putting that decryption command into a file as above could save some typing.

Alternatively, you can recall previous commands with up-arrow. But you can't easily recall previous commands with up-arrow forever.

Let's look at a more general solution for this sort of thing.

Try typing this command:

        alias go='tr `./subst2tr file` <problem'

Then instead of saying "tr `./subst2tr file` <problem", you can just say "go".

Aliases can also be used to add default arguments to certain commands. Consider this:

        alias rm='rm -i'

Now when we type "rm file", it will ask us before deleting it. This alias might be in your account by default, as described next.

Type "alias" (return) to see a list of all existing aliases.

But if you close the terminal window and open a new one, you'll lose your "go" alias. In the current case that's probably just as well, but in general, how do we make aliases "stick"? Answer: Add 'alias' commands into a file which gets executed upon login or when the terminal window is created. bash reads commands from a file ".bashrc" in your home directory when started, so add this alias command there.

In bash there are two files in question, called ".bashrc" and ".bash_login" (also known as ".bash_profile" — either name works). .bash_login is executed when you log in; .bashrc is executed for all other shells. So usually we put the aliases in .bashrc, and arrange for .bashrc also to be run by .bash_login (this is usually in place by default).

However, you must take care not to put any commands in your .bashrc which require interactivity or require a terminal (such as "stty"); those should go in .bash_login. Commands in the .bashrc file are always executed whenever a shell is started; commands in .bash_login are executed only when it thinks it's a "login shell".

(There's nothing to submit for this part of the lab.)

3) Letter-frequency tool

Now let's look at problem2.

The program "bfreq" in this directory tells you byte frequencies. Run "./bfreq problem2" to get a list of byte frequencies in this file; "012" indicates the newline (also known as \012), and the second-listed entry with 21 occurrences is a space.

These entries are listed in numeric order by character. Pipe this into an appropriate 'sort' command to see the characters ordered by their frequency of occurrence, with the commonest character first. Put that entire command-line (the bfreq piped to the sort) in a file named "lab03b" for later submission. Again, while cd'd to /u/csc209h/summer/pub/lab/03 you should be able to type "sh ~/lab03b" to execute your command-line.

As you see, the commonest letter is 'k'. Could this be code for 'e', the commonest letter in most English-language text? Start a decryption key file and put as the first entry that 'k' goes to 'e'. Call your file "key3" for later submission.

The second-commonest letter in English-language text is usually 't'. But 'v' occurs only slightly less frequently than 'k' in this text. So for us to guess that 'v' represents 't' is getting pretty dicey; this letter-frequency analysis normally works better on larger text samples. But actually, this guess happens to be correct — in this encrypted text, 'v' does indeed represent 't'. So put this into your "key3" file too.

Now try running the decryption, and you get:

	tfe ewhe tfyt zeb ac ehwex yrteu tfez; tfe pcca hx crt hbteuuea dhtf tfehu
	scbex.  xc eet ht se dhtf lyexyu.

Could "tfe" mean "the"? The 't' and 'e' are the results of the decryption, but the 'f' is the original text. So, might 'f' go to 'h'? We notice a second occurrence after the semicolon, so let's try this hypothesis. Add this translation to "key3", so that when you run the decryption again, you see the words "the".

Look also at the second sentence. Many English-language sentences begin with the letter 't', but this sentence apparently does not. However, the commonest letter for the start of a word in English is 's'. Perhaps 'x' goes to 's'. Add this as well.

Spend as much or as little time as you like further decrypting this text; when you've had enough, you can see the answer in problem2.key; and proceed to section 3.

4) A key-file-checking tool

In your key3 file, you have two columns of letters. If this is a complete and correct decryption key, we would expect all 26 letters to occur in column 1, with no repeats; and we would expect all 26 letters to occur in column 2, with no repeats.

Of course, your key3 file is probably not complete. Run /u/csc209h/summer/pub/lab/03/checkkey against your key file (your file can be supplied either on stdin or as a command-line argument, in the usual unix tool way). Unsurprisingly, this tells you a number of key letters which are missing in column 1. But does it tell you other complaints? The checkkey program also checks for the case that there are duplicate letters in either of the columns.

Run checkkey against the problem1.key file. You might not have noticed these errors! As it so happens, these errors did not affect the output because those letters do not appear in this short encrypted text.

Run checkkey against the problem2.key file and you will see what it looks like when a key file passes the check.

(There's nothing to submit for this part of the lab.)

5) A more substantial decrypting task

In the file "shortstory" you will find an encrypted short story. This is enough text for substantial letter-frequency analysis with "bfreq". The usual letter-frequency order in English-language text is: etaoinshrdlu
(although this might not be perfectly adhered to in a given text sample).

Your task for this final part of the lab is to decrypt this short story and produce a key file for the decryption. Call your key file "key5" (as opposed to "key3" from part 2 above).

To get part marks for this part of the lab, you need to get at least half the letters right, AND your submitted key5 file must pass "checkkey" with no complaints.
To get the full marks for this part of the lab, you need to get all of the letters right, and again your submitted key5 file must pass "checkkey" with no complaints.

To submit

First of all, during the lab time, on the console of a lab workstation, you must run "/u/csc209h/summer/present", or get the TA to mark you as present.

Then, by midnight at the end of Friday May 27, you must submit your files "lab03a" (from part two question 1), "lab03b" (from part two question 3), "key3" (from part two question 3; possibly very incomplete but should be correctly formatted), and "key5" (from part two question 5).

Submit these files with the command

	submit -c csc209h -a lab03 lab03a lab03b key3 key5

Note in this command that the first "lab03" is the "assignment" name you are submitting under, and the other arguments are the files to submit.
You may still change your files and resubmit them any time up to the due time, using "−f" as described towards the end of lab one; and you can use "submit −l" as also described in lab one.