Sunday 28 April 2013

Completely Reliable Conclusions

(This post is another follow-up to Learning to Program, which in turn was a kind of follow-up to Not a Tool, but a Philosophy of Knowledge.)

I've been reading A Revolution in Mathematics? What Really Happened a Century Ago and Why It Matters Today (Notices of the AMS, vol.59 no.1, Jan 2012, pp31-37) and a stack of other papers by mathematician Frank Quinn. He describes how, since the early twentieth century, professional mathematicians have worked in a particular way, the way they found most effective. Shockingly, Quinn points out that this is not how students are taught to do mathematics (in the USA) at any level up to and including a large part of undergraduate material. Instead, because of a misunderstanding by maths educators, students are taught outdated and inferior methods from the nineteenth century. This difference, although it seems both obvious and remarkable once pointed out, has remained unnoticed by mathematicians and educators until very recently.

Quinn goes on to argue that this unnoticed difference could to a large extent explain the poor and declining results of mathematics education in the USA. (And to the extent that other countries use the same poor methods, we should expect the same poor results elsewhere too.) The problem has become worse in recent years, says Quinn, largely because those in control of maths education believe that their methods are just fine, and believe that success will come with more vigorous application of the same poor methods. So rather than change these methods, the authorities have successfully championed "more of the same", leading to a "death spiral" from which escape appears uncertain.

I find Frank Quinn's ideas very convincing. I've done enough maths tuition that from personal experience I recognise some of the concrete problems of student understanding that he describes. It's rather frightening to see these problems explained as merely symptoms of a larger systemic problem. I think Quinn's ideas are crucial for the future of maths teaching, but I believe there are lessons here for us too, as we consider how best to teach programming. It's possible that we too have been going about things the wrong way, misunderstanding the fundamentals, and that the problems we face in teaching programming are exactly the same problems that Quinn describes. If so, the good news is that we can immediately draw on Quinn's experience and suggestions of how to do it better. I'll return to this thought later, but first let's examine more closely what he says about the difference between nineteenth and twentieth century methods in mathematics.

Up until the late nineteenth century, a mathematical proof depended to a large extent on physical intuitions and on understanding what a mathematical model "really" meant:

The conventional wisdom is that mathematics has always depended on error-free logical argument, but this is not completely true. It is quite easy to make mistakes with infinitesimals, infinite series, continuity, differentiability, and so forth, and even possible to get erroneous conclusions about triangles in Euclidean geometry. When intuitive formulations are used, there are no reliable rule-based ways to see these are wrong, so in practice ambiguity and mistakes used to be resolved with external criteria, including testing against accepted conclusions, feedback from authorities, and comparison with physical reality. (See A Revolution in Mathematics?, p31.)

The great revolution in mathematics, which took place from about 1890 to 1930, was to replace intuitive concepts and intuitive proofs with systems of explicit rules, like the rules of a game, without worrying about what they "really" meant. To some people, including philosophers and elite mathematicians with excellent intuition, this seemed like a loss, but it brought with it an amazing benefit: arguments based on consistent application of these rules led to completely reliable conclusions. This meant that mathematicians could build a much more complex edifice of rules and proofs than would ever have been possible in the nineteenth century, safe in the confidence that however complex, it was all still correct. The new methods also opened up mathematics research to ordinary mathematicians, not just to super-stars with extraordinary intuition. Of course, constructing the required proofs still required imagination and experience, but there was now a systematic way to proceed when you got stuck:

When someone reaches his personal limits of heuristic reasoning and intuition, the reasons for failure are obscure and there is not much that can be done about it. This is why advanced mathematics was limited to a few extraordinary people up through the nineteenth century, and why students feel stupid when they reach their limits today. The great discovery of the early twentieth century was that basing mathematics on disciplined reasoning rather than intuition makes it accessible to ordinary people. When people reach the limits of good basic logical skills then the failures are localized and can usually be identified and fixed. There is a clear, though disciplined and rigorous, way forward. Experts do eventually develop powerful intuitions, but these can now be seen as a battery, charged by thousands of hours of disciplined reasoning and refinement. (See Reform Mathematics Education ..., p11.)

As programmers, we can recognise that debugging a proof under the twentieth century mathematical regime is very much like debugging a program: "failures are localized" and so, with disciplined reasoning, "they can usually be identified and fixed". Twentieth century methods are "error-displaying", in the sense that if a mathematical argument produces a false conclusion, then it will be possible to find the error, because the error will be in the mis-application of some rule. Mistakes happen; potential proofs usually have errors, in the same way that programs, when first written, usually have bugs. But if the steps of a proof are set out in precise detail (similar to the precise detail you need in a computer program) then you will always be able to find exactly where a rule was broken. This in turn will often suggest an alternative approach or a fix that will mend the proof.

(And of course, another link with computing is that it's one thing to define a system of rules, it's another thing to be sure of applying them completely reliably, with no mistakes. How can you construct social and physical systems which guarantee this, given only fallible humans and unreliable physical artifacts? Nothing in the real world is perfect. What custom and technique could in practice guarantee complete reliability? Mathematicians have worked out their own answers to that question, but that question is exactly the concern of computer science! That question calls forth all of computer science from low-level machine design, through compilers, operating systems, programming and interaction design, to psychology and organisation theory. It is scarcely co-incidence that the mathematicians John von Neumann and Alan Turing were computer pioneers. They wanted to see their method embodied in machinery.)

So, standard practice in modern "core" mathematics revolves around systems of formal rules and the completely reliable conclusions that people can draw from them. What about mathematics education? In what sense is it still using nineteenth century methods and why exactly is that bad?

With nineteenth century methods intuition is key, and mathematics education has concentrated on intuitive understanding first and skill at applying formal rules second (or never). One problem with this is that correct intuition in mathematics comes from working with rules and internalising them. Trying to hand students an intuition first is very dangerous, because they will often often get the wrong intuition, or just be confused. And confusion is perhaps the better outcome, because intuitions once gained can be very hard to change. Quinn cites Do naïve theories ever go away? by Dunbar et al. (in Thinking with Data: 33rd Carnegie Symposium on Cognition, Ed. Lovett and Shah, 2007). The problem is that subsequent learning doesn't seem to actually correct an earlier misunderstanding, it only modifies the misunderstanding: "even when conceptual change appears to have taken place, students still have access to the old naïve theories and ... these theories appear to be actively inhibited rather than reorganized and absorbed into the new theory" (Dunbar et al., 2007, p.202). A bad intuition is dangerous because it is so sticky and difficult to change.

In the USA, with its popular "reform math" curriculum, this emphasis on "understanding" has gone hand-in-hand with a drastic decline in the practical abilities demanded of students at every level. Expertise in disciplined reasoning, with careful step-by-step application of formal rules, is seen by the nineteenth century educational authorities as at best a secondary concern. Since it is unimportant, the authorities think it can be dropped with no loss. But this is a mistake which has dire consequences. Quinn comments that, as a university maths lecturer, he is responsible for developing the math skills needed by engineering students. However:

Our goals for student learning are set by what it takes to deal effectively with the real world, and can't be redefined. The problem is that, as compared with fixed real-world goals, useful preparation of incoming students has been declining for thirty years. The decline accelerated 10-15 years ago and the bottom has almost dropped out in the last five years. (See Reform Mathematics Education ..., p3, written in 2012.)

Since rigorous thinking is not emphasised by the educational authorities, it is scarcely surprising that students have internalised the principle that precision is not important. Quinn gives an example of how this typically plays out on the ground:

Recently a student came to try to get more partial credit. He had put a plus instead of a comma in an expression, turning it from a vector to a real number. "But there was an example that looked like this, and anyway it is only one symbol and almost all the others are right." He had never heard the words 'conceptual' and 'error' used together; it made no sense to him and he would not accept it as a justification for a bad grade. (See Reform Mathematics Education ..., p4.)

What, you might ask, has this got to do with learning to program computers? Well, first of all, these are the same people we are trying to teach how to program! Quinn's example is eerily similar to the experiences I've had with some programming students, people who seem unable to comprehend that complete precision is necessary. There is such a gap of understanding that in many cases it seems impossible to cross it. You can think that you have communicated, but then you look at what they are doing later and you see that they didn't get it at all. They are not working precisely to find their error. They are still just superstitiously shuffling the symbols in their program, hoping that they will hit the jackpot and their program will appear to work.

What can we do? Maybe, if Quinn is right, not much. Maybe for them the race is lost. Maybe we would have had to start a decade or more earlier, with very different maths teaching in school. And so, if we really think "everyone should learn to code", maybe that's where we should start today, before it's too late.

Secondly, in programming education we may by chance have thrown the baby out with the bathwater, just as the reform maths educators did with calculators in schools. In the old days, before calculators, children had to learn to do long multiplication, for example 431 × 27, using pencil and paper. This was a bit of a chore, now largely dropped in favour of tapping the problem into a calculator, which gets the answer more reliably. However, it turns out that the children were learning a lot more by accident than how to multiply long numbers. They were learning to set out their working in an error-displaying form, essentially a twentieth century proof that their answer was correct. They had to be absolutely precise in their working, but if they made a mistake, they could check their working, find the mistake and correct it. Their teachers could periodically take in this work and confirm that their pupils were working accurately and in a proper error-displaying way. Not only all that, but children were intuitively learning something about the mathematical structure of numbers by working with them, so that when they came to polynomials they found that working out (4x² + 3x + 1) × (2x + 7) was not much different to working out 431 × 27. (In fact it's a bit simpler, because there are fewer carries.) To someone with a calculator, they are entirely different.

I wonder if, in the way we try to teach programming nowadays, we may have fallen into some similar traps, by not realising what students accidentally learned in the past. For example — and here's a heretical thought — are languages with GOTO really as bad as we imagine for novice programmers? Dijkstra claimed that "It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration". But don't forget this was just a claim: where's the evidence? Dijkstra himself obviously first learned to program in a language with only GOTOs, as I did, and I'm fairly happy that it did us no lasting damage. In fact I think it forced us to think about our code and to work with it in a particular detailed way, and this earlier practice may have served us well later, even when we programed in structured languages.

The question of whether structured languages are better later, for expert coding, is not the point. Clearly for doing sums, using a calculator is better later than using pencil and paper. But for the reasons outlined above, it's not better earlier. The question in either case should not be just "can students do the old problems more reliably?". Of course students with calculators get the answer more reliably than students without, but they don't learn those other early skills which support future learning. Could this, perhaps, be the case with us too? (Although this view is far from mainstream, I am relieved that I'm not the only person to suggest it. See Where Dijkstra went wrong: the value of BASIC as a first programming language.)

And finally, whether we are computer scientists promoting "computational thinking" or core mathematicians promoting twentieth century methods, it strikes me that we actually want the same thing. Perhaps we should make common cause. The easiest way to save core mathematics might be through computer science. After all, since the mathematicians did us the favour of founding our discipline, the least we could do in return would be to help them save theirs.

(There's lots more interesting stuff on Frank Quinn's education webpage. Well worth a read.)

Sunday 21 April 2013

Three Good Questions

Good questions are important. It's often fairly straightforward to find a good answer, if you're given a good question. It's a lot harder to find a good question, given nothing. Over years of doing system design, I found three "meta" questions particularly useful. These have sometimes helped me to imagine things that might go wrong during the construction or operation of a system. Maybe they could help you too.

I was reminded of these "meta" questions by an email from Aaron Sloman following my post on Learning to Program. He said:

"I could have added that I noticed something different when teaching philosophy. Doing conceptual analysis well requires the ability to dredge out of your vast store of knowledge examples that illustrate, test or refute, some concept, distinction, or thesis.

"I found over many years of teaching that I could teach some students to produce examples merely by doing it myself in discussions: they somehow got the idea and were able to do something similar. With other students they seemed to lack the ability to access their long-term memory in an appropriate way, and I was not able to think up a way of teaching them that helped them to develop that ability. I suspect this is related to the ability to do well in the Jeopardy game, but is more complicated, and less a matter of speed.

"More importantly it may also be related to some high level requirements for being a good system designer: too often systems are designed with flaws that the designer should have noticed without waiting for users to discover them. That requires the ability to trawl through what you already know, and I fear some designers are very bad at that, and design bad systems."

So maybe my Three Good Questions help me to "trawl through what I already know", and to imagine what might be. Anyway, here are the questions:
  • Does it scale? Usually it doesn't. When you take a prototype system that works fine in the lab and try to scale it up, you almost always run into problems. Sometimes these are latency or bandwidth problems. (A system which works fine in the lab on a lightly loaded local-area network is going to feel very different to users on a congested wide-area network.) Sometimes there are problems with algorithms that have unfortunate growth characteristics. There might even be an accidental reliance on particular people who are available to make the prototype system work, but who will not be available in a larger deployed system. (For example, I am suspicious that the currently over-hyped "Massively Open Online Courseware" will fail to scale in exactly this way.) There are lots of ways that a small system can fail to scale and only one of them has to be true for it to be a fatal problem for your system.

  • Does it have a start-of-day problem? A "start-of-day problem" is a familiar concept to system designers, but it is not a term in everyday use. A system has a start-of-day problem if it can reproduce itself once it exists, or keep itself going when it is going, but which needs an entirely different mechanism to start it going from nothing. For example, think of the problem of powering-up a ship from nothing. When its main engines are running, electricity comes from generators driven by those engines. (In some ships like the QM2, that's actually all that the main engines do: power transmission to the propellers is entirely electric.) However, the control systems and starter motors for the main engines need electricity to work. What provides that electricity if the main engines are not already going? This is a start-of-day problem, and it is solved in this case by having auxiliary generators to provide that electric power. (But they themselves have their own start-of-day problem, and they need to be provided with further mechanisms to start them from cold ...) We see this problem over and over in system design. For example, it is traditional to write compilers in their own language. But in that case how do you compile the first compiler for that language? (By having a bootstrap compiler written in something else, that you run once then never need again.)

  • What are the security implications? If we are doing system design properly, then we think about security all the way through, not just at the end. We think about exactly what security guarantees we are going to provide and to whom. We think about the traditional security properties of Confidentiality, Integrity and Availability (and maybe a few others). But this question is asking you to go a step further, and to consider the unintended consequences which might result from correctly implementing the security guarantees that you think you want. For example, to prevent an online password-guessing attack, it's usual to lock-out an account after, say, three wrong attempts. That's great for confidentiality, because a locked-out attacker can't now get access to your data. However, the unintended consequence is that neither can you! The lock-out mechanism which delivers better confidentially has also delivered worse availability: if all the attacker in fact wanted was a denial-of-service attack, you have handed it to him on a plate.
As I have been writing this, it struck me that the idea of trying to produce a counter-example is also very strong technique in system design. To phrase this as a "meta" question, we might say: If you think it can't work, can you prove it can't work? Sometimes by trying (and failing) to prove that it can never work, you can actually solve the original problem. For example, I was once working on a project with an apparently insurmountable architectural problem. If we couldn't solve it, the project we were working on would almost certainly be closed down. We'd been trying to solve it for a couple of months, but with no success. So one Friday afternoon, I decided to finally put the issue to rest and I tried to prove to a couple of colleagues that a solution was impossible. I didn't manage to do that, because over the space of about an hour, by trying to construct that proof I was finally able to find a solution to our problem.

System design is always an exercise in imagination. We often know the answer, but we don't know that we know it because we need someone to ask the right question, a useful and productive question which calls forth the answer in a flash of insight. Good system designers distinguish themselves by being able to ask exactly those fruitful and productive questions. Can you teach that? I'm not sure.

Sunday 14 April 2013

Some Useful Things

Little things can sometimes make a big difference. Here are a few things that I've found useful.

Food: Kuhn Rikon Epicurean Garlic Press

If you are going to use a garlic press, then this is the one to pick. It's heavy duty, made from stainless steel. It's probably the best garlic press in the world. To see a demo, have a look at America's Test Kitchen Equipment Review: Garlic Press where they come to the same conclusion and show off this little device. (Kuhn Rikon have another garlic press called the Easy Squeeze, which is a lot cheaper. It has a slightly different action and plastic handles. It's not nearly as good.)

Of course, if you are being macho you can just crush garlic with a kitchen knife. For example see this YouTube video. (Though the cook in this video makes a bit of a performance of removing the skin. The trick is to just trim off the ends of the garlic first with a knife, like you would while topping-and-tailing mange-tout peas. Then use the flat of your knife and gently lean your whole weight on the garlic until it just goes "click!". You can then just peel off the whole of the skin in one go rather than tediously picking pieces of skin out of the mashed up garlic.)

Coding: redemo.py

This little Python GUI program is endlessly useful when you find yourself tweaking a regular expression to make it parse your data correctly. In fact I would nowadays usually start by copying some of my example data into redemo.py's input box and then start writing my regular expression directly into its regexp box. That way I can check as I go along that the regexp is doing what I want, and when I make a mistake I'll get immediate feedback. (Redemo.py highlights the data that currently matches, reacting character-by-character as you change the data or the regexp.) When you get your regexp to work, you can just copy it straight into the program that's going to use it. And you're finished.

(On Windows, you can find redemo.py in the standard Python distribution, under Tools/scripts or Tools/demo depending on the version you are using. However, on Ubuntu Linux you have to sudo apt-get install python-examples or python3-examples, then find the mysterious place where it's been hidden, using say find /usr -type f -name redemo.py. Worth the effort though.)

Food: Star Anise

This is a useful ingredient that you might not think of adding to meat dishes. (I use it in my chilli recipe.) Obviously star anise plays a big part in Chinese cooking — it's one of the five spices — but it's less common in Western cooking. But as Harold McGee says in McGee on Food and Cooking, when you cook star anise with onions "the result is the production of sulfur-phenolic aromatics that intensify the meatiness of the dish." (p430)

Coding: Inconsolata font

Earlier this year I switched to Inconsolata for all my program editing, command shells and so on. It's not a huge change, but I think it's just a bit nicer than the alternatives. For example you get a zero with a slash though it, easily distinguishable from an uppercase letter "O", and you can quite easily tell the difference between numeral one, lowercase "L" and uppercase "I".

Sunday 7 April 2013

Why are variables so confusing?

(This post is a follow-up to Learning to Program.)

For many people, the first stumbling block when they are learning to program is that they find variables very confusing. Why is this?

It doesn't seem quite so surprising that novice programmers find functions and objects hard to understand. (These topics seem to form the next two major stumbling blocks for novice programmers.) Functions and objects both draw on an understanding of abstraction, and the idea of abstraction seems to be hard for some people to grasp; they seem to find it difficult, when presented with particular examples of abstraction, to understand that these are all illustrations of the same general concept. It's as though you need to already understand abstraction before you can see it in the examples.

So, functions and objects seem like major ideas, but ... variables? Why is it so difficult to use variables? To expert programmers, variables seem entirely obvious. It's really hard to see what the problem is. I'd like to suggest a possible explanation, which is that the novices tend to confuse one variable with another because they are suffering from a short-term memory overload. Maybe novices can only hold one or two variables in mind as they think about their program.

At first it seems unlikely that this could be the case. We all know about George Miller's seminal paper The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information (Psychological Review, 1956, vol 63, pp81-97). We expect our short-term memory to be able to hold around five to nine "things" before we start confusing or forgetting them. But that number depends upon how easily we recognise the things, based on "categorical" knowledge from our long-term memory. If we don't have that knowledge, then all the things tend to look similar, and the number we can hold in short-term memory can be a lot less. It can be as low as only one thing!

For example, Henrik Olsson and Leo Poom, in their paper Visual memory needs categories (PNAS, 2005, 102(24), pp8776-8780), describe experiments in which they presented their subjects with visual patterns which were easy to distinguish when viewed side-by-side, but which were confusing when viewed one after another because they were drawn from the same categories. (For example, differently oriented ovals within ovals.) They found that for these kinds of patterns, the subjects' short-term visual memory could only hold one thing: the last thing they saw. This was quite a surprise to the participants in the experiment, who expected to be able to do a lot better than that.

This kind of confusion is not restricted to visual memory. A similar effect is described by Malcolm Gladwell in his book Blink:

"Have a friend pour Pepsi into one glass and Coke into another and try to tell them apart. Let's say you succeed. Congratulations. Now, let's try the test again, in a slightly different form. This time have your tester give you three glasses, two of which are filled with one of the Colas and the third with the other. In the beverage business, this is called a triangle test. This time around, I don't want you to identify which is Coke and which is Pepsi. All I want you to say is which of the three drinks is not like the other two. Believe it or not, you will find this task incredibly hard. If a thousand people were to try this test, just over one-third would guess right — which is not much better than chance; we might as well guess." (Blink, p185.)

Unless you are an expert taster, unless you have practiced and have the categorical knowledge in your long-term memory which lets you recognise and distinguish these different tastes, your short-term memory only holds one thing, only one taste, the last thing you tasted. And we don't expect that.

So what I'm suggesting is that a similar effect might happen with variables in programs. When experts look at variables, they are applying categorical knowledge from their long-term memory. They recognise how variables are used and what their purpose is in the program. They recognise what each variable is for. Experts choose much better names for variables than novices. When an expert chooses a name for a variable, they tend to choose a name which reflects the purpose of the variable. Given a poorly written program, when an expert has understood it, one thing they will certainly do is to choose better names for the variables.

Novices, on the other hand, don't do any of this. They can't. They don't have the categorical knowledge in their long-term memory about patterns of variable use. Of course, they can tell one variable from another, just like people can tell Pepsi from Coke. But if I'm right, they will find it extremely hard to understand programs which require them to hold in their mind more than just a single previous variable. We might therefore expect that x = x + y would be just on the limit of understanding for a novice. (There are two variables, but x actually means two different things on each side of the assignment, which might push it over the edge.)

If novice programmers cannot understand their programs because they cannot intuitively distinguish variables, this might explain some otherwise mysterious features of their code. For example, I'm sure I'm not the first programming teacher to notice the remarkable enthusiasm of novices for inventing new variables and scattering their code with redundant assignments, as if they hope that one more might do the trick and make their program work. (And by accident, it sometimes does.) It might also partly explain a frequent misunderstanding when we come to functions and objects: the idea that every variable with the same name is actually the same variable. (As experts we have no problem with the idea that we can use the same name for local variables in different functions, even ones that call one-another. Or that this or self in a method is different when invoked on different objects. But if novices have to fall back on merely comparing the textual names of variables, they are bound to be confused when they try to reason about different variables that happen to have the same name.)

Of course, without intuitive understanding, many novice programmers would just give up, and many do. However, if they persist, after a while novices will start to unconsciously pick up the long-term categorical knowledge that they need. (Or at least those who are ultimately successful must do this.) But rather than leaving this to chance, there are some things we can do to help them on their way: we can at least try to explicitly teach some categorical knowledge about variables through worked examples. I have found that explaining the roles of variables seems to help a lot. (But just talking about roles is not quite enough, you have to get students to engage more and to practice describing the roles of variables in example programs and in their own programs.)