Sunday, November 19, 2006

debugging in lisp

Tim and I installed an update last night, and also used to opportunity while we were bringing the server down and back up to bring the back-end up in slime so we could take a look at some conditions that were being logged during operation. Normally, we use detachtty to run both the web-facing component (which we call nexus), as well as the agent-facing component (called archon). This is useful for starting up a long running process, and then coming back to it, but we miss out on the convenience of slime. (I need to find some information on how to get the convenience of remote slime while still being able to detach/re-attach. Any pointers?). In any case, we were getting a simple-error that:

simple-error in message-handler:
There is no applicable method for the generic function
when called with arguments (NIL).

At first glance, it looks like we are missing a database connection. Maybe we are losing it? But during the course of tracking down the cause, we figure out that it only happens when we try to do a select on the database looking for software called "Part Control System (C:\\Pc\\)".

If I were a real Lispnik, at this point I would launch into a great example about how you can use the power of Lisp and the slime development environment to solve a problem like this very quickly. Unfortunately, I'm not. Instead, I floundered around all day following one bad assumption with an even worse hunch after another.

Gee. it must be an issue with the backslashes, let me try a zillion combinations of escaping and unescaping them. Or, maybe it is that database connection, let me try creating one locally and seeing if I can use that. Hmm, oh, wait...

Now, if I had just used the tools I had, and walked through the problem one step at a time, I would have found the answer much sooner. It turns out the original error is entirely correct. If I had taken the effort to go through the backtrace one function at a time, as I finally did, I would have found that one of the functions was defined as:

(defun sql-output (sql-expr &optional database)

and the function that calls it doesn't actually pass in a database! It turns out that the code path that required the database was only triggered if there were backslashes in the sql expression. I've got an email in to the clsql list to see if this is a real bug, or just a figment of my imagination. I do know that setting a default value for the &optional database to *default-database* fixed the problem for me.

This is not to say that slime, the debugger and inspection tools are not powerful. Once I actually started using them, I was able to find out what was wrong pretty quickly. It was a trepidation toward diving into the clsql code itself that caused me so much wasted effort. The backtrace showed me where the error was, I had the code on my machine—heck, hitting v on the line in the backtrace will take you right there—and I avoided it by trying to cargo-cult my way out of the problem. In the end, the clsql code in question was not even that complex. I see the old-timers on #lisp respond to queries all the time to the effect: "what is the backtrace telling you?" Maybe we are just so new at it that we haven't come to trust our tools yet, or we come from a background with tools that weren't trustworthy (C++ template errors anyone?). One thing I can take comfort in: there will be no end of opportunities to get more familiar with debugging in Lisp.



Post a Comment

<< Home