<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Scott Schneider</title>
	<atom:link href="http://www.scott-a-s.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.scott-a-s.com</link>
	<description>Computer Science, computers and science</description>
	<lastBuildDate>Sat, 29 Sep 2012 22:38:22 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Traces vs. Snapshots: Print Statements and Debuggers</title>
		<link>http://www.scott-a-s.com/traces-vs-snapshots/</link>
		<comments>http://www.scott-a-s.com/traces-vs-snapshots/#comments</comments>
		<pubDate>Sun, 09 Sep 2012 20:53:43 +0000</pubDate>
		<dc:creator>Scott Schneider</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.scott-a-s.com/?p=27</guid>
		<description><![CDATA[To my surprise, some programmers consider using print statements instead of debuggers as a wholly inferior means of debugging. As I view the debugging process, they are complementary techniques. But the issue is not really &#8220;print statements&#8221; versus &#8220;debuggers.&#8221; It&#8217;s traces versus snapshots. Traces provide a long term view over a small set of data, [...]]]></description>
				<content:encoded><![CDATA[<p>
To my surprise, some programmers consider using print statements instead of debuggers as a wholly inferior means of debugging. As I view the debugging process, they are complementary techniques. But the issue is not really &#8220;print statements&#8221; versus &#8220;debuggers.&#8221; It&#8217;s traces versus snapshots.
</p>
<p>
Traces provide a long term view over a small set of data, and snapshots show <em>all</em> of the data from a moment in time. Or, as a figure:
</p>
<p>
<center></p>
<p>
<img src="../files/state_view.png" width=293>
</p>
<p></center>
</p>
<h1>Traces</h1>
<p>
Most of my programming time is spent working on the runtimes for parallel systems. Whether it&#8217;s <a href="projects/#streamflow">multithreaded memory allocation</a>, <a href="projects/#cellgen">automatic data transfers</a>, or tracking messages in a <a href="http://www-01.ibm.com/software/data/infosphere/streams/">distributed system</a>, these all have one thing in common: they&#8217;re event based. The code I write is not the code driving the program; my code is servicing an application.
</p>
<p>
When I have bugs, they are typically algorithmic in nature. My code correctly implements my understanding of the problem, but my understanding is wrong. Rarely does a program actually crash. Segfaults are actually a relief, because diagnosing the problem will probably be easy: just fire up the debugger and find the null pointer or empty container.
</p>
<p>
Rather, most of the time, the end results of the program are wrong, and I need to figure out why. Doing so requires recording just enough of the execution of the program to be able to spot something that disagrees with my understanding of what should happen. In other words, I need a trace.
</p>
<p>
Typically, I start by instrumenting the most visible entries into the runtime system. For example, for a memory allocator, I&#8217;ll log every allocation and deallocation request. For an allocation, I&#8217;ll record the size and the memory address returned. For the deallocation, I&#8217;ll record the memory address being freed.  Doing this provides me with a trace that is complete in time (it covers the whole execution of the program), but incomplete in program state (it is only for a select few values). But by having a trace, I can look forward and backward in time at my leisure, looking for aberrant behavior&#8212;say, the same memory address returned for two allocation requests without a deallocation in-between.
</p>
<p>
When tracking messages in a distributed system, I&#8217;ll log the receipt and submission of each message as they flow through the system. By looking at the traces for all of the processes in the system, I can construct the message flow and look for messages that are out of place.
</p>
<p>
I&#8217;m rarely lucky enough that these top-level traces provide enough information to find the bug. These traces are just the start, as they hint where to explore next. Do I need more data at my current level of instrumentation, or do I need to start instrumenting deeper parts of my algorithms? (Note that &#8220;instrumenting&#8221; is just a fancy way of saying &#8220;adding more print statements.&#8221;) In terms of my figure, I start with a narrow vertical slice of the program state, and selectively broaden its width as my understanding of the problem matures.
</p>
<p>
In situations like the above, a single snapshot of the entire program state is not going to show me what I need to know. A snapshot, as provided by a debugger, can tell me the entirety of the program&#8217;s state, but it cannot tell me how the program came to <em>be</em> in that state. I need history&#8212;lots of it. I could set breakpoints, observe the state, resume, wait for the next breakpoint, and then observe again. And sometimes, I do this. But doing this process thousands of times is not feasible&#8212;and good traces can easily reach into the hundreds-of thousands of events.
</p>
<h1>Snapshots</h1>
<p>
Of course, sometimes I still need snapshots. When a program crashes or hangs, I reach for a debugger. In those instances, I want to be able to inspect the entire system state at my leisure. Debuggers are essential for this, because it&#8217;s infeasible to log the entire system state; debuggers are interactive, and allow exploration of the system state much in the same way that traces allow exploration of algorithmic behavior over time.
</p>
<p>
Sometimes I&#8217;ll even reach for a debugger after spending a long time inspecting traces. If I can spot where things go wrong in a trace, but I&#8217;m already at the finest granularity of logging possible, then I start to suspect system issues like memory corruption. But in such a case, traces showed me where to look. I never would have been able to discover exactly where in my program to point the debugger without the trace. (I&#8217;m assuming, of course, that this is the kind of memory corruption that <a href="http://valgrind.org/">valgrind</a> cannot find.)
</p>
<p>
Debuggers are for when I have started to question even the most fundamental of operations, and I need to observe <em>exactly</em> what is happening at a point in time. In fact, I can rely on traces because I already have a good idea of what the system is doing at all points of the program. When I use traces, what I question is not the system itself, but my algorithms that run on top of it. Once I start to suspect the system itself, I reach for the debugger.
</p>
<h1>A Mental Model</h1>
<p>
Whether using traces, snapshots or both, the purpose is to build a mental model of what your program is actually doing, because your current one is wrong. (If it wasn&#8217;t, you wouldn&#8217;t have a bug.) Knowing the entire state of your program during its entire time of execution is not realistic for interesting programs. So we investigate sections of the state-time space. And, in general, we want to look at <em>slices</em> that cover all of one of those dimensions. If I&#8217;m confident that a particular value is involved in an error, then I want to see all of those values, over all time&#8212;a vertical slice, a <em>trace</em>. If my view of the state-time space does not cover all time, then there&#8217;s always the possibility that the error is lurking somewhere in the times I did not cover. If I&#8217;m confident that an error occurs at a particular moment in time, then I want a horizontal slice, a <em>snapshot</em>, so I can observe all values across that moment.
</p>
<p>
If you ever find yourself producing single-line traces where you keep adding reported values, you don&#8217;t want a trace. You want a snapshot, and a debugger is the better tool. If you ever find yourself setting breakpoints in a debugger, writing down values, letting it run until the next breakpoint and again writing down values, then you don&#8217;t want a snapshot. You want a trace.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.scott-a-s.com/traces-vs-snapshots/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Computer Science is Not Math</title>
		<link>http://www.scott-a-s.com/cs-is-not-math/</link>
		<comments>http://www.scott-a-s.com/cs-is-not-math/#comments</comments>
		<pubDate>Fri, 04 May 2012 02:22:24 +0000</pubDate>
		<dc:creator>Scott Schneider</dc:creator>
				<category><![CDATA[computer science]]></category>
		<category><![CDATA[math]]></category>

		<guid isPermaLink="false">http://www.scott-a-s.com/?p=26</guid>
		<description><![CDATA[A surprisingly common sentiment among some programmers is that &#8220;computer science is math.&#8221; Certainly, computer science as a rigorous discipline emerged from mathematics. Now, we consider such foundational work to be theoretical computer science. For example, Alonzo Church&#8217;s lambda calculus and Alan Turing&#8217;s Turing machine provided a theoretical foundation for computation. At the time, the [...]]]></description>
				<content:encoded><![CDATA[<p>
A surprisingly common sentiment among some programmers is that &#8220;computer science is math.&#8221; Certainly, computer science as a rigorous discipline emerged from mathematics. Now, we consider such foundational work to be <em>theoretical</em> computer science. For example, Alonzo Church&#8217;s <a href="http://en.wikipedia.org/wiki/Lambda_calculus">lambda calculus</a> and Alan Turing&#8217;s <a href="http://en.wikipedia.org/wiki/Turing_machine">Turing machine</a> provided a theoretical foundation for computation. At the time, the two self-identified as mathematicians, and were clearly doing mathematics.  So if the foundations of computer science are math, how is it that computer science as a whole is not math?
</p>
<p>
Simply, computer science has grown well beyond its purely theoretical roots. We invented real computers, which are not theoretical devices. In doing so, we had to deal with the complicated and messy reality of designing, implementing, using and programming computers. Those areas of study are also computer science. My operating definition of computer science is: everything to do with <em>computation</em>, both in the abstract and in the implementation.
</p>
<p>
The relationship I am claiming:</p>
<p><center></p>
<p>
<img src="../files/cs_math_venn.png" width=400>
</p>
<p></center></p>
<h1>Camping Buddies</h1>
<p>
Much like physics, we have two camps: theory and experimentation. However, the relationship between the two camps is not the same as it is in physics. In physics, experimentalists often have the job of testing the theories produced by the theoreticians. If the experimentalists are ahead of the theoreticians, then theoreticians must develop new theories to explain results discovered by the experimentalists that are inconsistent with current understanding.
</p>
<p>
The &#8220;I explain your results&#8221; and &#8220;I test your theories&#8221; relationship does not exist in computer science. Our version of experimentalists are generally called <em>systems</em> researchers. When a theoretical computer scientist <a href="http://www.cs.berkeley.edu/~virgi/matrixmult.pdf">proves that matrix multiplication is <em>O(n^2.3727)</em></a>, a systems researcher is never going to produce any results that disagree with that theoretical result. The theoreticians have discovered a <em>mathematical</em> fact—and yes, I use that word deliberately.
</p>
<p>
What systems people may do is provide evidence that while such a result is theoretically interesting, real systems may never take advantage of it. We (yes, I include myself in this group) do so by designing and implementing novel systems, from which we learn what is feasible and useful.
</p>
<p>
Computer science theoreticians and systems researchers do not always work in isolation. I am good enough in math and theory to know when I am not good enough in math and theory. I have worked on a project where in order to solve an interesting systems problem, I needed a sophisticated model that was beyond my ability to discover. In response, the theoreticians I worked with had to quiz me to understand what kind of information we could reliably measure from our system. In order for them to build a model, they had to know what kind of reliable information our system could provide. All of us were doing &#8220;computer science,&#8221; despite performing very different tasks.
</p>
<h1>Naming Names</h1>
<p>
I am in the systems camp. I have (at least) an intuition for the whole system stack, from knowing what kind of code a compiler is likely to emit for particular language semantics, to how the operating system will behave under that workload, and what the processor itself must do to execute it. My research almost always has messy empirical results.  Broadly, I am interested in improving the performance of software, which means lots of experiments, lots of results and lots of interpretation. That process is not math.
</p>
<p>
But there are people who are not only theoreticians or only systems researchers. I used a broad brush when painting the divide between theory and systems. It does not capture the entirety of the field; many people work in both theory and systems, and there are probably people who feel that the two categories don&#8217;t capture what they do. Which, of course, is my point: computer science is a large discipline that goes far beyond the parts that we all agree is math.
</p>
<p>
There are plenty of computer scientists who straddle the divide. I think this is particularly common in programming languages. Results in programming language research may be theoretical. The same researchers who are able to prove something about, say, a type system, are often the same people to design a language and implement a compiler that embodies the theoretical result.  In short, the divide between theory and systems research is not as clean as it is in physics.
</p>
<p>
Example the second: consider networking. The algorithms that govern how individual <a href="http://tools.ietf.org/html/rfc5681">TCP connections avoid congestion</a> is certainly computer science. There is also a large amount of mathematical <em>reasoning</em> that goes into designing and understanding how individual connections governed by these algorithms will behave. But, in the end, what matters is how they work in practice. These algorithms are the result of design, experimentation, interpretation of results and iterating. (And iterating.)
</p>
<p>
When someone simply says &#8220;computer science is math,&#8221; they are doing a disservice to all of these other fields in the discipline that are clearly not just math.  Of course, we use mathematical reasoning whenever we can, but so does all of science and engineering. Math is the common language across all empirical disciplines, but they do not all tell the same story.
</p>
<p>
Aside from programming languages and networking, the field of computer science also includes operating systems, databases, artificial intelligence, file and storage systems, processor design, graphics, scheduling, distributed and parallel systems—more than I can exhaustively list, but luckily, <a href="http://www.acm.org/about/class/ccs98-html">someone else has</a>. All of these areas <em>use</em> math to a varying degree, and some even have highly theoretical sub-fields. To the point, even, that I would agree that the theoretical basis for some of those areas is arguably math. For example, <a href="http://en.wikipedia.org/wiki/Relational_algebra">relational algebra</a> is math, but it&#8217;s also the theoretical foundations of relational databases. But if we make the blanket statement &#8220;databases is math,&#8221; we miss all of the implementation and design on the systems side that allows actual databases to exist in our world.
</p>
<h1>SCIENCE!</h1>
<p>
It&#8217;s impossible to discuss the nature of computer science without recognizing the elephant in the room: is it science? I won&#8217;t discuss that—not out of lack of interest, but because others have done a better job than I could. Cristina Videira Lopes covered the topic in an <a href="http://tagide.com/blog/2012/03/research-in-programming-languages/">excellent essay</a>, where I also learned about Stefan Hanenberg&#8217;s <a href="http://www.cs.washington.edu/education/courses/cse590n/10au/hanenberg-onward2010.pdf">paper on a similar topic</a>. Everything I have to say on the subject is derivative of their points.
</p>
<h1>Best Intentions</h1>
<p>
Those who claim that &#8220;computer science is math&#8221; generally have good intentions. They are usually responding to the notion that computer science is just programming, which is, of course, false. Anyone who has taught beginning programmers knows how difficult it is to convey to them that underneath all of the accidental complexities lies something fundamental.
</p>
<p>
But it is still a gross simplification to call the entire discipline of computer science &#8220;math.&#8221; Related to math, foundations in math—sure. But after a while, it makes sense to group the theoretical foundations of computation along with the design and implementation itself. That grouping is computer science.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.scott-a-s.com/cs-is-not-math/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
