Star (-) Watch (-)

JoelOnSoftware

Joel on Software

Back to Basics

回归基础

by Joel Spolsky Tuesday, December 11, 2001

We spend a lot of time on this site talking about exciting Big Picture Stuff like .NET versus Java, XML strategy, Lock-In, competitive strategy, software design, architecture, and so forth. All this stuff is a layer cake, in a way. At the top layer, you've got software strategy. Below that, we think about architectures like .NET, and below that, individual products: software development products like Java or platforms like Windows.

我们在这个站点花了大量时间来讲令人振奋的大场景像：.NET和Java，XML战略，Lock-In,竞争策略，软件设计，架构等等。在某种意义上，所有这些东西就像一个分层的蛋糕。在最顶层，是软件策略。在这下面，我们考虑像.NET一样的架构，再下面，是单独的产品：像Java 这样的软件开发产品或者像Windows一样的平台。

Go lower on the cake, please. DLLs? Objects? Functions? No! Lower! At some point you're thinking about lines of code written in programming languages.

请往这个分层蛋糕的底层再走一点。动态链接库？对象？函数? 不对！请再底层一点！在某一点上,你总归要开始考虑编程语言写的一行行代码。

Still not low enough. Today I want to think about CPUs. A little bit of silicon moving bytes around. Pretend you are a beginning programmer. Tear away all that knowledge you've built up about programming, software, management, and get back to the lowest level Von Neumann fundamental stuff. Wipe J2EE out of your mind for a moment. Think Bytes.

层次还是不够低。今天我想讨论CPU,一小块的硅片承载着的字节流。假设你是个入门程序员。摒弃所有你学到的关于编程，软件，管理的知识，然后回到最底层的冯诺依曼基础计算机结构。把你脑袋里装的J2EE忘掉一会儿。想想比特流。

Why are we doing this? I think that some of the biggest mistakes people make even at the highest architectural levels come from having a weak or broken understanding of a few simple things at the very lowest levels. You've built a marvelous palace but the foundation is a mess. Instead of a nice cement slab, you've got rubble down there. So the palace looks nice but occasionally the bathtub slides across the bathroom floor and you have no idea what's going on.

为什么要这样做？我觉得人们之所以在最高架构层构思的时候都会犯错误，是源于人们对最底层基本概念残缺不全的认识。你造了一座金碧辉煌的宫殿，但基础却一塌糊涂。本该是坚固水泥板的地方，你放置的却是残砖碎瓦。所以宫殿看起来虽然漂亮，但是有时候浴缸在浴室滑动了，你却不知道为什么。

So today, take a deep breath. Walk with me, please, through a little exercise which will be conducted using the C programming language.

所以今天，请深呼吸，跟我做一个用C语言写的小练习。

Remember the way strings work in C: they consist of a bunch of bytes followed by a null character, which has the value 0. This has two obvious implications:

There is no way to know where the string ends (that is, the string length) without moving through it, looking for the null character at the end.

Your string can't have any zeros in it. So you can't store an arbitrary binary blob like a JPEG picture in a C string.

记住C语言里字符串的工作机制：他们由一串字节然后一个null（值为0）字符构成。两点含义不言而喻：

如果不遍历这个字符串去寻找null字符，就无法获知它的结尾（也就是获得字符串长度）
你的字符串里面不能有0值，所以你不能把任意的块数据（如JPEG图像）存储在C字符串里。

Why do C strings work this way? It's because the PDP-7 microprocessor, on which UNIX and the C programming language were invented, had an ASCIZ string type. ASCIZ meant "ASCII with a Z (zero) at the end."

为什么C字符串是这样的呢？因为PDP-7微处理器（在它之上产生了UNIX和C编程语言）有一种ASCIZ字符串类型。ASCIZ的意思是“以零结尾的ASCII码”。

Is this the only way to store strings? No, in fact, it's one of the worst ways to store strings. For non-trivial programs, APIs, operating systems, class libraries, you should avoid ASCIZ strings like the plague. Why?

这是存储字符串的唯一方法么？不是，实际上，这是最糟糕的存储字符串的方法之一。对于重要的程序，接口，操作系统，类库，你应该像避免瘟疫一样避免使用ASCIZ字符串。为什么？

Let's start by writing a version of the code for strcat, the function which appends one string to another.

让我们先开始写一遍strcat（也就是把一个字符串接在另一个后面的函数）代码试试，

    void strcat( char* dest, char* src )
    {
         while (*dest) dest++;
         while (*dest++ = *src++);
    }

Study the code a bit and see what we're doing here. First, we're walking through the first string looking for its null-terminator. When we find it, we walk through the second string, copying one character at a time onto the first string.

研究一下这段代码看看我们在做什么。首先，我们在遍历第一个字符串寻找其终结符。当我们找到了，就遍历第二个字符串，把它的字符拼接到第一个字符串后。

This kind of string handling and string concatenation was good enough for Kernighan and Ritchie, but it has its problems. Here's a problem. Suppose you have a bunch of names that you want to append together in one big string:

这样的字符串拼接处理对Kernighan和Ritchie[1]来说已经足够好了，但它也有问题。问题就是：假设你有一堆名字，然后你想把它们一起拼接到一个大的字符串里：

char bigString[1000];     /* I never know how much to allocate... */
bigString[0] = '\0';
strcat(bigString,"John, ");
strcat(bigString,"Paul, ");
strcat(bigString,"George, ");
strcat(bigString,"Joel ");

This works, right? Yes. And it looks nice and clean.

这样是可以的，对吧？是的。而且看上去干净利落。

What is its performance characteristic? Is it as fast as it could be? Does it scale well? If we had a million strings to append, would this be a good way to do it?

它的性能特点如何？达到能达到的最快速度了么？扩放性[2]好么？如果我们有上百万的字符串要拼接，这样做是个好办法么？

No. This code uses the Shlemiel the painter's algorithm. Who is Shlemiel? He's the guy in this joke:

不，这段代码使用了Shlemiel画家算法。谁是Shlemiel？他就是下面这个笑话里的人物：

Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. "That's pretty good!" says his boss, "you're a fast worker!" and pays him a kopeck.

Shlemiel获得了街道粉刷的工作，也就是在马路中间画虚线。第一天，他带了一罐涂料街然后刷完了300码的马路。 “干得很漂亮！”他老板说“你真是个高效的工人！“ 然后付给了他一个铜板。

The next day Shlemiel only gets 150 yards done. "Well, that's not nearly as good as yesterday, but you're still a fast worker. 150 yards is respectable," and pays him a kopeck.

第二天Shlemiel 只刷完了150码。 “嗯，虽然没有昨天好，但你还是个高效工人。150码还是可以的“ 然后又付了他一个铜板。

The next day Shlemiel paints 30 yards of the road. "Only 30!" shouts his boss. "That's unacceptable! On the first day you did ten times that much work! What's going on?"

第三天Shlemiel只刷了30码。 “才30码！”他老板吼道。“无法忍受！你第一天干的活儿是今天的10倍！发生什么情况了？”

"I can't help it," says Shlemiel. "Every day I get farther and farther away from the paint can!"

“我也没办法” Shlemiel 说：“每天我都离涂料罐越来越远！”

(For extra credit,what are the real numbers? )

题外话，具体数字是多少？

This lame joke illustrates exactly what's going on when you usestrcat like I just did. Since the first part of strcat has to scan through the destination string every time, looking for that dang null terminator again and again, this function is much slower than it needs to be and doesn't scale well at all. Lots of code you use every day has this problem. Many file systems are implemented in a way that it's a bad idea to put too many files in one directory, because performance starts to drop off dramatically when you get thousands of items in one directory. Try opening an overstuffed Windows recycle bin to see this in action -- it takes hours to show up, which is clearly not linear in the number of files it contains. There must be a Shlemiel the Painter's Algorithm in there somewhere. Whenever something seems like it should have linear performance but it seems to have n-squared performance, look for hidden Shlemiels. They are often hidden by your libraries. Looking at a column of strcats or a strcat in a loop doesn't exactly shout out "n-squared," but that is what's happening.

这个蹩脚笑话阐述了当你就像我那样使用strcat时究竟发生了什么。因为strcat的第一部分每次都必须要扫描整个目标字符串，一次又一次的寻找null终结符，这个函数比实际需要的要慢很多而且扩放性根本不好。很多你日常使用的函数都有这个问题。许多文件系统都是以一种方式实现允许在一个文件夹下面放许多文件，这其实是个糟糕的想法，因为当你在一个文件夹下面放成千上万的文件的时候性能就开始剧烈恶化。你可以尝试打开一个塞满了的Windows回收站来亲身见证一下这个说法 – 要花上你几个小时才能打开，打开所花费的时间显然和文件夹里的文件数量是不成线性相关的。这中间某个地方肯定也有Shlemiel的画家算法。不管什么地方，看起来本该是线性时间性能的结果看起来却是N次方性能，这时就应该寻找隐藏的Shlemiel。通常他们被你所调用的库所隐藏起来了。仅仅看一列strcat调用或者是一个循环里的strcat调用不会精确的显示N次方低效问题，但那就是正在发生的事情。

How do we fix this? A few smart C programmers implemented their own mystrcat as follows:

我们应该如何修正这个问题呢？一些聪明的C程序员实现了他们自己的mystrcat函数，如下所示：

    char* mystrcat( char* dest, char* src )
    {
         while (*dest) dest++;
         while (*dest++ = *src++);
         return --dest;
    }

What have we done here? At very little extra cost we're returning a pointer to the end of the new, longer string. That way the code that calls this function can decide to append further without rescanning the string:

我们这里做了什么改动呢？我们仅仅花很小的额外代价返回了一个指向新分配的更长字符串末尾的指针。这样调用这个函数的代码就可以决定从该处拼接而不是重新扫描整个字符串：

    char bigString[1000];     /* I never know how much to allocate...我从来不知道要分配多大空间 */
    char *p = bigString;
    bigString[0] = '\0';
    p = mystrcat(p,"John, ");
    p = mystrcat(p,"Paul, ");
    p = mystrcat(p,"George, ");
    p = mystrcat(p,"Joel ");

This is, of course, linear in performance, not n-squared, so it doesn't suffer from degradation when you have a lot of stuff to concatenate.

性能而言这当然是线性的，而不是N次方的，所以当你有一大堆字符串要拼接的时候，它就不会被性能下降所困扰。

The designers of Pascal were aware of this problem and "fixed" it by storing a byte count in the first byte of the string. These are called Pascal Strings. They can contain zeros and are not null terminated. Because a byte can only store numbers between 0 and 255, Pascal strings are limited to 255 bytes in length, but because they are not null terminated they occupy the same amount of memory as ASCIZ strings. The great thing about Pascal strings is that you never have to have a loop just to figure out the length of your string. Finding the length of a string in Pascal is one assembly instruction instead of a whole loop. It is monumentally faster.

Pascal的设计者意识到了这个问题，并且通过在字符串的第一个字节里存储字节数来“修正“这个问题。这些就被称为Pascal字符串。它们可以包含0值并且不是NULL分隔的。然而因为一个字节只能存储0-255之间的数字，Pascal字符串长度就被限制在255字节以内，但因为他们不是NULL分隔的所以他们和ASCIZ消耗了相同数量的存储空间。Pascal字符串最大的好处就是不需要写个循环来获得字符串的长度。获取Pascal字符串的长度只需要一个汇编指令而不是一整个循环。那当然是快多了。

The old Macintosh operating system used Pascal strings everywhere. Many C programmers on other platforms used Pascal strings for speed. Excel uses Pascal strings internally which is why strings in many places in Excel are limited to 255 bytes, and it's also one reason Excel is blazingly fast.

旧的苹果计算机基本都是用Pascal字符串。许多在其他平台上的C程序员也会因为速度而采用Pascal字符串。 Excel内部也使用了Pascal字符串，这也是为什么Excel里面很多地方字符串被限制在255字节以内，当然这也是为什么Excel超级快的一个原因。

For a long time, if you wanted to put a Pascal string literal in your C code, you had to write:

很长一段时间，如果你想要把Pascal字符串常量放在你的C代码里，你必须这样写：

    char* str = "\006Hello!";

Yep, you had to count the bytes by hand, yourself, and hardcode it into the first byte of your string. Lazy programmers would do this, and have slow programs:

是的，你得自己手动计算字符串字节数，然后把它硬编码进你字符串的第一字节。但懒的程序员就会这么做，然后就写出了很慢的程序:

    char* str = "*Hello!";
    str[0] = strlen(str) - 1;

Notice in this case you've got a string that is null terminated (the compiler did that) as well as a Pascal string. I used to call these fucked strings because it's easier than calling them null terminated pascal strings but this is a rated-G channel so you will have use the longer name.

注意，在这种情况下你得到的是NULL分隔的字符串（编译器保证）以及Pascal字符串。我曾经称其为“操蛋“字符串，因为其实称为NULL分隔的Pascal字符串要简单一些。不过这儿可是个G评价的频道，所以你还是用长点儿的名字吧。

I elided an important issue earlier. Remember this line of code?

我前面省略了个重要问题。还记得这行代码么？

    char bigString[1000];     /* I never know how much to allocate...我从来不知道要分配多少空间。。。 */

Since we're looking at the bits today I shouldn't have ignored this. I should have done this correctly: figured out how many bytes I needed and allocated the right amount of memory. Shouldn't I have?

因为我们今天在研究比特流，所以我不应该忽略这个的。我应该正确的完成这件事情：先搞清楚我需要多少字节然后分配准确数量的内存。难道不是么？

Because otherwise, you see, a clever hacker will read my code and notice that I'm only allocating 1000 bytes and hoping it will be enough, and they'll find some clever way to trick me into strcatting a 1100 byte string into my 1000 bytes of memory, thus overwriting the stack frame and changing the return address so that when this function returns, it executes some code which the hacker himself wrote. This is what they're talking about when they say that a particular program has a buffer overflow susceptibility. It was the number one cause of hacks and worms in the olden days before Microsoft Outlook made hacking easy enough for teenagers to do.

因为否则，如你所见，一个聪明的黑客就会读到我的代码然后注意到我只分配了1000字节并且希望这就够了，而他们就会找到一些巧妙的方法骗我把1100字节的字符串拼接进我1000字节的内存，然后就可以覆盖我的帧栈并且修改函数返回地址，最后当这个函数返回的时候，它就会去执行黑客自己写的代码。这就是他们讨论说某个程序有可能缓冲区溢出漏洞时会发生的情况。这就是过去黑客攻击和蠕虫病毒的首要诱因。当然这是微软OUTLOOK让黑客攻击容易到青少年就能做成之前的旧时代了。

OK, so all those programmers are just lame-asses. They should have figured out how much memory to allocate. But really, C does not make this easy on you. Let's go back to my Beatles example:

恩，所有的这些程序员都是SB。他们应该搞清楚要分配多少内存。但是实际上，C语言并没有使这项工作对你变得容易。我们来回顾下我的甲壳虫例子：

    char bigString[1000];     /* I never know how much to allocate... */
    char *p = bigString;
    bigString[0] = '\0';
    p = mystrcat(p,"John, ");
    p = mystrcat(p,"Paul, ");
    p = mystrcat(p,"George, ");
    p = mystrcat(p,"Joel ");

How much should we allocate? Let's try doing this The Right Way.

我们要分配多少呢？让我们尝试用正确的方法来做这件事情能够。

    char* bigString;
    int i = 0;
    i = strlen("John, ") 
         + strlen("Paul, ") 
         + strlen("George, ") 
         + strlen("Joel ");
    bigString = (char*) malloc (i + 1); 
    /* remember space for null terminator! */
    /* 记住要给NULL分隔符分配空间 */
    ...

My eyes glazeth over. You're probably about ready to change the channel already. I don't blame you, but bear with me because it gets really interesting.

我的眼睛发呆完毕。你也许已经正要准备换台了，我不会怪你。不过耐心点儿，因为事情开始变得有意思了。

We have to scan through all the strings once just figuring out how big they are, then we scan through them again concatenating. At least if you use Pascal strings the strlen operation is fast. Maybe we can write a version of strcat that reallocates memory for us.

我们必须扫描所有的字符串以确定他们到底有多大，然后我们在拼接的时候要再扫描一遍。至少如果你使用Pascal字符串的话strlen操作就会很快。也许我们可以写一个新版本的strcat来帮我们重新分配所需内存空间。

That opens another whole can of worms: memory allocators. Do you know how malloc works? The nature of malloc is that it has a long linked list of available blocks of memory called the free chain. When you call malloc, it walks the linked list looking for a block of memory that is big enough for your request. Then it cuts that block into two blocks -- one the size you asked for, the other with the extra bytes, and gives you the block you asked for, and puts the leftover block (if any) back into the linked list. When you call free, it adds the block you freed onto the free chain. Eventually, the free chain gets chopped up into little pieces and you ask for a big piece and there are no big pieces available the size you want. So malloc calls a timeout and starts rummaging around the free chain, sorting things out, and merging adjacent small free blocks into larger blocks. This takes 3 1/2 days. The end result of all this mess is that the performance characteristic ofmalloc is that it's never very fast (it always walks the free chain), and sometimes, unpredictably, it's shockingly slow while it cleans up. (This is, incidentally, the same performance characteristic of garbage collected systems, surprise surprise, so all the claims people make about how garbage collection imposes a performance penalty are not entirely true, since typical malloc implementations had the same kind of performance penalty, albeit milder.)

这就打开了另一整罐的蠕虫。内存分配器。你知道malloc是如何工作的么？malloc的本质是它维护了一个称为空闲链的很长的可用内存空间链表。当你调用malloc，它就会遍历这个链表寻找足够你请求空间那么大的空闲内存块。然后他就把这块内存切成两部分，一部分就是你要的大小，另一部分就是多余的部分，把你请求的内存返回给你，然后把剩下的内存块（如果有的话）放回到链表里。当你调用free的时候，它就会把你要释放的内存块加到空闲链表中。最终，空闲链表里的块就会被越切越小，当你请求大的块空间的时候，哪里没有你所要大小的块空间。所以malloc会调用一个超时然后翻箱倒柜的整理空闲链表，把东西整理清楚，合并相邻的小空闲块整理成大空闲块。这也得花上个三两天，所有这一团糟设计的最终结果就是malloc的性能表现不佳，它从来没有快过（因为要遍历整个空闲链），甚至有时在它进行整理的时候，会无法预测地 令人震惊地慢（这内在的和垃圾回收器系统的性能表现是一致的，意料之外吧，所以当人们说使用垃圾回收系统会造成性能下降并不完全准确，因为通常malloc实现也会有这种性能开销，当然会温和一点儿。）

Smart programmers minimize the potential distruption of malloc by always allocating blocks of memory that are powers of 2 in size. You know, 4 bytes, 8 bytes, 16 bytes, 18446744073709551616 bytes, etc. For reasons that should be intuitive to anyone who plays with Lego, this minimizes the amount of weird fragmentation that goes on in the free chain. Although it may seem like this wastes space, it is also easy to see how it never wastes more than 50% of the space. So your program uses no more than twice as much memory as it needs to, which is not that big a deal.

聪明的程序员会通过总是分配2整数幂次大小的内存块空间来最小化malloc整理造成的性能中断。也就是：4字节，8字节，16字节，18446744073709551616字节等等。这最小化了空闲链中的碎片化数量。原因我想对玩过乐高积木的人都很直观。虽然看起来这有点浪费空间，但是很容易就发现它怎么都不会浪费超过一半的空间。所以你的程序最多也不会使用超过两倍的所需空间，这当然也不是什么大问题。

Suppose you wrote a smart strcat function that reallocates the destination buffer automatically. Should it always reallocate it to the exact size needed? My teacher and mentor Stan Eisenstat suggests that when you call realloc, you should always double the size of memory that was previously allocated. That means that you never have to call realloc more than lg n times, which has decent performance characteristics even for huge strings, and you never waste more than 50% of your memory.

假定你实现了一个很智能的strcat函数，该函数能够自动的重新分配目标缓冲区的空间。那么每次都应该只分配精确的所需空间大小么？我的导师Stan Eisenstat 建议当调用realloc的时候，你每次都应该分配前一次分配的两倍空间。这也意味着你永远不要调用realloc超过lgn次，这即使是对于巨型字符串也是相当不错的性能数据了，而且你永远也不会浪费超过50%的内存空间。

Anyway. Life just gets messier and messier down here in byte-land. Aren't you glad you don't have to write in C anymore? We have all these great languages like Perl and Java and VB and XSLT that never make you think of anything like this, they just deal with it, somehow. But occasionally, the plumbing infrastructure sticks up in the middle of the living room, and we have to think about whether to use a Stringclass or a StringBuilder class, or some such distinction, because the compiler is still not smart enough to understand everything about what we're trying to accomplish and is trying to help us not write inadvertent Shlemiel the Painter algorithms.

总之，在底层到字节流的地方，生活总是变得越来越糟。你再也不需要写C，你难道不觉得高兴么？我们所有的这些伟大的编程语言例如Perl，Java，VB以及XSLT 他们可以让你不需要考虑任何像这样的东西，它们会帮你处理好这些，但时不时的这整个架构就会竖在你起居室的中间，然后你就要考虑到底是用String类还是用StringBuilder类，或者是其他类似的东西，因为编译器还没有聪明到能够理解我们想要完成的所有事情或者是尝试帮助我们不要编写无用的Shlemiel画家算法。

Last week I wrote that you can't implement the SQL statementSELECT author FROM books fast when your data is stored in XML. Just in case everybody didn't understand what I was talking about, and now that we've been rolling around in the CPU all day, this assertion might make more sense.

上周我提到了，当你的数据存储在XML里的时候你无法实现很快的SQL语句 SELECT author FROM books。为了避免没人听懂我在说什么的情况，而且我们现在整天都在讨论CPU，这个断言就显得更加有意义

How does a relational database implement SELECT author FROM books? In a relational database, every row in a table (e.g. the bookstable) is exactly the same length in bytes, and every fields is always at a fixed offset from the beginning of the row. So, for example, if each record in the books table is 100 bytes long, and the author field is at offset 23, then there are authors stored at byte 23, 123, 223, 323, etc. What is the code to move to the next record in the result of this query? Basically, it's this:

关系数据库是如何实现SELECT author FROM books这个SQL语句的呢？在关系数据库中，所有表中的行（例如books表）都精确的是同一长度的，所有的字段离行起始的偏离总是固定的。例如，如果books表里的记录总是100字节长，author字段位于偏离23处，那么就会有author字段被存储在偏离为23，123，223，323等等的地方。在这个查询里移动到下一个记录的代码是什么样子的？基本上就是这个样子：

    pointer += 100;

One CPU instruction. Faaaaaaaaaast.

仅一个CPU指令。非常快。

Now lets look at the books table in XML.

现在让我们来看看XML里的books表

    <?xml blah blah>
    <books>
         <book>
              <title>UI Design for Programmers</title>
              <author>Joel Spolsky</author>
         </book>
         <book>
              <title>The Chop Suey Club</title>
              <author>Bruce Weber</author>
         </book>
    </books>

Quick question. What is the code to move to the next record?

简单的问个问题，移动到下一个记录的代码是什么样子的？

Uh...

额…

At this point a good programmer would say, well, let's parse the XML into a tree in memory so that we can operate on it reasonably quickly. The amount of work that has to be done here by the CPU to SELECT author FROM books will bore you absolutely to tears. As every compiler writer knows, lexing and parsing are the slowest part of compiling. Suffice it to say that it involves a lot of string stuff, which we discovered is slow, and a lot of memory allocation stuff, which we discovered is slow, as we lex, parse, and build an abstract syntax tree in memory. That assumes that you have enough memory to load the whole thing at once. With relational databases, the performance of moving from record to record is fixed and is, in fact, one CPU instruction. That's very much by design. And thanks to memory mapped files you only have to load the pages of disk that you are actually going to use. With XML, if you preparse, the performance of moving from record to record is fixed but there's a huge startup time, and if you don't preparse, the performance of moving from record to record varies based on the length of the record before it and is still hundreds of CPU instructions long.

在这个点上，一个好的程序员就会说，好，让我们把这个XML在内存里解析成树，这样我们就能在上面相对快速的进行操作。在这里CPU为了执行SELECT author FROM books要做的工作绝对会把你烦得眼泪掉下来。因为每个编译器作者都知道，词法和解析是编译过程中最慢的环节。因为随着我们进行词法分析，并且在内存里构建抽象语法树（假定你能一次把所有东西都装载到内存）,这过程包含了大量的字符处理内容，我们发现这是慢的；而且包含了大量的内存分配，我们发现这也是慢的。如果是关系数据库，从一条记录移动到另一条记录的性能开销是固定的，实际上，只有一个CPU指令。这是完全是设计带来的。并且多亏了内存映射文件，你可以仅装载磁盘上你实际需要使用的页面进内存。而XML，如果你预解析，那么从一个记录移动大另一个记录的时间就是固定的但是就会带来巨大的启动时间，如果你不预解析，那么从一个记录移动到另一个记录的性能就会随记录的长度变化，并且始终会是成百上千的CPU指令那么长

What this means to me is that you can't use XML if you need performance and have lots of data. If you have a little bit of data, or if what you're doing doesn't have to be fast, XML is a fine format. And if you really want the best of both worlds, you have to come up with a way to store metadata next to your XML, something like Pascal strings' byte count, which give you hints about where things are in the file so that you don't have to parse and scan for them. But of course then you can't use text editors to edit the file because that messes up the metadata, so it's not really XML anymore.

这对我来说就意味着：如果你有大量的数据而且你需要性能，那么你就不能使用XML。如果你只有一点点数据，或者你在做些不需要很快的事情，XML是个不错的格式。如果你确实想要两个世界的长处，你得想个办法把元数据存在你XML的旁边，就像Pascal字符串的长度一样，告诉你东西在文件中的存放位置线索这样你就不需要扫描然后解析这些它们。不过，然后你就不能使用文本编辑器来编辑文件，因为那样会弄乱那些元数据，所以这又不再是XML了。

For those three gracious members of my audience who are still with me at this point, I hope you've learned something or rethought something. I hope that thinking about boring first-year computer-science stuff like how strcat and malloc actually work has given you new tools to think about the latest, top level, strategic and architectural decisions that you make in dealing with technologies like XML. For homework, think about why Transmeta chips will always feel sluggish. Or why the original HTML spec for TABLES was so badly designed that large tables on web pages can't be shown quickly to people with modems. Or about why COM is so dang fast but not when you're crossing process boundaries. Or about why the NT guys put the display driver into kernelspace instead of userspace.

对于那些在这点还在看下去的亲爱的读者们，我希望你们学到了一些东西或者至少重新思考了一些东西。我希望思考无聊的计算机科学第一学年的内容，诸如strcat和malloc是如何工作的，已经给了你新的工具来思考最近的，最高层的，战略架构决策当你处理像XML那样的技术的时候。作为课外作业，思考为什么全美达芯片总是感觉很慢，或者为什么原来的HTML TABLE规范设计的如此糟糕以至于web网页里的大表格不能很快的呈现给使用猫上网的人们，或者为什么COM是如此快，但一旦跨越进程边界就没那么快了？或者为什么NT的设计者们决定把显卡驱动放到内核态而不是用户态。

These are all things that require you to think about bytes, and they affect the big top-level decisions we make in all kinds of architecture and strategy. This is why my view of teaching is that first year CS students need to start at the basics, using C and building their way up from the CPU. I am actually physically disgusted that so many computer science programs think that Java is a good introductory language, because it's "easy" and you don't get confused with all that boring string/malloc stuff but you can learn cool OOP stuff which will make your big programs ever so modular. This is a pedagogical disaster waiting to happen. Generations of graduates are descending on us and creating Shlemiel The Painter algorithms right and left and they don't even realize it, since they fundamentally have no idea that strings are, at a very deep level, difficult, even if you can't quite see that in your perl script. If you want to teach somebody something well, you have to start at the very lowest level. It's like Karate Kid. Wax On, Wax Off. Wax On, Wax Off. Do that for three weeks. Then Knocking The Other Kid's Head off is easy.

这些是让你要思考字节的事情，而且他们影响到了我们在做各种关于架构和战略的高层大决定。这也是为什么我对于教育的观点是一年级计算机专业学生应该从C学起，从CPU开始学起。我实在是感到浑身不舒服有那么多的计算机科学项目觉得Java是个很好的介绍语言，因为它很“容易“而且你不会被那些无聊的字符串/内存分配东西困扰并且可以开始学很酷的面向对象东西（面向对象能够让你的大程序变得更加模块化）。这简直就是等着发生的教学灾难。数代的毕业生围绕在我们周围并且大约会写出Shlemiel粉刷算法而且不会意料到，因为他们不知道字符串本质上，在非常深的层面上，是非常难的。甚至你不能从你的PERL脚本里看出来，如果你要教好某人一些东西，你得从最底层开始。就像电影空手道小孩里那样。上蜡，刮蜡。上蜡，刮蜡。这样坚持练习3个礼拜，再击败其他小孩就轻而易举了。