JoelOnSoftware

Joel on Software

Craftsmanship

手艺

by Joel Spolsky Monday, December 01, 2003

Making software is not a manufacturing process. In the 1980s everyone was running around terrified that Japanese software companies were setting up "software factories" that could churn out high quality code on an assembly line. It didn't make any sense then and it doesn't make sense now. Shoving a lot of programmers into a room and lining them up in neat rows did not really help get the bug counts down.

制作软件不是生产过程。 在20世纪80年代,每个人都在奔走相告说日本人已经设立了“软件工厂”能够通过流水线生产书高质量的软件代码。 在那个时代这完全是胡扯,现在也一样。 把一堆程序员赶紧一个房间,把他们整齐的排成一行行并不能把软件错误数量降下来。

If writing code is not assembly-line style production, what is it? Some have proposed the label craftsmanship. That's not quite right, either, because I don't care what you say: that dialog box in Windows that asks you how you want your help file indexed does not in any way, shape, or form resemble what any normal English speaker would refer to as "craftsmanship."

Writing code is not production, it's not always craftsmanship (though it can be), it's design. Design is that nebulous area where you can add value faster than you add cost. The New York Times magazine has been raving about the iPodand how Apple is one of the few companies that knows how to use good design to add value. But I've talked enough about design, I want to talk about craftsmanship for a minute: what it is and how you recognize it.

话说回来,如果编写软件不是流水线式的生产,那它又是什么呢? 有的人提议将其归类为手艺。 这也不是很准确, 因为那个在Window里让你选择如何索引你的帮助文件的对话框 无论从何种角度:形状 形式 都无法代表了人们心目中的“手艺”的概念。

写代码不是生产过程,也不总是手艺(虽然可以),它更像是一种设计。 设计是那种模糊不清的,增加价值比增加消耗更容易的领域。 纽约时报曾经报道Ipod以及苹果公司为为数不多的几家知道如何运用设计为产品增加价值的公司。 不过我已经说了很多设计了, 我想花几分钟讨论下手艺: 什么是这种手艺,你如何认识这种手艺。

I'd like to tell you about at a piece of code I've been rewriting for CityDesk 3.0: the file import code. (Advertainment: CityDesk is my company's easy-to-use content management product.)

The spec seems about as simple as any snippet of code can be. The user chooses a file using a standard dialog box, and the program copies that file into the CityDesk database.

我想聊聊我最近帮CityDesk3.0写的一段代码:关于文件导入的。 (广告:CityDesk是我们公司推出的易用的内容管理产品)

规范看起来像是任何简单的代码片段都能完成的那样。 用户通过一个标准对话框选择一个文件,然后CityDesk程序就会把这个文件导入到系统数据库。

This turned out to be a great example of one of those places where "the last 1% of the code takes 90% of the time." The first draft of the code looked like this:

1 . Open the file

2 . Read it all into a big byte array

3 . Store the byte array in a record

结果这项功能成了那个经典的“1%的代码是如何花掉90%时间”讨论的又一个样板。 第一版的代码实现的功能大概是这个样子的:

1 . 打开文件

2 . 把文件全部读入大字节数组

3 . 将字节数组存入数据记录

Worked great. For reasonable sized existing files it was practically instantaneous. It had a few little bugs, which I worked through one at a time.

The big bug surfaced when I stress-tested it by dragging a 120 MB file into CityDesk. Now, it is not common by any means for people to post 120 MB files on their web sites. In fact, it's quite rare. But it's not impossible, either. The code worked but took almost a minute and provided no visual feedback -- the app just froze and appeared to be completely locked up. This is obviously not ideal.

完美的工作, 对于合理大小的存在文件,上传几乎是瞬间的事情。我也遇到一些小的错误,不过我一个个的解决了。

大麻烦是当我开始压力测试这个功能的时候,我拖了一个120MB的文件进CityDesk。 现在虽然不管怎么说人们都已经很少发布120MB的文件到网站了。 事实上确实很少,但这也还是可能的。这段代码还能工作,不过花了几乎整整一分钟,而且没有任何的视觉反馈 – 程序就像冻结住了那样,看起来像完全被锁住了。 很显然这不是理想状态。

From a UI perspective, what I really wanted was for long operations to bring up a progress bar of some sort, along with a Cancel button. In the ideal world you would be able to continue doing other operations with CityDesk while the file copy proceeded in the background. There were three obvious ways to do this:

1 . From a single thread, polling frequently for input events

2 . By launching a second thread and synchronizing it carefully

3 . By launching a second process and synchronizing it less carefully

从用户界面设计的角度, 我实际想要的就是对于这样的长时间的操作,系统会弹出某个进度条,以及一个取消按钮。 理想情况下,在文件在后台上传的时候你应该还是要能够用CityDesk进行其他操作。 有三种最直观的方法来达成这个设计:

1 . 单线程, 频繁的轮询用户的输入信息。

2 . 开第二个线程,小心的进行线程同步。

3 . 开第二个进程,粗心的进行进程同步。

My experience with #1 is that it never quite works. It is too hard to ensure that all the code throughout your application can be run safely while a file copy operation is in progress. And Eric S. Raymond has convinced me that threads are usually not as good a solution as separate processes: indeed years of experience have shown me that programming with multiple threads creates much additional complexity and introduces whole new categories of dangerously frightful heisenbugs. #3 seemed like a good solution, especially since our underlying database is multiuser and doesn't mind lots of processes banging on it at the same time. So that's what I'm planning to do when I get back from Thanksgiving vacation.

我对第一个策略的经验是基本上不可能。 要保证在文件上传的同时你软件的其他功能运转正常太困难了。 EricRaymond说服了我,相比线程,进程会是解决问题的更好策略:实际上我多年的编程经验也告诉我,多线程引入了更多的复杂度,引入了一整个类别的危险可怕的黑森错误 。 第三个策略似乎是个不错的解决方案, 特别是我们的数据库是多用户的,所以它不会介意多个进程同时敲它家门。 所以这就是我放感恩节假后回来打算做的事情。

Notice, though, the big picture. We've gone from read the file/save it in the database to something significantly more complicated: launch a child process, tell it to read the file and save it in the database, add a progress bar and cancel button to the child process, and then some kind of mechanism so the child can notify the parent when the file has arrived so it can be displayed. There will also be some work passing command line arguments to the child process, and making sure the window focus behaves in an expected manner, and handling the case of the user shutting down their system while a file copy is in progress. I would guesstimate that when all is said and done I'll have ten times as much code to handle large files gracefully, code that maybe 1% of our users will ever see.

注意,恩,注意高层次的场景。 我们已经从读/写文件然后存入数据库转换到了要复杂的多的场景:启动一个子进程,告诉它去读取文件然后存入数据库, 添加一个进度条和取消按钮到子进程上, 然后通过某种机制,子进程能够通知父进程文件已经寸好了所以能显示了。 当然也还会有一些传递行参数到紫禁城的工作, 还会有一些确保窗口焦点正确的工作, 还要考虑用户在上传文件的时候关机的情况。 我估算了一下当所有这些都确定下来的时候我大概要花10倍的时间来完美的处理大文件,而这些代码可能只有1%的用户才会用到。

And of course, a certain type of programmer will argue that my new child-process architecture is inferior to the original. It's "bloated" (because of all the extra lines of code). It has more potential for bugs, because of all the extra lines of code. It's overkill. It's somehow emblematic of why Windows is an inferior operating system, they will say. What's all this about progress indicators? they sneer. Just hit Ctrl+Z and then "ls -l" repeatedly and watch to see if the file size is growing!

当然,某些类型的程序员会争辩说我子进程式的架构要比原来的茶。 它已经“膨胀”(因为那些额外写的代码)了。 也因为这些额外的代码,它更容易包含潜在的错误。 这太多余了。 这同时几乎也意味着他们会争辩说为什么windows是更次的操作系统噢乖。 为什么要这些进度条?他们会冷笑。 你只要按Ctrl+Z然后重复的输入“ls –l”看文件大小是不是还在增长就可以了!

The moral of the story is sometimes fixing a 1% defect takes 500% effort. This is not unique to software, no sirree, now that I'm managing all these construction projects I can tell you that. Last week, finally, our contractor finally put the finishing touches on the new Fog Creek offices. This consisted of installing shiny blue acrylic on the front doors, surrounded by aluminium trim with a screw every 20 cm. If you look closely at the picture, the aluminium trim goes all the way around each door. Where the doors meet, there are two pieces of vertical trim right next to each other. You can't tell this from the picture, but the screws in the middle strips are almost but not exactlylined up. They are, maybe, 2 millimeters off. The carpenter working on this measured carefully, but he was installing the trim while the doors were on the ground, not mounted in place, and when the doors were mounted, "oops," it became clear that the screws were not exactly lined up.

这个故事的重点是有时候修复一个缺陷要花500%的代价。 不单单是软件行业, 是的,先生,因为我现在也管理着建筑工程,举个例子。 上周,终于我们的承包商完成了FogCreek办公室的所有装修。 这项工程包括了给前门刷上闪亮的蓝色油漆,同时要包上铝质的边框每20CM会有扭曲。 如果你仔细的看着画面, 铝质的边框包着所有的门。碰接的地方会有两块垂直的边框彼此挨着。 虽然从画面上看不出来,但是铝边的20CM扭曲导致这些垂直的中接虽然没有完全但也差不多对齐了,大约有2MM的误差。 负责这项工作的木工非常仔细, 不过他安装的时候这些们就放在地上, 而不是标准了放好, 当门装好的时候 “啊哦”, 看起来很显然因为这些扭曲,这些中接并没有完全对齐。

This is probably not that uncommon; there are lots of screws in our office that don't line up perfectly. The problem is that fixing this once the holes are drilled would be ridiculously expensive. Since the correct placement for the screws is only a couple of millimeters away, you can't just drill new holes in the door; you'd probably have to replace the whole door. It's just not worth it. Another case where fixing a 1% defect takes 500% effort, and it explains why so many artifacts in our world are 99% good, not 100% good. (Our architect never stops raving about some really, really expensive house in Arizona where every screw lined up.)

这可能很场景; 我们办公室里有很多这种没有完全对齐的地方。 问题就在于如果你想在框架都已经钻好的时候来修正这个问题,那么花费就变得异常昂贵了。 因为离正确对齐的摆放,这种扭曲造成的误差也就只有几个毫米, 你不可能在门上新钻个孔; 你大概会想直接把整个门换掉。 根本不值得那么去做。 这就是1%的缺陷花费500%代价的另一个例子,而且它也说明了为什么世界上只有99%的人造品是好的 而不是100%都是好的。(我们的设计师不停的跟我们讲只有Arizona州的某些非常非常昂贵的房子这些扭曲才是完全对齐来的。)

It comes down to an attribute of software that most people think of ascraftsmanship. When software is built by a true craftsman, all the screws line up. When you do something rare, the application behaves intelligently. More effort went into getting rare cases exactly right than getting the main code working. Even if it took an extra 500% effort to handle 1% of the cases.

Craftsmanship is, of course, incredibly expensive. The only way you can afford it is when you are developing software for a mass audience. Sorry, but internal HR applications developed at insurance companies are never going to reach this level of craftsmanship because there simply aren't enough users to spread the extra cost out. For a shrinkwrapped software company, though, this level of craftsmanship is precisely what delights users and provides longstanding competitive advantage, so I'll take the time and do it right. Bear with me.

这样说下来软件有一个品质人们还是会觉得像手艺的。 当软件是被一个真正的手艺人制作的时候, 所有的扭曲都会被完全对齐。 当你做些不常见的事情的时,程序会显的很智能。 当主要的功能开始工作的时候,要花更多的精力来让这些极端的情况完全正确。虽然这要花500%的代价去解决1%的情况。

手艺当然是非常昂贵的。 你能承受的起的唯一情况就是你在为大量的受众开发软件。 很抱歉,不过保险公司开发的内部人事系统永远达不到这种手工品的程度,因为没有足够的用户来承担这些额外的花费。 而对于一个包装袋软件公司,这种程度的手艺正是让用户满意 并且为用户带来长足有竞争力的优势, 所以我要花时间把这件事情做对。 等等吧。