[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

PC: On archives, FAQs and the best possible use of time - human time is more valuable than machine time



This is a response to Jerry's post of February 13, 2002, titled "The start
of the PC list archives."

While I have been just a 'lurker' here so far, I have received a great deal
of help from many of the posts and also from many of the members
individually.

You asked if you should spend your time and energy to create an archive of
the PC mailing list. Now the greedy and selfish answer to your question is
"Of course archiving is a great idea - and you are just the person to do
it." My answer, however, is don't even think of wasting your time in such a
way.

I am a computer scientist by desire if not yet by design, and I must say
that I find the thought of your archiving "by hand" five years' worth of
posts spiritually and intellectually horrifying. Please allow me to explain
why (and I will from here until I say otherwise address the group as a
whole, not just Jerry).

First off, what we are discussing here is more work for our list moderator,
the human. The only reason a human should ever do work is to solve some
problem. So the first question is: "Is there a problem with the list that
would be solved by archiving all of the posts?"

There is. I will list all the 'problems' that come immediately to mind:

1. There are many posts with good information.
2. It would be nice if I could see all of that good information without
having been a member of this mailing list from its inception.
3. I notice people ask a lot of the same questions that I saw answered last
year.
4. Instead of wasting the time of those with knowledge to share by having
them answer the same questions over and over, it would be nice if instead of
asking such a question, a new list user could look first to see if it has
already been addressed.

Our good-natured list moderator has perceived these problems, and is the
first to donate his valuable time to solve them - by creating an archive,
and offering to put it into a more usable format for us so as to save our
time.

But is that really the best solution? I will now segue, if I may, to a
pseudo history lesson on an analagous aspect of Internet culture,
engineering, and practice.

This mailing list we have here is a descendant of the mailing lists known as
usenet on the original Internet (back when it was ARPAnet, and long before
Gore had 'invented' it as "the information superhighway"). This was back
before HTML - before SGML (the begrudged step-mother of the original HTML)
was turned into a big-spender's dream method of incurring debt - back when
the only thing on the net was text - in ASCII format. E-mail was king.
Usenet spawned a great many discussions and projects just as this mailing
list seems to have given birth to the PC Historical Society, or at least has
had no small part in its creation and maintenance.

Those early usenet groups ran into some problems - some of which I list
below:

1. There were many e-mails with good information.
2. It would have been nice if everyone could have seen all of that good
information without having been a member of any given usenet group from its
inception.
3. People asked a lot of the same questions that had previously been
answered.
4. Instead of wasting the time of those with knowledge to share by having
them answer the same questions over and over, it would have been nice if
instead of asking such a question, a new list user could simply have looked
first to see if it had already been addressed.

And so the usenet group would usually create some accessible archive of all
of the e-mails that had been sent. An archive was typically a
chronologically ordered collection of every e-mail ever sent in an usenet
group - and arranged into files by some specified amount of time - like the
yearbooks of well-read periodicals.

These archives were created by computer programs written to perform the
repetitive work of sorting e-mails into groups of varying sizes and/or
dates. The list managers were happy and everyone was amazed at how easy it
was to find information.

But then list managers began to notice a recurrence of problems 3 and 4 as
listed just above. They checked their systems - everything was working.
There was a new problem. The archives had grown so large that they were a
pain to go through, especially for those who joined the group for what they
thought should be a quick and simple answer to their questions. Also, no one
wanted to wade through 19 e-mails of "Hi, how are you" content only to find
that the 20th was information on something unrelated to their problem, and
then to repeat that for 1000 e-mails to find that their question had not
been asked.

Luckily (for us - not in terms of chance), the usenet group managers hit
upon an amazingly scalable solution. They, too, saw the need for
organization by subject. But they realized that re-archiving every single
e-mail by hand could not possibly have been the correct solution. Instead,
they decided to use only a small sample of those e-mails to give their
semi-exclusive (meaning private in terms of a community) usenet groups an
all-inclusive front-end. They made FAQ's - lists of frequently asked
questions, answered by information culled from the large pool of e-mails
available in the group - with excess verbage removed. Every usenet group
worth its salt had a FAQ. At the top was a list of every single question -
accurately but succinctly phrased - so that the user could peruse them to
see if his/hers had been answered - then further on down the page was the
question with its answer.

In this manner, people who were not subscribed to any given usenet group
could view its FAQ in search of answers to their questions or even just for
information in general - without bothering the list members (and, more
importantly at that time, the hardware that stored and parsed all the
e-mails) by asking repeat or trivial questions. Usenet groups in this manner
became great 'think-tanks' and resulted in many things we take for granted
today - such as computing standards and many software products themselves -
all because the list members with the valuable knowledge and skills were
unencumbered by silly questions from pushy people who should have gone to
the library instead - but could now just visit the FAQ to see the great
thoughts of great thinkers.

Let's come back to 2002. No doubt my point was not lost on anyone reading
this. The 'problems' this list faces are the same as the ones faced by those
original usenet groups. As I am sure everyone has assumed, my vote (for what
it counts) is for solving them the same way that the usenet elders decided
to solve theirs - by creating a well-designed FAQ. But only if such an
amount of work is really justifiable. And that leads me to my next leap into
computer science culture - a look at what needs to be done and how it needs
to be done.

The first step is to look at what we are dealing with - and that is a record
of posts to a mailing list.

Each post contains for our purposes three useful features: (1) the date of
the post, (2) the subject line of the post, and (3) the body, or content, of
the post.

We have identified three feasible methods of collecting these posts into an
usable and useful pool of information: (1) a chronological archive, as first
attempted by the usenet moderators, (2) a FAQ list, and (3) as alluded to by
Jerry, an archive arranged into collections of posts by subject.

So our choice is between an archive or a faq. I will now attempt to explain
each in terms of each method's features.

1. The Archive
1a. is a collection of every single post to the mailing list.
1b. contains many words in each post that do not pertain to the subject of
it's sorted group.
1c. can be easily automated, though not with especially high quality.
1d. can be searched by keyword - and from within an e-mail query, as well,
depending on the list-server used.
1e. can be accessed by all members of the list.

2. The FAQ
2a. is a collection of only the most important or most interesting
information presented in the e-mails.
2b. contains only words that pertain to the subject of each question.
2c. provides a very high-quality information source, though cannot be easily
automated.
2d. can be searched by keyword - though not from within an e-mail query,
except with very much programming work and implementation.
2e. can be accessed by all members of the group, but also by non-members.

So we know our problem(s), our system and materials, and our methods of
solution. How can we answer Jerry's questions?

What follows is what Jerry said and also my replies to each point.

>>> I'm trying to find out (a) if this is something everyone
>>>would be interested in seeing, and (b) if so, is it in a
>>>useful format.

(a) An archive is a great idea.
(b) Archives as I have always known them come in one format - chronological.
I saw the archive as it exists, and it, too, is chronological. However, this
question implies that the idea of arranging an archive by some other method
is being considered. I will address that below.

>>>To begin with, the archive will be updated
>>>manually, probably once a month or so.

Bad, bad idea. Jerry's time is more valuable to us (and more importantly, to
him) than to be wasted doing monotonous, repetitive tasks. This one in
particular just screams for a computer program for its execution.

>>>Eventually, once I finally get the
>>>new web and mail server installed, I'll set it up so that
>>>messages will be archived automatically as they are
>>>posted.

When Jerry does get this, he may kick himself for having archived anything
by hand - the new system will no doubt include some utility for archiving
all previous posts as well as then archiving the new ones as they are
posted. Even if it doesn't, there are free programs that are able to do
these things.

That covers my reactions to Jerry's e-mail content specifically. Now allow
me to continue with my ideas on why wasting Jerry's time by asking him for a
manual archive is a bad idea; this is addressed to Jerry directly.

I think it may be best if you use your time more wisely and for things that
will be less boring (for you to do, not for us to read through).

While a formatted archive can be a fairly useful tool for some lists in
which every e-mail has valuable information that would lead to hundreds of
man-hours spent on faq creation, this is not one of those lists. I believe
one could distill all the knowledge that is to be gleaned here into a very
small, though ever-expanding, faq.

Allow me to explain myself. This list sends me 3-4 e-mails per day on
average, if any at all. Typically, every 6th post addresses a topic that has
not yet been addressed, but let's give the benefit of the doubt and say that
a full 20% are not replies to previous posts. Most of the ones that are
replies to posts offer no new/unique information. Additionally, about half
of the 20% that are not replies are posts such as "Great site, Jerry," "When
is the next PC convention," etc. - posts that may spawn the presentation of
useful information, but do not in themselves contain it or even pose
interesting questions. At best 10% of the posts are useful information -
please understand I am not saying that most are devoid of any value
whatsoever - just that their inclusion in an archive will offer only dead
weight to be sifted through when I am searching for an answer to a question
I might have.

I feel I can back up my opinion by asking one rhetorical question: Would you
format the archive by subject instead of by date? I feel that your post has
already answered this question - you seem to see archive by subject to be
the more useful option (and rightly so). One difficulty with the archival
idea is that it is the right solution to the wrong problem, or perhaps more
correctly, the wrong solution to the right problem. Another difficulty is
that creating such an archive is exactly the type of task which led early
computer scientists to say things like "computers will make things easier."

There are programs that will archive the posts by date into groups, the size
of which you can determine beforehand. It will take a program maybe a night
(if it's poorly written, running on a slow machine, and has a huge workload)
to archive all of the messages. Your time is worth more than that. Better to
let the computer do the work for you - if someone thinks they must have an
archive of every single message, let them hunt by date of post. And if you
feel really generous, you can even add a specialized list archive search
program - available for free, I believe. You do enough just to moderate the
group and keep your site up to date as perhaps the best of its kind, bar
none - no need to exhaust yourself with menial tasks that brainless
computers can do more quickly.

That doesn't solve your problem though - but neither would the idea of
organizing the archive by subject. So what is your problem? An archive takes
care of "I have x number of posts - it would be nice if anyone could see any
one of them at any time." I believe your real problem, though, is "I have
seen a lot of posts with really good questions, and many with great answers;
there must be a way to collect these into one central location whereby all
worthy knowledge could be dispensed." If I am correct in thinking that, then
the solution is contained in the logic of properly formulating the problem.
Questions with answers - a FAQ. Every one of the useful usenet groups to
this day has an extensive FAQ associated with it. You have the perfect
vehicle for such a FAQ already in place - your PC site. I think a FAQ is a
better medium for what you hope to accomplish.

What is the difference, you may ask? After all, isn't an
archive-organized-by-subject just like a FAQ? Yes - but mostly no. The
Internet comes from a long line of formal and informal standards - standards
that have been broken by the business-on-the-web mentality so prevalent
nowadays. To those of us who care, an archive is quite different from a faq.
You have here a private mailing list - outsiders presumably cannot penetrate
its murky depths without becoming an insider. You have knowledge culled from
many insiders that could be useful to everyone - not just for other
insiders. The purpose of the original faqs was to take the archived
information from mailing lists and distill it into one large document
organized by questions which were in turn organized by subject. This also
presented a way of giving the information out to a larger community - to
eliminate the pollution of the mailing list by simplistic questions that had
been answered by previous posts. FAQs are sometimes created automatically,
but the useful ones are done by hand. They are created by picking the most
repeated subjects (found by skimming the subject lines, not the bodies of
posts - a strong argument against weak subject lines) and then distilling
the useful parts of the threads into a single frequently asked question (and
of course its answer). Other posts with very interesting subject matter
(again located by subject line) would be added to the list of repeated
threads.

Let's imagine that you decide to create a FAQ to solve your problem. You
spend maybe 2 hours deciding on a subject breakdown into which you can
organize the questions. Then you start skimming all of the subject lines to
find topics of interest or topics that are repeated often enough to justify
adding a question to the faq - 5 hours later, you're done. Then you spend
maybe 13 hours taking all of those e-mails you collected and copying the
salient points over into a well written question and answer format. In the
end you spent 20 hours and converted all 20 of those hours into a valuable
(in terms of usable and useful information) FAQ. 100% of the time and 100%
of the value.

But let's imagine instead that you go ahead and manually archive by hand.
You have to decide on a subject breakdown that would be best - 2 hours. You
have to read each post's subject line - 5 hours. Then once you have them
somewhat organized that way, you have to read the body of each post to be
sure you have it under the right subject - 5 hours. As I said earlier,
roughly 10% of all those posts will be useful, usable information. So you
spent 60% of the time (comparatively - 12 of 20 hours) to create a product
with 10% value.

So the question now is how can you justify a time savings of 40% for a
quality loss of 90%? Especially considering we get the same 10% quality if
you spend "0 time" and just let a computer organize it chronologically.
(None of this, by the way, is intended to be correct in terms of
mathematical probability or analysis - just approximation)

Besides - the only logical outcome of any archive organized by subject is to
become a faq on your site. So in the long-term your time is doubled by
saving 40% initially.

For these reasons, I think everyone - you and everyone else both on and off
the list - would best be served by a FAQ as opposed to an archived version
of the same.

If you think a FAQ is too much bother, then I urge you not to ever do any
repetitive task "by hand" - especially archiving posts. Have one of those
free programs archive the posts by date, and get a free list archive search
engine program (if that functionality isn't already built into your
list-server to begin with).

Just that no matter what you decide to do, please don't waste your time
doing work that for the most part results little more value than doing
nothing at all would - not for our sakes. Computers are supposed to prevent
that time-wasting, not cause it. I, for one, am very capable of waiting for
the day when you can have it all done "automatically" for you. If you feel
you must spend any time at all on this, I just want to be on record as
saying that I think a faq is the best choice and a better use of human time.
And also to say that you don't have to do it all yourself. I am sure there
are some of us in the group who would be willing to add questions and
answers to the faq, to limit your job as much as possible to that of faq
editor. Of course, once it's up there, the hard part - the questions - will
be supplied by list members and may be added immediately by you if you find
them interesting enough - and then an answer can be added in. One more plug
for a faq - the subject structure can easily be changed - just move the
questions to a new category. There is no need to change the question for
such a move if the question is worded well to begin with. Consequently,
there is also no need to spend more than 10 minutes deciding on the
"perfect" subject structure. Just some thoughts.

~ Jeremy


Home | Main Index | Thread Index