There was a time when people using the news groups infrastructure (mostly) happily realized their ephemeral discussions were being increasingly archived by a site called DejaNews. In the mid- to late 1990s the news groups were already ancient by Internet standards, having existed for more than a decade. But as popular online services like CompuServe and AOL gave their users access to the news groups, and as Internet access began to grow exponentially, news groups came into existence faster than people could find them.
The DejaNews archived proved its value time and time again when people used it to find old discussions for a variety of purposes. I often used DejaNews in my research to find citations, references to books, names of scholars, and other information that people had shared very casually without thought for preserving the knowledge where it could be easily found. I also found myself increasingly using DejaNews to cite myself, when people asked me to explain very detailed concepts I had discussed at length in previous years (and no longer had time to do the research again).
A few years ago DejaNews ran into trouble and it, like so many other Internet properties, was sold and resold. Google eventually acquired the service in 2001 and renamed it Google Groups. At the time, many of us were relieved to know that a great archive would be retained in essentially the same form it was always available in.
Since then Google has “improved” Google Groups and, like with so many other unnecessary upgrades, they have destroyed the essential usefulness of the Google Groups archive. In short, it is no longer possible to find an exact article. Instead, Google Groups Search returns only “threads” as results, and you are forced to manually view every article in the discussion (some of which included hundreds or thousands of individual posts).
Keep in mind that I am not talking about the Google Groups service — only the search tool. As a Web forum Google Groups works okay. I have used worse software (and better) for Web communities. It is the SEARCH function that is horribly broken.
Not only can you not zero in on the exact, precise, specific articles that contain the expressions you ask for, you cannot even get all the articles by specified authors to open up. I have spent many frustrating hours browsing old discussions where I posted long citations, only to see irrelevant posts (by other people) presented to me in full text format while my own articles were collapsed.
When you have to look through 300 articles to find the only article that contains a very specific expression, you quickly realize that the search tool is not doing a proper job of searching. It’s equivalent to driving up to a gas station and asking the attendant how to find Frazier Street, and the attendant replies, “You see that neighborhood over there on that hill? Frazier Street is somewhere in that vicinity”.
I’ve tried using the Advanced Search in several ways. I’ve tried filtering out subject lines, authors, and unrelated expressions. However, you either end up with nothing or you end up with a list of discussions where the links put you smack dab into the middle of the wrong part.
Since 2001 I am sure that the number of news group postings has increased by the billions. But it appears to me that Google has supplementalized the Groups archive into virtual non-existence. That is, the individual messages may as well not exist because you have little hope of finding the right message.
There are certainly plenty of plausible extenuating circumstances that might explain why Google took a really good tool and destroyed it. For example, many news group messages (in olden times called “articles”) consisted of long messages cited within long messages. You might have a 15-article long thread where every other message quotes in full the original message — and the rest might break up the original, or quote some of the quoters.
The duplicate content you have to sift through in news group messages is horrendous — there should be no doubt about that. And the methods people used for denoting quoted text were legion, so any attempt to filter out quoted text would have to be extremely complex.
The sheer volume of postings also leads one to imagine a vast array of hard drives would be required just to contain one image of the non-binary groups (which, I believe, are not archived by Google Groups anyway). Even Google’s legendary database and operating system technologies may have buckled beneath the task of indexing and making searchable the unimaginable volume of “me too”, “can you please answer this question that is in the FAQ that I am too lazy to read” posts, much less the actually useful posts that contain useful, unique, irreplaceable information.
And then there are privacy concerns. For example, does Google actually store all those articles with the “X-No-Archive” headers even though it should not be showing them? And does it archive the quotable followups that replicate those no-archived messages? What about all the cancellation messages that anti-news group spam hunters sent out — are they archived? (Point of information: Cancellations became so abused that a robot was set up to repost cancelled messages, copying the original cancellation message).
I do understand that Google took on an immense responsibility, and over the years they have resorted to some darned clever mechanisms (in my Computer Science perspective) for managing the cosmic volumes of information they have been indexing.
But that doesn’t change the fact that I cannot find very specific, uniquely worded articles that just happen to have been posted as part of extremely long discussions. Even if Google Groups Search zeros in on the right discussion, it still doesn’t open up the right article for me — and in most searches I find that it doesn’t even put me in the right part of the discussion.
There have been several stages of evolution in Google Groups Search, and with each stage of improvement I have seen more and more useful, worthwhile functionality vanish. It is now virtually impossible to search Google Groups for information that was once vital to know — information that I often would like to refer to again.
We have lost an entire generation of thought and expression.
I have only rarely seen people acknowledge the archived news group discussions as part of the World Wide Web. In fact, all Web-based archives have been treated pretty much as spam — even by me. I used to be able to find the same discussions archived in multiple Web forum gateways (once very useful interfaces for people who didn’t have direct access to news reading software) that Google Groups archived. Many of these gateways were monetized — or maybe, realizing that the gateway model easily lent itself to monetization, AdSense spammers created many unnecessary gateways.
I remember complaining on a few occasions about those faux gateway sites. But now I cannot find their archived discussions at all, and it would be most useful to have at least a few of them available. So did Google delist the gateways, or did it adjust its Web search algorithm to penalize the gateways without completely removing them? Or did they simply go away, as so many once helpful Web sites have vanished through the years?
Without a searchable archive the produces meaningful, atomic results, we cannot reasonably include the news group discussions in the searchable Web. That is, the Invisible Web seems to have rallied against the advances made in search technology and it has swallowed up a once very accessible portion of the Visible Web.
Which leads me to conclude that we need another way to search Google Groups. It’s too late for anyone else to create an archive (unless Google is willing to share all those ancient articles) that supports a better search function.
We have lost direct access to a multitude of documents that covered tens of thousands of topics, documents that in their time helped to broaden human knowledge.
I spent years reading and writing news group messages. I don’t have enough time to devote more years to finding them again. I seriously doubt anyone else has that much time on their hands, too.
The situation at Google Groups is a very powerful example of just how easily search technology can lose its way.
So anyone who creates content through news groups (or just through Google Groups) now may want to give serious consideration to developing methods for making their posts findable, particularly in very long discussions where the search index loses track of just who said what.






