Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!howland.erols.net!newsfeed.mathworks.com!intgwlon.nntp.telstra.net!nsw.nntp.telstra.net!news.syd.connect.com.au!news.mel.connect.com.au!news.per.connect.com.au!news.waia.asn.au!news.iinet.net.au!not-for-mail
From: david@cn.net.au (David Novak)
Newsgroups: alt.internet.research,sci.research,alt.answers,sci.answers,news.answers
Subject: Information Research FAQ v.3.7 (Part 9/9)
Followup-To: poster
Date: 14 July 1999 00:00:00 GMT
Organization: iiNet Technologies
Lines: 895
Approved: news-answers-request@MIT.EDU
Message-ID: <7mebs2$h8v$9@news.iinet.net.au>
NNTP-Posting-Host: reggae-10-178.nv.iinet.net.au
Summary: Information Research FAQ: Resources, Tools & Training
Xref: senator-bedfellow.mit.edu sci.research:19075 alt.answers:43090 sci.answers:10341 news.answers:162267

Archive-name: internet/info-research-faq/part9
Posting-Frequency: monthly
Last-modified: July 14 1999
URL: http://cn.net.au
Copyright: (c) 1999 David Novak
Maintainer: David Novak <david@cn.net.au> 

                        Information Research FAQ     (Part 9/9)

    This part of the FAQ highlights other aspects of information research.
    See also the full faq (http://cn.net.au/faq.txt) and the spire project
    (http://cn.net.au). Please note the disclaimer statement on Part 1 of
    this FAQ.

                                Contents 

    		----- Part 9 -----
    32. More on the Internet Research
          32.1 three definitions of the Internet
          32.2 information, transaction, entertainment
          32.3 information formats
          32.4 information preparation
          32.5 publishing motivation
          32.6 promoting information
          32.7 information clumps
          32.8 bringing it together
    33. More on the Commercial Information Sphere
          33.1 structure of the database industry
          33.2 advanced search technologies
          33.3 the art of searching
    34. More on the Information Service Industry
          34.1 judging information value
    35. Emerging Trends in the information sphere
    36. Question and Answer Section
          36.1 How do I find information on the Internet?
    37. Acknowledgments
    ___________________________________________________

 31. More on Internet Research

    Lets agree the Internet is great fun to surf, but less valuable when you
    have a specific question in mind.

    To improve our search skills, we begin by understanding how information
    is arranged on the Internet. Contrary to myth, information is not
    disorganized but rather organized very carefully along clear patterns.
    Many patterns are specific to the information format (text document,
    webpage, email message, printed article). Further patterns match the way
    we become aware of information, or are specific to the information
    systems (mailing list, faq, peer-reviewed journal). Your understanding
    of the strengths and weaknesses of each pattern, each format, each
    system, guides your search for information. We shall start by shattering
    the Internet, and commenting on the many pieces.

 __ 31.1 three definitions of the Internet

    Let us be careful when we use the word 'Internet'.

    1_ The Internet is a physical network; more than a million computers
    continuously exchanging information. The Internet allows us to transfer
    information around the world.

    2_ The Internet is a landscape of information available on almost every
    topic imaginable. This information appears almost chaotically
    distributed to the world, but holds clear patterns. For instance,
    linking the information are various structures like government web
    links, search engines and FAQ documents.

    3_ The Internet is a community of 100+ million individuals. These are
    real people who chose to interact, discuss and share information online.

    What we learn here is not so important as the technique - break the
    large seemingly chaotic system into smaller pieces: pieces which
    hopefully make more sense. Eventually, when we've made sense of the
    little bits, perhaps we can comment astutely on the big-picture.

    In this example, let me just draw your attention to the way most of our
    research effort focuses on the second definition: a landscape of
    information. Much of the best information originates in the third
    definition: the Internet is a community. Sometimes it is far more
    effective to ask real people than search the information cyberspace.

    Let us now illuminate more important facets of the Internet.

 __ 31.2 information, transaction, entertainment

    There is a triad of functions to all online activity:

    Function      -  Activity  -      Unit
    ----------------------------------------
    Information   -  Research  -  The Fact or Conclusion
    Exchange      -  Business  -  The Transaction
    Entertainment -  Play      -  The Experience

    Each Internet function grows at a different rate and moves in a
    different direction. The development of forums is firmly in the smallest
    segment dealing with information. This segment is quite poorly organized
    and confusing. The entertainment function in contrast is well financed
    and graphically innovative with clear, profitable opportunities.

    Much of the web is prepared with Exchange or Entertainment in mind.
    "Brochureware" (purely promotional webpages) is rarely required for
    research, but is critical to the securing a transaction. Entertainment
    related, or just entertaining, websites abound. Let us recognize just
    how few webpages are information & research related.

    My own experience suggests we are just beginning to see the movements
    towards profiting from providing information. Direct sales of
    information is still chaotic and unrewarding.

 __ 31.3 information formats

    The way information is packaged has a great bearing on the content,
    quality and use of the information. This theme is evident throughout the
    work of the spire project, and is particularly applicable to Internet
    information. Webpages, text files, software, email and database entries
    each have particular qualities. Each shapes, constrains and restricts
    the informative content. These qualities apply irrespective of the
    information involved.

    Books are dense, factual, a little old. Articles are short, sharp, more
    recent. News is puff, introductory, immediate. Each way the information
    is packaged, each format, presents the information to set standards.

    Information formats on the Internet are the same. Webpages are
    graphical, technical to produce, and not easily updated. FAQs are easier
    to maintain, text only, and attract more peer review. Mailing lists are
    simpler still, text, short, immediate, very peer-reviewed, characterized
    by discussion and resource discovery. Newsgroups are characterized by
    extremely low costs, vulnerable to trashing, poorly managed. Email is
    simple use, one-to-one discussion.

    Lets look at books more closely. Books are created by authors who have
    something to write. Books are printed and marketed by Publishers to the
    bookstores who then provide it to the readers. Each facet of this
    process defines the resource. Books have quality, editorial vetting but
    minimal peer-review, marketable value and a potentially lengthy
    preparation time.

    When it comes to research, why look for a book when searching for
    information about digital money?  Books would just have the wrong
    qualities - would present the information poorly. We need a more current
    format (digital money is a fast moving topic), and a more peer-reviewed
    format (books have editorial vetting, but not intrinsic peer-review).
    Why not search for a mailing list, an FAQ, or an association website.
    These formats have qualities more appropriate to our question.

 __ 31.4 information preparation

    Information flows also impress patterns on Internet information. Most
    information is transplanted to the web - first created elsewhere. The
    source of information imparts as much pattern as the eventual format the
    information takes.

    Information may appear as a webpage, and conform to our expectations for
    all webpages, but the information may have been prepared from the
    discussion on a mailing list - and thus enjoy a more topical, specific,
    timely and peer-reviewed quality.

    Lets look at FAQs. The best resource in the world on copyright law is
    the musings of a group of copyright lawyers who form the copyright
    mailing list. The copyright FAQ supported by this group is a logical
    document summarizing much of the discussion of this mailing list. FAQs
    are vetted by the news.answers team, then automatically mirrored around
    the world. From its origins in the mailing list, the FAQ is a
    peer-reviewed document, often full of links to further resources,
    topical, knowledgeable and factual. As an FAQ, the document is not
    immediate, graphical or financially rewarding (some FAQs stagnate).

    Only some Internet information is created within the Internet
    environment. The concept of  'brochureware' describes the common traits
    to promotional webpages directly prepared from paper promotional
    brochures.

    One of the more exciting trends is the movement of information from the
    dusty shelves of government offices and association libraries to their
    more accessible websites. The quality of information retained in your
    average government agency, from quality research reports, to detailed
    studies, to current industry monitoring is very high. These qualities
    are then brought over to the web format. Such web-documents tend to be
    isolated (not linked to other related resources) and perhaps a little
    behind the time line, but of a generally high quality.

    An exciting holistic views of the Internet information landscape is
    based on these descriptions. Information flows through a collection of
    systems. At certain points, information groups together, and generates
    new, perhaps higher quality information, which then flows in a different
    system - flowing to different people.

    The flow of information from one person to another, from one format to
    another, imprints qualities to the information along the way. Each
    organization, or subsequent re-organization, imparts specific styles and
    conventions and quality to the result.

 __ 31.5 publishing motivation

    Information appears on the Internet for one very specific reason.
    Someone publishes (DUH). The motivation behind publishing creates
    distinct patterns. Patterns we will use to better search for answers on
    the web.

    Ask yourself who is publishing, and why.

    One of the biggest publishers six months ago were single individuals
    publishing documents written from their personal expertise. A typical
    document would be one with minimal peer review, a list of aging links to
    further resources, simple graphics, variable to short length, prone to
    bias, but moderately reliable because the publisher knows their topic
    well. These pages are usually located on web pages with private
    sub-directories (usually starting /~name/).

    Commercial Sites publish mainly for promotional value. Their secondary
    purpose is to provide sales information to prospective clients. Rarely
    do commercial sites go beyond this. Commercial webpages often reside on
    their own domain name, as a .com, or in sub-directories - without the
    tilde symbol. Commercial sites also tend to age badly. They are very
    noticeable from their front page.

    Government agencies are emerging as valued publishers. Slowly their
    dormant information becomes available through this new medium. Currently
    almost all government documents on the Internet also appear in print,
    meaning they are factual, exhaustively reviewed, tend to be a little old
    (but age well), and come from highly paid knowledgeable people who
    believe it is their duty to inform others. Such documents are lengthy
    and appear on .gov domains.

    Gee! This is fun.

    Grant-funded projects create brilliant research resources and hold much
    promise in pushing the limits of this technology. I am eager to see the
    results of the US Patents project, and appreciate the value of having
    Supreme Court rulings on the Internet. Often such projects are short on
    money but deeply focused on content. Most projects reside on educational
    servers and are widely discussed within knowledgeable groups.

    Associations, publish association-kind-of-things. Most are initially
    just like the commercial webpages, but with time become much more
    factual and research-worthy. Most associations are dedicated to
    developing awareness of their chosen topic, albeit coloured by their
    chosen bias. Few associations are significant publishers, but this
    segment will begin to liberate dormant information within associations.

    Lets reverse. The key is to always watch who is the publisher. We can
    assume a great deal, quickly. We are unlikely to find the latest changes
    to patent law from government or commercial publishers. Such
    organizations are simply not motivated to present such information.

 __ 31.6 promoting information

    Publishing is one achievement, but you and I will never read any
    information until we learn it exists. This simple fact creates even more
    patterns to Internet information. Knowledge of information moves through
    set routes on its way from writer to reader.

    Promotion is not simple. It is a process which takes time, effort and
    perhaps money. Information without serious promotion tends not to be
    promoted far from the source. Another way to phrase this; you must
    search close to the source to find poorly promoted information.

    A search engine indexes pages relatively indiscriminately. This also
    means a site of quality is not likely to reach your attention. The odds
    are not good, and from a promotion point of view, search engines
    generate minimal traffic to your webpage. Search engines drop you rather
    randomly into a website. It is often necessary to move up a directory to
    understand the purpose and motivation of a site you find interesting.

    Information published through advertising tends to have a financial
    payoff for the promoter. This kind of information tends to be
    promotional information, 'brochureware'.

    The alternatives are to promote a webpage or website through one of the
    referral tools. Each such tool accepts links on some criterion. Each
    tool you use to locate information also selects particular types of
    information for your attention.

    If you arrive at a document by recommendation through a mailing list,
    the document is likely to be recent, on-topic, and specific to the
    purpose of the mailing list. Alternatively, (for poor mailing lists) it
    will be wildly off topic and trash. You are unlikely to see referrals to
    old documents or documents of historical importance. These are the
    qualities most acceptable to the mailing list environment.

    Webpages or websites found through historically important documents, by
    their nature, tend to be long-lasting websites with lasting importance
    in the field. Such documents point to other similar documents or
    websites which have achieved a long-lasting importance. You are unlikely
    to find specific documents, but rather sites which focus or bring
    together information. In short, there is little motivation to link to
    specific webpages, when a link to important websites is considered just
    as good.

    Directory trees, FAQs, guidebooks and related promotion tools all work
    as historically important documents.

    Similar generations can be made of each type of promotional tool, and
    become important in rapidly seeking our information which matches our
    intention, as well as summarizing the likely motivation - and bias - of
    webpages we are interested in.

 __ 31.7 information clumps

    Information Clumps. Information is created, nurtured, develops, gets
    transplanted, gets arranged and then becomes visible through a process
    which brings similar information together.

    As we have discussed, there are a collection of factors which deeply
    affect any information appearing on the Internet. Motivation,
    Preparation, Format and Promotion defines the quality and content of any
    given item of information. With so many influences, we should not be
    surprised to learn information naturally groups together. In reality,
    there is nothing natural involved - it is a social phenomenon reinforced
    each time you and I visit or read one resource but not another.

    History can explain some aspects of Internet development. As a small
    collection of sites become dominant in particular fields, by collecting
    and delivering better content to more people, new sites find it
    progressively more difficult to capture attention. This dynamic works
    for websites reaching out for visitors, and discussion groups reaching
    out for subscribers. In each case, seniority counts.

    Seniority counts in several ways too. Promotion is directly related to
    quality, interest, traffic and time. The longer a site is active, the
    better the footpath develops, the more people visit. Secondly, quality
    content is directly related to access to quality content, peer review,
    and time/money. Important existing sites gain in every way.

    This result in a grand system where the first-in, best-dressed, can
    capture the high ground and secure a grand lead in awareness and
    footpath over competitors who follow. Yahoo is a prime example of a
    directory tree, not even the best in most areas, which has achieved
    unparalleled traffic & awareness.

    This competition is equally evident where no money is involved. Perhaps
    our association wishes to create a new referral website, or an open
    mailing list, or an informative guide. All sound concepts and effective
    projects. However, if older, established resources exist, the work will
    be long and arduous.

    Despite the marketing message, the Internet is not a world where the
    best information floats to the top. The Internet will not let you to
    reach millions. You must compete for the attention, participation,
    devotion and assistance in a manner very similar to building a business.

    In concrete terms, information clumps on the Internet. The best resource
    could appear on any Internet system (webpages, email mailing lists,
    ftp-archives, faqs, online databases, newsgroups...) but we can be
    fairly certain the best information will congregate in just one or two.

    Consider our article "Searching the Web"
    (http://cn.net.au/webpage.html). We progressively search different web
    tools, looking for the most worthy. Searching the Internet is the same.
    You must touch each system to see which system is dominant, where the
    information is congregating for your topic.

 __ 31.8 bringing it together

    In summary, we have broken down and discussed various qualities of
    published information and promoted information. We have made sweeping
    generalizations and educated guesses about information on the Internet.
    Now what?

    When a painter begins to paint, they have already visualized some of the
    image. They already have a concept of the finished result. Internet
    research is no different. We start by building a vision of the
    information we seek. Who would publish it. Where would I find it. What
    is its motivation. How would we find it. We now have a practical vision.

    The Address is the key. The url for any item of information gives us a
    surprising amount of information - particularly now we are making
    generalizations about information patterns. We can guess if information
    resides on a personal webpage, a funded university project, or a
    commercial project. The information resides on a .gov website? - the
    quality is likely to be higher and conform to our expectations of
    government resources.

    We use this new-found experience in three ways. Firstly, we restrict our
    searches to the most likely sources. Secondly, we quickly jump through
    lists of resources (such as those generated by search engines) to the
    sources which match our expectations. Third, your understanding of the
    relative qualities of information guides your judgment of information
    value.

    Internet newcomers often expect to have instant access to the latest
    information at the touch of the button in beautiful colour and peer
    reviewed quality prose. Who is publishing this? Where is this
    information coming from? Who would help us find this? Such a vision is
    fantasy. If we were instead to look for an association website,
    dedicated to a certain type of research, or an informed newsgroup,
    maintained by people passionate about sharing this technology, then we
    have made four steps forward. We are clear about where to look for the
    answers we seek, and we will know quickly if the answers are not online.
    ___________________________________________________

 32. More on the Commercial Information Sphere

 __ 32.1 structure of the database industry

    The commercial information sphere existed in the 1970's and earlier. It
    is far more developed, far better organized, far better funded, almost
    always far more valuable and expensive than every other research
    resource.

    For the most part, commercial information is arranged reasonably
    uniformly in large databases of full-text or bibliographic information.
    Some databases are small, single source documents, while others are vast
    unfocused collections of, for example, all the news from the last 15
    years.

    Most directories and journals can be made into a database, but
    single-source databases do not enjoy much financial success. The market
    is too limited and the cost of promotion too high (except in a local
    market with newspapers). To overcome this difficulty, single sources are
    grouped together into larger collections of databases on a particular
    topic. These large database groups have become the primary tool for
    commercial research.

    Developing these databases requires considerable expertise and expense.
    Sometimes data requires abstracting, interpreting, and as with some
    Lexis-Nexis and WestLaw databases, even expert legal interpretation.
    Sometimes this is accomplished by firms developing a portfolio of
    databases. Sometimes this is accomplished individually.

    The marketing and consumer billing of such databases is then provided by
    a relatively small collection of large database retailers. A list can be
    found in our "Commercial Databases" article. As an indication of the
    size of this market, Knight-Ridder sold Dialog & Datastar for a figure
    approaching half a billion dollars.

    This industry consisting of a wide collection of players, each improving
    and developing the information from individual periodicals, journals,
    news items... All very confusing for the end user.

    This is elegantly illustrated by the database descriptions for
    Lexis-Nexis databases (their preferred term is libraries). See
    http://www.lexis-nexis.com/lncc/sources/libcont/aust.html as an example
    of a 'Database about Australia').

    Many single-sources appear in different commercial databases. Further,
    different databases sometimes include different information from the
    same single-source. One database may include just abstracts, another may
    include fulltext, chemical indexing and more.

    Most researchers are unfamiliar with what exactly is being searched.

    This state of affairs is not unproductive. Searching a 'Database about
    Australia', is uncomplicated. You receive information on Australia. It
    is simple, informative and incomplete. Of course, researchers are busy
    people. Time is critical and results matter. This system also gives rise
    to great customer loyalty to database retailers brought on by ignorance
    and obsfucation in the quest for simplicity. Unfortunately, I am hard
    pressed to compare prices let alone describe the differences between
    information products.

    Prices actually model many a developed industry, remarkably similar to
    the telephone or banking industry. As one friend commented, "bullshit
    baffles the brains". The prices are complex on purpose. It becomes very
    unrewarding to compare prices, and any conclusions are only valid in
    specific circumstances - and will not hold in others. This trend,
    familiar to us as a multitude of banking changes and telephone pricing
    schedules, reinforces our need to stop price hunting and trust our
    favoured information retailers.

    This is not to say we should not try to compare prices - but for the
    most part, you will find comparing prices a most unrewarding experience.
    It really requires you to search and retrieve the same information on
    different systems - and this does not even begin to touch different
    databases, or database groupings, or variables which change over time
    like download speeds.

    Optimistically, there are actually very few important databases in each
    field. It may be simple to browse each of the databases in your field
    and compare directly. You may never need to know more than a few
    databases intimately. Realistically, you will yearn for a simpler
    solution.

    The commercial information industry has distributed information this way
    for several decades. It is both sophisticated and quite difficult. You
    will need to become experienced with inverted indexes, search techniques
    (Boolean, truncation, proximity, field limits ...), and properly
    phrasing the question in a way that will be answered by a database
    search. I have always found the value of a database search directly
    proportional to the length of the query.

    If you are incompletely skilled at database research, you will take
    longer, pay more and locate far more information (or unwisely discard
    more) than necessary.

    This is very different from searching Altavista and Webcrawler.

    Doing your own research offers an opportunity to more closely influence
    the research process. Sometimes only you understand the topic and
    sometimes you can more quickly discard unimportant details. Certainly it
    is becoming simpler to undertake some work yourself.

    Many of the commercial databases are also available in a CD format.
    Substantial subscription costs limit their availability to large
    research institutions and libraries, but exceptions exist. I believe
    world books in print costs AU$5000+. Provided you can find casual
    access, it will cost you far less. Keep an eye on the age, though.
    Sometimes (and only sometimes) online information is more recent.
 
    The decision between undertaking research on your own or seeking
    external help is really a decision based on your research expertise,
    your budget, your access to information, your time, and the importance
    of finding all the information available. It also depends on your access
    to some decent research assistance. Your decision.

    What I do know is a newcomer to the commercial information sphere will
    seriously underestimate the difficulty involved in searching, and
    underestimate both the cost of research and the cost of research
    assistance. Keep in mind this same system serves the needs of large
    commercial conglomerates, professional legal research, and well financed
    government studies. The commercial information sphere contains far more
    valuable information than you need. Often the Internet is just an
    interesting sneeze in comparison.

  #  Article: The Gale Directory of Databases (bi-annual in two volumes)
    includes a factual article as the forword, following the development of
    this industry.

 __ 32.2 advanced search technologies

    Searching is both science and art. The science is a the range of
    improvements to the blunt system of simply asking for a word. The good
    news is an experienced searcher can accomplish wonders - collecting
    articles of 70%+ interest regularly on expensive database. The bad news
    is most of the best of search technology is not implemented on all the
    databases you will search and rarely ever on databases available on the
    Internet.

    The art is a kind of magic, of choosing just the right words at the
    right times, and in phrasing your request for information in a way which
    tightly describes your interest without removing information which
    should interest you. The art of searching relies heavily on an
    understanding of what is possible within a given system. Much of this,
    you guessed it, involves creative visualizing.

    Current search technology allows us several ways to refine our search:

    Straight Word Searches:
    All search situations allow you to ask for the presence of words in a
    block of text. If you ask for the right words, they you will quickly
    locate the information you desire. For best results, you obviously
    search the desired text several times with different terms, and you
    consider the possibility of different spellings for the same words. I
    use this frequently to locate information in web pages, in large
    documents like online directories or the archives of past discussion on
    forums.

    Text Fragments:
    The simplest refinement to straight searching involves searching for
    parts of a word - if you are interested in surfing, search for surf
    better yet, search for " surf" with the space in front of the word.

    Truncation:
    Some search engines don't allow searches for text fragments, and you
    must explain your intention by adding a truncation mark (usually * or ?)
    to the ends of words. For most professional researchable alga? will
    include both algae and algal. I was once badly lost because of the
    spelling difference between aging and ageing. There are a number of
    improvements on this concept to. Sometimes there are special symbols for
    a nonspace character car?a, sometimes there is automatic awareness of
    multiple spellings (colour & color). Sometimes there is even automatic
    awareness of synonyms. Often you are initially unaware important
    information is indexed under slightly different spelling, so truncation
    is strongly suggested for most searching.

    Thesaurus:
    An improvement on truncation is the opportunity to look directly at a
    list of words, either keywords, or descriptors. This allows you to see
    the range of spellings before you search. This is also ideal for
    searches of company names or proper places so you can select only the
    words you are interested in. Some library catalogues present subject
    searches in this way: a list of subject categories arranged
    alphabetically.

    Boolean operators:
    Changing tack, searching for multiple words calls for "and, or, not"
    concepts. I want this word and that word, but not another word. It is
    simple enough. Many of the search engines allow for this with the -sign,
    and commercial databases often add brackets. Use of the not symbol is
    frowned upon in textbooks (too easy to dismiss information you are
    interested in it is said), but the 'and & or' is absolutely necessary
    for complex questions like I want [(spaghetti or noodle) and pasta] or
    (Italian and cuisine). With most internet search engines, but not all
    commercial searches, you will find the and is assumed.

    Proximity operators:
    The next dramatic improvement fixes the position of words relative to
    one another. In this category we have adjacent (often written as adj,
    next, or "inserted in quotes"), near (by how many words), or in the same
    sentence. Often it is wise to stretch the distance a little (within
    two), but where available, proximity is best way to remove the dross
    without affecting the value of information. "Patent near Research" is
    much more precise than "Patent and Research".

    Fields:
    By separating information into different fields, we can selectively
    search different portions of the information. I want the title to show
    the words "Patent" and the abstract to include the words "Patent
    Research". Field searching is a common way to refine a search, but be
    aware searching titles is very likely to remove some desired
    information, where as searching descriptors and not abstracts may
    dramatically improve the content.

    Date Field:
    Are you really interested in information more than 15 years old? Library
    catalogues frequently have many aging books, and date limiting is very
    wise.

    Further Enhancements:
    There are some special techniques available on a few systems which bear
    discussing. Sorting allows you to shape the presentation of the
    information. When applied to financial information, this is particularly
    valuable. Alerts allow you to automatically repeat a previous search and
    have the information sent to you. Multiple database searching allows you
    to search a collection of databases concurrently. Ranking positions
    certain information at the top and is valuable when your search is not
    time or price limited.

 __ 32.3 the art of searching

    The artistic side to this deals with two fields. Firstly, the selection
    of accurate words is not automated. The searcher needs to approach the
    information beast fully recognizing he or she is likely to get either
    tons of information... or far to little. When to expand, when to get
    more in-depth and how to handle fields which you may be poorly
    experienced in are talents. The search technology itself is simple.

    The trouble lies in retrieving from databases with far to much
    information for simple word selection. It also flares when you are
    dealing with databases charging up from $2 a minute and an additional
    cost per item retrieved. You decide very quickly to get good at
    searching once you receive a bill for $200 of irrelevant information.

    The simplest solution to this difficulty is to practice. You will find
    all Research Libraries provide access to slightly older articles through
    CD-rom databases. Search these to hone your skills.

    I saw a small book on search techniques from an early course in
    Alexander Library - but it is very basic. Most librarians build
    experience in using search systems either internally, or through a
    series of courses given by traveling database officers like the periodic
    training by Dialog-Insearch. These are expensive, but include some free
    time searching the expensive databases (no, they don't let you take
    information back with you).

    Now, there must be something else I can share with you on this topic.
    Firstly, learn something about how the databases are built in the first
    place. It helps if you know what an inverted text database looks like. I
    know an aging book Chapter 3 & 6 which I found particularly impressive
    on this topic. I'll see if I can publish it here.

    Secondly, something personal about technique... I always find the uglier
    the search query, the better the result. Honestly. A search combining
    numerous elements improves your chances of getting it right.

    Secondly, I always try to change my search techniques to match the
    medium. I am likely to be more careful of broad searches of expensive
    database, where as free databases often lead me to gather 50 articles,
    then weeding them out by hand. (most CD-roms allow you to select only
    the ones you want). Always bring a 3.5'' floppy with you when visiting a
    library on the of-chance you want to download and look at results
    another time.

    Thirdly, I almost always find one of the most difficult challenge is in
    locating those specific terms which appear in 80% of the documents which
    will interest you. When searching the Internet for information about
    government use of the web, the specific terms required were government
    and publishing (not even government publish was close) All other search
    terms gave far to much garbage. Yes, of course, being an expert in a
    particular field is an edge in already knowing these special terms.

    There are two escape hatches here. Firstly, if you can find one or two
    articles which interest you, often you can browse these articles for
    those special words. Sometimes even, the descriptors of an interesting
    article will give you a specific subject heading. I've heard this
    technique called the " Pearl Development Technique" but I just think of
    it as a good idea. The second escape hatch is the use of free databases
    to prepare you for going online. If you have ready access to a CD-rom
    database, search this first - get the right search words on the free
    databases, then go online.

    Oh, of course, there is also the issue of just asking someone involved
    for the proper words. I like to ask my clients if they know what words
    are likely to be used. Its not a mark of an amateur to be asked these
    questions, by the way.

    A couple of side issues

    1) Keep an eye on the type of document you are searching. If you want
    full text - don't go looking in bibliography databases. More to the
    point, don't start word searching databases with really big files
    without using the proximity indicators and descriptive fields. I hated
    paying for that 20 page document which included all the words I was
    interested in - but on different pages.

    2) Also, keep an eye on the quality of the documents you are retrieving.
    I know a search of newspapers sounds impressive, but they are rarely
    capable of explaining anything in depth and are notorious at being
    advertorials. I try to keep newsprint for locating experts - not for
    information. I have also been trapped by obscure magazines with
    appealing articles, only to learn the magazine is one of a large number
    of very basic business mags which likes to use fillers, or just doesn't
    like to pay for good journalism. A single article of 5 pages from
    Scientific American blows 20 small fillers out of the water. In fact the
    length of an article is a hint of depth.

    Oh, if you are looking for some really good books on this issue, try the
    manuals Dialog sends you to start, try looking for text databases in you
    library to locate something to teach you about building a database, then
    proceed to one of the search books recommended at the end of the
    professional research page.


    ___________________________________________________

 33. More on the Information Service Industry

    Private Detectives, Professional Database Researchers, Library
    Researchers, Legal Researchers, Commercial Database Producers,
    Commercial Database Retailers, Magazines, News Organizations, Libraries,
    this is a big industry. Information Research is just a process linking
    together people seeking information with people who provide it.

 __ 33.1 judging information value

    Information has value. It also has other qualities that will assist you
    to judge information you may consider buying.

    Accuracy: the factual nature of the information presented. If the
    statistics purport to show a particular trend - how large is the margin
    of error? How large is the sample size? How likely are there to have
    been factual errors in their development? The measurement of statistical
    error is now a refined science in some fields. A statistical result can
    be inaccurate when the sample size is too small, if the margin of error
    is too large, the sample collection procedure incorrect, or a number of
    other situations.

    Reliability: the support for trusting the solutions, both from
    additional resources and from being able to duplicate the conclusions.
    This includes the reputation of the researchers. No matter how
    inaccurate and biased you may believe certain facts to be, successful
    independent support of a suggested fact does improve its value.

    Bias: conscious or subconscious influences that affect information. Bias
    can occur in collection, preparation and presentation of information.
    Most information you find will be tainted. Secondary information is
    deeply affected. Statistics are not necessarily less biased.

    We counter bias in several ways. Firstly, we try to be aware of bias.
    Where is bias likely? Which direction would the bias affect the
    information. Secondly, we try to collect information with different
    bias. This is why research based solely on government research, no
    matter how accurate and reliable, is less valuable. Often information
    from different countries can counter bias. Thirdly, we need to accept
    bias is likely to exist. This is why primary sources are often more
    valuable than secondary sources. This is why tertiary sources, like
    experts, can rarely stand alone.

    Age: The date information was created or compiled will feature
    prominently in the value of information. Dates given sometimes mean the
    date information was created, or the date information was compiled. How
    old is a book compiled in 1995, that took the author 10 years to finish?
    I find statistics often forecast information, prominently displaying
    recent compilation dates but still use old census data or the like to
    draw their conclusions. Information on the Internet typically has no
    date, and can be severely challenged because of this.

    Purpose: purpose merits further discussion. When you are uncertain about
    potential bias, you can look for reasons to distrust the information
    instead. Suspicion is not equivalent to bias, but it can be thought
    provoking. Privately, I have heard repeated rumours important national
    statistics have been fudged in different countries. A government
    research report investigating the price of books in Australia would have
    a political purpose, a purpose that provides the climate for some
    potentially significant bias. A tell-all book by industry experts often
    include a tremendous quality of insider experience difficult to find
    elsewhere. While there may be a purpose of self-aggrandizement, the
    purpose is less a climate for significant bias. Medical research has
    perhaps the greatest climate for significant bias, and this suggests the
    greatest standard of proof and external, reliable support.

    Accuracy, reliability, bias, age and purpose are very important in
    research. This is what leads us to an appraisal of value. For years, the
    tobacco industry funded 'independent' research finding smoking minimally
    harmful to health. It is now likely there may have been errors brought
    on by accuracy, and bias. Certainly, purpose was in doubt. As new
    studies show smoking is harmful, we can also say the original research
    lacked reliability. In some topics, like business and the Internet,
    research is perpetually suspect because it also ages so quickly.

    I have seen further discussions which add 'Coverage' and 'Authority' to
    this checklist. Both have bearing on the value of the information
    contained. By coverage, we mean how much detail is invested in covering
    a specific topic. Sparse or shallow coverage is closely tied to missing
    critical aspects of information. News stories frequently have limited
    coverage.

    Once you are acclimatized to these elements, you begin to see potential
    for error in a whole range of information. Real-estate association
    figures, expert opinions, Toothpaste advertisements and National GDP
    figures all occasionally display some degree of warping and
    manipulation, clouding the truth. The solution is awareness, comparison
    and careful analysis. As a personal aside, this is part of the reason
    for my personal dislike for market research: it is often taken far more
    seriously than warranted and mean far less than suggested.
    ___________________________________________________

 34. Emerging Trends in the information sphere

    Firstly, for the past few years, individual database owners/maintainers
    have been flirting with the idea of making paid access available through
    the Internet, rather than the existing system of allowing database
    retailing firms to promote and market their databases. I have heard
    rumours most database producers earn up to 30% of retail price when
    delivered through database retailing firms. The Internet is not a
    commercially viable alternative...yet, but some have emerged with
    alternative funding despite this (Library of Congress, ERIC, see section
    13). Others are creeping in around the edges by offering subscribers
    access at a much reduced flat annual fee (Computer Select at one time).
    I expect to see much more of this once a meaningful way to charge by the
    page emerges. Digital money holds the key but despite the hype,
    practical use appears to be some time away.

    A second trend is Internet publishing itself. Gradually, the information
    is getting easier to locate (don't laugh please - its undignified). We
    are also getting better at using the Internet as a tool to disseminate
    information. We have the very visible, if perhaps short-lived, search
    engines, but also other efforts like archives of FAQs, archives of
    guidebooks, applying the dewey decimal system to the Internet,
    specialist directories, subject guides, specialist search engines. This
    will be a lively field for several years to come. As it gets easier to
    locate the good information, perhaps the lines between commercial
    quality and Internet quality will begin to merge in places.

    The third trend is the very promising prospect of paying for information
    by the page through the Internet - viewing the results in a web page
    immediately. There are some technical hurdles yet, but certain elements
    are already appearing in ventures like DialogWeb. This step may prove
    profitable for ATM vendors and owners of Internet cafes, pubs and
    kiosks. It will also herald a dramatic drop in the cost of information.
    ___________________________________________________

 36. Question and Answer Section

 __ 36.1 How do I find information on the Internet?

    A search for information on the Internet is not essentially different
    from the standard information search process. You still need to start by
    outlining carefully just what you are hoping to locate. You also need to
    be aware of the peculiarities of the Internet as a researchable resource
    (or rather a collection of resources). If you expect instant delivery of
    exactly what you require, free, then you need a reality check (and I am
    sure you will get one real soon). Sadly, the printed media tends to
    overlook this.

    As with all resources, the more familiar you are with a given resource,
    the more efficiently you will work. Get to know the Internet for a time
    first. Understand how it works. Then re-adjust your expectations and
    file it as just another collection of resources, perhaps preferable in
    certain circumstances.

    A more complete answer to this question starts with a great deal of
    reading and is a primary purpose of the spire project.
    ___________________________________________________

 37) Acknowledgments

    I would like to thank my sponsors who have made this possible, and my
    wife Fiona, who I love and cherish dearly.

    The spire project is the culmination of several years bridging
    information research and internet development. The information research
    industry is on the verge of a radical transformation set to add meaning
    to the oft-used saying "Information Revolution". The development of the
    Internet is currently delayed by many factors, but to grow further, we
    need to radically improve the middle ground of content-rich
    resource-linked webpages. I feel this is the most beautiful form
    information can take in this emerging information landscape. It is also
    a most effortful area to work in.

    The spire project is the most advanced information guide today. Thanks
    to the many readers who assist in building and refining this
    information. Your help is appreciated.
    ___________________________________________________
    Copyright (c) 1998 by David Novak, all rights reserved.
    This FAQ may be posted to any USENET newsgroup, on-line service,
    website, or BBS as long as it is posted unaltered in its entirety
    including this copyright statement. This FAQ may not be included in
    commercial collections or compilations without express permission from
    the author. Please post permission requests to david@cn.net.au
    -----------------------------------
    David Novak - david@cn.net.au

.