INMD Ad Hoc Folks INTERNET-DRAFT draft-hanson-nnmp-01.txt Expires Feb 14th, 2000 C. Hanson Arcticus Sept 16th, 1999 INMD: Internet Metadata Status of this Document This is a rough draft of an idea that deserves wider dissemination and comment. Distribution is unlimited. Sarcastic humor is included. This is the second (01) revision and reflects changes made to expand the scope of the rating system beyond Netnews to include generalized Internet resources such as web sites/servers/documents. The second edition changes the proposed name from NNMP to INMD. This document is an Internet-Draft and is NOT offered in accordance with Section 10 of RFC2026, and the author does not provide the IETF with any rights other than to publish as an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Contributing Authors People who have recently contributed ideas to this brainstorm include Steve Koren, Bob Maple, David Kessner, Ian Cahoon, Eric Schultz, Frank Weed, Earl Miles, Jamie Krutz, Michael Ash and Dave Warner. Abstract We attempt to design an extensible parallel distributed client-server customizable profiles-based ratings system with as many buzzwords as possible. Additionally the system should be capable of amassing, tracking and indexing other Netnews/Internet "metadata" -- any data about netnews articles or Internet resources other than the articles/ resources themselves. By this method, we seek to win the Nobel Peace Prize in Communications by staving off smothering regulation and making the worlds largest and most democratic information/discussion forum useful again. Table of Contents 1. Definitions 1. Introduction 1. Pros 1. Cons 1. Client 1. Server 1. Example of Usage 1. Notes 1. Related Works 1. Security Considerations 1. Author's Address Definitions Metadata: Data about data. In the context of Netnews, metadata is any information about the news articles other than the news articles themselves. In this case, rating information. In the case of Internet resources (web/ftp servers/sites/documents, etc), this could mean rating or classification information. Introduction NNTP-based Netnews, aka Usenet , once a valuable source of information (signal), has for quite some time been drowning in 'noise'. The signal has been lost. The noise is comprised of deliberate 'spam', off-topic content, and content of little to no value. More vexing, the actual signal level has not been increasing at the same rate as the noise, and indeed may even be decreasing as numbers of quality authors depart the Netnews in general disgust of the decline of the medium. Those familiar with the 'early' and 'late' days of Usenet understand the difference. Anyone who has seen the evolution (devolution?) of the Lightwave mailing list into the Lightwave newsgroup, and the degeneration of both into a morass of noise will see clearly the problem. A filter needs to be constructed to separate the signal from the noise at the reader's bidding. The problems are many. To begin with, not all readers agree on what is signal and what is noise. Additionally, a filter must be capable of coping with the vast numbers of messages in different formats and languages that travel the Usenet. A filter must be very clever to interpret the actual content of a message regardless of the format, language, encoding or other medium-specific attributes. Finally, a filter must be immune to accidental or even very deliberate filter manipulation or spoofing. What nature of computer exists that can meet these strict criteria? Only one: The amassed brainpower of the global Internet userbase. We propose to design and implement a system to facilitate the rating of Usenet/Internet articles/content by each and every reader/browser (if they so desire). The system stores and indexes these ratings and will permit users to sort and present (unviewed) Usenet messages/ Internet resources based upon how a user-selected set of peers rated said messages/content. Pros * Distributed Client-Server Architecture * 'Expert' system * Does not enforce censorship * Difficult to spoof Cons * May be a resource hog with large numbers of raters. * No (?) known interesting business model for implementers or service providers to profit from. Client The client needs to be integrated into popular newsreaders/browsers. The client needs to contact the INMD server and negotiate any login/ security session issues to access the user's Profile. Subsequently, when the user reads a Usenet message/views a document, the client should offer the user the option to submit a rating for the message/ document. Ratings could be a simple 1 to 10 scale, or -10 to 10, or such, or could attempt to rate multiple aspects of the message -- relevance, coherency, G/PG/PG13/R/NC17/X, overall score, etc. When retrieving messages/documents, the client should request rating information from the server pertaining to the message(s)/document(s) in question, and use the ratings information to rank and/or filter the unread messages. Which profile or profiles are used to generate the rating, and in what proportions should be up to the user. Server The server should securely keep a profile for each authenticated user it 'hosts', and track any rating submitted by that user by Usenet message-id or Internet URL (Uniform Resource Locator). Additionally, other users should be able to request a rating by specifying profile id and message-id/URL. The server should be able to accept and fulfill requests for rating from multiple profiles and multiple message-ids/ URLs in one transaction for efficiency. The format of the data in the actual TCP/IP packets is not yet defined, but should be as concise and brief as possible. Optionally, the server could generate a list of the 'top 50' (or top n) most popular profiles it hosts, and perhaps a list of profiles most closely matching a users own ratings. (See Related Works for examples of this type of feature.) Database speed and compactness are the primary criteria for the server. Record sizes for each rating have not yet been estimated. Example of Usage User 'Joe' starts up his INMD-enabled newsreader. It contacts his ISP's NNTP and INMD servers (news.foo.net and ratings.foo.net) and performs whatever authentication is necessary. Joe browses the group alt.frogs.small.green. His newsreader sees that Joe has selected several other user's profiles (biff@bar.com and jim@baz.org) to use in rating messages in alt.frogs.small.green. As the newsreader fetches NNTP headers from the NNTP server, it also contacts ratings.bar.com and ratings.baz.org to fetch ratings for each article via their message-id. The newsreader has not yet fetched the actual article(s). It then sorts all of the messages in alt.frogs.small.green according to the composite score it calculated from biff's and jim's ratings. As Joe reads, he rates each message himself. His ratings are submitted to his ratings server (ratings.foo.net) by his netnews client. Later, when Sue goes and reads alt.frogs.small.green, her news client requests the ratings profiles from Joe, Biff and Jim (because she had previously selected them) and sorts and filters based upon the composite score. Now, Sue decides to read alt.controversial.topic. She has never read this group before. Her newsreader suggests the top 50 profiles for the group. She recognizes several notable personalities whose opinions she agrees with, and several she vehemently opposes. She selects a few profiles of people she agrees with, and a few of her associates as well. In this way, she avoids becoming victim to any sort of mass censorship -- she chooses who has the ability to restrict her media input. Later, Joe begins to read alt.molecular.diagrams. He does not select any profiles to sort his messages by because he does not recognize any names in the field. He begins to rate messages himself. After a few days (weeks?) he has rated a statistically significant number of messages, and his news client queries his INMD server (and a few other well-known servers?) for a list of other profiles that rated the same articles as Joe, and rated them similarly. Now, though Joe does not know any of these individuals, he at least knows they share a roughly similar opinion on the content of messages in alt.molecular.diagrams, so he selects a few (more is always better) profiles to govern the filtering and sorting. Similarly, in a web/Internet document situation, the client software should query and retrieve rating profile entries for a document URL prior to retrieving/displaying the document itself. Potentially, the ratings could be retrieved and used for filtering as the browser/client encountered references to URLs in other documents. In this way, the browser/client could advise the user of the rating implications as the user moved the mousepointer over destination URLs to 'test the waters'. Submitting a rating for a page is considered trivially obvious and is not elaborated on here. Notes * Because the system assigns a level of respect for popular 'raters', one could imagine certain parties who perform well becoming de-facto expert raters for certain groups. A cyber-Siskel or electronic Ebert.;) This may be roughly analogous to the position of "Sub-Op" or "Board-Op" or "Moderator" from the BBS and Compuserve era. Who knows what this level of respect can be parleyed into. Power and World Domination? ;) * Concerning cliques and Special Interest Groups: A system such as this would allow organizations to create and maintain their own ratings/ approval profiles. Whether use of these profiles is voluntary or not could be a subject of abuse. In a voluntary sense, companies such as SurfWatch and NetNanny and the like could sell the use of their profiles to those who desire them. Other organizations that might be motivated to maintain and publish their own 'official' profile might include RIAA, MPAA, PMRC, PTA, NEA, NRA, Republicans, Democrats, Libertarians, Fascists, Communists, National Governments and every other special interest group on the planet. If a users want their thinking to be guided/restricted by a group or groups, that shall be their choice. * Spam, being the natural target of this endeavour, will not fare well. Anyone who does read it will derive a natural satisfaction from feeling like they're doing the world a favour (and getting revenge on the poster) by downrating it. The server might want to specially flag messages that get rapidly and substantially downrated by a majority of the readers, and provide this list as an independent 'profile' for those who wish to use it. This could present a non-legislative technological and democratic answer to the spam problem. People only spam because they know it works. By 'works' I mean that it gets read by some fraction of a large number of people, willingly or not. However, if we dramatically reduce that fraction, we reduce the motivation to post ineffective spam, perhaps shuttling it permanently into an evolutionary backwater, like wisdom teeth or the appendix. Out-Darwinned. If a Spam gets posted in the forest and no one is there to read it, does it continue to get posted? * The protocol should be extensible -- if other useful categories of metadata are devised, the same server and protocol should be capable of supporting, storing, indexing and delivering the new metadata as well. The index field is a NNTP message-id or an Internet URL in the above cases, but could conceivably be any unique identifying ID for other unforeseen types of metadata. Related Works Deja News http://www.deja.com Slashdot: http://www.slashdot.org MovieLens: http://www.movielens.umn.edu MovieCritic: http://www.moviecritic.com Net Nanny: http://www.netnanny.com SurfWatch: http://www.surfwatch.com Google http://www.google.com Security Considerations There must be heaps of security considerations. They need to be discussed. As I understand, in order to use a port number less than 1024, under Unix the program needs to run with special privileges. Is this true? Desired? Other than that, it is hoped that the server could run without any special system privileges. Author's Address Chris Hanson xenon@arcticus.com Steve Koren Bob Maple David Kessner Ian Cahoon Eric Schultz Frank Weed Earl Miles Jamie Krutz Michael Ash Dave Warner I have created a mailing-list to discuss this document at NNMP@Arcticus.com. All of the people listed above are currently subscribed. Expires Feb 14th, 2000.