From: -dsr- Date: Wed, 17 Mar 1999 11:09:28 -0500 Message-ID: <19990317110928.18566@tao.ne.mediaone.net> To: usenet-format@clari.net Subject: draft-ietf-usefor-article.x (long) INTERNET DRAFT to be NEWS Expires 19990501 News Article Format draft-ietf-usefor-article-02 USEFOR Working Group Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). It is hoped that this document will obsolete RFC 1036 and will become an Internet standard. This document is a successor to Henry Spencer's "Son of 1036" Draft, and has been referred to as "Grandson of 1036". Distribution of this memo is unlimited. Abstract This Draft defines the format of network news articles, and defines roles and responsibilities for humans and software. Network news articles resemble mail messages but are broadcast to potentially large audiences, using a flooding algorithm that propagates one copy to each interested host (or group thereof), typically stores only one copy per host, and does not require any central administration or systematic registration of interested users. Network news originated as the medium of communication for Usenet, circa 1980. The term "Usenet" refers to the protocols established in RFC 1036 and successors; the software implementing those protocols; the network of hosts exchanging traffic using that software; and also the traffic itself. Cooperating subnets are possible; these are groups of hosts which agree to hold each other and themselves to an internally adopted set of standards concerning protocol details or implementations. When a cooperating subnet does not exchange traffic with general Usenet hosts, then it is no longer a part of Usenet, but a separate entity. Since then Usenet has grown explosively, and most Internet sites participate in it. In addition, the news technology is now in widespread use for other purposes, on the Internet and elsewhere. This document is intended to provide a definitive guide to the article format and interpretations thereof. Backward compatibility is a major goal, but where this document and earlier documents or practices collide, this document should be used. Table of Contents 1. Introduction 1.1 Scope and Objectives "Netnews" is a set of protocols that enables news "articles" (which resemble mail messages) to be broadcast to potentially-large audiences, using a flooding algorithm which propagates copies throughout a network of participating hosts, typically storing only one copy per host and making it available on demand to readers able to access that host. Articles are grouped, for convenience of access, into "newgroups", and the newsgroups themselves are arranged into "hierarchies". An important characteristic of Netnews is the lack of any requirement for a central administration or for the establishment of any controlling host to manage the network. A network which limits participation to some restricted set of hosts (within some company, for example) is a "closed" network; otherwise it is an "open" network. A set of hosts within a network which, by mutual arrangement, operates some variant (whether more or less restrictive) of the Netnews protocols is a "cooperating subnet". "Usenet" is a particular worldwide open network based upon the Netnews protocols. Anybody can join (it is simply necessary to negotiate an exchange of articles with one or more other participating hosts). Usenet "belongs" to those who administer the hosts of which it is comprised. There is no Cabal with overall authority to direct what is to be be allowed. Nevertheless, there do exist agencies within Usenet that have authority to establish policies and to perform administrative functions, but such authority derives solely from the consent of those sites which choose to recognise it (and who can decline to exchange articles with sites which choose not to recognise it). Usually, the authority of such an agency is restricted to a particular hierarchy, or group of hierarchies. A "policy" is a rule intended to facilitate the smooth operation of a network by establishing parameters which restrict behaviour that, whilst technically unexceptionable, would nevertheless contravene some accepted standard of "Good Netkeeping". Since the ultimate beneficiaries of a network are its human readers, who will be less tolerant of poorly designed interfaces than mere computers, articles in breach of established policy can cause considerable annoyance to their recipients. Policies may well vary from network to network, from hierarchy to hierarchy within one network, and even between individual newgroups within one hierarchy. It is assumed, for the purposes of this document, that agencies with the proper authority to establish such policies will exist. However, for the benefit of networks and hierarchies without such agencies, and to provide a basis upon which such agencies can build, this present document often provides default policy parameters, usually introducing them by a phrase such as "As a matter of policy ...". [If we follow this route, then that phrase (or one like it, perhaps using the word "default") can be introduced at various places in the existing text, for example when discussing the lengths of lines in articles, when discussing the lengths of components of newsgroup names, and when discussing Mime Content-Types, and also in connection with the Checkpolicies header, if we decide to have it.] The purpose of this present document is to define the protocols to be used for Netnews in general, and for Usenet in particular, and to set standards to be followed by software that implements those protocols. It is NOT the purpose of this document to define how the authority of various agencies to exercise control or oversight of the various parts of Usenet is established (that is itself a matter of policy). Nevertheless, it is assumed that such authorities will exist, and tools are provided within the protocols for their use. 1.2 Historical Outline Network news originated as the medium of communication for Usenet, circa 1980. Since then Usenet has grown explosively, and many Internet sites participate in it. In addition, the news technology is now in widespread use for other purposes, on the Internet and elsewhere. The earliest news interchange used the so-called "A News" article format. Shortly thereafter, an article format vaguely resembling Internet mail was devised and used briefly. Both of those formats are completely obsolete; they are documented in appendix A for historical reasons only. With publication of [RFC-850] in 1983, news articles came to closely resemble Internet mail messages, with some restrictions and some additional headers. [RFC-1036] in 1987 updated [RFC-850] without making major changes. A Draft popularly referred to as "Son of 1036" [RFC-1036BIS] was written in 1994 by Henry Spencer. That document formed the original basis for this document. Much is taken directly from Son of 1036, and it is hoped that we have followed its spirit and intentions. 1.3 Transport As in this document's predecessors, the exact means used to transmit articles from one host to another is not specified. NNTP [RFC-977] is the most common transmission method on the Internet, but much transmission takes place entirely independent of the Internet. Other methods in use include the UUCP protocol [RFC-976] extensively used in the early days of Usenet, FTP, downloading via satellite, tape archives, and physically delivered magnetic and optical media. 2. Definitions, Notations and Conventions 2.1 Definitions. An "article" is the unit of news, analogous to a [MAIL] "message". A "poster" is the person or software that composes and submits a possibly compliant article to an injecting agent. The poster is analogous to [MAIL]'s author(s). A "posting agent" is software that assists posters to prepare articles, including adding required headers and determining whether the final article is compliant to this standard. If the article is compliant it passes the article on to an injecting agent for final checking and injection into the news stream. If the article is not compliant or rejected by the injecting agent then the posting agent informs the poster with an explanation of the error. An "injecting agent" takes the finished article from the posting agent (often via the NNTP "post" command ) performs some final checks and passes it on to a relaying agent for general distribution. A "relaying agent" is software which receives allegedly compliant articles from injecting agents and/or other relaying agents, and possibly passes copies on to other relaying agents and serving agents. A "serving agent" takes an article from a relaying agent and files it in a "news database" . It also provides an interface for reading agents to access the news database. A "reader" is the person or software reading news articles. A "reading agent" is software which presents articles to a reader. A "newsgroup" is a single news forum, a logical bulletin board, having a name and nominally intended for articles on a specific topic. An article is "posted to" a single newsgroup or several newsgroups. When an article is posted to more than one newsgroup, it is said to be "crossposted"; note that this differs from posting the same text as part of each of several articles, one per newsgroup. A "hierarchy" is the set of all newsgroups whose names share a first component. A newsgroup may be "moderated", in which case submissions are not posted directly, but mailed to a "moderator" for consideration and possible posting. Moderators are typically human but may be implemented partially or entirely in software. A "followup" is an article containing a response to the contents of an earlier article (the followup's "precursor"). A "followup agent" is a combination of reading agent, and posting agent that aids in the preparation and posting of a followup. A "reply agent" is a combination of reading agent and mailer that aids in the preparation and posting of an email response to an article. A "message ID" is a unique identifier for an article, usually supplied by the posting agent which posted it. It distinguishes the article from every other article ever posted anywhere. Articles with the same message ID are treated as identical copies of the same article even if they are not in fact identical. A "gateway" is software which receives news articles and converts them to messages of some other kind (e.g. mail to a mailing list), or vice versa; in essence it is a translating relaying agent that straddles boundaries between different methods of message exchange. The most common type of gateway connects newsgroup(s) to mailing list(s), either unidirectionally or bidirectionally, but there are also gateways between news networks using this document's news format and those using other formats. A "control message" is an article which is marked as containing control information; a relaying or serving agent receiving such an article may (subject to permissions etc.) take actions beyond just filing and passing on the article. An article's "reply address" is the address to which mailed replies should be sent. This is the address specified in the article's From header (see section 5.2), unless it also has a Reply-To header (see section 6.3). 2.2 Textual Notations Throughout this document, [MAIL] is short for "the current RFCs governing electronic mail formats, beginning with the historical [RFC-822] and continuing to its modern successors". "ASCII" is short for "the ANSI X3.4 character set" [ANSI- X3.4]. While "ASCII" is often misused to refer to various character sets somewhat similar to X3.4, in this document "ASCII" means X3.4 and only X3.4. ASCII is a 7 bit character set. Please note that this document requires that all agents be 8 bit clean; that is, they must accept and transmit data without changing or omitting the 8th bit. Certain words used to define the significance of individual requirements are capitalized. "MUST", "SHOULD", "MAY" and the same words followed by "NOT" should be read as having the same meaning as in [RFC-2119]. In particular, to be fully compliant with this document, software must satisfy every relevant "MUST" requirement. Software that satisfies every relevant "SHOULD" requirement but not every "MUST" requirement is partially compliant. [However, we could step back from this by requiring less rigour in observing "SHOULD" in the case of "matters of policy". Or perhaps we could introduce an "OUGHT" category.] This document contains explanatory notes using the following format. These may be skipped by persons interested solely in the content of the specification. The purpose of the notes is to explain why choices were made, to place them in context, or to suggest possible implementation techniques. NOTE: While such explanatory notes may seem superfluous in principle, they often help the less-than-omniscient reader grasp the purpose of the specification and the constraints involved. Given the limitations of natural language for descriptive purposes, this improves the probability that implementors and users will understand the true intent of the specification in cases where the wording is not entirely clear. [Remarks enclosed in square brackets, such as this one, are not part of this document, but are editorial notes to explain matters amongst ourselves, or to point out alternatives, or to indicate work yet to be done.] All numeric values are given in decimal unless otherwise indicated. Octets are assumed to be unsigned values for this purpose. Throughout this document we will give examples of various definitions, headers and other specifications. It MUST be remembered that these samples are for the aid of the reader only and do NOT define any specification themselves. In order to prevent possible conflict with "Real World" entities and people the top level domain of ".example" is used in all sample domains and addresses. The hierarchy of example.* is also used as a sample hierarchy. Information on the ".example" top level domain is in [TEST-TLDS]. 2.3 Relation To Mail and MIME The primary intent of this document is to describe the news article format. Insofar as news articles are a subset of [MAIL]'s message format augmented by some new headers, this document incorporates many (though not all) of the provisions of [MESSFOR], with the aim of enabling news articles to pass through mail systems and vice versa, provided only that they contain the minimum headers required for the mode of transport being used. Unfortunately, the match is not perfect, but it is the intention of this document that gateways between [MAIL] and news should be able to operate with the minimum of tinkering. [This document has been designed to fit on top of the drafts currently in preparation for Mail [MESSFOR]. It is expected that those drafts will have progressed to the RFC stage by the time the present document in complete, at which time all references to [MESSFOR] in the present text will be replaced by references to that RFC.] Likewise, this document incorporates many (though not all) of the provisions of the MIME standards [RFC-2045 et seq] which, though designed with [MAIL] in mind, are mostly applicable to news. 2.4 Syntax Notation This document uses the Augmented Backus Naur Form described in [RFC-2234]. A discussion of this is outside the bounds of this document, but it is expected that implementors will be able to quickly understand it with reference to the defining document. Much of the syntax in this document is incorporated directly from that given in [MESSFOR] or in the Mime specifications [RFC-2045 et seq], but with appropriate modifications to permit the use of full 8bit characters, and to remove those parts of the syntax given in [MESSFOR] that are regarded as "obsolete". Full details of this are explained in section 4.1. [Alternatively, we could move some parts of 4.1 forward to here.] NOTE: News parsers historically have been much less permissive than [MAIL] parsers, and this is reflected in the modifications referred to, and in some further specific rules. NOTE: Following [RFC-2234], literal text included in the syntax is to be regarded as case-insensitive. However, in contradistinction to [MAIL], the NetNews protocols are sensitive to case in some instances (as in newsgroup names, some header parameters, etc.). Care has been taken to indicate this explicitly where required. 2.5 Language Various constant strings in this document, such as header names and month names, are derived from English words. Despite their derivation, these words do NOT change when the poster or reader employing them is interacting in a language other than English. Posting and reading agents MAY translate as appropriate in their interaction with the poster or reader, but the forms that actually appear in articles MUST be the English-derived ones defined in this document. 3. Changes to the existing protocols This document prescribes many changes, clarifications and new features since the protocols described in [RFC-1036] and [RFC-1036BIS]. It is the intention that they can be assimilated into Usenet as it presently operates without major interruption to the service, though some of the new features may not begin to show benefit until they become widely implemented. This sections summarizes the main changes, and comments on some features of the transition. 3.1 Principal Changes o The [MAIL] conventions for parenthesis-enclosed comments in headers are supported. o Whitespace is permitted in Newsgroups headers, permitting folding of such headers. Indeed, all news headers can now be folded. o An enhanced syntax for the Path header enables the injection point of and the route taken by an article to be determined with certainty. o Netnews is firmly established as an 8bit medium. o Large parts of MIME are recognized as an integral part of Netnews. o The charset for headers is always UTF-8. This will, inter alia, permit newsgroup-names with non-ASCII characters. o There is a new Control command 'mvgroup' to facilitate group renaming. o There are several new headers defined, such as Replaces and Author-Ids, leading to increased functionality. o There are numerous other small changes, clarifications and enhancements. [Doubtless many other changes should be listed, but there is little point in doing so until our text is nearing completion. The above gives the flavour of what should be said.] 3.2 Transitional Arrangements An important distinction must be made between serving and relaying agents which are responsible for the distribution and storage of news articles, and user agents which are responsible for interactions with users. It is important that the former should be upgraded to conform to this document as soon as possible to provide the benefit of the enhanced facilities. Fortunately, the number of distinct implementations of such agents is rather small, at least so far as the main "backbone" of Usenet is concerned, and many of the new features are already supported. Contrariwise, there are a great number of implementations of user agents, installed on a vastly greater number of small sites. Therefore, the new functionality has been designed so that existing agents may continue to be used, although the full benefits may not be realised until a substantial proportion of them have been upgraded. In the list which follows, care has been taken to distinguish the implications for both kinds of agent. o [MAIL] style comments in headers do not affect serving and relaying agents (note that the Newsgroups and Path headers do not contain them). They are unlikely to hinder their proper display in existing user agents except in the case of the References header in agents which thread articles. Therefore, it is provided that they SHOULD NOT be generated except where permitted by the previous standards. o Because of its importance to all serving agents, the extension permitting whitespace and folding in Newsgroup headers SHOULD NOT be used unless the user is willing to take the risk of misprocessed articles. It is believed most existing implementations handle correctly, but this is not certain. User agents are unaffected. o The new style of Path header is already consistent with the previous standards. However, the intention is that relaying agents should henceforth reject articles in the old style, and so this should be offered as a configurable option for relaying agents. User agents are unaffected. o The vast majority of serving, relaying and transport agents are believed to be already 8bit clean (in the slightly restricted sense in which that term is used in the MIME standards). User agents that do not implement MIME may be disadvantaged, but no more so than at present when faced with 8bit characters (which currently abound in spite of the previous standards). o The introduction of MIME reflects a practice that is already widespread. Articles in strict compliance with the previous standards (using strict ASCII) will be unaffected. Many user agents already support it, at least to the extent of widely used charsets such as ISO8859-1. Users expecting to read articles using the more exotic charsets will need to acquire suitable reading agents. It is not intended, in general, that any single user agent will be able to display every charset known to IANA, but all such agents MUST support ASCII. Serving and relaying agents are not affected. o The use of the UTF-8 charset for headers will not affect any existing usage, since ASCII is a strict subset of UTF-8. Insofar as newsgroup names containing non-ASCII characters can now be expected to arise, support from serving and relaying agents will be necessary. It is believed that the customary storage structure used by serving agents can already cope (perhaps not ideally) with such names. Note that it is not necessary for serving and relaying agents to understand all the characters available in UTF-8, though it is desirable for them to be displayable for diagnostic purposes via some escape mechanism using, for example, the visible subset of ASCII. For users expecting to use the more exotic charsets available under UTF-8, the remarks already made in connection with MIME will apply. o The new Control: mvgroup command will need to be implemented in serving agents. It SHOULD be used in conjunction with pairs of matching rmgroup and newgroup commands (injected shortly after the mvgroup) until such time as mvgroup is widely implemented. The new Replaces header is also effectively a Control command, and transitional arrangements are provided which should be used in the meantime. User agents are unaffected. o The headers newly introduced by this document can safely be ignored by existing software, albeit with loss of the new functionality. 4. Basic Format 4.1 Overall Syntax Much of the syntax of News Articles is based on the corresponding syntax defined by [MESSFOR], which is deemed to have been incorporated into this standard as required. However, there are some important differences arising from the fact that [MESSFOR] does not recognise anything other than US-ASCII characters, that it does not recognise the MIME headers [RFC2045], and that it includes much syntax described as "obsolete". The following syntactic forms supersede the corresponding rules given in [MESSFOR] and [RFC2045]: text = %d1-9 / ; all octets except %d11-12 / ; US-ASCII NUL, CR and LF %d14-255 ctext = NO-WS-CTL / ; all of except %d33-39 / ; SP, HTAB, "(", ")" %d42-91 / ; and "\" %d93-255 qtext = NO-WS-CTL / ; all of except %d33 / ; SP, HTAB, "\" and <"> %d35-91 / %d93-255 ftext = %d33-57 / ; all octets except %d59-126 / ; CTL, SP and ":" %d128-255 token = 1* tspecials = "(" / ")" / "<" / ">" / "@" "," / ";" / ":" / " "/" / "[" / "]" / "?" / "=" Wherever in this standard the syntax is stated to be taken from [MESSFOR], it is to be understood as the syntax defined by [MESSFOR] after making the above changes, but NOT including any syntax defined in section 4 ("Obsolete syntax") of [MESSFOR]. Software compliant with this standard MUST NOT generate any of the syntactic forms defined in that Obsolete Syntax, although it MAY accept such syntactic forms. Certain syntax from the MIME specifications [RFC2045 et seq] is also considered a part of this Standard (see ...). The following syntactic forms, taken from [RFC2234] or from [MESSFOR], are repeated here for convenience only: ALPHA = %x41-5A / ; A-Z %x61-7A ; a-z CR = %x0D ; carriage return CRLF = CR LF DIGIT = %x30-39 ; 0-9 HTAB = %x09 ; horizontal tab LF = %x0A ; line feed SP = %x20 ; space NO-WS-CTL = %d1-8 / ; US-ASCII control characters %d11 / ; which do not include the %d12 / ; carriage return, line feed, %d14-41 / ; and whitespace characters %d127 WSP = SP / HTAB ; Whitespace characters FWS = ([*WSP CRLF] 1*WSP) ; Folding whitespace comment = "(" *([FWS] (ctext / quoted-pair / comment)) [FWS] ")" CFWS = *([FWS] comment) (([FWS] comment) / FWS ) <"> = %d34 ; quote mark quoted-pair = "\" text quoted-string = *CFWS <"> *(FWS (qtext / quoted-pair)) <"> *CFWS unstructured = *( [FWS] text ) 4.2. Syntax of News Articles The overall syntax of a news article is: article = 1*header separator body header = header-name ":" SP header-content CRLF header-name = 1*name-character *( "-" 1*name-character ) name-character = ALPHA / DIGIT header-content = usenet-header-content / unstructured usenet-header-content = separator = CRLF body = *( *998text CRLF ) nonblank-text = 1*( [FWS] nbtext ) nbtext = qtext / ; all of except "\" / <"> ; SP and HTAB An article consists of some headers followed by a body. An empty line separates the two. The headers contain structured information about the article and its transmission. A header begins with a header-name identifying it, and can be continued onto subsequent lines as described in section 4.3.2. The body is largely unstructured text significant only to the poster and the readers. NOTE: Terminology here follows the current custom in the news community, rather than the [MESSFOR] convention of referring to what is here called a "header" as a "header-field" or "field". Note that the separator line must be truly empty, not just a line containing white space. Further empty lines following it are part of the body, as are empty lines at the end of the article. NOTE: The syntax above defines the canonical form of a news article as a sequence of lines each terminated by CRLF. This does not prevent serving agents or transport agents from storing or handling the article in other formats (e.g. using a single LF in place of CRLF) so long as the overall effects achieved are as defined by this document when operating on the canonical form. 4.3. Headers 4.3.1. Names and Contents Despite the restrictions on header-name syntax imposed by the grammar, relayers and reading agents SHOULD tolerate header names containing any ASCII printable character other than colon (":", ASCII 58). [That brings it into line with as given in [MESSFOR].] Header-names SHOULD be either those defined in this standard, or those defined in [MESSFOR], or those defined in any extension to either of these standards, or other names beginning with "X-". Software SHOULD NOT attempt to interpret headers not described in this standard or in its extensions. Relaying agents MUST pass them on unaltered and reading agents MUST enable them to be displayed, at least optionally. Posters wishing to convey non-standard information in headers SHOULD use header-names beginning with "X-". No standard header name will ever be of this form. Reading agents SHOULD ignore "X-" headers, or at least treat them with great care. The order of headers in an article is not significant. However, posting agents are encouraged to put mandatory headers (see section 5) first, followed by optional headers (see section 6), followed by "X-" headers and headers not defined in this standard or its extensions. Relaying agents MUST NOT change the order of the headers in an article. Header-names are case-insensitive. There is a preferred case convention, which posters and posting agents SHOULD use: each hyphen-separated "word" has its initial letter (if any) in uppercase and the rest in lowercase, except that some abbreviations have all letters uppercase (e.g. "Message-ID" and "MIME-Version"). The forms used in this standard are the preferred forms for the headers described herein. Relaying and reading agents MUST, however, tolerate articles not obeying this convention. [I thought we were doing away with header classes, except to discuss eXperimental. Consensus, please?] 4.3.2 Header Classes There are four special classes of headers that may be present in an article: Experimental, Persistent, Comment, and Variant. All other headers are ephemeral. These classes are significant in how newsreaders and servers should treat them when encountered. 4.3.3 Experimental Headers Experimental headers are headers which begin with "X-". They are to be used by newsreaders proposing new headers for some utility or for comments to be propogated with the article. There are no established headers that are considered experimental headers; an established header cannot be experimental. Attempts to create new headers that are to be adopted as standard headers MUST begin their lives as experimental headers. 4.3.4 Persistent Headers Persistent headers are headers which begin with "P-" (or "X-P-", hereafter referred to simply as "P- headers") which persist across followups either identically or by simple modification. Headers with this behavior include: Newsgroups Content is carried over into all followups. Modified by content of Followup-To header. Subject Content is carried over into all followups. Modified by prefixing with "Re: " if not already present. Also modified by user, often with a "(was: )" phrase preserving the previous content. References Content is carried over into all followups. Modified by appending content of Message-ID header. NOTE: Though traditionally old newsreaders would treat Keywords as a persistent header, it is not a persistent header. More modern newsreaders do not treat it as such. 4.3.5. Variant Headers Variant Headers are headers that are modified on articles when they are propogated. Variant headers have a "V-" prefix. Variant headers may be experimental ("X-V-"), persistent ("P-V-"), or both ("X-P-V-"). 4.3.6. Header Classes There are four special classes of headers that may be present in an article: Experimental, Persistent, Comment, and Variant. All other headers are ephemeral. These classes are significant in how newsreaders and servers should treat them when encountered. 4.3.6.1 Experimental Headers Experimental headers are headers which begin with "X-". They are to be used by newsreaders proposing new headers for some utility or for comments to be propogated with the article. There are no established headers that are considered experimental headers; an established header cannot be experimental. Attempts to create new headers that are to be adopted as standard headers MUST begin their lives as experimental headers. 4.3.6.2 Persistent Headers Persistent headers are headers which begin with "P-" (or "X-P-", hereafter referred to simply as "P- headers") which persist across followups either identically or by simple modification. Headers with this behavior include: Newsgroups Content is carried over into all followups. Modified by content of Followup-To header. Subject Content is carried over into all followups. Modified by prefixing with "Re: " if not already present. Also modified by user, often with a "(was: )" phrase preserving the previous content. References Content is carried over into all followups. Modified by appending content of Message-ID header. NOTE: Though traditionally old newsreaders would treat Keywords as a persistent header, it is not a persistent header. More modern newsreaders do not treat it as such. 4.3.6.3 Examples Newsgroups: alt.test Subject: Persistent Header Example Message-ID: <001@news.site.example> P-Author-IDs: User-Agent: experimental/0.1g (P-Author-ID Compliant) From: jane@site.invalid (Jane Smith) Newsgroups: alt.test Followup-To: misc.test Subject: Re: Persistent Header Example Message-ID: <002@news.site.example> References: <001@news.site.example> P-Author-IDs: User-Agent: modern/1.2 (Author-ID non-Compliant; P- header compliant) Keywords: persistance, good ideas From: andrew@isp.invalid Newsgroups: misc.test Subject: Further example (was: Re: Persistent Header Example) Message-ID: <001@news.isp.example> References: <001@news.site.example> <002@news.site.example> P-Author-IDs: User-Agent: codeveloper/2.0b (Author-ID Compliant) 4.3.6.4 Comment Headers Comment headers are headers that are strictly local and MUST NOT be propogated outside of a restricted subnet for local testing purposes. Comment headers have a prefix of "C-". Due to their limited scope, they MUST NOT be combined with any other prefix, such as "X-C-" headers. Headers with this behavior include: Xref Used by servers to keep track of crossposted articles' article numbers in the crossposted-to news groups in the local news spool as an aid to newsreaders marking such articles as read. 4.3.6.5. Variant Headers Variant Headers are headers that are modified on articles when they are propogated. Variant headers have a "V-" prefix. Variant headers may be experimental ("X-V-"), persistent ("P-V-"), or both ("X-P-V-"). 4.3.7. White Space and Continuations [The following text is taken from [MESSFOR], adapted to the different terminology used for this standard.] Each header is logically a single line of characters comprising the header-name, the colon with its following SP, and the header-content. For convenience, however, the header-content can be split into a multiple line representation; this is called "folding". The general rule is that wherever this standard allows for FWS (which includes CFWS, but not simply SP or HTAB) a CRLF followed by AT LEAST one SP or HTAB may instead be inserted. For example, the header: Approved: modname@modsite.com(Acting Moderator of comp.foo.bar) can be represented as: Approved: modname@modsite.com (Acting Moderator of comp.foo.bar) NOTE: Though header-contents are defined in such a way that folding can take place between many of the lexical tokens, folding SHOULD be limited to placing the CRLF at higher-level syntactic breaks. For instance, if a header-content is defined as comma-separated values, it is recommended that folding occur after the comma separating the structured items, even if it is allowed elsewhere. Folding MUST NOT be carried out in such a way that any line of a header is made up entirely of WSP characters and nothing else. [That is taken from a rather unsatisfactory line in section 3.2.4 of [MESSFOR] (which seems to allow WSP-only lines to arise from FWS but not from CFWS). The situation could arise where two FWS or CFWS could be adjacent, according to the syntax (I believe this is possible in [MESSFOR], which goes to show how sloppy their syntax is), or where FWS or CFWS is allowed at the end of a line.] The colon following the header name on the start-line MUST be followed by white space, even if the header is empty. If the header is not empty, at least some of the content MUST appear on the start-line. Posting agents MUST enforce these restrictions, but relaying agents SHOULD accept even articles that violate them. Posters and posting agents SHOULD use SP, not HTAB, where white space is desired in headers (some existing software expects this), and MUST use SP immediately following the colon after a header-name (this was an RFC 1036 requirement). Relaying agents SHOULD accept HTAB in all such cases, however. Since the white space beginning a continuation line remains a part of the logical line, headers can be "broken" into multiple lines only at FWS or CFWS. Posting agents SHOULD not break headers unnecessarily (but see section 4.6). 4.3.8 Comments Strings of characters which are treated as comments may be included in header contents wherever the syntactic element CFWS occurs. They consist of characters enclosed in parentheses. Such strings are considered comments so long as they do not appear within a quoted-string. Comments may be nested. A comment is normally used to provide some human readable informational text, except at the end of an
which contains no , as in fred@foo.bar.com (Fred Bloggs) as opposed to "Fred Bloggs" The former is a deprecated, but commonly encountered, usage and reading agents SHOULD take special note of such comments as indicating the name of the person whose
it is. In all other situations a comment is semantically interpreted as a single SP. Since a comment is allowed to contain FWS, folding is permitted within it as well as immediately preceding and immediately following it. Also note that, since quoted-pair is allowed in a comment, the parenthesis and backslash characters may appear in a comment so long as they appear as a quoted-pair. Semantically, the enclosing parentheses are not part of the comment token; the token is what is contained between the two parentheses. Since comments have not hitherto been permitted in news articles, except in a few specified places, posters and posting-agents SHOULD NOT insert them except in those places. However, compliant software MUST accept them in all places where they are syntactically allowed. 4.3.9. Undesirable Headers A header whose content is empty is said to be an empty header. Relaying and reading agents SHOULD NOT consider presence or absence of an empty header to alter the semantics of an article (although syntactic rules, such as requirements that certain header names appear at most once in an article, MUST still be satisfied). Posting and injecting agents SHOULD delete empty headers from articles before posting them; relaying agents MUST pass them untouched. Headers that merely state defaults explicitly (e.g., a Followup-To header with the same content as the Newsgroups header, or a MIME Content-Type header with contents "text/plain; charset=us-ascii") or state information that reading agents can typically determine easily themselves (e.g. the length of the body in octets) are redundant and posters and posting agents SHOULD NOT include them. 4.4. Body 4.4.1. Body Format Issues The body of an article MAY be empty, although posting agents SHOULD consider this an error condition (meriting returning the article to the poster for revision). A posting or injecting agent which does not reject such an article SHOULD issue a warning message to the poster and supply a non-empty body. Note that the separator line MUST be present even if the body is empty. NOTE: Some existing news software is known to react badly to body-less articles, hence the request for posting and injecting agents to insert a body in such cases. The sentence "This article was probably generated by a buggy news reader" has traditionally been used is this situation. Note that an article body is a sequence of lines terminated by CRLFs, not arbitrary binary data, and in particular it MUST end with a CRLF. However, relaying agents SHOULD treat the body of an article as an uninterpreted sequence of octets (except as mandated by changes of CRLF representation and by control-message processing) and SHOULD avoid imposing constraints on it. See also section 4.6. 4.4.2. Body Conventions A body is by default an uninterpreted sequence of octets for most of the purposes of this standard. However, a MIME Content-Type header may impose some structure or intended interpretation upon it, and may also specify the character set in accordance with which the octets are to be interpreted. NOTE: The syntax does not permit the NUL octet to appear in a body, and the octets CR and LF MUST ONLY occur together as CRLF. See also section 4.6 for limits on the length of a line. It is a common practice for followup agents to enable the incorporation of the followed-up article (the "precursor") as a quotation. This SHOULD be done by prefacing each line of the quoted text (even if it is empty) with the character ">" (or preferably with "> "). This will result in multiple levels of ">" when quoted content itself contains quoted content. The followup agent SHOULD also precede the quoted content by an "attribution line" incorporating at least the name of the precursor's poster. The following convention for attribution lines, whilst not mandated by this Standard, is intended to facilitate their automatic recognition and processing by sophisticated reading agents. The following fields describing the precursor SHOULD, if present, be in the given order. A single Newsgroup name (the one from which the followup is being made) enclosed within <...> or The precursor's Message-ID enclosed within <...> or The precursor's poster's Name enclosed within "..." The precursor's poster's Email address enclosed within <...> or The fields may be separated by arbitrary text, they may be folded in the same way as headers, and they should be terminated by a ":" followed by two CRLFs. Example: On in <12345678@foo.com> on 24 Dec 1997 16:40:20 +0000 "Joe D. Bloggs" wrote: NOTE: The use of the standard character ">" facilitates automatic analysis of articles. The inclusion of the Message-ID in the attribution would enable reading agents to retrieve the precursor by clicking on it. However, readers are warned not to assume that attributions are accurate, especially within multiply nested quotations. NOTE: Posters SHOULD edit quoted context to trim it down to the minimum necessary. However, followup agents SHOULD NOT attempt to enforce this beyond issuing a warning (past attempts to do so have been found to be notably counter-productive). A "personal signature" is a short closing text automatically added to the end of articles by posting agents, identifying the poster and giving his network addresses, etc. If a poster or posting agent does append such a signature to an article, it MUST be preceded with a delimiter line containing (only) two hyphens (ASCII 45) followed by one SP (ASCII 32). The signature is considered to extend from the last occurrence of that delimiter up to the end of the article (or up to the end of the part in the case of a multipart MIME body). Followup agents, when incorporating quoted text from a precursor, SHOULD NOT include the signature in the quotation. Posting agents SHOULD discourage (at least with a warning) signatures of excessive length (4 lines is a commonly accepted limit). 4.5. Characters And Character Sets Transmission paths for news articles MUST treat news articles as uninterpreted sequences of octets, excluding the values 0 (ASCII NUL) and 13 and 10 (ASCII CR and LF, which MUST only appear in the combination which denotes a line separator). NOTE: this correspponds to the range of octets permitted for MIME "8bit data" [RFC-2045]. An octet, or a sequence of octets, may represent a character in some Coded Character Set (CCS) [RFC-2130] as determined by some Character Encoding Scheme (CES) [RFC-2130]. If it comes to a relaying agent's attention that it is being asked to pass an article using the Content-Transfer-Encoding "8bit" to a relaying agent that does not support it, it SHOULD report this error to its administrator. It MUST refuse to pass the article and MUST NOT re-encode it with different MIME encodings. NOTE: This strategy will do little harm. The target relaying agent is unlikely to be able to make use of the article on its own servers, and the usual flooding algorithm will likely find some alternative route to get the article to destinations where it is needed. 4.5.1. Character Sets within Article Headers Within article headers, the CES is UTF-8 [ISO-10646 or RFC-2279] and hence the CCS is the Universal Multiple-Octet Coded Character Set (UCS) [ISO-10646] (which is essentially a superset of Unicode [UNICODE] and expected to remain so). However, interpreting the octets directly as ASCII characters should ensure correct behaviour in most situations. NOTE: UTF-8 is an encoding for 16bit (and even 32bit) character sets with the property that any octet less than 128 immediately represents the corresponding ASCII character, thus ensuring upwards compatibility with previous practice. Non-ASCII characters from UCS are represented by sequences of octets greater than 127. Only those octet sequences explicitly permitted by [RFC 2079] shall be used. UCS includes all characters from the ISO-8859 series of characters sets [ISO-8859] (which includes all Greek and Arabic characters) as well as the more elaborate characters used in Japan and China. See the following section for the appropriate treatment of UCS characters by reading agents. Notwithstanding the great flexibility permitted by UTF-8, there is need for restraint in its use in order that the essential components of headers may be discerned using reading agents that cannot present the full UCS range. In particular, header-names MUST be in ASCII, and certain other components of headers, as defined elsewhere in this standard - notably s (as in s), s, s s and s - MUST be in ASCII. s, s (as in
es) and s (as in s) MAY use other character sets. For s see below. Where the use of non-ASCII characters, encoded in UTF-8, is permitted as above, they MAY also be encoded using the MIME mechanism defined in RFC-2047 [RFC-2047], but this usage is deprecated within news articles (even though it is required in mail messages) since it is less legible in older reading agents which support neither it nor UTF-8. Nevertheless, reading agents SHOULD support this usage, but only in those contexts explicitly mentioned in [RFC-2047]. 4.5.2 Character Sets within Article Bodies Within article bodies, the CES and CCS implied by any Content-Transfer-Encoding and Content-Type headers [RFC-2045] SHOULD be applied by reading agents. In the absence of such headers, reading agents cannot be relied upon to display correctly more than the ASCII characters. [Observe that reading agents are not forbidden to "guess", or to interpret as UTF-8 regardless, which would be the simplest course for them to take.] NOTE: It is not expected that reading agents will necessarily be able to present characters in all possible character sets, although they MUST be able to present all ASCII characters. For example, a reading agent might be able to present only the ISO-8859-1 (Latin 1) characters [ISO-8859], in which case it SHOULD present undisplayable characters using some distinctive glyph, or by exhibiting a suitable warning. Older reading agents that do not understand MIME headers or UTF-8 should be able to display bodies in ASCII (with some loss of human comprehensibility) except possibly when the Content-Transfer-Encoding is "8bit". NOTE: Be warned that it will never be safe to send raw binary data in the body of news articles, because the presence of ASCII NUL and changes of representation will inevitably corrupt it. Such data MUST be encoded (e.g. by using Content-Transfer-Encoding: base64). Posters SHOULD avoid using control characters in ASCII (or other CCSs) except for tab (ASCII 9), formfeed (ASCII 12), and backspace (ASCII 8). Tab signifies sufficient horizontal white space to reach the next of a set of fixed positions; posters are warned that there is no standard set of positions, so tabs should be avoided if precise spacing is essential. Formfeed signifies a point at which a reading agent SHOULD pause and await reader interaction before displaying further text. Backspace SHOULD be used only for underlining, done by a sequence of underscores (ASCII 95) followed by an equal number of backspaces, signifying that the same number of text characters following are to be underlined. Posters are warned that underlining is not available on all output devices and is best not relied on for essential meaning. Reading agents SHOULD recognize underlining and translate it to the appropriate commands for devices that support it. Reading agents MUST NOT pass other control characters or escape sequences unaltered to the output device. Followup agents MUST be careful to apply appropriate encodings to the outbound followup. A followup to an article containing non-ASCII material is very likely to contain non-ASCII material itself. 4.6. Size Limits The syntax provides for the lines of a body to be up to 998 octets in length, not including the CRLF. All software compliant with this standard MUST support lines of at least that length, both in headers and in bodies, and all such software SHOULD support lines of arbitrary length. In particular, relaying agents MUST transmit lines of arbitrary length without truncation or any other modification. NOTE: The limit of 998 octets is consistent with the corresponding limit in [MESSFOR]. In plain-text messages (those with no MIME headers, or those with a MIME Content-Type of text/plain) posting agents SHOULD endeavour to keep the length of body lines within some reasonable limit. The size of this limit is a matter of policy, the default being to keep within 79 characters at most, and preferably within 72 characters (to allow room for quoting in followups). However, posting agents MUST permit the poster to include longer lines if he so insists. NOTE: Plain-text messages are intended to be displayed "as-is" without any special action (such as automatic line splitting) on the part of the recipient. The policy limit (e.g. 72 or 79) should be expressed as a number of characters (as they will be displayed by a reading agent) rather than as the number of octets used to encode them. Posting agents SHOULD fold headers by inserting CRLF followed by 1*WSP at positions (preferably higher-level ones - see 4.3.2) where this is syntactically allowed so as to keep, so far as is possible, all header lines within 79 characters. Likewise, injecting agents SHOULD fold any headers generated automatically by themselves. Relaying agents MUST NOT fold header lines (i.e. they must pass on the folding as received). NOTE: There is NO restriction on the number of lines into which a header may be split, and hence there is NO restriction on the total length of a header (in particular it may, by suitable folding, be made to exceed the 998 octets restriction pertaining to a single header line). NOTE: This standard provides no upper bound on the overall size of a single article, but neither does it forbid relaying agents from dropping articles of excessive length. It is, however, suggested that any limits thought appropriate by particular agents would be more appropriately expressed in megabytes than in kilobytes. 4.7. Example Here is a sample article: Path: server.example,unknown.site2.example@site2.example, relay.site.example,site.example,injector.site.example%jsmith Newsgroups: example.announce,example.chat Message-ID: <9urrt98y53@site.example> From: Ann Example Subject: Announcing a new sample article. Date: Fri, 27 Mar 1998 12:12:50 +1300 Approved: example.announce moderator Followup-To: example.chat Reply-To: Ann Example Expires: Wed, 22 Apr 1998 12:12:50 -0700 Organization: Site1, The Number one site for examples. User-Agent: ExampleNews/3.14 (Unix) Keywords: example, announcement, standards, RFC 1036, Usefor Summary: The URL for the next standard. Just a quick announcemnt that a new standard example article has been released; it is in the new USEFOR draft obtainable from ftp.ietf.org. Ann. -- Ann Example Sample Poster to the Stars "The opinions in this article are bloody good ones" - from J Clarke. 5. Mandatory Headers An article MUST have one, and only one, of each of the following headers: Date, From, Message-ID, Subject, Newsgroups, Path. NOTE: [MAIL] specifies (if read most carefully) that there must be exactly one Date header and exactly one From header, but otherwise does not restrict multiple appearances of headers. (Notably, it permits multiple Message-ID headers!) This appears singularly useless, or even harmful, in the context of news, and much current news software will not tolerate multiple appearances of mandatory headers. Note also that there are situations, discussed in the relevant parts of section 6, where References, Sender, or Approved headers are mandatory. In control articles, specific values are required for certain headers. In the discussions of the individual headers, the content of each is specified using the syntax notation. The convention used is that the content of, for example, the Subject header is defined as . NOTE: see also Section 7.1.1 5.1. Date The Date header contains the date and time that the article was submitted for transmission. The content syntax is defined in the Message Format Standard [MESSFOR]. Date-content = date-time 5.2. From The From header contains the electronic address(es), and possibly the full name, of the article's author(s) . The format of the From header is defined in the Message Format Standard [MESSFOR]. All mailboxes in the From-content field MUST either belong to the posters(s) of the article ( or the poster(s) are authorized by the owners to use the mailboxes) or end in the top level domain of ".invalid". From-content = mailbox-list 5.2.1 Examples: From: John Smith From: John Smith , dave@isp.example From: John Smith , andrew@isp.example, fred@site2.example From: Jan Jones From: Jan Jones From: dave@isp.example (Dave Smith) NOTE: the last example is in an obsolete syntax. 5.3. Message-ID The Message-ID header contains the article's message ID, a unique identifier distinguishing the article from every other article. The format of the Message-ID header is defined in the Message Format Standard [MESSFOR] . An article's message ID MUST be unique and MUST NEVER be reused. Message-ID-content = msg-id 5.4. Subject The Subject field contains a short string identifying the topic of the message. When used in a followup, the field body SHOULD start with the string "Re: " ( a "back reference" ) followed by the contents of the pure-subject of the precursor. subject-content = [ back-reference ] pure-subject CRLF pure-subject = nonblank-text back-reference = %x52.65.3A.20 ; which is a case-sensitive "Re: " The pure-subject MUST NOT begin with "Re: ". The default subject-content of a followup is the string "Re: " followed by the contents of the pure-subject of the precursor. Any leading "Re: " in the pure-subject MUST be stripped. Followup agents MAY remove instances of non-standard back-reference (such as "Re(2): ", "Re:", "RE: ", or "Sv: ") from the subject-content when composing the subject of a followup and add a correct back-reference in front of the result. NOTE: that would be "SHOULD remove instances" except that we cannot find a sufficiently robust and simple algorithm to do the necessary natural language processing. Followup agents MUST NOT use any other string except "Re: " as a back reference. Specifically, a translation of "Re: " into a local language or usage MUST NOT be used. Agents SHOULD NOT depend on nor enforce the use of back references by followup agents. For compatibility with legacy news software the subject-content of a control message MAY start with the string "cmsg ", non-control messages MUST NOT start with the string "cmsg ". 5.4.1 Examples: In the following examples, please note that only "Re: " is mandated by this DRAFT. "was: " is a convention used by many English-speaking posters to signal a change in subject matter. Software should be able to deduce this information from References. Subject: Film at 11. Subject: Re: Film at 11 Subject: Use of Godwin's law considered harmful (was: Film at 11) Subject: Godwin's law (was: Film at 11) Subject: Re: Godwin's law (was: Film at 11) 5.5. Newsgroups The Newsgroups header's content specifies which newsgroup(s) the article is posted to: Newsgroups-content = newsgroup-name *( ng-delim *FWS newsgroup-name ) *FWS newsgroup-name = component *( "." component ) component = component-start *( component-start / component-other ) component-start = Un-lowercase / Un-digit Un-lowercase = / Un-uppercase = / Un-digit = / component-other = "+" / "-" / "_" ng-delim = "," where the items are as described in [UNICODE]. An article's Newsgroups header may not contain a duplicated newsgroup-name component. The inclusion of folding white space within a newsgroup-name is a newly introduced feature in this standard. It MUST be accepted by all conforming implementations (relaying agents, serving agents and reading agents). Posting agents should be aware that except for experimental posting to 'test' newsgroups or within cooperating subnets, such postings may be rejected by overly-critical old-style relaying agents. When a sufficient number of relay agents are in conformance, posting agents SHOULD generate such whitespace in the form of so as to keep the length of lines in the relevant headers (notably Newsgroups and Followup-To) to no more than than 79 characters (or other agreed policy limit - see 4.6). Before such critical mass occurs, injecting agents MAY reformat such headers by removing whitespace inserted by the posting agent, but relaying agents MUST NOT do so. A newsgroup name consists of one or more components. Components MAY contain non-ASCII letters, but these MUST be encoded in UTF-8 and not according to RFC-2047. A component MUST contain at least one letter (and must, according to the syntax, begin and end with a letter or digit). Components SHOULD begin with a letter. Composite characters (made by overlaying one character with another) and format characters, as allowed in certain parts of Unicode and needed by certain languages, must use whatever canonical conventions apply to those parts of Unicode (such conventions are not defined in this Standard). The use of "_" in a component is deprecated. Serving agents MAY refuse to accept newsgroups using that component. NOTE: Components composed entirely of digits would cause problems for the commonly used implementation technique of using the component as the name of a directory, whilst also using sequential numbers to distinguish the articles within a group. NOTE: Uppercase letters MUST NOT be used. Although converting ASCII uppercase letters to their lowercase counterparts is straightforward enough, it would be unreasonable to expect software to do the same in parts of Unicode for which it was not configured (in general, a table lookup would be required). Thus software MAY attempt to convert Un-uppercase letters according to the mappings defined by [UNICODE], but this behaviour is not required. Whilst there is no longer any technical reason to limit the length of a component (formerly, it was limited to 14 characters) nor to limit the total length of a newsgroup-name, it should be noted that these names are also used in the newsgroups line (6.6.1.2) where an overall limit applies, and moreover excessively long names can be exceedingly inconvenient in practical use. Agencies responsible for individual hierarchies SHOULD therefore, as a matter of policy, set reasonable limits for the length of a component and of a newsgroup name. In the absence of such explicit policies, the default figures are 30 characters and 72 characters respectively. NOTE: The newsgroup-name as encoded in UTF-8 should be regarded as the canonical form. Reading agents may convert it to whatever character set they are able to display (see 4.5.2) and serving agents may possibly need to convert it to some form more suitable as a filename. Simple algorithms for both kinds of conversion are readily available. Posters SHOULD use only the names of existing newsgroups in the Newsgroups header, because newsgroups are not created simply by being posted to. However, it is legitimate to cross-post to newsgroup(s) which do not exist on the posting agent's host, provided that at least one of the newsgroups DOES exist there, and followup agents MUST accept this (posting agents MAY accept it, but SHOULD at least alert the poster to the situation and request confirmation). Relaying agents MUST NOT rewrite Newsgroups headers in any way, even if some or all of the newsgroups do not exist on the relaying agent's host. 5.5.1 Forbidden newsgroup names The following newsgroup-names MUST NOT be used: Newsgroup-names having only one component (reserved for newsgroups whose propagation is restricted to a single host, or local network, and for pseudo-newsgroups such as "poster" (because it has special meaning in the Followup-To header (see section 6.1)), "newsgroups" (likewise), "junk" (frequently used for pseudo-newsgroups internal to serving agents) and "control" (likewise). Any newsgroup-name beginning with "control." (Used as a pseudo-newsgroup by many serving agents.) Any newsgroup-name containing the component "ctl" (likewise) "to" or any newsgroup-name beginning with "to." (reserved for test messages sent on an essentially point-to-point basis (see also the ihave/sendme protocol described in section 7.2) Any newsgroup-name containing the component "all" (because this is used as a wildcard in some implementations) A newsgroup MUST NOT appear more than once in the Newsgroups header. The order of newsgroup names in the Newsgroups header is not significant. 5.6 Path The Path header shows the route a message took from its entry into the USENET system to the current system. It is a list of site identifiers with the origin on the right. Each relaying, injecting or serving agent that processes the article adds one or more entries to this header. Aside from tracing the route articles take in moving over the network, Path is used primarily to allow relaying systems to not send articles to sites known to already have them, in particular the site they came from. This improves the efficiency of links. Path is also used for USENET statistics gathering and flow tracking. Finally the presence of a "%" delimiter in the Path header can be used to identify an article injected in conformance with this standard. 5.6.1 Format path-content = old-path / new-path old-id = 1*( ALPHA / digit / "-" | "." | "_") old-path = old-id *(punctuation old-id) punctuation = LWSP / %x21-2f / %x3a-40 / %x5b-60 / %x7b-7f ; These are ! " # $ % & ' ( ) * ; + , - . / : ; < = > ? @ [ \ ; ] ^ _ ` { | } ~ DEL new-delims = [FWS] ("@" / "/" / "," ) [FWS] new-path = post-injection "%" pre-injection delim-plus-id = [FWS] "!" [FWS] old-id / new-delims site-id post-injection = *(site-id 1*new-delims) site-id pre-injection = site-id *delim-plus-id site-id = ALPHA word ; UUCP name / ALPHA ; for "x" tail entry / "." word ; other registered name / ; as per RFC 1034 / ; numeric IP address rep ; specified in rfc820 etc. / "[" dotted-quad "]" / "[" "]" ; per RFC1884 word = 1*(ALPHA / digit / "-" / "_") 5.6.2 Adding an entry to the Path header. When a system receives a message from another system, it MUST add its own unique name (path-identity or site-id) and a delimiter to the beginning of the Path string. In addition, if needed, folding-whitespace MAY be added. The path-identity added MUST be unique. To this end it should be one of: 1. A name registered previously in the UUCP maps database (found in the newsgroup comp.mail.maps), containing no dot character. 3. The fully qualified domain name or MX record, retrievable via the Internet DNS service. 4. An encoding of an IP address -- dotted quad or for IPv6 as per RFC1884. These encodings using SHOULD NOT be used prior to draft-implementation-date. Whichever form is chosen, a site SHOULD use a form which can be verified using one of the schemes described below by all sites to which it will forward news articles. If all forwarding is by NNTP or other internet based protocols, then the FQDN or IP address encodings are advised. For the purposes of comparison, FQDN entries should be put in an all-lower-case canonical form. Because RFC1036 specified any punctuation or whitespace could act as delimiter, programs SHOULD accept this, with the exception that IPv6 addresses containing colons MUST be treated as a single unit. Modern programs MUST generate only the set "!,%@" plus optional additional whitespace. When a site receives an article from another site, it SHOULD (and eventually MUST) verify the identity of the source site. When processing an article from a source, the leftmost entry of the Path line should be extracted, converted to a canonical form, and tested to see if it matches the canonical form of the verified identity of the source. If it does, a "," should be used as the delimiter, and thus the comma, and then the receiving site's path-identity MUST be prepended to the Path line. The method of verification is up to the site. Any method of suitable authenticity may be chosen, with the consideration that in the event of problems at the source site, the relaying site may be called upon to reliably identify it. If the leftmost entry does not match the verified identity of the source, then the receiving site should prepend an "@" delimiter, then a simple form of the verified identity of the source, then a "," delimiter and then the receiving site's own path-identity. This adding of two identities to the line MUST NOT be done if the provided and verified identities match. For articles received from an internet source, the unique IPv4 (or IPv6) address or properly verified FQDN, whichever is shorter, is encouraged for the generated ID. 5.6.3 The tail Entry For historical reasons, the rightmost entry in the Path string generated by most systems is not a site name, but a "user name". However, the Path string is not an E-mail address and MUST NOT be used to contact the user. Injecting agents MAY place any string here that is not a path-identity. If no meaning is anticipated the string "x" SHOULD be used. RFC1036 suggested that the last entry could be a site name, requiring software to check it when feeding, but said it also should have a user-id for very old systems. As of this specification, a systems MUST NOT treat the tail entry as a path-identity. Typically this field will be the only entry on the Path string generated by a poster, or if not generated by the posting-agent, by the injecting agent, which will prepend a "%" and then its own verifiable path-identity. The percent divides the verified part of the Path line from any entries provided prior to injection into the news network. There may be more than one entry to the left of the percent, and all but the last are to be treated as sites. Injecting Agents SHOULD use the tail entry for local authentication information on the source of an article. For example, if they wish to store an encoding of the IP address of a source machine connecting to do the injection, and/or the UID of an invoking user or any other such information, they may encode it in the tail entry, provided they do so in a manner that will not match any site identifier. (e.g. ending with a dot) . 5.6.4 The Injecting Agent Entry The injecting agent's path identity is a special case. This identity MUST be a FQDN which can be used as a domain for E-mail connections (ie. it should have either an A or MX record). See the Duties of an Injection Agents section 7.1 and RFC 2142. 5.6.5 Delimiter Summary A summary of delimiters and the meaning they imply for the name on the right, or in addition, the name to the left. , Verified or generated identity. @ Name failed verification test. Name on left is identity generated by site further to the left. % Optional pre-injection entries followed by tail entry. Commonly just the tail entry, either "x" or an encoding of login identity. Name on left is FQDN of site that handles mail for Injecting Agent. The presence of two "%" in a path indicates a double-injected error. ! Entry is unverified. Identity on left is an old-style system not conformant with this specification. Folding Whitespace MUST NOT be used as the sole delimiter. Other Treat as "!" as per RFC1036 "/" Reserved for future use, treat as "," ; Semicolon is reserved for the generation of extensible headers. : The colon is a valid delimiter for legacy systems, however, inside an IPv6 numeric address, surrounded in square brackets, it is a part of the path-identifier. _ This should not be treated as punctuation (a delimiter), contrary to RFC1036. Treat as part of identifiers. 5.6.6 Other formatting Issues The Path header MUST NOT be truncated. Whitespace MAY be present in the Path to make it easier to represent. However, there is no requirement to do so. Whitespace MUST not be used as a delimiter. 5.6.6.1 Use of "!" Old USENET relaying and injecting programs almost all delimit Path: entries with the "!" delimiter, and these entries are not verified. As such, the presence of "%" as a delimiter will indicate the article was injected by software conforming to this standard, and the presence of "!" as a delimiter will indicate the message passed through systems developed prior to this standard. Prior to the draft-implementation-date, messages with mixed sets of delimiters will be common. After that date, all messages SHOULD NOT have "!" delimiters prior to the "%" delimiter. 5.6.7 Suggested Verification Methods Sites attempting to verify an incoming entry SHOULD take the following approaches for common transports. They are not required, but not following them may lead to wasteful double-entry Path additions. If the incoming article arrives through some protocol local to the site, such as UUCP, that protocol MUST include a means of verifying the article source site, and this should match. In UUCP implementations, commonly each incoming connection has a unique login name and password; that login name could be used to build a suitable verified identifier. Here is an example of a suitable verification method for an article arriving via a TCP/IP protocol such as via NNTP: 1. If it is an encoding of an IP address, it should be decoded into a canonical form. If that address does not match the source's IP, a reverse-DNS (in-addr.arpa PTR record) lookup should be done on the provided address, followed by a regular DNS "A" record lookup on the returned name. That A record may contain several IP addresses. So long as one matches the IP address from the path, and another matches the source IP address, this is considered a match. 2. If it is a internet DNS style FQDN, then the name should be looked up with DNS. The A records MUST contain an IP address that is the verified address of the source. 3. (It should be noted that when generating a name after a non-match, if an FQDN is desired, simply doing a reverse DNS (PTR) lookup on the IP address is not sufficient to generate the FQDN. The returned name must be mapped back to A records to assure it matches the source's IP address.) 5.6.8 Issues There is no firm way to tell a path entry generated by new software, and one generated by old software assuming that any delimiter is valid. However, use of "!" by old software has become effectively universal. Sites are not strictly required to use a standard form for their path entry, but if they don't, path lines out of that site get longer due to the adding of the identity. However, groups of associated sites wanting a common identity may decide to use that and let the receiver add the specific site. 6. Optional Headers The headers appearing in this section have established meanings. They MUST be interpreted according to the definitions made in this document. None of them are required to appear in every article. All of the headers appearing in this document MUST NOT appear more than once in an article. Headers not appearing in this document (i.e. X-headers, headers defined by cooperating subnets) are exempt from this requirement. See "Responsibilities of Agents" for a clear picture. 6.1 Followup-To The Followup-To header contents specify which newsgroup(s) followups should be posted to: Followup-To-content = Newsgroups-content / "poster" The syntax is the same as that of the Newsgroups content, with the exception that the magic word "poster" means that followups should be mailed to the article's reply address rather than posted. In the absence of Followup-To, the default newsgroup(s) for a followup are those in the Newsgroups header and for this reason the Followup-To header should not be included if it just duplicates the Newsgroups header. 6.2 Sender The Sender header specifies the email address of the entity which actually sent this article, if that entity is different from the From header. This header SHOULD NOT appear in an article unless the sender is different from the author. This header is appropriate for use by automatic article posters. See [DRUMS] for Sender-content = mailbox-list 6.3 Expires The Expires header content specifies a date and time when the article is deemed to be no longer useful and should be removed ("expired"). The content syntax is the same as that of the Date content which is defined in the Message Format Standard [MESSFOR] . expires-content = date-time A Expires header SHOULD only be used in an article if the requested expiry time is earlier or later than the default would normally be for that article. Local policy for each serving agent will dictate when this header is obeyed and authors SHOULD NOT depend on it being completely followed. 6.3. Reply-To The Reply-To header content specifies a reply address(es) to be used for personal replies for the author(s) of the article when this is different from the author's address(es) given in the From header. The format of the Reply-To header is defined in the Message Format Standard [MESSFOR] . In the absence of Reply-To, the reply address(es) is the address(es) in the From header. For this reason a Reply-To SHOULD NOT be included if it just duplicates the From header. Use of a Reply-To header is preferable to including a similar request in the article body, because reply agents can take account of Reply-To automatically. "Reply-To: <> " MAY be used to indicate that the poster does not wish to recieve email replies. Reply-To-content = From-content 6.3.1 Examples: Reply-To: John Smith Reply-To: John Smith , dave@isp.example Reply-To: John Smith , andrew@isp.example, fred@site2.example Reply-To: Please not not reply <> 6.4. References The References header content lists optionally CFWS-separated message ids of precursors. The format of the References header is defined in the Message Format Standard [MESSFOR]. A followup MUST have a References header, and an article that is not a followup MUST NOT have a References header. In a followup, if the precursor did not have a References header, the followup's References content MUST be formed by the message ID of the precursor. A followup to an article which had a References header MUST have a References header containing the precursor's References content, plus the precursor's message ID appended to the end of the list (separated from it by optional CFWS). Followup Agents SHOULD NOT trim message ids out of the References content unless the number of message ids exceeds 31 in which case message ids SHOULD be trimmed until there are only 31. Trimming SHOULD be done by removing the sixth (6th) message-id and any incomplete or otherwise broken message-ids. If Followup Agents trim any message-ids out of the References content, then they MUST leave the first five and the last nine message ids and they SHOULD also leave any message ids mentioned in the body of the article intact. NOTE: Software writers should be aware that the number of messages ids in this header may exceed 31 and software must be able to handle this without problem. References-content = msg-id [msg-id...] 6.4.1 Examples: References: References: References: <222@site1.example><87tfbyv@site7.example><67jimf@site666.example> References: 6.5. Control The Control header content marks the article as a control message, and specifies the desired actions (other than the usual ones of filing and passing on the article): Control-content = verb *( FWS argument ) verb = 1*( ALPHA / DIGIT ) argument = 1* ftext The verb indicates what action should be taken, and the argument(s) (if any) supply details. In some cases, the body of the article may also contain details. The next section describes the standard verbs. 6.6. Control Messages The following sections document the group control messages. "Message" is used herein as a synonym for "article" unless context indicates otherwise. Group control messages are a special class of control messages, that request the group configuration on a server be updated. All of the group control messages MUST have an Approved header (section 6.10). They SHOULD use one of the authentication mechanisms defined in section TBD. The execution of the actions requested by control messages is subject to local administrative restrictions, which MAY deny requests or refer them to an administrator for approval. The descriptions below are generally phrased in terms suggesting mandatory actions, but any or all of these MAY be subject to local administrative approval (either as a class or case-by-case). Analogously, where the description below specifies that a message or portion thereof is to be ignored, this action MAY include reporting it to an administrator. Relaying Agents MUST propagate even control messages they do not understand. In the following sections, each type of control message is defined syntactically by defining its arguments and its body. For example, "cancel" is defined by defining cancel-arguments and cancel-body. 6.6.1 The "newgroup" Control Message newgroup-ctrl = "newgroup" FWS groupname [ FWS flags ] flags = "moderated" groupname ; defined in [NEWS] The "newgroup" control message requests the specified group be created or changed. The text "moderated" is appended to mark the group as moderated. The message contains a "multipart/news-groupinfo" (section 6.6.1 body) part containing machine- and human-readable information about the group. The newgroup command is also used to update the description line or moderation status of a group. NOTE: It is also possible to send newgroups for existing groups that don't change anything to ensure the group exist on all systems ("booster" newgroups). Implementations might want to test for this condition before attempting to update their configuration. 6.6.1.1 multipart/news-groupinfo The "multipart/news-groupinfo" body structure contains information about a (new) newsgroup. The MIME content type definition of "multipart/news-groupinfo" is: MIME type name: multipart MIME subtype name: news-groupinfo Required parameters: boundary (see [MIME2]) Optional parameters: none Encoding considerations: "7bit" or "8bit" is sufficient and MUST be used to maintain compatibility. Security considerations: to be added A "multipart/news-groupinfo" body part contains the following subparts: 1. An "application/news-groupinfo" part (section 6.6.1.2) containing the name and description line of the group(s). This part is mandatory. 2. Other parts containing useful information about the backgrounds of newsgroup message. 3. Parts containing initial named articles for the newsgroup. See section 6.6.1.3 for details. 6.6.1.2 application/news-groupinfo The "application/news-groupinfo" body part contains a short information on a newsgroup, i.e. the group's name, it's description and the moderation flag. NOTE: This part has a format that makes the whole "multipart/news-groupinfo" structure compatible with [1036BIS]. The MIME content type definition of "application/news-groupinfo" is: MIME type name: application MIME subtype name: news-groupinfo Optional parameters: none Encoding considerations: "7bit" or "8bit" is sufficient and MUST be used to maintain compatibility. Note that the descriptions may use [MIME3]. Security considerations: to be added The content of the "application/news-groupinfo" body part is defined as: groupinfo-body = descriptor-tag CRLF 1*( description-line CRLF ) descriptor-tag = %x46.6F.72 SP %x79.6F.75.72 SP %x6E.65.77.73.67.72.6F.75.70.73 SP %x66.69.6C.6E.3A ; case sensitive "For your newsgroups file:" description-line = newsgroup-name [ 1*WSP description ] description = nonblank-text moderation-flags = [ moderated-literal ] moderated-literal = %x28.4D.6F.64.65.72.61.74.65.64.29 ; case sensitive "(Moderated)" The "application/news-groupinfo" is used in conjunction with the "newgroup" (section 6.6.1) and "mvgroup" control messages (section 6.6.3) as part of a "multipart/news-groupinfo" (section 6.6.1) MIME structure. Moderated newsgroups MUST be marked by appending the case sensitive text " (Moderated)" at the end. It is NOT recommended that the moderator's email address be included in the description. Although, in accordance with [NNTP], [MESSFOR] and 4.6 of this document, a description line could have a maximum length of 998 octets, as a matter of policy a far lower limit, expressed in characters, SHOULD be set. By default, in the absence of explicit policies, the description length SHOULD be limited in such a way that the newsgroup name, the tab (interpreted as an 8-character tab that takes one at least to column 24) and the description (excluding flags) fit into the first 79 characters. NOTE: Servers that use an "newsgroups" file will store the group descritpions there as is, i.e. without any conversion of charsets or encoding. NOTE: The descriptions will also be used with the [NNTP] LIST NEWSGROUPS command. The descriptions will be sent as is, i.e. without any conversion of charsets or encoding. 6.6.1.3 Initial Named Articles Some parts of a multipart/news-groupinfo structure MAY contain an initial set of named articles. These parts are identified by the Article-Name header just like normal named article postings. The named articles are filed separately as single postings, where the headers of the enclosing control message are copied to every part that contains a named article except that: Content-* and Article-* headers MUST be taken from the body part. The message id MUST be changed by inserting /partX before the @ sign, where X is the number of the body part, starting with 0. The Control header of the enclosing message header MUST be stripped. It MAY be replaced by a "Control: named" header. Signatures (Auth, X-Auth...) of the enclosing message SHOULD be stripped. They MAY be replaced by a signature of the own site. The resulting articles are for internal use of the server and its users only, they MUST NOT, repeat MUST NOT be forwarded to other sites. Nested multipart/* structures are allowed, they are not recursively expanded to separate articles. 6.6.2 The "rmgroup" Control Message rmgroup-ctrl = "rmgroup" FWS groupname The "rmgroup" control message requests the specified group be removed from the list of valid groups. The Content-Type of the body is unspecified; it MAY contain anything, usually an explaining text. NOTE: It is also possible to send rmgroups for nonexisting, bogus groups to ensure the group is removed on all systems ("booster" rmgroups). Implementations might want to test for this before attempting to update their configuration. 6.6.3 The "mvgroup" Control Message mvgroup-ctrl = "mvgroup" FWS ( mvgrp-groups / mvgrp-hrchy) mvgrp-groups = groupname [ FWS groupname ] mvgrp-hrchy = groupnamepart ".*" FWS groupnamepart groupnamepart = groupname ; syntactically 6.6.3.1 Single group The "mvgroup" control message requests the first specified group to be moved to the second group. The message contains a "multipart/news-groupinfo" (section 6.6.1.2) body part containing machine- and human-readable information about the new group. When this message is received, the new group SHOULD be created and all articles, including named articles, SHOULD be copied or moved to the new group, then the old, now empty group SHOULD be deleted. NOTE: For servers that use a file system directory structure to organize message storage, this operation is quite efficiently implemented as a single directory rename operation. If the old group does not exist, the message is ignored unless the new group does not exist either, in which case the new group is created just as for a "newgroup" message. An indication that the old group was replaced by the new group MAY be left back in the server's configuration and be made available to clients. NOTE: For servers that use an "active" file this means an entry in the form "oldgroup xxx yyy =newgroup" is created. NOTE: If the old group did not exist, this is considered a local configuration error. Therefore it is the best to correct this error when a mvgroup is received. If the old group does not exist, the message is ignored unless the new group does not exist either, in which case the new group is created just as for a "newgroup" message. If both groups exist, the groups MAY be "merged". If this is done, it MUST be done correctly, i.e. implementations MUST take care that the messages in the group being deleted are renumbered accordingly to avoid overwriting articles in one group with those of the other and that crossposted articles don't appear twice. Otherwise, the old group is just deleted. In all cases, information transported in the "multipart/news-groupinfo" body part is applied to the new group. Named articles are taken from the mvgroup message, the new group (if already existent) and the old group in this precedence. As a special case, the second name, i.e. the one of the new group MAY be omitted. In this case, only the information of the group is updated according to the contained "multipart/news-groupinfo". Until most relay agents conform to this document, whenever a mvgroup control message for a single group is issued, a corresponding pair of rmgroup and newgroup control messages SHOULD be issued a few days later. 6.6.3.2 Multiple Groups If the first name ends with the character sequence ".*", the newgroup message requests a whole (sub)hierarchy to be moved. The same procedure as for single groups (section 6.6.3.1) applies to every matched group; however, some systems might be able to optimize the process. NOTE: For servers that use a file system directory structure to organize message storage, this process can be optimized by renaming the parent directory instead of every group's directory. To avoid recursion, the new groups' names MUST NEVER match the old groups name pattern; i.e. moving a whole (sub)hierarchy to a subhierarchy of the original hierarchy is explicitly disallowed. Until a critical mass of relay agents are in compliance, whenever a mvgroup control message for multiple groups is issued, a corresponding set of rmgroup and newgroup control messages for all the affected groups SHOULD be issued a few days later. 6.6.4 The "checkgroups" Control Message The "checkgroups" control message contains a list of all valid groups in a complete hierarchy. The "Control:" header has the following format: checkgroup-ctrl = "checkgroups" [ FWS chkscope ] [ FWS chksernr ] chkscope = 1*( ["!"] newsgroup-name-part ) chksernr = "#" 1*DIGIT The chkscope parameter(s) specifies the (sub)hierarchy(s) for which this "checkgroups" message applies. 6.6.4.1 Example: Control: checkgroups de !de.alt #248 NOTE: "Old" software is known to ignore the "chkscope" parameter. Thus a "checkgroups" message SHOULD also contain the groups of other subhierarchies the sender is not responsible for. "New" software MUST ignore groups which don't fall into the scope of the "checkgroups" message. If no scope for the checkgroups message is given, it applies to all hierarchies for which group statements appear in the message. "Checkgroups" messages MAY also contain a serial number, which can be any positive integer (i.e. just numbered or the date in YYYYMMDD). It SHOULD increase by an arbitrary value with every change to the group list and MUST NOT ever decrease. NOTE: This was added to circumvent security problems in situations where the Date header can not be signed. The body of the message is an "application/news-checkgroups" part containing the list of ALL valid groups (and MAYBE deletion confirmations) for the specified hierarchies. 6.6.5 application/news-checkgroups The "application/news-checkgroups" body part contains a complete list of all newsgroups in a top level hierarchy, their description lines and moderation status. The MIME content type definition of "application/news-checkgroups:" is: MIME type name: application MIME subtype name: news-checkgroups Optional parameters: none Encoding considerations: "7bit" or "8bit" is sufficient and MUST be used to maintain compatibility. Note that the descriptions may use [MIME3]. Security considerations: to be added The content of the "application/news-checkgroups" body part is defined as: checkgroups-body = *( invalidation CRLF ) 1*( valid-group CRLF ) invalidation = "!" groupname *( "," *WSP groupname ) valid-group = description-line description-line ; see section 6.6.1.2 The "application/news-checkgroups" content type is used in conjunction with the "checkgroups" control message (section 6.6.1.3.1). 6.6.5.1 Examples A "newgroup" with bilingual charter and policy information: From: admin@example.invalid (example.all Administrator) Newsgroups: example.admin.groups,example.admin.announce Date: 27 Feb 1997 12:50:22 +14:00 (EST) Subject: Group example.admin.info created. Approved: admin@example.invalid Control: newgroup example.admin.info moderated Message-ID: Content-Type: multipart/news-groupinfo; boundary="nxtprt" Content-Transfer-Encoding: 8bit This is a MIME control message. --nxtprt Content-Type: application/news-groupinfo For your newsgroups file: example.admin.info Information on the example.* hierarchy (Moderated) --nxtprt Content-Type: multipart/alternative ; differences = content-language ; boundary = nxtlang Article-Name: example.admin.info: charter --nxtlang Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Language: en The group example.admin.info contains regularly posted information on the example.* hierarchy. --nxtlang Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Content-Language: de Die Gruppe example.admin.info enthõlt regelmõ~Kig versandte Informationen ³ber die example.*-Hierarchie. --nxtlang-- --nxtprt-- plain "rmgroup": From: admin@example.invalid (example.all Administrator) Newsgroups: example.admin.groups, example.admin.announce Date: 4 Jul 1997 22:04 +02:00 (PST) Subject: Deletion of example.admin.obsolete Message-ID: Approved: admin@example.invalid Control: rmgroup example.admin.obsolete The group example.admin.obsolete is obsolete. Please remove it from your system. plain "mvgroup": From: admin@example.invalid (example.all Administrator) Newsgroups: example.admin.groups, example.admin.announce Date: 30 Jul 1997 22:04 +02:00 (CEST) Subject: Moving example.oldgroup to example.newgroup Message-ID: Approved: admin@example.invalid Control: mvgroup example.oldgroup example.newgroup Content-Type: multipart/news-groupinfo; boundary=nxt --nxt Content-Type: application/newgroupinfo For your newsgroups file: example.newgroup The new replacement group. --nxt The group example.oldgroup is replaced by example.newgroup. Please update your configuration. --nxt-- more complex "mvgroup" for a whole hierarchy: The charter of the group example.talk.jokes contained a reference to example.talk.jokes.d, which is also being moved. So the charter is updated. From: admin@example.invalid (example.all Administrator) Newsgroups: example.admin.groups, example.admin.announce Date: 30 Jul 1997 22:04 +02:00 (PST) Subject: Deletion of example.admin.obsolete Message-ID: Approved: admin@example.invalid Control: mvgroup example.talk.* example.conversation Content-Type: multipart/news-groupinfo; boundary=nxt; chartas=1 --nxt Content-Type: application/newgroupinfo For your newsgroups file: example.conversation.boring Boring conversations. example.conversation.interesting Interesting conversations. example.conversation.jokes Jokes and funny stuff. example.conversation.jokes.d Discussion about example.conversation.jokes. Article-Name: example.conversation.jokes: charter This group is to publish jokes and other funny stuff. Discussions about the articles posted here should be redirected to example.conversation.jokes.d; adding a Followup-to: header is recommended. --nxt-- 6.6.6 Cancel The cancel message requests that one or more target articles be "canceled" ie be withdrawn from circulation or access. This message MAY be issued by entities which processed the target article(s) while it was still a proto-article (ie posters, posting agents, moderators and injecting agent. See also Gateways[2.1] ). Other entities MUST NOT use this method to remove articles. NOTE: A separate method for other entities to cancel articles will be defined in a later draft. cancel-arguments = 1*( message-id CFWS ) cancel-body = body The argument(s) identify the article(s) to be cancelled, by message-id. The body SHOULD contain an indication of why the cancellation was requested. The cancel message SHOULD be posted to the same newsgroup(s), with the same distribution(s), as the article(s) it is attempting to cancel. In order for a cancel message to remove an article either: 1. The mailing addresses from the From line of the cancel message and the target article match and the target article is otherwise unauthenticated. 2. At least one authentication method of the target article MUST be matched by the cancel message plus the mailing addresses from the From line of the cancel message and the target article MAY match. NOTE: The Sender, From or Approved headers MUST NOT be used as an "authentication method" within the meaning of the previous paragraph. If the above conditions are satisfied then the relaying or serving agent SHOULD delete the target article completely and immediately (or at the minimum make the article unavailable for relaying or serving) and also SHOULD reject any copies of this article that appear. See also section 7 on duties of Serving and Relaying agents. 6.6.7 ihave, sendme The ihave and sendme control messages implement a crude batched predecessor of the NNTP [rrr] protocol. They are largely obsolete in the Internet, but still see use in the UUCP environment, especially for backup feeds that normally are active only when a primary feed path has failed. NOTE: The ihave and sendme messages defined here have ABSOLUTELY NOTHING TO DO WITH NNTP, despite similarities of terminology. The two messages share the same syntax: ihave-arguments = *( message-id space ) relayer-name sendme-arguments = ihave-arguments ihave-body = *( message-id CRLF ) sendme-body = ihave-body Message IDs MUST appear in either the arguments or the body, but NOT both. Relayers SHOULD generate the form putting message IDs in the body, but the other form MUST be supported for backward compatibility. The ihave message states that the named relaying agent has received articles with the specified message IDs, which may be of interest to the relaying agents receiving the ihave message. The sendme message requests that the agent receiving it send the articles having the specified message IDs to the named relaying agent. These control messages are normally sent essentially as point-to-point messages, by using "to." newsgroups (see section 5.5.1) that are sent only to the relaying agent the messages are intended for. The two relaying agents MUST be neighbors, exchanging news directly with each other. Each relaying agent advertises its new arrivals to the other using ihave messages, and each uses sendme messages to request the articles it lacks. To reduce overhead, ihave and sendme messages SHOULD be sent relatively infrequently and SHOULD contain reasonable numbers of message IDs. If ihave and sendme are being used to implement a backup feed, it may be desirable to insert a delay between reception of an ihave and generation of a sendme, so that a slightly slow primary feed will not cause large numbers of articles to be requested unnecessarily via sendme. 6.6.8 Obsolete control messages. The following forms of control messages are declared obsolete by this document: sendsys version whogets senduuname 6.7. Distribution 6.7.1 Historical Note The original Distribution header provided a means to limit the distribution of articles to a subset of the sites which received the newsgroups it was posted to. It was designed to control a feed. Each site feeding other sites would, for each feed, configure the list of distributions appropriate to send to that site. If an article had a Distribution header, a check would be made to see if any of the distributions in the header matched the distribution list for the feed. Sadly, this list was often configured in the form "all distributions except the following" where the local distributions would be listed. This mean an unknown distribution, leaked from an external site, would match the "all distributions" and get fed out. This meant that once an article leaked out from a distribution's subnet, it flooded the entire net, or at least the very large subset that used "all but these" style of configuring the feed. Indeed, many sites deliberately wanted this flood. Hub sites at national and multinational ISPs wanted to receive all the local distributions, for the use of their users in the individual geographic regions. This assured netwide propagation of all distributions, defeating the purpose of the header. It became close to valueless. 6.7.1.1 New Semantics While distributions SHOULD still control feeds as they do, they SHOULD also be associated with the site. Each site SHOULD maintain a list of the distributions to which it is a "member." Newsreaders SHOULD also allow the user to maintain a list of distributions to which the user is a member. Newsreaders MAY also keep track of distributions the user wishes to belong to. In this event, they should examine the Distribution headers of articles to be presented to the user, and SHOULD not display them if the user does not belong to any of the distributions named. 6.7.1.2 Planned Uses Distributions can now be used to define rigid subsets of the net that sites can "subscribe" to. For example, say a party wishes to issue 3rd party cancel messages that delete spam or net abuse at sites which wish to listen to that canceller. These messages would now be posted to a specific distribution. They might still reach the entire net, and would make it to hubs, but they would only have effect at sites which explicitly took membership in the distribution, even without authentication. However, as these might be very high volume messages -- especially if there are many such 3rd party cancel services -- it remains possible for sites to ask their feeders to not even feed articles in this distribution, thus making the system efficient. 6.7.2 Definition The Distribution header specifies geographical or organizational limits to an article's propagation: Distribution-content = distribution *( dist-delim distribution) dist-delim = "," distribution = positive-distribution / negative-distribution positive-distribution = *FWS distribution-name *FWS negative-distribution = *FWS "!" distribution-name *FWS distribution-name = 1*letter [That is more restrictive than Henry, omitting '+', '-' and '_', but more liberal in allowing uppercase letters, which in fact are commonly used, and in not specifying any 14 character limit.] A distribution is case-insensitive (i.e. "US", "Us" and "us" all specify the same distribution). In the absence of a Distribution header, the default Distribution-content is "world". However, "world" SHOULD NOT be explicitly mentioned unless a negative-distribution is also present, as in Distribution: world, !us "All" MUST NOT be used as a distribution-name. Articles MUST NOT be passed between relaying agents unless the sending agent has been configured to supply and the receiving agent has requested to receive BOTH of (a) at least one of the newsgroups in the article's Newsgroups header, and (b) at least one of the positive-distributions in the article's Distribution header and none of the negative-distributions. Exceptionally, ALL relaying agents are deemed willing to supply or accept the distribution "world", and NO relaying agent should supply or accept the distribution "local". Posting agents SHOULD NOT provide a default Distribution header without giving the poster an opportunity to override it. Followup agents SHOULD initially supply the same Distribution header as found in the precursor. All the two-letter country names (e.g. "us") commonly used as top-level domain names may be used as distributions, but the common non-country top-level domain names (such as "edu" and "com") are NOT distributions, moreover top-level newsgroup-names (such as "comp" and "soc") are NOT distributions. Apart from the above, distribution-names are a matter for negotiation between the relaying agents or cooperating subnets involved. 6.8. Keywords The Keywords field contains a comma separated list of important words and phrases intended to describe some aspect of the content of the article. The format of the Keywords header is defined in the Message Format Standard [MESSFOR] . NOTE: The list is comma seperated NOT space seperated. 6.9. Summary The Summary header content is a short phrase summarizing the article's content. summary-content = non-blank-text CRLF non-blank-text = 1*(FWS text) The summary SHOULD be terse. Authors SHOULD avoid trying to cram their entire article into the headers; even the simplest query usually benefits from a sentence or two of elaboration and context, and not all reading agents display all headers. On the other hand the summary should give more detail than the Subject. 6.10. Approved The Approved header content indicates the mailing addresses (and possibly the full names) of the persons or entities approving the article for posting: Approved-content = From-content An Approved header is required in all postings to moderated newsgroups. If this header is not present then relaying and serving agents MUST reject the article. An Approved header is also required in certain control messages, to reduce the probability of accidental posting of same; see the relevant parts of section 6.6. Please see section 7.1 on how injecting agents should treat posts to moderated groups that do not contain this header. 6.11 Lines The Lines header content indicates the number of lines in the body of the article: Lines-content = 1*digit The line count includes all body lines, including the signature if any, including empty lines (if any) at beginning or end of the body. (The single empty separator line between the headers and the body is not part of the body) . The "body" here is the body as found in the posted article as transmitted by the posting agent. Software SHOULD NOT use the value of Lines for any purpose other than to display an estimate to humans. This header will be deprecated in a future RFC. 6.12 Xref The Xref header content indicates where an article was filed by the last server to process it: Xref-content = server 1*( CFWS location ) server = server-name location = newsgroup-name ":" article-locator article-locator = 1* The serving agent's name is included so that software can determine which serving agent generated the header. The locations specify what newsgroups the article was filed under (which may differ from those in the Newsgroups header) and where it was filed under them. The exact form of an article locator is implementation-specific. NOTE: The traditional form of an article locator is a decimal number, with articles in each newsgroup numbered consecutively starting from 1. NNTP demands that such a model be provided, and there may be other software which expects it, but it seems desirable to permit flexibility for unorthodox implementations. An agent inserting an Xref header into an article MUST delete any previous Xref header(s). A relaying agent MUST only create and/or relay an Xref header if it correct on all the receiving agents the article is forwarded to. Serving agents SHOULD insert this header unless the information in it (apart from the serving name) is correct in which case it should be left unchanged. An agent MUST use the same name in Xref headers as it uses in Path headers. 6.13 Organization The Organization header content is a short phrase identifying the author's organization: organization-content = nonblank-text CRLF NOTE: Posting and injection agents are discouraged from providing a default value for this header unless it is acceptable to all posters using these agents. Unless this header contains useful information ( including some indication of the authors physical location) posters are discouraged from including it. 6.14 User-Agent The User-Agent header contains information about the user agent (typically a newsreader) generating the article. This is for statistical purposes and tracing of standards violations to specific software needing correction. Although OPTIONAL, user agents SHOULD include this header with the articles they generate. The field MAY contain multiple product tokens and comments identifying the agent and any subproducts which form a significant part of the user agent such as external agents used for message composition, separated injecting agents (such as those used by offline newsreaders), and significant libraries that are part of such agents. The products are listed in order of their significance for identifying the application, not necessarily in chronological order of handling prior to injection. Injecting agents MAY include product information for servers (such as INN/1.7.2), but servers MUST NOT generate or modify this header to list themselves. User-Agent MUST NOT be modified after injection, but MAY be stripped or have its contents replaced prior to re-injection by another user agent such as an anonymizing gateway. User-Agent = "User-Agent:" SP User-Agent-content User-Agent-content = product *(CFWS product) [CFWS] At least one product MUST be present. The first token MUST NOT be a comment. Comments relate to the previously named product, not the product following it. product = token ["/" product-version] product-version = token Product tokens should be short and to the point -- they MUST NOT be used for information beyond the canonical name of the product and it's version. Although any token character MAY appear in a product-version, this token SHOULD be used only for a version identifier (i.e., successive versions of the same product SHOULD differ only in the product-version portion of the product value). Product tokens MUST identify products. NOTE: Variations from RFC 1945: 1. product token is required and MUST be first, 2. use of other text in the syntactic usage of the product token which is not a token is forbidden, 3. comment allows quoted-pair, 4. "{" and "}" are allowed in token (product and product-version) in news, 5. octets from character sets othe