tunefeed

(Build a newsgroups pattern for a remote feed)

SYNOPSIS

tunefeed [-hv] [-d days] [-t threshold] local remote [traffic]

DESCRIPTION

Given two active files, tunefeed generates an INN newsfeeds pattern for a feed from the first site to the second, that sends the second site everything in its active file carried by the first site but tries to minimize the number of rejected articles. It does this by noting differences between the two active files and then trying to generate wildcard patterns that cover the similarities without including much (or any) unwanted traffic.

local and remote should be standard active files. You can probably get the active file of a site that you feed (provided they're running INN) by using the getlist program (getlist -h news.server.com will retrieve its active file) or by connecting to their NNTP port and typing LIST ACTIVE.

tunefeed makes an effort to avoid complex patterns when they're of minimal gain. threshold is the number of messages per day at which to worry about excluding a group; if a group the remote site doesn't want to receive gets below that number of messages per day, then that group is either sent or not sent depending on which choice results in the simplest (shortest) wildcard pattern. If you want a pattern that exactly matches what the remote site wants, use -t 0.

Ideally, tunefeed likes to be given the optional third argument, traffic, which points at a file listing traffic numbers for each group. The format of this file is a group name, whitespace, and then the number of messages per day it receives. Without such a file, tunefeed will attempt to guess traffic by taking the difference between the high and low numbers in the active file as the amount of traffic in that group per day. This will almost always not be accurate, but it should at least be a ballpark figure. If you know approximately how many days of traffic the active file numbers represent, you can tell tunefeed this information using the -d flag.

tunefeed's output will look something like:

    comp.*,humanities.classics,misc.*,news.*,rec.*,sci.*,soc.*,talk.*,\
    alt.*,!alt.atheism,!alt.binaries.*,!alt.punk*,\
    !alt.sex*,!alt.video.dvd,\
    bionet.*,biz.*,gnu.*,vmsnet.*,\
    ba.*,!ba.jobs.agency,ca.*,sbay.*

(with each line prefixed by a tab, and with standard INN newsfeeds continuation syntax). Due to the preferences of the author, it will also be sorted as Big Eight, then alt.*, then global non-language hierarchies, then regional and language hierarchies.

OPTIONS

-d days, --days=days

Assume that the difference between the high and low numbers in the active file represent days days of traffic.

-h, --help

Print out this documentation (which is done simply by feeding the script to perldoc -t.

-t threshold, --threshold=threshold

Allow any group with less than threshold articles per day in traffic to be either sent or not sent depending on which choice makes the wildcard patterns simpler. If a threshold isn't specified, the default value is 250.

-v, --version

Print out the version of tunefeed and exit.

BUGS

This program takes a long time to run, not to mention being a nasty memory hog. The algorithm is thorough, but definitely not very optimized, and isn't all that friendly.

Guessing traffic from active file numbers is going to produce very skewed results on sites with expiration policies that vary widely by group.

There is no way to optimize for size in avoiding rejections, only quantity of articles.

There should be a way to turn off the author's idiosyncratic ordering of hierarchies, or to specify a different ordering, without editing this script.

This script should attempt to retrieve the active file from the remote site automatically if so desired.

This script should be able to be given some existing wildcard patterns and take them into account when generating new ones.

CAVEATS

Please be aware that your neighbor's active file may not accurately represent the groups they wish to receive from you. As with everything, choices made by automated programs like this one should be reviewed by a human and the remote site should be notified, and if they have sent explicit patterns, those should be honored instead. I definitely do *not* recommend running this program on any sort of automated basis.

AUTHOR

Russ Allbery <eagle@eyrie.org>.

Last modified and spun 2024-02-25