Document: FSC-0047 Version: 001 Date: 28-May-90 The ^ASPLIT Kludge Line For Splitting Large Messages Pat Terry 5:494/4.101 pat.Terry@p101.f4.n494.z5.fidonet.org pterry@m2xenix.psg.org Status of this document: This FSC suggests a proposed protocol for the FidoNet(r) community, and requests discussion and suggestions for improvements. Distribution of this document is unlimited. Fido and FidoNet are registered marks of Tom Jennings and Fido Software. Objectives =========== Several packers place a limit on the size of message that can be transmitted. This is often of the order of 14K which, while sufficient for most purposes, is inadequate for several applications, and in particular for long messages gated to and from UUCP land. A SPLIT/UNSPLIT suite of two programs has been developed, intended to handle this problem. SPLIT will split long .MSG format messages into smaller packets. After transmission to a remote site, the packets may be merged by UNSPLIT to recreate the original message, as closely as possible. The only differences are the addition of a kludge line and, possibly, a few line breaks. The system ensures that each large message, when split, generates a collection of small messages, each of which is still valid in its own right. If recombination is not effected, the messages will still be usefully received, and, in particular, split messages to UUCP should still all get to their destinations, albeit in parts. After some weeks of testing, the system seems to be sufficiently stable and useful to justify making an FSC proposal. The ^A SPLIT kludge line ======================== Messages split and joined by this system make use of an ^A kludge line, which has the form below. It is proposed in this note that this become the basis for a "standard". One of these lines is added to the list of kludges preceding each part of a split message. When recombined, a line of this form remains, for reasons which will appear later. Generically the lines look like this, in fixed columns: ^ASPLIT: date time @net/node nnnnn pp/xx +++++++++++ where nnnnn gives the original message number from which the components have been derived (cols 41 - 45) pp gives the part number (cols 47 and 48) xx gives the total number of parts (cols 50 and 51) For example ^ASPLIT: 30 Mar 90 11:12:34 @494/4 123 02/03 +++++++++++ | | | | | | | | | | @ | | | | | Date Time Node MSG | | Eye catcher (when split) (of origin) (at time | Total parts of split) Part number Thus a large file (existing as 123.MSG when the splitter was run) originating from 494/4 might be split into 3 parts with the split lines ^ASPLIT: 30 Mar 90 11:12:34 @494/4 123 01/03 ++++++++++++ ^ASPLIT: 30 Mar 90 11:12:34 @494/4 123 02/03 ++++++++++++ ^ASPLIT: 30 Mar 90 11:12:34 @494/4 123 03/03 ++++++++++++ Columns 9 through 45 are really a "uniquefier". The nnnnn message number is just the one the message had when it was split, and is of no other significance. Similarly, the system does not use 4-d addressing for the node/net component, because this is of no real interest to this application, and requires parsing a file like BINKLEY.CFG, or similar extra work, to determine the other components. This is, admittedly, verbose, but if recombination fails for any reason (like all the packets not arriving at once) one can still recombine or examine the relevant pieces manually. Note also that the lines are added to messages that are themselves "long", and the *relative* increase in length is actually very small. Further justification will be found below. Splitting large messages ======================== When splitting large messages, the following happens: The message base is scanned for large messages. For each of the (few) large messages found that qualify, the large message is split into parts. The original FTSC header is placed in each component part, save that the FileAttach bit (if any) is removed from the 2nd, 3rd ... parts. No attempt is made to modify the To:, or From: fields. The Subject: field for the 2nd, 3rd ... parts is modified to include a leading part number. The original kludge lines are retained in the first part. Most other "leading" kludges, like ^AFMPT, ^ATOPT, ^AINTL are retained in these parts. However, ^AEID and ^AMSGID lines, if any, are removed from the 2nd, 3rd ... parts. This is potentially awkward, but is to avoid "dupe detectors" discarding the 2nd, 3rd ... parts, and in practice should cause no real problems. Large echomail messages originating on a system will presumably have their ^AEID lines added to the constituent parts at scanning/packing time on that system (ie AFTER splitting), and other large messages should probably not reach this stage - they should have been split or discarded earlier. A ^ASPLIT line is added to each part to allow for possible later recombination. If the message is addressed "TO UUCP: in the FTSC header, the To: lines at the start of the message text are copied to all parts. The "body" of the message is then split between the various parts. An attempt is made to split at the end of a line in each case. The trailing tear line, ^AVia ^APath etc lines are added to all parts. Joining ("unsplitting") messages ================================ When reconstituting large messages, the following happens: The message base is scanned for messages with ^ASPLIT lines. A list is made of messages to be unsplit, with each message having a list of its component parts. If a duplicate component part is found, it is discarded (thus partially getting around the problem of any discarded ^AEID lines in the components). Messages marked "in transit" or "sent" are not eligible for recombination. Nor are messages with a split component number of 00, as these will only exist as the result of an earlier recombination. For each set of components of messages to be recombined the following happens: The first component is examined so as to extract the Kludge lines, and any UUCP "To: " lines. These, and the FTSC header, are written out to a new file, with the ^ASPLIT line modified to have a component number of 00, so as to prevent further splitting should the splitter program be reapplied to the recombined message. If this is not done, large messages can get into a tedious split-unsplit- split-unsplit... cycle each time the system is run. The text portions of the first and subsequent parts are then merged (discarding extra copies of kludges, UUCP "To:" lines and the like). Any tearline, Origin, ^APATH, ^AVia lines etc are appended. Normally the component files are then automatically deleted. Justification for "human readable" uniquifier. ============================================== Most systems do not display kludge lines, and the ^ASPLIT line should be of no real interest. However, in one particular application which was using this system, the ^ASPLIT lines were made visible for messages that could not be recombined (because they become too large for gating from FidoNet to another RFC-822 compliant network), and hence it has been deemed essential that a "visible" line derived from ^ASPLIT became human readable, easily spotted, and comprehensible. For much the same reason, fixed columns have been used, rather than free format, so that archaic FORTRAN programmers could easily develop "unsplitters" after getting all the pieces! Lastly, in this system a sort was done to order the ^ASPLIT line to be the last kludge line before the message body proper. Acknowledgements ================ Particular thanks must be expressed to Randy Bush for offering to test this system in its earliest releases on the very busy 1/5 zonegate, and for suggesting various improvements. Thanks for testing are also due to Dave Wilson who operates the 5/1 zonegate at the other end of the link from Randy, and to Mike Lawrie of Rhodes Computer Centre for useful suggestions regarding the form of the ^ASPLIT line acceptable to non-Fido users. Prototype system ================ A version of SPLIT/UNSPLIT using this system may be FREQ'd from 1:105/42 or 5:494/4 using the magic name SPLITTER. As at this time I have unsubstantiated reports that it does not work in conjunction with systems running Novell software (I have no access to Novell). It works fine using Msged and QMail.