From 9c45d9ad13fdf439d44d7443ae75da15ea0223ed Mon Sep 17 00:00:00 2001 From: Sam Varshavchik Date: Mon, 19 Aug 2013 16:39:41 -0400 Subject: Initial checkin Imported from subversion report, converted to git. Updated all paths in scripts and makefiles, reflecting the new directory hierarchy. --- rfc822/rfc822.sgml | 625 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 625 insertions(+) create mode 100644 rfc822/rfc822.sgml (limited to 'rfc822/rfc822.sgml') diff --git a/rfc822/rfc822.sgml b/rfc822/rfc822.sgml new file mode 100644 index 0000000..f0d7c93 --- /dev/null +++ b/rfc822/rfc822.sgml @@ -0,0 +1,625 @@ + + + + + SamVarshavchikAuthorCourier Mail Server + + + rfc822 + 3 + Double Precision, Inc. + + + + rfc822 + RFC 822 parsing library + + + + + + +#include <rfc822.h> + +#include <rfc2047.h> + +cc ... -lrfc822 + + + + + + DESCRIPTION + + +The rfc822 library provides functions for parsing E-mail headers in the RFC +822 format. This library also includes some functions to help with encoding +and decoding 8-bit text, as defined by RFC 2047. + + +The format used by E-mail headers to encode sender and recipient +information is defined by +RFC 822 +(and its successor, +RFC 2822). +The format allows the actual E-mail +address and the sender/recipient name to be expressed together, for example: +John Smith <jsmith@example.com> + + +The main purposes of the rfc822 library is to: + + +1) Parse a text string containing a list of RFC 822-formatted address into +its logical components: names and E-mail addresses. + + +2) Access those individual components. + + +3) Allow some limited modifications of the parsed structure, and then +convert it back into a text string. + + + Tokenizing an E-mail header + + + +struct rfc822t *tokens=rfc822t_alloc_new(const char *header, + void (*err_func)(const char *, int, void *), + void *func_arg); + +void rfc822t_free(tokens); + + + + +The rfc822t_alloc_new() function (superceeds +rfc822t_alloc(), which is now +obsolete) accepts an E-mail header, and parses it into +individual tokens. This function allocates and returns a pointer to an +rfc822t +structure, which is later used by +rfc822a_alloc() to extract +individual addresses from these tokens. + + +If err_func argument, if not NULL, is a pointer +to a callback +function. The function is called in the event that the E-mail header is +corrupted to the point that it cannot even be parsed. This is a rare instance +-- most forms of corruption are still valid at least on the lexical level. +The only time this error is reported is in the event of mismatched +parenthesis, angle brackets, or quotes. The callback function receives the +header pointer, an index to the syntax error in the +header string, and the func_arg argument. + + +The semantics of err_func are subject to change. It is recommended +to leave this argument as NULL in the current version of the library. + + +rfc822t_alloc() returns a pointer to a +dynamically-allocated rfc822t +structure. A NULL pointer is returned if there's insufficient memory to +allocate this structure. The rfc822t_free() function +destroys +rfc822t structure and frees all +dynamically allocated memory. + + + +Until rfc822t_free() is called, the contents of +header MUST +NOT be destroyed or altered in any way. The contents of +header are not +modified by rfc822t_alloc(), however the +rfc822t structure contains +pointers to portions of the supplied header, +and they must remain valid. + + + + + Extracting E-mail addresses + + + +struct rfc822a *addrs=rfc822a_alloc(struct rfc822t *tokens); + +void rfc822a_free(addrs); + + + + +The rfc822a_alloc() function returns a +dynamically-allocated rfc822a +structure, that contains individual addresses that were logically parsed +from a rfc822t structure. The +rfc822a_alloc() function returns NULL if +there was insufficient memory to allocate the rfc822a structure. The +rfc822a_free() function destroys the rfc822a function, and frees all +associated dynamically-allocated memory. The rfc822t structure passed +to rfc822a_alloc() must not be destroyed before rfc822a_free() destroys the +rfc822a structure. + + +The rfc822a structure has the following fields: + + +struct rfc822a { + struct rfc822addr *addrs; + int naddrs; +} ; + + + + +The naddrs field gives the number of +rfc822addr structures +that are pointed to by addrs, which is an array. +Each rfc822addr +structure represents either an address found in the original E-mail header, +or the contents of some legacy "syntactical sugar". +For example, the +following is a valid E-mail header: + + + +To: recipient-list: tom@example.com, john@example.com; + + + + Typically, all of this, except for "To:", +is tokenized by rfc822t_alloc(), then parsed by +rfc822a_alloc(). +"recipient-list:" and +the trailing semicolon is a legacy mailing list specification that is no +longer in widespread use, but must still must be accounted for. The resulting +rfc822a structure will have four +rfc822addr structures: one for +"recipient-list:"; +one for each address; and one for the trailing semicolon. +Each rfc822a structure has the following +fields: + + +struct rfc822addr { + struct rfc822token *tokens; + struct rfc822token *name; +} ; + + + + +If tokens is a null pointer, this structure +represents some +non-address portion of the original header, such as +"recipient-list:" or a +semicolon. Otherwise it points to a structure that represents the E-mail +address in tokenized form. + + +name either points to the tokenized form of a +non-address portion of +the original header, or to a tokenized form of the recipient's name. +name will be NULL if the recipient name was not provided. For the +following address: +Tom Jones <tjones@example.com> - the +tokens field points to the tokenized form of +"tjones@example.com", +and name points to the tokenized form of +"Tom Jones". + + +Each rfc822token structure contains the following +fields: + + +struct rfc822token { + struct rfc822token *next; + int token; + const char *ptr; + int len; +} ; + + + + +The next pointer builds a linked list of all +tokens in this name or +address. The possible values for the token field +are: + + + + 0x00 + + +This is a simple atom - a sequence of non-special characters that +is delimited by whitespace or special characters (see below). + + + + 0x22 + + +The value of the ascii quote - this is a quoted string. + + + + Open parenthesis: '(' + + +This is an old style comment. A deprecated form of E-mail +addressing uses - for example - +"john@example.com (John Smith)" instead of +"John Smith <john@example.com>". +This old-style notation defined +parenthesized content as arbitrary comments. +The rfc822token with +token set to '(' is created for the contents of +the entire comment. + + + + Symbols: '<', '>', '@', and many others + + +The remaining possible values of token include all +the characters in RFC 822 headers that have special significance. + + + + + +When a rfc822token structure does not represent a +special character, the ptr field points to a text +string giving its contents. +The contents are NOT null-terminated, the len +field contains the number of characters included. +The macro rfc822_is_atom(token) indicates whether +ptr and len are used for +the given token. +Currently rfc822_is_atom() returns true if +token is a zero byte, '"', or +'('. + + +Note that it's possible that len might be zero. +This happens with null addresses used as return addresses for delivery status +notifications. + + + + Working with E-mail addresses + + +void rfc822_deladdr(struct rfc822a *addrs, int index); + +void rfc822tok_print(const struct rfc822token *list, + void (*func)(char, void *), void *func_arg); + +void rfc822_print(const struct rfc822a *addrs, + void (*print_func)(char, void *), + void (*print_separator)(const char *, void *), void *callback_arg); + +void rfc822_addrlist(const struct rfc822a *addrs, + void (*print_func)(char, void *), + void *callback_arg); + +void rfc822_namelist(const struct rfc822a *addrs, + void (*print_func)(char, void *), + void *callback_arg); + +void rfc822_praddr(const struct rfc822a *addrs, + int index, + void (*print_func)(char, void *), + void *callback_arg); + +void rfc822_prname(const struct rfc822a *addrs, + int index, + void (*print_func)(char, void *), + void *callback_arg); + +void rfc822_prname_orlist(const struct rfc822a *addrs, + int index, + void (*print_func)(char, void *), + void *callback_arg); + +char *rfc822_gettok(const struct rfc822token *list); +char *rfc822_getaddrs(const struct rfc822a *addrs); +char *rfc822_getaddr(const struct rfc822a *addrs, int index); +char *rfc822_getname(const struct rfc822a *addrs, int index); +char *rfc822_getname_orlist(const struct rfc822a *addrs, int index); + +char *rfc822_getaddrs_wrap(const struct rfc822a *, int); + + + + +These functions are used to work with individual addresses that are parsed +by rfc822a_alloc(). + + +rfc822_deladdr() removes a single +rfc822addr structure, whose +index is given, from the address array in +rfc822addr. +naddrs is decremented by one. + + +rfc822tok_print() converts a tokenized +list of rfc822token +objects into a text string. The callback function, +func, is called one +character at a time, for every character in the tokenized objects. An +arbitrary pointer, func_arg, is passed unchanged as +the additional argument to the callback function. +rfc822tok_print() is not usually the most +convenient and efficient function, but it has its uses. + + +rfc822_print() takes an entire +rfc822a structure, and uses the +callback functions to print the contained addresses, in their original form, +separated by commas. The function pointed to by +print_func is used to +print each individual address, one character at a time. Between the +addresses, the print_separator function is called to +print the address separator, usually the string ", ". +The callback_arg argument is passed +along unchanged, as an additional argument to these functions. + + +The functions rfc822_addrlist() and +rfc822_namelist() also print the +contents of the entire rfc822a structure, but in a +different way. +rfc822_addrlist() prints just the actual E-mail +addresses, not the recipient +names or comments. Each E-mail address is followed by a newline character. +rfc822_namelist() prints just the names or comments, +followed by newlines. + + +The functions rfc822_praddr() and +rfc822_prname() are just like +rfc822_addrlist() and +rfc822_namelist(), except that they print a single name +or address in the rfc822a structure, given its +index. The +functions rfc822_gettok(), +rfc822_getaddrs(), rfc822_getaddr(), +and rfc822_getname() are equivalent to +rfc822tok_print(), rfc822_print(), +rfc822_praddr() and rfc822_prname(), +but, instead of using a callback function +pointer, these functions write the output into a dynamically allocated buffer. +That buffer must be destroyed by free(3) after use. +These functions will +return a null pointer in the event of a failure to allocate memory for the +buffer. + + +rfc822_prname_orlist() is similar to +rfc822_prname(), except that it will +also print the legacy RFC822 group list syntax (which are also parsed by +rfc822a_alloc()). rfc822_praddr() +will print an empty string for an index +that corresponds to a group list name (or terminated semicolon). +rfc822_prname() will also print an empty string. +rfc822_prname_orlist() will +instead print either the name of the group list, or a single string ";". +rfc822_getname_orlist() will instead save it into a +dynamically allocated buffer. + + +The function rfc822_getaddrs_wrap() is similar to +rfc822_getaddrs(), except +that the generated text is wrapped on or about the 73rd column, using +newline characters. + + + + + Working with dates + + +time_t timestamp=rfc822_parsedt(const char *datestr) +const char *datestr=rfc822_mkdate(time_t timestamp); +void rfc822_mkdate_buf(time_t timestamp, char *buffer); + + + + +These functions convert between timestamps and dates expressed in the +Date: E-mail header format. + + +rfc822_parsedt() returns the timestamp corresponding to +the given date string (0 if there was a syntax error). + + +rfc822_mkdate() returns a date string corresponding to +the given timestamp. +rfc822_mkdate_buf() writes the date string into the +given buffer instead, +which must be big enough to accommodate it. + + + + + Working with 8-bit MIME-encoded headers + + + +int error=rfc2047_decode(const char *text, + int (*callback_func)(const char *, int, const char *, void *), + void *callback_arg); + +extern char *str=rfc2047_decode_simple(const char *text); + +extern char *str=rfc2047_decode_enhanced(const char *text, + const char *charset); + +void rfc2047_print(const struct rfc822a *a, + const char *charset, + void (*print_func)(char, void *), + void (*print_separator)(const char *, void *), void *); + + +char *buffer=rfc2047_encode_str(const char *string, + const char *charset); + +int error=rfc2047_encode_callback(const char *string, + const char *charset, + int (*func)(const char *, size_t, void *), + void *callback_arg); + +char *buffer=rfc2047_encode_header(const struct rfc822a *a, + const char *charset); + + + + +These functions provide additional logic to encode or decode 8-bit content +in 7-bit RFC 822 headers, as specified in RFC 2047. + + +rfc2047_decode() is a basic RFC 2047 decoding function. +It receives a +pointer to some 7bit RFC 2047-encoded text, and a callback function. The +callback function is repeatedly called. Each time it's called it receives a +piece of decoded text. The arguments are: a pointer to a text fragment, number +of bytes in the text fragment, followed by a pointer to the character set of +the text fragment. The character set pointer is NULL for portions of the +original text that are not RFC 2047-encoded. + + +The callback function also receives callback_arg, as +its last +argument. If the callback function returns a non-zero value, +rfc2047_decode() +terminates, returning that value. Otherwise, +rfc2047_decode() returns 0 after +a successful decoding. rfc2047_decode() returns -1 if it +was unable to allocate sufficient memory. + + +rfc2047_decode_simple() and +rfc2047_decode_enhanced() are alternatives to +rfc2047_decode() which forego a callback function, and +return the decoded text +in a dynamically-allocated memory buffer. The buffer must be +free(3)-ed after +use. rfc2047_decode_simple() discards all character set +specifications, and +merely decodes any 8-bit text. rfc2047_decode_enhanced() +is a compromise to +discarding all character set information. The local character set being used +is specified as the second argument to +rfc2047_decode_enhanced(). Any RFC +2047-encoded text in a different character set will be prefixed by the name of +the character set, in brackets, in the resulting output. + + +rfc2047_decode_simple() and +rfc2047_decode_enhanced() return a null pointer +if they are unable to allocate sufficient memory. + + +The rfc2047_print() function is equivalent to +rfc822_print(), followed by +rfc2047_decode_enhanced() on the result. The callback +functions are used in +an identical fashion, except that they receive text that's already +decoded. + + +The function rfc2047_encode_str() takes a +string and charset +being the name of the local character set, then encodes any 8-bit portions of +string using RFC 2047 encoding. +rfc2047_encode_str() returns a +dynamically-allocated buffer with the result, which must be +free(3)-ed after +use, or NULL if there was insufficient memory to allocate the buffer. + + +The function rfc2047_encode_callback() is similar to +rfc2047_encode_str() +except that the callback function is repeatedly called to received the +encoding string. Each invocation of the callback function receives a pointer +to a portion of the encoded text, the number of characters in this portion, +and callback_arg. + + +The function rfc2047_encode_header() is basically +equivalent to rfc822_getaddrs(), followed by +rfc2047_encode_str(); + + + + + + Working with subjects + + + +char *basesubj=rfc822_coresubj(const char *subj); + +char *basesubj=rfc822_coresubj_nouc(const char *subj); + + + + +This function takes the contents of the subject header, and returns the +"core" subject header that's used in the specification of the IMAP THREAD +function. This function is designed to strip all subject line artifacts that +might've been added in the process of forwarding or replying to a message. +Currently, rfc822_coresubj() performs the following transformations: + + + Whitespace + + Leading and trailing whitespace is removed. Consecutive +whitespace characters are collapsed into a single whitespace character. +All whitespace characters are replaced by a space. + + + + Re:, (fwd) [foo] + + +These artifacts (and several others) are removed from +the subject line. + + + + + Note that this function does NOT do MIME decoding. In order to +implement IMAP THREAD, it is necessary to call something like +rfc2047_decode() before +calling rfc822_coresubj(). + + +This function returns a pointer to a dynamically-allocated buffer, which +must be free(3)-ed after use. + + +rfc822_coresubj_nouc() is like +rfc822_coresubj(), except that the subject +is not converted to uppercase. + + + + + SEE ALSO + + +rfc20453, +reformail1, +reformime1. + + -- cgit v1.2.3