diff options
| author | Sam Varshavchik | 2013-08-19 16:39:41 -0400 |
|---|---|---|
| committer | Sam Varshavchik | 2013-08-25 14:43:51 -0400 |
| commit | 9c45d9ad13fdf439d44d7443ae75da15ea0223ed (patch) | |
| tree | 7a81a04cb51efb078ee350859a64be2ebc6b8813 /rfc822 | |
| parent | a9520698b770168d1f33d6301463bb70a19655ec (diff) | |
| download | courier-libs-9c45d9ad13fdf439d44d7443ae75da15ea0223ed.tar.bz2 | |
Initial checkin
Imported from subversion report, converted to git. Updated all paths in
scripts and makefiles, reflecting the new directory hierarchy.
Diffstat (limited to 'rfc822')
| -rw-r--r-- | rfc822/.gitignore | 5 | ||||
| -rw-r--r-- | rfc822/ChangeLog | 271 | ||||
| -rw-r--r-- | rfc822/Makefile.am | 47 | ||||
| -rw-r--r-- | rfc822/configure.in | 125 | ||||
| -rw-r--r-- | rfc822/encode.c | 255 | ||||
| -rw-r--r-- | rfc822/encode.h | 55 | ||||
| -rw-r--r-- | rfc822/encodeautodetect.c | 138 | ||||
| -rw-r--r-- | rfc822/imaprefs.c | 1059 | ||||
| -rw-r--r-- | rfc822/imaprefs.h | 109 | ||||
| -rw-r--r-- | rfc822/imapsubj.c | 305 | ||||
| -rw-r--r-- | rfc822/reftest.c | 367 | ||||
| -rw-r--r-- | rfc822/reftest.txt | 106 | ||||
| -rw-r--r-- | rfc822/rfc2047.c | 729 | ||||
| -rw-r--r-- | rfc822/rfc2047.h | 87 | ||||
| -rw-r--r-- | rfc822/rfc2047u.c | 1050 | ||||
| -rw-r--r-- | rfc822/rfc822.c | 826 | ||||
| -rw-r--r-- | rfc822/rfc822.h | 296 | ||||
| -rw-r--r-- | rfc822/rfc822.sgml | 625 | ||||
| -rw-r--r-- | rfc822/rfc822_getaddr.c | 46 | ||||
| -rw-r--r-- | rfc822/rfc822_getaddrs.c | 108 | ||||
| -rw-r--r-- | rfc822/rfc822_mkdate.c | 112 | ||||
| -rw-r--r-- | rfc822/rfc822_parsedt.c | 256 | ||||
| -rw-r--r-- | rfc822/rfc822hdr.c | 160 | ||||
| -rw-r--r-- | rfc822/rfc822hdr.h | 49 | ||||
| -rw-r--r-- | rfc822/testsuite.c | 134 | ||||
| -rw-r--r-- | rfc822/testsuite.txt | 100 |
26 files changed, 7420 insertions, 0 deletions
diff --git a/rfc822/.gitignore b/rfc822/.gitignore new file mode 100644 index 0000000..5aa453d --- /dev/null +++ b/rfc822/.gitignore @@ -0,0 +1,5 @@ +/reftest +/rfc822.3 +/rfc822.config +/rfc822.html +/testsuite diff --git a/rfc822/ChangeLog b/rfc822/ChangeLog new file mode 100644 index 0000000..6dc142a --- /dev/null +++ b/rfc822/ChangeLog @@ -0,0 +1,271 @@ +2013-02-20 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc2047.c, rfc2047u.c: workaround for invalid utf-8 input making + libidn go off the rails. + +2011-02-12 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc2047.c (do_encode_words_method): Avoid splitting RFC 2047-encoded + works in a middle of a grapheme. + +2011-02-10 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc2047.c (rfc2047_encode_callback): Likely bug fixed. + (rfc2047_encode_str): Ignore invalid charset sequence when encoding + RFC-2047, too much code assumes that encoding always works. + +2011-01-24 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc822/rfc2047.c (rfc2047_encode_callback): Rewrite broken logic. + +2011-01-22 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc822/encodeautodetect.c (libmail_encode_autodetect): Remove obsolete + unicode API. Determine encoding with heuristics based entirely on + the content. Remove charset arg, replace with "use7bit", to force + qp or base64, instead of 8bit. Take a binaryflag param that gets set + to indicate whether base64 was selected based on binary content. + + * rfc2047.c (rfc2047_encode_str): Removed + rfc2047_encode_callback_base64, invoked from rfc2047_encode_str(). + Rewrite rfc2047_encode_str() to use the new unicode API. + + * rfc2047u.c: Unicode API updates. + +2011-01-09 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc822_parsedt.c: Eliminate the dependency on ctype, replaced + them with macros. + +2011-01-08 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc822hdr.c (rfc822hdr_namecmp): Factor out rfc822hdr_namecmp from + rfc822hdr_is_addr, and make it usable, generically. + +2010-06-25 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc822.c (rfc822_print_common_nameaddr): Prevent segfault if + address decode fails. + +2009-11-22 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc822.c: Removed rfc822_praddr(). + + * rfc822_getaddr.c (rfc822_getaddr): Implement rfc822_getaddr() by + calling rfc822_display_addr_tobuf(), instead of rfc822_praddr(). + + * testsuite.c (doaddr): Remove rfc822_addrlist() and rfc822_namelist(). + +2009-11-21 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc822_getaddr.c: Remove rfc822_prname() and rfc822_prname_orlist(), + replaced by rfc822_display_name() with a NULL character set. + + * rfc2047u.c (rfc822_display_name): Semantical change -- without + an explicit name, display the address as the name. If the requested + character set is NULL, do not decode RFC2047-encoded content, return + it as is. + +2009-11-17 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc2047u.c (rfc2047_print_unicodeaddr): Fix several formatting + issues with deprecated RFC 822 distribution lists: spurious comma + adter the last address, pass the space after the ':' as a separator + character. + + * rfc2047.c (counts2/save): Fix line-wrapping of encoded addresses. + + * rfc2047u.c (rfc822_display_addr_tobuf): New function. + +2009-11-14 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc822.c (rfc822_print_common): Rewrite. + + * rfc2047u.c (rfc822_display_name_int): Fixed various rules for + encoding names to be more MIME compliant. + (rfc822_display_addr_str): Renamed from rfc822_display_addr(), for a + consistent API. + (rfc822_display_addr): New function, decode the wire format of a single + address. Names are MIME decoded, addresses are IDN-decoded. + (rfc2047_print_unicodeaddr): Do not output a dummy name for an + address without one. + (rfc822_display_addr_str_tobuf): New function, version of + rfc822_display_addr_str() that collects the output into a buffer. + + * rfc2047.c (rfc822_encode_domain): New function -- IDN-encode a domain, + with an optional "user@". + (rfc2047_encode_header_addr): Renamed rfc2047_encode_header(), for a + consistent API. + (rfc2047_encode_header_tobuf): New function, encode a header from + displayed format to wire format. Names are encoded using RFC 2047, + addresses using IDN. + +2009-11-08 Sam Varshavchik <mrsam@courier-mta.com> + + * rfc2047.h: Expose raw RFC 2047 decoding function, + rfc2047_decoder(). + + * rfc822hdr.c (rfc822hdr_is_addr): New function. + + * rfc822.c (tokenize): Tweak the logic for collecting RFC 2047 atoms. + + * rfc2047u.c (rfc822_display_name): New function, + replaces rfc2047_print(). + (rfc822_display_name_tobuf): New function, + replaces rfc2047_print(). + (rfc822_display_namelist): New function, + replaces rfc822_namelist(). + (rfc822_display_addr): New function, replaces rfc2047_print(). + (rfc2047_print_unicodeaddr): Renamed from rfc2047_print_unicode(). + (rfc822_display_hdrvalue): New function, replaces rfc2047_decode(), + rfc2047_decode_simple(), rfc2047_decode_enhanced(). + (rfc822_display_hdrvalue_tobuf): New function, ditto. + + * rfc2047.c: Removed rfc2047_decode(), rfc2047_decode_simple(), + rfc2047_decode_enhanced(), rfc2047_print(). + + * Makefile.am: Link against GNU IDN library. + +2008-11-30 Sam Varshavchik <mrsam@courier-mta.com> + + * imaprefs.c (dorefcreate): Clean up usage of rfc822_threadsearchmsg(). + A malloc() failure wasn't checked correctly. + +2008-06-13 Mr. Sam <mrsam@courier-mta.com> + + * rfc822_getaddr.c: Backslashed special characters in address names + weren't being dequoted correctly by rfc822_getname() and + rfc822_getname_orlist(). + +2007-02-25 Mr. Sam <mrsam@courier-mta.com> + + * rfc822.c (parseaddr): rfc822a_alloc() would corrupt and misparse + RFC2047-encoded atoms. + +2006-01-21 Mr. Sam <mrsam@courier-mta.com> + + * rfc2047.c (encodebase64): Fix compiler warning. + + * rfc822.c (parseaddr): Ditto. + +2005-11-15 Mr. Sam <mrsam@courier-mta.com> + + * encode.c (quoted_printable): encode spaces that precede a newline. + +2004-08-29 Mr. Sam <mrsam@courier-mta.com> + + * imapsubj.c (rfc822_coresubj_keepblobs): New function to strip + non-core subject appendages, but keep [blobs]. + +2004-05-29 Mr. Sam <mrsam@courier-mta.com> + + * rfc2047.c (rfc2047_encode_callback): Use base64 to MIME-encode + instead of quoted-printable, where it's more efficient to do so. + +2004-04-14 Mr. Sam <mrsam@courier-mta.com> + + * rfc2047.c (rfc2047_encode_callback): Fix bug introduced in 0411. + +2004-04-11 Mr. Sam <mrsam@courier-mta.com> + + * rfc2047.c (a_rfc2047_encode_str): Improve compliance with RFC 2047 + for MIME-encoded recipient lists. + (rfc2047_encode_callback): New argument: qp_allow - function that + indicates acceptable characters in QP-encoded words. + (rfc2047_encode_str): Ditto. + (rfc2047_qp_allow_any, rfc2047_qp_allow_comment) + (rfc2047_qp_allow_word): Possible arguments to qp_allow for various + situations. + +2004-04-09 Mr. Sam <mrsam@courier-mta.com> + + * encode.c: Moved rfc2045/rfc2045encode.c here, renamed all functions + to use the libmail_ prefix. + +2003-11-18 Tim Rice <tim@multitalents.net> + + * configure.in: Fix MSG_WARN. + +2003-10-20 Mr. Sam <mrsam@courier-mta.com> + + * rfc2047u.c (rfc2047_print_unicode): Unicode-aware version of + rfc2047_print(). + +2003-07-08 Mr. Sam <mrsam@courier-mta.com> + + * imaprefs.c (rfc822_threadmsgrefs): New rfc822_threadmsgrefs takes + an array of References: headers, instead of a single References: + string. + +2003-03-20 Mr. Sam <mrsam@courier-mta.com> + + * rfc2047.c (rfc2047_encode_callback): Fix MIME encoding of "_". + +2002-12-23 Mr. Sam <mrsam@courier-mta.com> + + * rfc2047.c (rfc2047_encode_callback): Fix loop on broken + locales where isspace(U+0x00A0) is true. + +2002-09-19 Mr. Sam <mrsam@courier-mta.com> + + * RFC 2231 support. + +2002-08-08 Mr. Sam <mrsam@courier-mta.com> + + * rfc2047.c (rfc2047_encode_callback): Fix MIME encoding of words + with = and ? characters. + +2002-05-20 Mr. Sam <mrsam@courier-mta.com> + + * rfc822_parsedt.c (rfc822_parsedt): Ignore obviously invalid years + (someone else can worry about Y10K). + +2002-04-07 Mr. Sam <mrsam@courier-mta.com> + + * rfc822_mkdate.c (rfc822_mkdate_buf): Explicit (int) cast gets + the file compiled under Cygwin. + +2002-03-09 Mr. Sam <mrsam@courier-mta.com> + + * rfc2047.c (rfc2047_encode_callback): Fix MIME-encoding of spaces. + +2002-03-04 Mr. Sam <mrsam@courier-mta.com> + + * rfc822.c (rfc822_prname_orlist): Dequote quoted-strings. + +2001-06-27 Mr. Sam <mrsam@courier-mta.com> + + * rfc2047.c (a_rfc2047_encode_str): Fix incorrect MIME encoding of + address name in old-style RFC-822 format. + +2001-04-17 Mr. Sam <mrsam@courier-mta.com> + + * rfc822.c (rfc822t_alloc): Explicitly cast arg to (void *). + +2000-12-22 Mr. Sam <mrsam@courier-mta.com> + + * reftest.c: Fix dependency on qsort sorting order of identical keys. + +2000-12-11 Mr. Sam <mrsam@courier-mta.com> + + * imapsubj.c (stripsubj): Recode subject stripping. + +2000-11-18 Mr. Sam <mrsam@gwl.email-scan.com> + + * imaprefs.c: Update to draft-05.txt-bis (sort top level siblings + by date. + +Mon Apr 5 00:58:37 EDT 1999 + +* Yes, I've decided to start a Change Log. librfc822 now has a life of its +own, so it might as well have it. + +* Courier needs tokens in a link list, not an array. Rewrote most token +handling code. + +* Fixed some issues with handling of () comments. + +* Changed *pr* functions to pass along a caller-provided void, also for + courier. librfc822 should now be threadable (like, who cares...) + +* Added a testsuite diff --git a/rfc822/Makefile.am b/rfc822/Makefile.am new file mode 100644 index 0000000..cda7f9d --- /dev/null +++ b/rfc822/Makefile.am @@ -0,0 +1,47 @@ +# +# Copyright 1998 - 2009 Double Precision, Inc. See COPYING for +# distribution information. + + +noinst_LTLIBRARIES=libencode.la librfc822.la + +librfc822_la_SOURCES=rfc822.c rfc822.h rfc822hdr.c rfc822hdr.h \ + rfc822_getaddr.c rfc822_getaddrs.c \ + rfc822_mkdate.c rfc822_parsedt.c rfc2047u.c \ + rfc2047.c rfc2047.h imapsubj.c imaprefs.h imaprefs.c \ + encodeautodetect.c +librfc822_la_LIBADD = $(LIBIDN_LIBS) + +DISTCLEANFILES=rfc822.config + +AM_CFLAGS = $(LIBIDN_CFLAGS) + +libencode_la_SOURCES=encode.c encode.h + +BUILT_SOURCES=rfc822.3 rfc822.html +noinst_DATA=$(BUILT_SOURCES) + +noinst_PROGRAMS=testsuite reftest +testsuite_SOURCES=testsuite.c +testsuite_DEPENDENCIES=librfc822.la ../unicode/libunicode.la +testsuite_LDADD=librfc822.la ../unicode/libunicode.la +testsuite_LDFLAGS=-static + +reftest_SOURCES=reftest.c imaprefs.h +reftest_DEPENDENCIES=librfc822.la ../unicode/libunicode.la +reftest_LDADD=librfc822.la ../unicode/libunicode.la +reftest_LDFLAGS=-static + +EXTRA_DIST=testsuite.txt reftest.txt $(BUILT_SOURCES) + +if HAVE_SGML +rfc822.html: rfc822.sgml ../docbook/sgml2html + ../docbook/sgml2html rfc822.sgml rfc822.html + +rfc822.3: rfc822.sgml ../docbook/sgml2html + ../docbook/sgml2man rfc822.sgml rfc822.3 +endif + +check-am: + ./testsuite | cmp -s - $(srcdir)/testsuite.txt + ./reftest | cmp -s - $(srcdir)/reftest.txt diff --git a/rfc822/configure.in b/rfc822/configure.in new file mode 100644 index 0000000..3dc3659 --- /dev/null +++ b/rfc822/configure.in @@ -0,0 +1,125 @@ +dnl Process this file with autoconf to produce a configure script. +dnl +dnl Copyright 1998 - 2009 Double Precision, Inc. See COPYING for +dnl distribution information. + +AC_INIT(rfc822lib, 0.13, [courier-users@lists.sourceforge.net]) + +>confdefs.h # Kill PACKAGE_ macros + +AC_CONFIG_SRCDIR(rfc822.c) +AC_CONFIG_AUX_DIR(../..) +AM_INIT_AUTOMAKE([foreign no-define]) +AM_CONFIG_HEADER(config.h) + +dnl Checks for programs. +AC_USE_SYSTEM_EXTENSIONS +AC_PROG_CC +AC_PROG_LIBTOOL + +if test "$GCC" = yes ; then + CXXFLAGS="$CXXFLAGS -Wall" + CFLAGS="$CFLAGS -Wall" +fi + +CFLAGS="$CFLAGS -I.. -I$srcdir/.." + +dnl Checks for libraries. + +dnl Checks for header files. +AC_HEADER_STDC +AC_CHECK_HEADERS(locale.h) + +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST +AC_TYPE_SIZE_T +AC_STRUCT_TM +AC_SYS_LARGEFILE + +dnl Checks for library functions. + +AC_ARG_WITH(libidn, AC_HELP_STRING([--with-libidn=[DIR]], + [Support IDN (needs GNU Libidn)]), + libidn=$withval, libidn=yes) + +if test "$libidn" != "no" +then + PKG_CHECK_MODULES(LIBIDN, libidn >= 0.0.0, [libidn=yes], [libidn=no]) + if test "$libidn" != "yes" + then + libidn=no + AC_MSG_WARN([Libidn not found]) + else + libidn=yes + AC_DEFINE(LIBIDN, 1, [Define to 1 if you want Libidn.]) + fi +fi +AC_MSG_CHECKING([if Libidn should be used]) +AC_MSG_RESULT($libidn) + +AC_CHECK_FUNCS(strcasecmp strncasecmp setlocale) + +AC_CACHE_CHECK([how to calculate alternate timezone],librfc822_cv_SYS_TIMEZONE, + +AC_TRY_COMPILE([ +#include <time.h> +],[ +int main() +{ +time_t t=altzone; + + return (0); +} +], librfc822_cv_SYS_TIMEZONE=altzone, + + AC_TRY_COMPILE([ +#include <time.h> +],[ +int main() +{ +int n=daylight; + + return (0); +} + ], librfc822_cv_SYS_TIMEZONE=daylight, + + AC_TRY_COMPILE([ +#include <time.h> + +extern struct tm dummy; +],[ +int main() +{ +long n=dummy.tm_gmtoff; + + return (0); +} + ] ,librfc822_cv_SYS_TIMEZONE=tm_gmtoff, + librfc822_cv_SYS_TIMEZONE=unknown + ) + ) + ) +) + +case $librfc822_cv_SYS_TIMEZONE in +tm_gmtoff) + AC_DEFINE_UNQUOTED(USE_TIME_GMTOFF,1, + [ The time offset is specified in the tm_gmtoff member ]) + ;; +altzone) + AC_DEFINE_UNQUOTED(USE_TIME_ALTZONE,1, + [ The daylight savings time offset is in the altzone member ]) + ;; +daylight) + AC_DEFINE_UNQUOTED(USE_TIME_DAYLIGHT,1, + [ The daylight savings time offset is in the tm_isdst member ]) + ;; +*) + AC_MSG_WARN([Cannot figure out how to calculate the alternate timezone, will use GMT]) + ;; +esac + +AM_CONDITIONAL(HAVE_SGML, test -d ${srcdir}/../docbook) + +echo "libidn=$libidn" >rfc822.config +AC_OUTPUT(Makefile) diff --git a/rfc822/encode.c b/rfc822/encode.c new file mode 100644 index 0000000..a6f791b --- /dev/null +++ b/rfc822/encode.c @@ -0,0 +1,255 @@ +/* +** Copyright 2003-2004 Double Precision, Inc. See COPYING for +** distribution information. +*/ + +/* +*/ +#include "encode.h" +#include <string.h> +#include <stdlib.h> + +static int quoted_printable(struct libmail_encode_info *, + const char *, size_t); +static int base64(struct libmail_encode_info *, + const char *, size_t); +static int eflush(struct libmail_encode_info *, + const char *, size_t); + +void libmail_encode_start(struct libmail_encode_info *info, + const char *transfer_encoding, + int (*callback_func)(const char *, size_t, void *), + void *callback_arg) +{ + info->output_buf_cnt=0; + info->input_buf_cnt=0; + + switch (*transfer_encoding) { + case 'q': + case 'Q': + info->encoding_func=quoted_printable; + info->input_buffer[0]=0; /* Recycle for qp encoding */ + break; + case 'b': + case 'B': + info->encoding_func=base64; + break; + default: + info->encoding_func=eflush; + break; + } + info->callback_func=callback_func; + info->callback_arg=callback_arg; +} + +int libmail_encode(struct libmail_encode_info *info, + const char *ptr, + size_t cnt) +{ + return ((*info->encoding_func)(info, ptr, cnt)); +} + +int libmail_encode_end(struct libmail_encode_info *info) +{ + int rc=(*info->encoding_func)(info, NULL, 0); + + if (rc == 0 && info->output_buf_cnt > 0) + { + rc= (*info->callback_func)(info->output_buffer, + info->output_buf_cnt, + info->callback_arg); + info->output_buf_cnt=0; + } + + return rc; +} + +static int eflush(struct libmail_encode_info *info, const char *ptr, size_t n) +{ + while (n > 0) + { + size_t i; + + if (info->output_buf_cnt == sizeof(info->output_buffer)) + { + int rc= (*info->callback_func)(info->output_buffer, + info->output_buf_cnt, + info->callback_arg); + + info->output_buf_cnt=0; + if (rc) + return rc; + } + + i=n; + + if (i > sizeof(info->output_buffer) - info->output_buf_cnt) + i=sizeof(info->output_buffer) - info->output_buf_cnt; + + memcpy(info->output_buffer + info->output_buf_cnt, ptr, i); + info->output_buf_cnt += i; + ptr += i; + n -= i; + } + return 0; +} + +static int base64_flush(struct libmail_encode_info *); + +static int base64(struct libmail_encode_info *info, + const char *buf, size_t n) +{ + if (!buf) + { + int rc=0; + + if (info->input_buf_cnt > 0) + rc=base64_flush(info); + + return rc; + } + + while (n) + { + size_t i; + + if (info->input_buf_cnt == sizeof(info->input_buffer)) + { + int rc=base64_flush(info); + + if (rc != 0) + return rc; + } + + i=n; + if (i > sizeof(info->input_buffer) - info->input_buf_cnt) + i=sizeof(info->input_buffer) - info->input_buf_cnt; + + memcpy(info->input_buffer + info->input_buf_cnt, + buf, i); + info->input_buf_cnt += i; + buf += i; + n -= i; + } + return 0; +} + +static const char base64tab[]= +"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; + +static int base64_flush(struct libmail_encode_info *info) +{ + int a=0,b=0,c=0; + int i, j; + int d, e, f, g; + char output_buf[ sizeof(info->input_buffer) / 3 * 4+1]; + + for (j=i=0; i<info->input_buf_cnt; i += 3) + { + a=(unsigned char)info->input_buffer[i]; + b= i+1 < info->input_buf_cnt ? + (unsigned char)info->input_buffer[i+1]:0; + c= i+2 < info->input_buf_cnt ? + (unsigned char)info->input_buffer[i+2]:0; + + d=base64tab[ a >> 2 ]; + e=base64tab[ ((a & 3 ) << 4) | (b >> 4)]; + f=base64tab[ ((b & 15) << 2) | (c >> 6)]; + g=base64tab[ c & 63 ]; + if (i + 1 >= info->input_buf_cnt) f='='; + if (i + 2 >= info->input_buf_cnt) g='='; + output_buf[j++]=d; + output_buf[j++]=e; + output_buf[j++]=f; + output_buf[j++]=g; + } + + info->input_buf_cnt=0; + + output_buf[j++]='\n'; + return eflush(info, output_buf, j); +} + +static const char xdigit[]="0123456789ABCDEF"; + +static int quoted_printable(struct libmail_encode_info *info, + const char *p, size_t n) +{ + char local_buf[256]; + int local_buf_cnt=0; + +#define QPUT(c) do { if (local_buf_cnt == sizeof(local_buf)) \ + { int rc=eflush(info, local_buf, local_buf_cnt); \ + local_buf_cnt=0; if (rc) return (rc); } \ + local_buf[local_buf_cnt]=(c); ++local_buf_cnt; } while(0) + + if (!p) + return (0); + + while (n) + { + + + /* + ** Repurpose input_buffer[0] as a flag whether the previous + ** character was a space. + ** + ** A space before a newline gets escaped. + */ + + if (info->input_buffer[0]) + { + if (*p == '\n') + { + QPUT('='); + QPUT('2'); + QPUT('0'); + } + else + { + QPUT(' '); + } + ++info->input_buf_cnt; + } + + info->input_buffer[0]=0; + + if (*p == ' ') + { + info->input_buffer[0]=1; + p++; + --n; + continue; + } + + if (info->input_buf_cnt > 72 && *p != '\n') + { + QPUT('='); + QPUT('\n'); + info->input_buf_cnt=0; + } + + if ( *p == '\n') + info->input_buf_cnt=0; + else if (*p < ' ' || *p == '=' || *p >= 0x7F) + { + QPUT('='); + QPUT(xdigit[ (*p >> 4) & 15]); + QPUT(xdigit[ *p & 15 ]); + info->input_buf_cnt += 3; + p++; + --n; + continue; + } + else info->input_buf_cnt++; + + QPUT( *p); + p++; + --n; + } + + if (local_buf_cnt > 0) + return eflush(info, local_buf, local_buf_cnt); + + return 0; +} diff --git a/rfc822/encode.h b/rfc822/encode.h new file mode 100644 index 0000000..e2d68b0 --- /dev/null +++ b/rfc822/encode.h @@ -0,0 +1,55 @@ +/* +*/ +#ifndef rfc822_encode_h +#define rfc822_encode_h + +/* +** Copyright 2004 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +#if HAVE_CONFIG_H +#include "rfc822/config.h" +#endif +#include <stdio.h> +#include <sys/types.h> +#include <stdlib.h> +#include <time.h> + +#ifdef __cplusplus +extern "C" { +#endif + +struct libmail_encode_info { + char output_buffer[BUFSIZ]; + int output_buf_cnt; + + char input_buffer[57]; /* For base64 */ + int input_buf_cnt; + + int (*encoding_func)(struct libmail_encode_info *, + const char *, size_t); + int (*callback_func)(const char *, size_t, void *); + void *callback_arg; +}; + +const char *libmail_encode_autodetect_fp(FILE *, int, int *); +const char *libmail_encode_autodetect_fpoff(FILE *, int, off_t, off_t, int *); +const char *libmail_encode_autodetect_buf(const char *, int); + +void libmail_encode_start(struct libmail_encode_info *info, + const char *transfer_encoding, + int (*callback_func)(const char *, size_t, void *), + void *callback_arg); + +int libmail_encode(struct libmail_encode_info *info, + const char *ptr, + size_t cnt); + +int libmail_encode_end(struct libmail_encode_info *info); + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/rfc822/encodeautodetect.c b/rfc822/encodeautodetect.c new file mode 100644 index 0000000..ad1ebc6 --- /dev/null +++ b/rfc822/encodeautodetect.c @@ -0,0 +1,138 @@ +/* +** Copyright 2003-2011 Double Precision, Inc. See COPYING for +** distribution information. +*/ + +/* +*/ +#include "encode.h" +#include <string.h> +#include <stdlib.h> +#include "../unicode/unicode.h" + +static const char *libmail_encode_autodetect(int use7bit, + int (*func)(void *), void *arg, + int *binaryflag) +{ + int l=0; + int longline=0; + int c; + + size_t charcnt=0; + size_t bit8cnt=0; + + if (binaryflag) + *binaryflag=0; + + while ((c = (*func)(arg)) != EOF) + { + unsigned char ch= (unsigned char)c; + + ++charcnt; + + ++l; + if (ch < 0x20 || ch >= 0x80) + { + if (ch != '\t' && ch != '\r' && ch != '\n') + { + ++bit8cnt; + l += 2; + } + } + + if (ch == 0) + { + if (binaryflag) + *binaryflag=1; + + return "base64"; + } + + if (ch == '\n') l=0; + else if (l > 990) + { + longline=1; + } + + } + + if (use7bit || longline) + { + if (bit8cnt > charcnt / 10) + return "base64"; + + return "quoted-printable"; + } + + return bit8cnt ? "8bit":"7bit"; +} + +struct file_info { + FILE *fp; + off_t pos; + off_t end; +}; + +static int read_file(void *arg) +{ +int c; +struct file_info *fi = (struct file_info *)arg; + if (fi->end >= 0 && fi->pos > fi->end) + return EOF; + c = getc(fi->fp); + fi->pos++; + return c; +} + +static int read_string(void * arg) +{ +int c; +unsigned char **strp = (unsigned char **)arg; + if (**strp == 0) + return EOF; + c = (int)**strp; + (*strp)++; + return c; +} + +const char *libmail_encode_autodetect_fp(FILE *fp, int use7bit, + int *binaryflag) +{ + return libmail_encode_autodetect_fpoff(fp, use7bit, 0, -1, + binaryflag); +} + +const char *libmail_encode_autodetect_fpoff(FILE *fp, int use7bit, + off_t start_pos, off_t end_pos, + int *binaryflag) +{ +struct file_info fi; +off_t orig_pos = ftell(fp); +off_t pos = orig_pos; +const char *rc; + + if (start_pos >= 0) + { + if (fseek(fp, start_pos, SEEK_SET) == (off_t)-1) + return NULL; + else + pos = start_pos; + } + + fi.fp = fp; + fi.pos = pos; + fi.end = end_pos; + + rc = libmail_encode_autodetect(use7bit, &read_file, &fi, + binaryflag); + + if (fseek(fp, orig_pos, SEEK_SET) == (off_t)-1) + return NULL; + return rc; +} + +const char *libmail_encode_autodetect_buf(const char *str, int use7bit) +{ + return libmail_encode_autodetect(use7bit, &read_string, &str, + NULL); +} diff --git a/rfc822/imaprefs.c b/rfc822/imaprefs.c new file mode 100644 index 0000000..dc58869 --- /dev/null +++ b/rfc822/imaprefs.c @@ -0,0 +1,1059 @@ +/* +** Copyright 2000-2003 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +/* +*/ + +#include "config.h" + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <time.h> + +#include "rfc822.h" +#include "imaprefs.h" + +static void swapmsgdata(struct imap_refmsg *a, struct imap_refmsg *b) +{ + char *cp; + char c; + time_t t; + unsigned long ul; + +#define swap(a,b,tmp) (tmp)=(a); (a)=(b); (b)=(tmp); + + swap(a->msgid, b->msgid, cp); + swap(a->isdummy, b->isdummy, c); + swap(a->flag2, b->flag2, c); + + swap(a->timestamp, b->timestamp, t); + swap(a->seqnum, b->seqnum, ul); + +#undef swap +} + +struct imap_refmsgtable *rfc822_threadalloc() +{ +struct imap_refmsgtable *p; + + p=(struct imap_refmsgtable *)malloc(sizeof(struct imap_refmsgtable)); + if (p) + memset(p, 0, sizeof(*p)); + return (p); +} + +void rfc822_threadfree(struct imap_refmsgtable *p) +{ +int i; +struct imap_refmsghash *h; +struct imap_subjlookup *s; +struct imap_refmsg *m; + + for (i=0; i<sizeof(p->hashtable)/sizeof(p->hashtable[0]); i++) + while ((h=p->hashtable[i]) != 0) + { + p->hashtable[i]=h->nexthash; + free(h); + } + + for (i=0; i<sizeof(p->subjtable)/sizeof(p->subjtable[0]); i++) + while ((s=p->subjtable[i]) != 0) + { + p->subjtable[i]=s->nextsubj; + free(s->subj); + free(s); + } + + while ((m=p->firstmsg) != 0) + { + p->firstmsg=m->next; + if (m->subj) + free(m->subj); + free(m); + } + free(p); +} + +static int hashmsgid(const char *msgid) +{ +unsigned long hashno=0; + + while (*msgid) + { + unsigned long n= (hashno << 1); + +#define HMIDS (((struct imap_refmsgtable *)0)->hashtable) +#define HHMIDSS ( sizeof(HMIDS) / sizeof( HMIDS[0] )) + + if (hashno & HHMIDSS ) + n ^= 1; + + hashno= n ^ (unsigned char)*msgid++; + } + + return (hashno % HHMIDSS); +} + +struct imap_refmsg *rfc822_threadallocmsg(struct imap_refmsgtable *mt, + const char *msgid) +{ +int n=hashmsgid(msgid); +struct imap_refmsg *msgp= (struct imap_refmsg *) + malloc(sizeof(struct imap_refmsg)+1+strlen(msgid)); +struct imap_refmsghash *h, **hp; + + if (!msgp) return (0); + memset(msgp, 0, sizeof(*msgp)); + strcpy ((msgp->msgid=(char *)(msgp+1)), msgid); + + h=(struct imap_refmsghash *)malloc(sizeof(struct imap_refmsghash)); + if (!h) + { + free(msgp); + return (0); + } + + for (hp= &mt->hashtable[n]; *hp; hp= & (*hp)->nexthash) + { + if (strcmp( (*hp)->msg->msgid, msgp->msgid) > 0) + break; + } + + h->nexthash= *hp; + *hp=h; + h->msg=msgp; + + msgp->last=mt->lastmsg; + + if (mt->lastmsg) + mt->lastmsg->next=msgp; + else + mt->firstmsg=msgp; + + mt->lastmsg=msgp; + return (msgp); +} + +struct imap_refmsg *rfc822_threadsearchmsg(struct imap_refmsgtable *mt, + const char *msgid) +{ +int n=hashmsgid(msgid); +struct imap_refmsghash *h; + + for (h= mt->hashtable[n]; h; h= h->nexthash) + { + int rc=strcmp(h->msg->msgid, msgid); + + if (rc == 0) return (h->msg); + if (rc > 0) + break; + } + return (0); +} + +static int findsubj(struct imap_refmsgtable *mt, const char *s, int *isrefwd, + int create, struct imap_subjlookup **ptr) +{ + char *ss=rfc822_coresubj(s, isrefwd); + int n; + struct imap_subjlookup **h; + struct imap_subjlookup *newsubj; + + if (!ss) return (-1); + n=hashmsgid(ss); + + for (h= &mt->subjtable[n]; *h; h= &(*h)->nextsubj) + { + int rc=strcmp((*h)->subj, ss); + + if (rc == 0) + { + free(ss); + *ptr= *h; + return (0); + } + if (rc > 0) + break; + } + if (!create) + { + free(ss); + *ptr=0; + return (0); + } + + newsubj=malloc(sizeof(struct imap_subjlookup)); + if (!newsubj) + { + free(ss); + return (-1); + } + memset(newsubj, 0, sizeof(*newsubj)); + newsubj->subj=ss; + newsubj->nextsubj= *h; + newsubj->msgisrefwd= *isrefwd; + *h=newsubj; + *ptr=newsubj; + return (0); +} + +static void linkparent(struct imap_refmsg *msg, struct imap_refmsg *lastmsg) +{ + msg->parent=lastmsg; + msg->prevsib=lastmsg->lastchild; + if (msg->prevsib) + msg->prevsib->nextsib=msg; + else + lastmsg->firstchild=msg; + + lastmsg->lastchild=msg; + msg->nextsib=0; +} + + +static void breakparent(struct imap_refmsg *m) +{ + if (!m->parent) return; + + if (m->prevsib) m->prevsib->nextsib=m->nextsib; + else m->parent->firstchild=m->nextsib; + + if (m->nextsib) m->nextsib->prevsib=m->prevsib; + else m->parent->lastchild=m->prevsib; + m->parent=0; +} + +static struct imap_refmsg *dorefcreate(struct imap_refmsgtable *mt, + const char *newmsgid, + struct rfc822a *a) + /* a - references header */ +{ +struct imap_refmsg *lastmsg=0, *m; +struct imap_refmsg *msg; +int n; + +/* + (A) Using the Message-IDs in the message's references, link + the corresponding messages together as parent/child. Make + the first reference the parent of the second (and the second + a child of the first), the second the parent of the third + (and the third a child of the second), etc. The following + rules govern the creation of these links: + + If no reference message can be found with a given + Message-ID, create a dummy message with this ID. Use + this dummy message for all subsequent references to this + ID. +*/ + + for (n=0; n<a->naddrs; n++) + { + char *msgid=a->addrs[n].tokens ? rfc822_getaddr(a, n):NULL; + + msg=msgid ? rfc822_threadsearchmsg(mt, msgid):0; + if (!msg) + { + msg=rfc822_threadallocmsg(mt, msgid ? msgid:""); + if (!msg) + { + if (msgid) + free(msgid); + + return (0); + } + msg->isdummy=1; + } + + if (msgid) + free(msgid); + +/* + If a reference message already has a parent, don't change + the existing link. +*/ + + if (lastmsg == 0 || msg->parent) + { + lastmsg=msg; + continue; + } + +/* + Do not create a parent/child link if creating that link + would introduce a loop. For example, before making + message A the parent of B, make sure that A is not a + descendent of B. + +*/ + + for (m=lastmsg; m; m=m->parent) + if (strcmp(m->msgid, msg->msgid) == 0) + break; + if (m) + { + lastmsg=msg; + continue; + } + + linkparent(msg, lastmsg); + + lastmsg=msg; + } + +/* + (B) Create a parent/child link between the last reference + (or NIL if there are no references) and the current message. + If the current message has a parent already, break the + current parent/child link before creating the new one. Note + that if this message has no references, that it will now + have no parent. + + NOTE: The parent/child links MUST be kept consistent with + one another at ALL times. + +*/ + + msg=*newmsgid ? rfc822_threadsearchmsg(mt, newmsgid):0; + + /* + If a message does not contain a Message-ID header line, + or the Message-ID header line does not contain a valid + Message ID, then assign a unique Message ID to this + message. + + Implementation note: empty msgid, plus dupe check below, + implements that. + */ + + if (msg && msg->isdummy) + { + msg->isdummy=0; + if (msg->parent) + breakparent(msg); + } + else + { +#if 1 + /* + ** If two or more messages have the same Message ID, assign + ** a unique Message ID to each of the duplicates. + ** + ** Implementation note: just unlink the existing message from + ** it's parents/children. + */ + if (msg) + { + while (msg->firstchild) + breakparent(msg->firstchild); + breakparent(msg); + newmsgid=""; + + /* Create new entry with an empty msgid, if any more + ** msgids come, they'll hit the dupe check again. + */ + + } +#endif + msg=rfc822_threadallocmsg(mt, newmsgid); + if (!msg) return (0); + } + + if (lastmsg) + { + for (m=lastmsg; m; m=m->parent) + if (strcmp(m->msgid, msg->msgid) == 0) + break; + if (!m) + linkparent(msg, lastmsg); + } + return (msg); +} + +static struct imap_refmsg *threadmsg_common(struct imap_refmsg *m, + const char *subjheader, + const char *dateheader, + time_t dateheader_tm, + unsigned long seqnum); + +static struct imap_refmsg *rfc822_threadmsgaref(struct imap_refmsgtable *mt, + const char *msgidhdr, + struct rfc822a *refhdr, + const char *subjheader, + const char *dateheader, + time_t dateheader_tm, + unsigned long seqnum); + +struct imap_refmsg *rfc822_threadmsg(struct imap_refmsgtable *mt, + const char *msgidhdr, + const char *refhdr, + const char *subjheader, + const char *dateheader, + time_t dateheader_tm, + unsigned long seqnum) +{ + struct rfc822t *t; + struct rfc822a *a; + struct imap_refmsg *m; + + t=rfc822t_alloc_new(refhdr ? refhdr:"", NULL, NULL); + if (!t) + { + return (0); + } + + a=rfc822a_alloc(t); + if (!a) + { + rfc822t_free(t); + return (0); + } + + m=rfc822_threadmsgaref(mt, msgidhdr, a, subjheader, dateheader, + dateheader_tm, seqnum); + + rfc822a_free(a); + rfc822t_free(t); + return m; +} + + +struct imap_refmsg *rfc822_threadmsgrefs(struct imap_refmsgtable *mt, + const char *msgid_s, + const char * const * msgidList, + const char *subjheader, + const char *dateheader, + time_t dateheader_tm, + unsigned long seqnum) +{ + struct imap_refmsg *m; + struct rfc822token *tArray; + struct rfc822addr *aArray; + + struct rfc822a a; + size_t n, i; + + for (n=0; msgidList[n]; n++) + ; + + if ((tArray=malloc((n+1) * sizeof(*tArray))) == NULL) + return NULL; + + if ((aArray=malloc((n+1) * sizeof(*aArray))) == NULL) + { + free(tArray); + return NULL; + } + + for (i=0; i<n; i++) + { + tArray[i].next=NULL; + tArray[i].token=0; + tArray[i].ptr=msgidList[i]; + tArray[i].len=strlen(msgidList[i]); + + aArray[i].name=NULL; + aArray[i].tokens=&tArray[i]; + } + + a.naddrs=n; + a.addrs=aArray; + + m=rfc822_threadmsgaref(mt, msgid_s, &a, subjheader, dateheader, + dateheader_tm, seqnum); + + free(tArray); + free(aArray); + return m; +} + +static struct imap_refmsg *rfc822_threadmsgaref(struct imap_refmsgtable *mt, + const char *msgidhdr, + struct rfc822a *refhdr, + const char *subjheader, + const char *dateheader, + time_t dateheader_tm, + unsigned long seqnum) +{ + struct rfc822t *t; + struct rfc822a *a; + struct imap_refmsg *m; + + char *msgid_s; + + t=rfc822t_alloc_new(msgidhdr ? msgidhdr:"", NULL, NULL); + if (!t) + return (0); + a=rfc822a_alloc(t); + if (!a) + { + rfc822t_free(t); + return (0); + } + + msgid_s=a->naddrs > 0 ? rfc822_getaddr(a, 0):strdup(""); + + rfc822a_free(a); + rfc822t_free(t); + + if (!msgid_s) + return (0); + + m=dorefcreate(mt, msgid_s, refhdr); + + free(msgid_s); + + if (!m) + return (0); + + + return threadmsg_common(m, subjheader, dateheader, + dateheader_tm, seqnum); +} + +static struct imap_refmsg *threadmsg_common(struct imap_refmsg *m, + const char *subjheader, + const char *dateheader, + time_t dateheader_tm, + unsigned long seqnum) +{ + if (subjheader && (m->subj=strdup(subjheader)) == 0) + return (0); /* Cleanup in rfc822_threadfree() */ + + if (dateheader) + dateheader_tm=rfc822_parsedt(dateheader); + + m->timestamp=dateheader_tm; + + m->seqnum=seqnum; + + return (m); +} + +/* + (2) Gather together all of the messages that have no parents + and make them all children (siblings of one another) of a dummy + parent (the "root"). These messages constitute first messages + of the threads created thus far. + +*/ + +struct imap_refmsg *rfc822_threadgetroot(struct imap_refmsgtable *mt) +{ + struct imap_refmsg *root, *m; + + if (mt->rootptr) + return (mt->rootptr); + + root=rfc822_threadallocmsg(mt, "(root)"); + + if (!root) return (0); + + root->parent=root; /* Temporary */ + root->isdummy=1; + + for (m=mt->firstmsg; m; m=m->next) + if (!m->parent) + { + if (m->isdummy && m->firstchild == 0) + continue; /* Can happen in reference creation */ + + linkparent(m, root); + } + root->parent=NULL; + return (mt->rootptr=root); +} + +/* +** +** (3) Prune dummy messages from the thread tree. Traverse each +** thread under the root, and for each message: +*/ + +void rfc822_threadprune(struct imap_refmsgtable *mt) +{ + struct imap_refmsg *msg; + + for (msg=mt->firstmsg; msg; msg=msg->next) + { + struct imap_refmsg *saveparent, *m; + + if (!msg->parent) + continue; /* The root, need it later. */ + + if (!msg->isdummy) + continue; + + /* + ** + ** If it is a dummy message with NO children, delete it. + */ + + if (msg->firstchild == 0) + { + breakparent(msg); + /* + ** Don't free the node, it'll be done on msgtable + ** purge. + */ + continue; + } + + /* + ** If it is a dummy message with children, delete it, but + ** promote its children to the current level. In other words, + ** splice them in with the dummy's siblings. + ** + ** Do not promote the children if doing so would make them + ** children of the root, unless there is only one child. + */ + + if (msg->firstchild->nextsib && + msg->parent->parent) + continue; + + saveparent=msg->parent; + breakparent(msg); + + while ((m=msg->firstchild) != 0) + { + breakparent(m); + linkparent(m, saveparent); + } + } +} + +static int cmp_msgs(const void *, const void *); + +int rfc822_threadsortsubj(struct imap_refmsg *root) +{ + struct imap_refmsg *toproot; + +/* +** (4) Sort the messages under the root (top-level siblings only) +** by sent date. In the case of an exact match on sent date or if +** either of the Date: headers used in a comparison can not be +** parsed, use the order in which the messages appear in the +** mailbox (that is, by sequence number) to determine the order. +** In the case of a dummy message, sort its children by sent date +** and then use the first child for the top-level sort. +*/ + size_t cnt, i; + struct imap_refmsg **sortarray; + + for (cnt=0, toproot=root->firstchild; toproot; + toproot=toproot->nextsib) + { + if (toproot->isdummy) + rfc822_threadsortsubj(toproot); + ++cnt; + } + + if ((sortarray=malloc(sizeof(struct imap_refmsg *)*(cnt+1))) == 0) + return (-1); + + for (cnt=0; (toproot=root->firstchild) != NULL; ++cnt) + { + sortarray[cnt]=toproot; + breakparent(toproot); + } + + qsort(sortarray, cnt, sizeof(*sortarray), cmp_msgs); + + for (i=0; i<cnt; i++) + linkparent(sortarray[i], root); + free(sortarray); + return (0); +} + +int rfc822_threadgathersubj(struct imap_refmsgtable *mt, + struct imap_refmsg *root) +{ + struct imap_refmsg *toproot, *p; + +/* +** (5) Gather together messages under the root that have the same +** extracted subject text. +** +** (A) Create a table for associating extracted subjects with +** messages. +** +** (B) Populate the subject table with one message per +** extracted subject. For each message under the root: +*/ + + for (toproot=root->firstchild; toproot; toproot=toproot->nextsib) + { + const char *subj; + struct imap_subjlookup *subjtop; + int isrefwd; + + /* + ** (i) Find the subject of this thread by extracting the + ** base subject from the current message, or its first child + ** if the current message is a dummy. + */ + + p=toproot; + if (p->isdummy) + p=p->firstchild; + + subj=p->subj ? p->subj:""; + + + /* + ** (ii) If the extracted subject is empty, skip this + ** message. + */ + + if (*subj == 0) + continue; + + /* + ** (iii) Lookup the message associated with this extracted + ** subject in the table. + */ + + if (findsubj(mt, subj, &isrefwd, 1, &subjtop)) + return (-1); + + /* + ** + ** (iv) If there is no message in the table with this + ** subject, add the current message and the extracted + ** subject to the subject table. + */ + + if (subjtop->msg == 0) + { + subjtop->msg=toproot; + subjtop->msgisrefwd=isrefwd; + continue; + } + + /* + ** Otherwise, replace the message in the table with the + ** current message if the message in the table is not a + ** dummy AND either of the following criteria are true: + */ + + if (!subjtop->msg->isdummy) + { + /* + ** The current message is a dummy + ** + */ + + if (toproot->isdummy) + { + subjtop->msg=toproot; + subjtop->msgisrefwd=isrefwd; + continue; + } + + /* + ** The message in the table is a reply or forward (its + ** original subject contains a subj-refwd part and/or a + ** "(fwd)" subj-trailer) and the current message is + not. + */ + + if (subjtop->msgisrefwd && !isrefwd) + { + subjtop->msg=toproot; + subjtop->msgisrefwd=isrefwd; + } + } + } + return (0); +} + +/* +** (C) Merge threads with the same subject. For each message +** under the root: +*/ + +int rfc822_threadmergesubj(struct imap_refmsgtable *mt, + struct imap_refmsg *root) +{ + struct imap_refmsg *toproot, *p, *q, *nextroot; + char *str; + + for (toproot=root->firstchild; toproot; toproot=nextroot) + { + const char *subj; + struct imap_subjlookup *subjtop; + int isrefwd; + + nextroot=toproot->nextsib; + + /* + ** (i) Find the subject of this thread as in step 4.B.i + ** above. + */ + + p=toproot; + if (p->isdummy) + p=p->firstchild; + + subj=p->subj ? p->subj:""; + + /* + ** (ii) If the extracted subject is empty, skip this + ** message. + */ + + if (*subj == 0) + continue; + + /* + ** (iii) Lookup the message associated with this extracted + ** subject in the table. + */ + + if (findsubj(mt, subj, &isrefwd, 0, &subjtop) || subjtop == 0) + return (-1); + + /* + ** (iv) If the message in the table is the current message, + ** skip it. + */ + + /* NOTE - ptr comparison IS NOT LEGAL */ + + subjtop->msg->flag2=1; + if (toproot->flag2) + { + toproot->flag2=0; + continue; + } + subjtop->msg->flag2=0; + + /* + ** Otherwise, merge the current message with the one in the + ** table using the following rules: + ** + ** If both messages are dummies, append the current + ** message's children to the children of the message in + ** the table (the children of both messages become + ** siblings), and then delete the current message. + */ + + if (subjtop->msg->isdummy && toproot->isdummy) + { + while ((p=toproot->firstchild) != 0) + { + breakparent(p); + linkparent(p, subjtop->msg); + } + breakparent(toproot); + continue; + } + + /* + ** If the message in the table is a dummy and the current + ** message is not, make the current message a child of + ** the message in the table (a sibling of it's children). + */ + + if (subjtop->msg->isdummy) + { + breakparent(toproot); + linkparent(toproot, subjtop->msg); + continue; + } + + /* + ** If the current message is a reply or forward and the + ** message in the table is not, make the current message + ** a child of the message in the table (a sibling of it's + ** children). + */ + + if (isrefwd) + { + p=subjtop->msg; + if (p->isdummy) + p=p->firstchild; + + subj=p->subj ? p->subj:""; + + str=rfc822_coresubj(subj, &isrefwd); + + if (!str) + return (-1); + free(str); /* Don't really care */ + + if (!isrefwd) + { + breakparent(toproot); + linkparent(toproot, subjtop->msg); + continue; + } + } + + /* + ** Otherwise, create a new dummy container and make both + ** messages children of the dummy, and replace the + ** message in the table with the dummy message. + */ + + /* What we do is create a new message, then move the + ** contents of subjtop->msg (including its children) + ** to the new message, then make the new message a child + ** of subjtop->msg, and mark subjtop->msg as a dummy msg. + */ + + q=rfc822_threadallocmsg(mt, "(dummy)"); + if (!q) + return (-1); + + q->isdummy=1; + + swapmsgdata(q, subjtop->msg); + + while ((p=subjtop->msg->firstchild) != 0) + { + breakparent(p); + linkparent(p, q); + } + linkparent(q, subjtop->msg); + + breakparent(toproot); + linkparent(toproot, subjtop->msg); + } + return (0); +} + +/* +** (6) Traverse the messages under the root and sort each set of +** siblings by sent date. Traverse the messages in such a way +** that the "youngest" set of siblings are sorted first, and the +** "oldest" set of siblings are sorted last (grandchildren are +** sorted before children, etc). In the case of an exact match on +** sent date or if either of the Date: headers used in a +** comparison can not be parsed, use the order in which the +** messages appear in the mailbox (that is, by sequence number) to +** determine the order. In the case of a dummy message (which can +** only occur with top-level siblings), use its first child for +** sorting. +*/ + +static int cmp_msgs(const void *a, const void *b) +{ + struct imap_refmsg *ma=*(struct imap_refmsg **)a; + struct imap_refmsg *mb=*(struct imap_refmsg **)b; + time_t ta, tb; + unsigned long na, nb; + + while (ma && ma->isdummy) + ma=ma->firstchild; + + while (mb && mb->isdummy) + mb=mb->firstchild; + + ta=tb=0; + na=nb=0; + if (ma) + { + ta=ma->timestamp; + na=ma->seqnum; + } + if (mb) + { + tb=mb->timestamp; + nb=mb->seqnum; + } + + return (ta && tb && ta != tb ? ta < tb ? -1: 1: + na < nb ? -1: na > nb ? 1:0); +} + +struct imap_threadsortinfo { + struct imap_refmsgtable *mt; + struct imap_refmsg **sort_table; + size_t sort_table_cnt; +} ; + +static int dothreadsort(struct imap_threadsortinfo *, + struct imap_refmsg *); + +int rfc822_threadsortbydate(struct imap_refmsgtable *mt) +{ + struct imap_threadsortinfo itsi; + int rc; + + itsi.mt=mt; + itsi.sort_table=0; + itsi.sort_table_cnt=0; + + rc=dothreadsort(&itsi, mt->rootptr); + + if (itsi.sort_table) + free(itsi.sort_table); + return (rc); +} + +static int dothreadsort(struct imap_threadsortinfo *itsi, + struct imap_refmsg *p) +{ + struct imap_refmsg *q; + size_t i, n; + + for (q=p->firstchild; q; q=q->nextsib) + dothreadsort(itsi, q); + + n=0; + for (q=p->firstchild; q; q=q->nextsib) + ++n; + + if (n > itsi->sort_table_cnt) + { + struct imap_refmsg **new_array=(struct imap_refmsg **) + (itsi->sort_table ? + realloc(itsi->sort_table, + sizeof(struct imap_refmsg *)*n) + :malloc(sizeof(struct imap_refmsg *)*n)); + + if (!new_array) + return (-1); + + itsi->sort_table=new_array; + itsi->sort_table_cnt=n; + } + + n=0; + while ((q=p->firstchild) != 0) + { + breakparent(q); + itsi->sort_table[n++]=q; + } + + qsort(itsi->sort_table, n, sizeof(struct imap_refmsg *), cmp_msgs); + + for (i=0; i<n; i++) + linkparent(itsi->sort_table[i], p); + return (0); +} + +struct imap_refmsg *rfc822_thread(struct imap_refmsgtable *mt) +{ + if (!mt->rootptr) + { + rfc822_threadprune(mt); + if ((mt->rootptr=rfc822_threadgetroot(mt)) == 0) + return (0); + if (rfc822_threadsortsubj(mt->rootptr) || + rfc822_threadgathersubj(mt, mt->rootptr) || + rfc822_threadmergesubj(mt, mt->rootptr) || + rfc822_threadsortbydate(mt)) + { + mt->rootptr=0; + return (0); + } + } + + return (mt->rootptr); +} diff --git a/rfc822/imaprefs.h b/rfc822/imaprefs.h new file mode 100644 index 0000000..7c1d11f --- /dev/null +++ b/rfc822/imaprefs.h @@ -0,0 +1,109 @@ +/* +*/ +#ifndef imaprefs_h +#define imaprefs_h + +/* +** Copyright 2000-2003 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +#if HAVE_CONFIG_H +#include "rfc822/config.h" +#endif + +#ifdef __cplusplus +extern "C" { +#endif + +/* +** Implement REFERENCES threading. +*/ + +/* The data structures */ + +struct imap_refmsg { + struct imap_refmsg *next, *last; /* Link list of all msgs */ + struct imap_refmsg *parent; /* my parent */ + struct imap_refmsg *firstchild, *lastchild; /* Children link list */ + struct imap_refmsg *prevsib, *nextsib; /* Link list of siblings */ + + char isdummy; /* this is a dummy node (for now) */ + char flag2; /* Additional flag */ + + char *msgid; /* msgid of this message */ + + char *subj; /* dynalloced subject of this msg */ + time_t timestamp; /* Timestamp */ + unsigned long seqnum; /* Sequence number */ +} ; + +struct imap_refmsgtable { + struct imap_refmsg *firstmsg, *lastmsg; /* Link list of all msgs */ + + /* hash table message id lookup */ + + struct imap_refmsghash *hashtable[512]; + + struct imap_subjlookup *subjtable[512]; + + struct imap_refmsg *rootptr; /* The root */ +} ; + +struct imap_refmsgtable *rfc822_threadalloc(void); +void rfc822_threadfree(struct imap_refmsgtable *); +struct imap_refmsg *rfc822_threadmsg(struct imap_refmsgtable *mt, + const char *msgidhdr, + const char *refhdr, + const char *subjheader, + + const char *dateheader, + time_t dateheader_tm, + /* Set one or other */ + + unsigned long seqnum); + +struct imap_refmsg *rfc822_threadmsgrefs(struct imap_refmsgtable *mt, + const char *msgid_s, + const char * const * msgidList, + const char *subjheader, + const char *dateheader, + time_t dateheader_tm, + unsigned long seqnum); + +struct imap_refmsg *rfc822_thread(struct imap_refmsgtable *mt); + + /* INTERNAL FUNCTIONS FOLLOW */ + + +struct imap_refmsghash { + struct imap_refmsghash *nexthash; + struct imap_refmsg *msg; +} ; + +struct imap_subjlookup { + struct imap_subjlookup *nextsubj; + char *subj; + struct imap_refmsg *msg; + int msgisrefwd; +} ; + +struct imap_refmsg *rfc822_threadallocmsg(struct imap_refmsgtable *mt, + const char *msgid); +void rfc822_threadprune(struct imap_refmsgtable *mt); +struct imap_refmsg *rfc822_threadgetroot(struct imap_refmsgtable *mt); +struct imap_refmsg *rfc822_threadsearchmsg(struct imap_refmsgtable *mt, + const char *msgid); +int rfc822_threadsortsubj(struct imap_refmsg *root); +int rfc822_threadgathersubj(struct imap_refmsgtable *mt, + struct imap_refmsg *root); +int rfc822_threadmergesubj(struct imap_refmsgtable *mt, + struct imap_refmsg *root); +int rfc822_threadsortbydate(struct imap_refmsgtable *mt); + + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/rfc822/imapsubj.c b/rfc822/imapsubj.c new file mode 100644 index 0000000..2f6adfd --- /dev/null +++ b/rfc822/imapsubj.c @@ -0,0 +1,305 @@ +/* +** Copyright 2000 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +/* +*/ +#include "config.h" +#include <stdio.h> +#include <ctype.h> +#include <stdlib.h> +#include <string.h> +#include "rfc822.h" + +#if HAVE_STRCASECMP + +#else +#define strcasecmp stricmp +#endif + +#if HAVE_STRNCASECMP + +#else +#define strncasecmp strnicmp +#endif + +/* Skip over blobs */ + +static char *skipblob(char *p, char **save_blob_ptr) +{ + char *q; + char *orig_p=p; + int isalldigits=1; + + if (*p == '[') + { + for (q= p+1; *q; q++) + if (*q == '[' || *q == ']') + break; + else if (strchr("0123456789", *q) == NULL) + isalldigits=0; + + if (*q == ']') + { + p=q+1; + + while (isspace((int)(unsigned char)*p)) + { + ++p; + } + + if (save_blob_ptr && *save_blob_ptr && !isalldigits) + { + while (orig_p != p) + *(*save_blob_ptr)++=*orig_p++; + } + + return (p); + } + } + return (p); +} + +static char *skipblobs(char *p, char **save_blob_ptr) +{ + char *q=p; + + do + { + p=q; + q=skipblob(p, save_blob_ptr); + } while (q != p); + return (q); +} + +/* Remove artifacts from the subject header */ + +static void stripsubj(char *s, int *hasrefwd, char *save_blob_buf) +{ + char *p; + char *q; + int doit; + + for (p=q=s; *p; p++) + { + if (!isspace((int)(unsigned char)*p)) + { + *q++=*p; + continue; + } + while (p[1] && isspace((int)(unsigned char)p[1])) + { + ++p; + } + *q++=' '; + } + *q=0; + + do + { + doit=0; + /* + ** + ** (2) Remove all trailing text of the subject that matches + ** the subj-trailer ABNF, repeat until no more matches are + ** possible. + ** + ** subj-trailer = "(fwd)" / WSP + */ + + for (p=s; *p; p++) + ; + while (p > s) + { + if ( isspace((int)(unsigned char)p[-1])) + { + --p; + continue; + } + if (p-s >= 5 && strncasecmp(p-5, "(FWD)", 5) == 0) + { + p -= 5; + *hasrefwd |= CORESUBJ_FWD; + continue; + } + break; + } + *p=0; + + for (p=s; *p; ) + { + for (;;) + { + char *orig_blob_ptr; + int flag=CORESUBJ_FWD; + + /* + ** + ** (3) Remove all prefix text of the subject + ** that matches the subj-leader ABNF. + ** + ** subj-leader = (*subj-blob subj-refwd) / WSP + ** + ** subj-blob = "[" *BLOBCHAR "]" *WSP + ** + ** subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":" + ** + ** BLOBCHAR = %x01-5a / %x5c / %x5e-7f + ** ; any CHAR except '[' and ']' + */ + + if (isspace((int)(unsigned char)*p)) + { + ++p; + continue; + } + + q=skipblobs(p, NULL); + + if (strncasecmp(q, "RE", 2) == 0) + { + flag=CORESUBJ_RE; + q += 2; + } + else if (strncasecmp(q, "FWD", 3) == 0) + { + q += 3; + } + else if (strncasecmp(q, "FW", 2) == 0) + { + q += 2; + } + else q=0; + + if (q) + { + orig_blob_ptr=save_blob_buf; + + q=skipblob(q, &save_blob_buf); + if (*q == ':') + { + p=q+1; + *hasrefwd |= flag; + continue; + } + + save_blob_buf=orig_blob_ptr; + } + + + /* + ** (4) If there is prefix text of the subject + ** that matches the subj-blob ABNF, and + ** removing that prefix leaves a non-empty + ** subj-base, then remove the prefix text. + ** + ** subj-base = NONWSP *([*WSP] NONWSP) + ** ; can be a subj-blob + */ + + orig_blob_ptr=save_blob_buf; + + q=skipblob(p, &save_blob_buf); + + if (q != p && *q) + { + p=q; + continue; + } + save_blob_buf=orig_blob_ptr; + break; + } + + /* + ** + ** (6) If the resulting text begins with the + ** subj-fwd-hdr ABNF and ends with the subj-fwd-trl + ** ABNF, remove the subj-fwd-hdr and subj-fwd-trl and + ** repeat from step (2). + ** + ** subj-fwd-hdr = "[fwd:" + ** + ** subj-fwd-trl = "]" + */ + + if (strncasecmp(p, "[FWD:", 5) == 0) + { + q=strrchr(p, ']'); + if (q && q[1] == 0) + { + *q=0; + p += 5; + *hasrefwd |= CORESUBJ_FWD; + + for (q=s; (*q++=*p++) != 0; ) + ; + doit=1; + } + } + break; + } + } while (doit); + + q=s; + while ( (*q++ = *p++) != 0) + ; + if (save_blob_buf) + *save_blob_buf=0; +} + +char *rfc822_coresubj(const char *s, int *hasrefwd) +{ + char *q=strdup(s), *r; + int dummy; + + if (!hasrefwd) + hasrefwd= &dummy; + + *hasrefwd=0; + if (!q) return (0); + + for (r=q; *r; r++) + if ((*r & 0x80) == 0) /* Just US-ASCII casing, thanks */ + { + if (*r >= 'a' && *r <= 'z') + *r += 'A'-'a'; + } + stripsubj(q, hasrefwd, 0); + return (q); +} + +char *rfc822_coresubj_nouc(const char *s, int *hasrefwd) +{ + char *q=strdup(s); + int dummy; + + if (!hasrefwd) + hasrefwd= &dummy; + + *hasrefwd=0; + if (!q) return (0); + + stripsubj(q, hasrefwd, 0); + return (q); +} + +char *rfc822_coresubj_keepblobs(const char *s) +{ + char *q=strdup(s), *r; + int dummy; + + if (!q) return (0); + + r=strdup(s); + if (!r) + { + free(q); + return (0); + } + + stripsubj(q, &dummy, r); + strcat(r, q); + free(q); + return (r); +} diff --git a/rfc822/reftest.c b/rfc822/reftest.c new file mode 100644 index 0000000..1d2ea9a --- /dev/null +++ b/rfc822/reftest.c @@ -0,0 +1,367 @@ +/* +** Copyright 2000 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +/* +*/ + +#include "config.h" + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <time.h> + +#if HAVE_STRINGS_H +#include <strings.h> +#endif + +#if HAVE_LOCALE_H +#include <locale.h> +#endif + +#include "rfc822.h" +#include "imaprefs.h" + + +static void test1() +{ +struct imap_refmsgtable *mt=rfc822_threadalloc(); +char buf[20]; + + strcpy(buf, "a@b"); + rfc822_threadallocmsg(mt, buf); + strcpy(buf, "c@d"); + rfc822_threadallocmsg(mt, buf); + + printf("%s\n", (rfc822_threadsearchmsg(mt, "a@b") + ? "found":"not found")); + printf("%s\n", (rfc822_threadsearchmsg(mt, "c@d") + ? "found":"not found")); + printf("%s\n", (rfc822_threadsearchmsg(mt, "e@f") + ? "found":"not found")); + + rfc822_threadfree(mt); +} + +static void prtree(struct imap_refmsg *m) +{ + printf("<%s>", m->msgid ? m->msgid:""); + + if (m->isdummy) + { + printf(" (dummy)"); + } + + printf(".parent="); + if (m->parent) + printf("<%s>", m->parent->msgid ? m->parent->msgid:""); + else + printf("ROOT"); + + printf("\n"); + + for (m=m->firstchild; m; m=m->nextsib) + prtree(m); +} + +static void prpc(struct imap_refmsgtable *mt) +{ + struct imap_refmsg *root=rfc822_threadgetroot(mt), *m; + + if (!root) + return; + + for (m=root->firstchild; m; m=m->nextsib) + prtree(m); + + printf("\n\n"); +} + +static void test2() +{ + struct imap_refmsgtable *mt=rfc822_threadalloc(); + + rfc822_threadmsg(mt, "<1>", NULL, + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<2>", + "<1>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<4>", + "<1> <2> <3>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + prpc(mt); + rfc822_threadfree(mt); +} + +static void test3() +{ + struct imap_refmsgtable *mt=rfc822_threadalloc(); + + rfc822_threadmsg(mt, "<4>", + "<2> <1> <3>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<3>", + "<1> <2>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<2>", + "<1>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<1>", NULL, + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + prpc(mt); + rfc822_threadfree(mt); +} + +static void test4() +{ + struct imap_refmsgtable *mt=rfc822_threadalloc(); + + rfc822_threadmsg(mt, "<1>", NULL, + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<2>", "<1>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<4>", "<1> <2> <3>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + prpc(mt); + rfc822_threadprune(mt); + prpc(mt); + rfc822_threadfree(mt); +} + +static void test5() +{ + struct imap_refmsgtable *mt=rfc822_threadalloc(); + + rfc822_threadmsg(mt, "<4>", "<1> <2> <3>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<3>", NULL, + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + prpc(mt); + rfc822_threadprune(mt); + prpc(mt); + rfc822_threadfree(mt); +} + +static void prsubj(struct imap_refmsgtable *p) +{ + struct imap_subjlookup *s; + int i; + + for (i=0; i<sizeof(p->subjtable)/sizeof(p->subjtable[0]); i++) + for (s=p->subjtable[i]; s; s=s->nextsubj) + printf("subject(%s)=<%s>\n", s->subj, + s->msg->msgid ? s->msg->msgid:""); + printf("\n\n"); +} + +static void test6() +{ + struct imap_refmsgtable *mt=rfc822_threadalloc(); + + rfc822_threadmsg(mt, "<message1>", NULL, + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<message10>", NULL, + "subject 2", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 2); + + rfc822_threadmsg(mt, "<message3>", "<message2>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 3); + + rfc822_threadmsg(mt, "<message11>", NULL, + "Re: subject 4", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 4); + + rfc822_threadmsg(mt, "<message12>", NULL, + "subject 4", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 5); + + rfc822_threadmsg(mt, "<message13>", NULL, + "subject 5", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 6); + + rfc822_threadmsg(mt, "<message14>", NULL, + "re: subject 5", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 7); + + rfc822_threadprune(mt); + rfc822_threadsortsubj(rfc822_threadgetroot(mt)); + rfc822_threadgathersubj(mt, rfc822_threadgetroot(mt)); + prpc(mt); + prsubj(mt); + rfc822_threadfree(mt); +} + +static void test7() +{ + struct imap_refmsgtable *mt=rfc822_threadalloc(); + + rfc822_threadmsg(mt, "<message1>", "<message1-dummy>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<message2>", "<message2-dummy>", + "subject 1", + "Thu, 29 Jun 2000 14:41:58 -0700", 0, 1); + rfc822_threadprune(mt); + rfc822_threadsortsubj(rfc822_threadgetroot(mt)); + rfc822_threadgathersubj(mt, rfc822_threadgetroot(mt)); + prpc(mt); + prsubj(mt); + rfc822_threadmergesubj(mt, rfc822_threadgetroot(mt)); + prpc(mt); + rfc822_threadfree(mt); +} + +static void test8() +{ + struct imap_refmsgtable *mt=rfc822_threadalloc(); + + rfc822_threadmsg(mt, "<message4>", NULL, + "subject 2", + "Thu, 29 Jun 2000 14:41:51 -0700", 0, 1); + + rfc822_threadmsg(mt, "<message2>", NULL, + "subject 1", + "Thu, 29 Jun 2000 14:41:52 -0700", 0, 1); + + rfc822_threadmsg(mt, "<message1>", "<message1-dummy>", + "subject 1", + "Thu, 29 Jun 2000 14:41:53 -0700", 0, 1); + + rfc822_threadmsg(mt, "<message3>", NULL, + "Re: subject 2", + "Thu, 29 Jun 2000 14:41:54 -0700", 0, 1); + + rfc822_threadmsg(mt, "<message10>", NULL, + "subject 10", + "Thu, 29 Jun 2000 14:41:55 -0700", 0, 1); + + rfc822_threadmsg(mt, "<message11>", NULL, + "subject 10", + "Thu, 29 Jun 2000 14:41:56 -0700", 0, 1); + + rfc822_threadprune(mt); + rfc822_threadsortsubj(rfc822_threadgetroot(mt)); + rfc822_threadgathersubj(mt, rfc822_threadgetroot(mt)); + prpc(mt); + prsubj(mt); + rfc822_threadmergesubj(mt, rfc822_threadgetroot(mt)); + prpc(mt); + rfc822_threadfree(mt); +} + +static void test9() +{ + struct imap_refmsgtable *mt=rfc822_threadalloc(); + + rfc822_threadmsg(mt, "<message1>", NULL, + "subject 1", + "Thu, 20 Jun 2000 14:41:55 -0700", 0, 1); + + rfc822_threadmsg(mt, "<message2>", NULL, + "subject 1", + "Thu, 19 Jun 2000 14:41:51 -0700", 0, 2); + + rfc822_threadmsg(mt, "<message3>", NULL, + "subject 1", + "Thu, 21 Jun 2000 14:41:56 -0700", 0, 3); + + rfc822_threadmsg(mt, "<message4>", "<message2>", + "subject 2", + "Thu, 21 Jun 2000 14:41:54 -0700", 0, 6); + + rfc822_threadmsg(mt, "<message5>", "<message2>", + "subject 2", + "Thu, 21 Jun 2000 14:41:53 -0700", 0, 5); + + rfc822_threadmsg(mt, "<message6>", "<message2>", + "subject 2", + "Thu, 20 Jun 2000 14:41:52 -0700", 0, 4); + + + rfc822_threadprune(mt); + rfc822_threadsortsubj(rfc822_threadgetroot(mt)); + rfc822_threadgathersubj(mt, rfc822_threadgetroot(mt)); + rfc822_threadmergesubj(mt, rfc822_threadgetroot(mt)); + rfc822_threadsortbydate(mt); + prpc(mt); + rfc822_threadfree(mt); +} + +static void test10() +{ + struct imap_refmsgtable *mt=rfc822_threadalloc(); + + rfc822_threadmsg(mt, "<message1>", NULL, + "subject 1", + "Thu, 20 Jun 2000 14:41:58 -0700", 0, 1); + + rfc822_threadmsg(mt, "<message4>", "<message1>", + "subject 2", + "Thu, 21 Jun 2000 14:41:58 -0700", 0, 6); + + rfc822_threadmsg(mt, "<message1>", NULL, + "subject 2", + "Thu, 21 Jun 2000 14:41:58 -0700", 0, 5); + + rfc822_threadmsg(mt, "<message4>", "<message1>", + "subject 2", + "Thu, 21 Jun 2000 14:41:58 -0700", 0, 6); + + rfc822_threadprune(mt); + rfc822_threadsortsubj(rfc822_threadgetroot(mt)); + rfc822_threadgathersubj(mt, rfc822_threadgetroot(mt)); + rfc822_threadmergesubj(mt, rfc822_threadgetroot(mt)); + rfc822_threadsortbydate(mt); + prpc(mt); + rfc822_threadfree(mt); +} + +int main(int argc, char **argv) +{ + +#if HAVE_SETLOCALE + setlocale(LC_ALL, "C"); +#endif + + test1(); + test2(); + test3(); + test4(); + test5(); + test6(); + test7(); + test8(); + test9(); + test10(); + return (0); +} diff --git a/rfc822/reftest.txt b/rfc822/reftest.txt new file mode 100644 index 0000000..5f7a137 --- /dev/null +++ b/rfc822/reftest.txt @@ -0,0 +1,106 @@ +found +found +not found +<1>.parent=<(root)> +<2>.parent=<1> +<3> (dummy).parent=<2> +<4>.parent=<3> + + +<2>.parent=<(root)> +<3>.parent=<2> +<4>.parent=<3> +<1>.parent=<(root)> + + +<1>.parent=<(root)> +<2>.parent=<1> +<3> (dummy).parent=<2> +<4>.parent=<3> + + +<1>.parent=<(root)> +<2>.parent=<1> +<4>.parent=<2> + + +<1> (dummy).parent=<(root)> +<2> (dummy).parent=<1> +<3>.parent=<(root)> +<4>.parent=<3> + + +<3>.parent=<(root)> +<4>.parent=<3> + + +<message1>.parent=<(root)> +<message10>.parent=<(root)> +<message2> (dummy).parent=<(root)> +<message3>.parent=<message2> +<message11>.parent=<(root)> +<message12>.parent=<(root)> +<message13>.parent=<(root)> +<message14>.parent=<(root)> + + +subject(SUBJECT 1)=<message2> +subject(SUBJECT 2)=<message10> +subject(SUBJECT 5)=<message13> +subject(SUBJECT 4)=<message12> + + +<message1-dummy> (dummy).parent=<(root)> +<message1>.parent=<message1-dummy> +<message2-dummy> (dummy).parent=<(root)> +<message2>.parent=<message2-dummy> + + +subject(SUBJECT 1)=<message1-dummy> + + +<message1-dummy> (dummy).parent=<(root)> +<message1>.parent=<message1-dummy> +<message2>.parent=<message1-dummy> + + +<message4>.parent=<(root)> +<message2>.parent=<(root)> +<message1-dummy> (dummy).parent=<(root)> +<message1>.parent=<message1-dummy> +<message3>.parent=<(root)> +<message10>.parent=<(root)> +<message11>.parent=<(root)> + + +subject(SUBJECT 10)=<message10> +subject(SUBJECT 1)=<message1-dummy> +subject(SUBJECT 2)=<message4> + + +<message4>.parent=<(root)> +<message3>.parent=<message4> +<message1-dummy> (dummy).parent=<(root)> +<message1>.parent=<message1-dummy> +<message2>.parent=<message1-dummy> +<(dummy)> (dummy).parent=<(root)> +<message10>.parent=<(dummy)> +<message11>.parent=<(dummy)> + + +<(dummy)> (dummy).parent=<(root)> +<message2>.parent=<(dummy)> +<message6>.parent=<message2> +<message5>.parent=<message2> +<message4>.parent=<message2> +<message1>.parent=<(dummy)> +<message3>.parent=<(dummy)> + + +<message1>.parent=<(root)> +<>.parent=<message1> +<(dummy)> (dummy).parent=<(root)> +<>.parent=<(dummy)> +<message4>.parent=<(dummy)> + + diff --git a/rfc822/rfc2047.c b/rfc822/rfc2047.c new file mode 100644 index 0000000..f80e862 --- /dev/null +++ b/rfc822/rfc2047.c @@ -0,0 +1,729 @@ +/* +** Copyright 1998 - 2011 Double Precision, Inc. See COPYING for +** distribution information. +*/ + +#include "rfc822.h" +#include <stdio.h> +#include <ctype.h> +#include <string.h> +#include <stdlib.h> +#include <errno.h> + +#include "rfc822hdr.h" +#include "rfc2047.h" +#include "../unicode/unicode.h" +#if LIBIDN +#include <idna.h> +#include <stringprep.h> +#endif + + +#define RFC2047_ENCODE_FOLDLENGTH 76 + +static const char xdigit[]="0123456789ABCDEF"; +static const char base64tab[]= +"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; + +static char *a_rfc2047_encode_str(const char *str, const char *charset, + int isaddress); + +static void rfc2047_encode_header_do(const struct rfc822a *a, + const char *charset, + void (*print_func)(char, void *), + void (*print_separator)(const char *, + void *), void *ptr) +{ + rfc822_print_common(a, &a_rfc2047_encode_str, charset, + print_func, print_separator, ptr); +} + +static char *rfc822_encode_domain_int(const char *pfix, + size_t pfix_len, + const char *domain) +{ + char *q; + +#if LIBIDN + int err; + char *p; + size_t s=strlen(domain)+16; + char *cpy=malloc(s); + + if (!cpy) + return NULL; + + /* + ** Invalid UTF-8 can make libidn go off the deep end. Add + ** padding as a workaround. + */ + + memset(cpy, 0, s); + strcpy(cpy, domain); + + err=idna_to_ascii_8z(cpy, &p, 0); + free(cpy); + + if (err != IDNA_SUCCESS) + { + errno=EINVAL; + return NULL; + } +#else + char *p; + + p=strdup(domain); + + if (!p) + return NULL; +#endif + + q=malloc(strlen(p)+pfix_len+1); + + if (!q) + { + free(p); + return NULL; + } + + if (pfix_len) + memcpy(q, pfix, pfix_len); + + strcpy(q + pfix_len, p); + free(p); + return q; +} + +char *rfc822_encode_domain(const char *address, + const char *charset) +{ + char *p=libmail_u_convert_tobuf(address, charset, "utf-8", NULL); + char *cp, *q; + + if (!p) + return NULL; + + cp=strchr(p, '@'); + + if (!cp) + { + q=rfc822_encode_domain_int("", 0, p); + free(p); + return q; + } + + ++cp; + q=rfc822_encode_domain_int(p, cp-p, cp); + free(p); + return q; +} + +static char *a_rfc2047_encode_str(const char *str, const char *charset, + int isaddress) +{ + size_t l; + char *p; + + if (isaddress) + return rfc822_encode_domain(str, charset); + + for (l=0; str[l]; l++) + if (str[l] & 0x80) + break; + + if (str[l] == 0) + { + size_t n; + + for (l=0; str[l]; l++) + if (strchr(RFC822_SPECIALS, str[l])) + break; + + if (str[l] == 0) + return (strdup(str)); + + for (n=3, l=0; str[l]; l++) + { + switch (str[l]) { + case '"': + case '\\': + ++n; + break; + } + + ++n; + } + + p=malloc(n); + + if (!p) + return NULL; + + p[0]='"'; + + for (n=1, l=0; str[l]; l++) + { + switch (str[l]) { + case '"': + case '\\': + p[n++]='\\'; + break; + } + + p[n++]=str[l]; + } + p[n++]='"'; + p[n]=0; + + return (p); + } + + return rfc2047_encode_str(str, charset, rfc2047_qp_allow_word); +} + +static void count(char c, void *p); +static void counts2(const char *c, void *p); +static void save(char c, void *p); +static void saves2(const char *c, void *p); + +char *rfc2047_encode_header_addr(const struct rfc822a *a, + const char *charset) +{ +size_t l; +char *s, *p; + + l=1; + rfc2047_encode_header_do(a, charset, &count, &counts2, &l); + if ((s=malloc(l)) == 0) return (0); + p=s; + rfc2047_encode_header_do(a, charset, &save, &saves2, &p); + *p=0; + return (s); +} + + +char *rfc2047_encode_header_tobuf(const char *name, /* Header name */ + const char *header, /* Header's contents */ + const char *charset) +{ + if (rfc822hdr_is_addr(name)) + { + char *s=0; + + struct rfc822t *t; + struct rfc822a *a; + + if ((t=rfc822t_alloc_new(header, NULL, NULL)) != 0) + { + if ((a=rfc822a_alloc(t)) != 0) + { + s=rfc2047_encode_header_addr(a, charset); + rfc822a_free(a); + } + rfc822t_free(t); + } + return s; + } + + return rfc2047_encode_str(header, charset, rfc2047_qp_allow_word); +} + +static void count(char c, void *p) +{ + ++*(size_t *)p; +} + +static void counts2(const char *c, void *p) +{ + if (*c == ',') + count(*c++, p); + + count('\n', p); + count(' ', p); + + while (*c) count(*c++, p); +} + +static void save(char c, void *p) +{ + **(char **)p=c; + ++*(char **)p; +} + +static void saves2(const char *c, void *p) +{ + if (*c == ',') + save(*c++, p); + + save('\n', p); + save(' ', p); + + while (*c) save(*c++, p); +} + +static int encodebase64(const char *ptr, size_t len, const char *charset, + int (*qp_allow)(char), + int (*func)(const char *, size_t, void *), void *arg) +{ + unsigned char ibuf[3]; + char obuf[4]; + int rc; + + if ((rc=(*func)("=?", 2, arg)) || + (rc=(*func)(charset, strlen(charset), arg))|| + (rc=(*func)("?B?", 3, arg))) + return rc; + + while (len) + { + size_t n=len > 3 ? 3:len; + + ibuf[0]= ptr[0]; + if (n>1) + ibuf[1]=ptr[1]; + else + ibuf[1]=0; + if (n>2) + ibuf[2]=ptr[2]; + else + ibuf[2]=0; + ptr += n; + len -= n; + + obuf[0] = base64tab[ ibuf[0] >>2 ]; + obuf[1] = base64tab[(ibuf[0] & 0x03)<<4|ibuf[1]>>4]; + obuf[2] = base64tab[(ibuf[1] & 0x0F)<<2|ibuf[2]>>6]; + obuf[3] = base64tab[ ibuf[2] & 0x3F ]; + if (n < 2) + obuf[2] = '='; + if (n < 3) + obuf[3] = '='; + + if ((rc=(*func)(obuf, 4, arg))) + return rc; + } + + if ((rc=(*func)("?=", 2, arg))) + return rc; + return 0; +} + +#define ISSPACE(i) ((i)=='\t' || (i)=='\r' || (i)=='\n' || (i)==' ') +#define DOENCODEWORD(c) \ + ((c) < 0x20 || (c) > 0x7F || (c) == '"' || \ + (c) == '_' || (c) == '=' || (c) == '?' || !(*qp_allow)((char)c)) + +/* +** Encode a character stream using quoted-printable encoding. +*/ +static int encodeqp(const char *ptr, size_t len, + const char *charset, + int (*qp_allow)(char), + int (*func)(const char *, size_t, void *), void *arg) +{ + size_t i; + int rc; + char buf[3]; + + if ((rc=(*func)("=?", 2, arg)) || + (rc=(*func)(charset, strlen(charset), arg))|| + (rc=(*func)("?Q?", 3, arg))) + return rc; + + for (i=0; i<len; ++i) + { + size_t j; + + for (j=i; j<len; ++j) + { + if (ptr[j] == ' ' || DOENCODEWORD(ptr[j])) + break; + } + + if (j > i) + { + rc=(*func)(ptr+i, j-i, arg); + + if (rc) + return rc; + if (j >= len) + break; + } + i=j; + + if (ptr[i] == ' ') + rc=(*func)("_", 1, arg); + else + { + buf[0]='='; + buf[1]=xdigit[ ( ptr[i] >> 4) & 0x0F ]; + buf[2]=xdigit[ ptr[i] & 0x0F ]; + + rc=(*func)(buf, 3, arg); + } + + if (rc) + return rc; + } + + return (*func)("?=", 2, arg); +} + +/* +** Calculate whether the next word should be RFC2047-encoded. +** +** Returns 0 if not, 1 if any character in the next word is flagged by +** DOENCODEWORD(). +*/ + +static int encode_word(const unicode_char *uc, + size_t ucsize, + int (*qp_allow)(char), + + /* + ** Points to the starting offset of word in uc. + ** At exit, points to the end of the word in uc. + */ + size_t *word_ptr) +{ + size_t i; + int encode=0; + + for (i=*word_ptr; i<ucsize; ++i) + { + if (ISSPACE(uc[i])) + break; + + if (DOENCODEWORD(uc[i])) + encode=1; + } + + *word_ptr=i; + return encode; +} + +/* +** Calculate whether the next sequence of words should be RFC2047-encoded. +** +** Whatever encode_word() returns for the first word, look at the next word +** and keep going as long as encode_word() keeps returning the same value. +*/ + +static int encode_words(const unicode_char *uc, + size_t ucsize, + int (*qp_allow)(char), + + /* + ** Points to the starting offset of words in uc. + ** At exit, points to the end of the words in uc. + */ + + size_t *word_ptr) +{ + size_t i= *word_ptr, j, k; + + int flag=encode_word(uc, ucsize, qp_allow, &i); + + if (!flag) + { + *word_ptr=i; + return flag; + } + + j=i; + + while (j < ucsize) + { + if (ISSPACE(uc[j])) + { + ++j; + continue; + } + + k=j; + + if (!encode_word(uc, ucsize, qp_allow, &k)) + break; + i=j=k; + } + + *word_ptr=i; + return flag; +} + +/* +** Encode a sequence of words. +*/ +static int do_encode_words_method(const unicode_char *uc, + size_t ucsize, + const char *charset, + int (*qp_allow)(char), + size_t offset, + int (*encoder)(const char *ptr, size_t len, + const char *charset, + int (*qp_allow)(char), + int (*func)(const char *, + size_t, void *), + void *arg), + int (*func)(const char *, size_t, void *), + void *arg) +{ + char *p; + size_t psize; + int rc; + int first=1; + + while (ucsize) + { + size_t j; + size_t i; + + if (!first) + { + rc=(*func)(" ", 1, arg); + + if (rc) + return rc; + } + first=0; + + j=(RFC2047_ENCODE_FOLDLENGTH-offset)/2; + + if (j >= ucsize) + j=ucsize; + else + { + /* + ** Do not split rfc2047-encoded works across a + ** grapheme break. + */ + + for (i=j; i > 0; --i) + if (unicode_grapheme_break(uc[i-1], uc[i])) + { + j=i; + break; + } + } + + if ((rc=libmail_u_convert_fromu_tobuf(uc, j, charset, + &p, &psize, + NULL)) != 0) + return rc; + + + if (psize && p[psize-1] == 0) + --psize; + + rc=(*encoder)(p, psize, charset, qp_allow, + func, arg); + free(p); + if (rc) + return rc; + offset=0; + ucsize -= j; + uc += j; + } + return 0; +} + +static int cnt_conv(const char *dummy, size_t n, void *arg) +{ + *(size_t *)arg += n; + return 0; +} + +/* +** Encode, or not encode, words. +*/ + +static int do_encode_words(const unicode_char *uc, + size_t ucsize, + const char *charset, + int flag, + int (*qp_allow)(char), + size_t offset, + int (*func)(const char *, size_t, void *), + void *arg) +{ + char *p; + size_t psize; + int rc; + size_t b64len, qlen; + + /* + ** Convert from unicode + */ + + if ((rc=libmail_u_convert_fromu_tobuf(uc, ucsize, charset, + &p, &psize, + NULL)) != 0) + return rc; + + if (psize && p[psize-1] == 0) + --psize; + + if (!flag) /* If not converting, then the job is done */ + { + rc=(*func)(p, psize, arg); + free(p); + return rc; + } + free(p); + + /* + ** Try first quoted-printable, then base64, then pick whichever + ** one gives the shortest results. + */ + qlen=0; + b64len=0; + + rc=do_encode_words_method(uc, ucsize, charset, qp_allow, offset, + &encodeqp, cnt_conv, &qlen); + if (rc) + return rc; + + rc=do_encode_words_method(uc, ucsize, charset, qp_allow, offset, + &encodebase64, cnt_conv, &b64len); + if (rc) + return rc; + + return do_encode_words_method(uc, ucsize, charset, qp_allow, offset, + qlen < b64len ? encodeqp:encodebase64, + func, arg); +} + +/* +** RFC2047-encoding pass. +*/ +static int rfc2047_encode_callback(const unicode_char *uc, + size_t ucsize, + const char *charset, + int (*qp_allow)(char), + int (*func)(const char *, size_t, void *), + void *arg) +{ + int rc; + size_t i; + int flag; + + size_t offset=27; /* FIXME: initial offset for line length */ + + while (ucsize) + { + /* Pass along all the whitespace */ + + if (ISSPACE(*uc)) + { + char c= *uc++; + --ucsize; + + if ((rc=(*func)(&c, 1, arg)) != 0) + return rc; + continue; + } + + i=0; + + /* Check if the next word needs to be encoded, or not. */ + + flag=encode_words(uc, ucsize, qp_allow, &i); + + /* + ** Then proceed to encode, or not encode, the following words. + */ + + if ((rc=do_encode_words(uc, i, charset, flag, + qp_allow, offset, + func, arg)) != 0) + return rc; + + offset=0; + uc += i; + ucsize -= i; + } + + return 0; +} + + +static int count_char(const char *c, size_t l, void *p) +{ +size_t *i=(size_t *)p; + + *i += l; + return (0); +} + +static int save_char(const char *c, size_t l, void *p) +{ +char **s=(char **)p; + + memcpy(*s, c, l); + *s += l; + return (0); +} + +char *rfc2047_encode_str(const char *str, const char *charset, + int (*qp_allow)(char c)) +{ + size_t i=1; + char *s, *p; + unicode_char *uc; + size_t ucsize; + int err; + + /* Convert string to unicode */ + + if (libmail_u_convert_tou_tobuf(str, strlen(str), charset, + &uc, &ucsize, &err)) + return NULL; + + /* + ** Perform two passes: calculate size of the buffer where the + ** encoded string gets saved into, then allocate the buffer and + ** do a second pass to actually do it. + */ + + if (rfc2047_encode_callback(uc, ucsize, + charset, + qp_allow, + &count_char, &i)) + { + free(uc); + return NULL; + } + + if ((s=malloc(i)) == 0) + { + free(uc); + return NULL; + } + + p=s; + (void)rfc2047_encode_callback(uc, ucsize, + charset, + qp_allow, + &save_char, &p); + *p=0; + free(uc); + return (s); +} + +int rfc2047_qp_allow_any(char c) +{ + return 1; +} + +int rfc2047_qp_allow_comment(char c) +{ + if (c == '(' || c == ')' || c == '"') + return 0; + return 1; +} + +int rfc2047_qp_allow_word(char c) +{ + return strchr(base64tab, c) != NULL || + strchr("*-=_", c) != NULL; +} diff --git a/rfc822/rfc2047.h b/rfc822/rfc2047.h new file mode 100644 index 0000000..7b9f9a1 --- /dev/null +++ b/rfc822/rfc2047.h @@ -0,0 +1,87 @@ +#ifndef rfc2047_h +#define rfc2047_h + +#include <stdlib.h> +/* +** Copyright 1998 - 2009 Double Precision, Inc. See COPYING for +** distribution information. +*/ + +#ifdef __cplusplus +extern "C" { +#endif + + + +struct unicode_info; + +/* +** Raw RFC 2047 parser. +** +** rfc2047_decoder() repeatedly invokes the callback function, passing it +** the decoded RFC 2047 string that's given as an argument. +*/ + +int rfc2047_decoder(const char *text, + void (*callback)(const char *chset, + const char *lang, + const char *content, + size_t cnt, + void *dummy), + void *ptr); + +/* +** rfc2047_print_unicodeaddr is like rfc822_print, except that it converts +** RFC 2047 MIME encoding to 8 bit text. +*/ + +struct rfc822a; + +int rfc2047_print_unicodeaddr(const struct rfc822a *a, + const char *charset, + void (*print_func)(char, void *), + void (*print_separator)(const char *, void *), + void *ptr); + + +/* +** And now, let's encode something with RFC 2047. Encode the following +** string in the indicated character set, into a malloced buffer. Returns 0 +** if malloc failed. +*/ + +char *rfc2047_encode_str(const char *str, const char *charset, + int (*qp_allow)(char c) /* See below */); + + +/* Potential arguments for qp_allow */ + +int rfc2047_qp_allow_any(char); /* Any character */ +int rfc2047_qp_allow_comment(char); /* Any character except () */ +int rfc2047_qp_allow_word(char); /* See RFC2047, bottom of page 7 */ + + + +/* +** rfc2047_encode_header allocates a buffer, and MIME-encodes a header. +** +** The name of the header, passed as the first parameter, should be +** "From", "To", "Subject", etc... It is not included in the encoded contents. +*/ +char *rfc2047_encode_header_tobuf(const char *name, /* Header name */ + const char *header, /* Header's contents */ + const char *charset); + +/* +** rfc2047_encode_header_addr allocates a buffer, and MIME-encodes an +** RFC822 address header. +** +*/ +char *rfc2047_encode_header_addr(const struct rfc822a *a, + const char *charset); + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/rfc822/rfc2047u.c b/rfc822/rfc2047u.c new file mode 100644 index 0000000..c1bc48b --- /dev/null +++ b/rfc822/rfc2047u.c @@ -0,0 +1,1050 @@ +/* +** Copyright 1998 - 2009 Double Precision, Inc. See COPYING for +** distribution information. +*/ + +#include "rfc822.h" +#include <stdio.h> +#include <ctype.h> +#include <string.h> +#include <stdlib.h> +#include <errno.h> + +#include "rfc822hdr.h" +#include "rfc2047.h" +#include "../unicode/unicode.h" + +#if LIBIDN +#include <idna.h> +#include <stringprep.h> +#endif + + +static ssize_t rfc822_decode_rfc2047_atom(const char *str, + size_t cnt, + + void (*callback)(const char *, + const char *, + const char *, + size_t, + void *), + void *ptr); + +static int rfc2047_decode_unicode(const char *text, + const char *chset, + void (*callback)(const char *, size_t, + void *), + void *ptr); + +struct decode_unicode_s { + const char *mychset; + + char *bufptr; + size_t bufsize; +} ; + +static void save_unicode_text(const char *p, size_t l, void *ptr) +{ + struct decode_unicode_s *s= + (struct decode_unicode_s *)ptr; + + if (s->bufptr) + memcpy(s->bufptr+s->bufsize, p, l); + + s->bufsize += l; +} + +struct rfc822_display_name_s { + const char *chset; + void (*print_func)(const char *, size_t, void *); + void *ptr; +}; + +static void unknown_charset(const char *chset, + const char *tochset, + void (*print_func)(const char *, size_t, void *), + void *ptr) +{ + static const char unknown[]="[unknown character set: "; + + (*print_func)(unknown, sizeof(unknown)-1, ptr); + (*print_func)(chset, strlen(chset), ptr); + (*print_func)(" -> ", 4, ptr); + (*print_func)(tochset, strlen(tochset), ptr); + (*print_func)("]", 1, ptr); +} + +static void rfc822_display_addr_cb(const char *chset, + const char *lang, + const char *content, + size_t cnt, + void *dummy) +{ + struct rfc822_display_name_s *s= + (struct rfc822_display_name_s *)dummy; + char *ptr; + char *buf; + + buf=malloc(cnt+1); + + if (!buf) + return; + + memcpy(buf, content, cnt); + buf[cnt]=0; + + ptr=libmail_u_convert_tobuf(buf, chset, s->chset, NULL); + free(buf); + + if (ptr) + { + (*s->print_func)(ptr, strlen(ptr), s->ptr); + free(ptr); + } + else + { + unknown_charset(chset, s->chset, s->print_func, s->ptr); + return; + } +} + +static +int rfc822_display_name_int(const struct rfc822a *rfcp, int index, + const char *chset, + void (*print_func)(const char *, size_t, void *), + void *ptr) +{ + struct rfc822_display_name_s s; + const struct rfc822addr *addrs; + + struct rfc822token *i; + int prev_isatom=0; + int isatom=0; + ssize_t rc; + + if (index < 0 || index >= rfcp->naddrs) return 0; + + addrs=rfcp->addrs+index; + + if (!addrs->name) + return rfc822_display_addr(rfcp, index, chset, print_func, ptr); + + if (chset == NULL) + { + s.chset="iso-8859-1"; + } + else + { + s.chset=chset; + } + + s.print_func=print_func; + s.ptr=ptr; + + for (i=addrs->name; i; i=i->next, prev_isatom=isatom) + { + isatom=rfc822_is_atom(i->token); + if (isatom && prev_isatom) + (*print_func)(" ", 1, ptr); + + if (i->token == '"' || i->token == '(') + { + size_t l=i->len; + char *p, *q, *r; + + if (i->token == '(') + { + if (l > 2) + l -= 2; + else + l=0; + } + + p=malloc(l+1); + + if (!p) + return -1; + + if (l) + { + if (i->token == '(') + { + memcpy(p, i->ptr+1, l); + } + else + { + memcpy(p, i->ptr, l); + } + } + + + p[l]=0; + + for (q=r=p; *q; *r++ = *q++) + if (*q == '\\' && q[1]) + ++q; + + *r=0; + + if (chset == NULL) + { + (*print_func)(p, strlen(p), ptr); + } + else if (rfc822_display_hdrvalue("subject", + p, s.chset, + print_func, + NULL, ptr) < 0) + { + free(p); + return -1; + } + free(p); + continue; + } + + if (i->token) + { + char c= (char)i->token; + + (*print_func)(&c, 1, ptr); + continue; + } + + rc=chset ? rfc822_decode_rfc2047_atom(i->ptr, i->len, + rfc822_display_addr_cb, + &s):0; + + if (rc < 0) + return -1; + + if (rc == 0) + { + (*print_func)(i->ptr, i->len, ptr); + continue; + } + + if (i->next && i->next->token == 0) + { + rc=rfc822_decode_rfc2047_atom(i->next->ptr, + i->next->len, + NULL, NULL); + + if (rc < 0) + return -1; + + if (rc > 0) + isatom=0; /* Suppress the separating space */ + } + } + return 0; +} + +int rfc822_display_name(const struct rfc822a *rfcp, int index, + const char *chset, + void (*print_func)(const char *, size_t, void *), + void *ptr) +{ + const struct rfc822addr *addrs; + + if (index < 0 || index >= rfcp->naddrs) return 0; + + addrs=rfcp->addrs+index; + + if (!addrs->tokens) + return 0; + + return rfc822_display_name_int(rfcp, index, chset, + print_func, ptr); +} + +char *rfc822_display_name_tobuf(const struct rfc822a *rfcp, int index, + const char *chset) +{ + struct decode_unicode_s s; + char *p; + + s.bufptr=0; + s.bufsize=1; + + if (rfc822_display_name(rfcp, index, chset, save_unicode_text, &s) < 0) + return NULL; + s.bufptr=p=malloc(s.bufsize); + if (!p) + return (0); + + s.bufsize=0; + if (rfc822_display_name(rfcp, index, chset, save_unicode_text, &s) < 0) + { + free(s.bufptr); + return (0); + } + save_unicode_text("", 1, &s); + + return (p); +} + +int rfc822_display_namelist(const struct rfc822a *rfcp, + const char *chset, + void (*print_func)(const char *, size_t, void *), + void *ptr) +{ + int n; + + for (n=0; n<rfcp->naddrs; n++) + { + if (rfcp->addrs[n].tokens) + { + int err=rfc822_display_name(rfcp, n, chset, + print_func, ptr); + + if (err < 0) + return err; + + (*print_func)("\n", 1, ptr); + } + } + return 0; +} + +int rfc822_display_addr_str(const char *tok, + const char *chset, + void (*print_func)(const char *, size_t, void *), + void *ptr) +{ + const char *p; + + p=strchr(tok,'@'); + + if (!p) + p=tok; + else + ++p; + + if (chset != NULL) + { + int err=0; + char *utf8_ptr; + + if (p > tok) + (*print_func)(tok, p-tok, ptr); + +#if LIBIDN + /* + ** Invalid UTF-8 can make libidn go off the deep end. Add + ** padding as a workaround. + */ + { + size_t s=strlen(p)+16; + char *cpy=malloc(s); + + if (!cpy) + return 0; + memset(cpy, 0, s); + strcpy(cpy, p); + + err=idna_to_unicode_8z8z(cpy, &utf8_ptr, 0); + free(cpy); + } + + if (err != IDNA_SUCCESS) + utf8_ptr=0; +#else + utf8_ptr=0; +#endif + + if (utf8_ptr == 0) + (*print_func)(p, strlen(p), ptr); + else + { + char *q=libmail_u_convert_tobuf(utf8_ptr, + "utf-8", + chset, NULL); + if (q) + { + (*print_func)(q, strlen(q), ptr); + free(q); + } + else + { + (*print_func)(p, strlen(p), ptr); + } + free(utf8_ptr); + } + } + else + { + (*print_func)(tok, strlen(tok), ptr); + } + return 0; +} + +int rfc822_display_addr(const struct rfc822a *rfcp, int index, + const char *chset, + void (*print_func)(const char *, size_t, void *), + void *ptr) +{ + const struct rfc822addr *addrs; + char *tok; + int rc; + + if (index < 0 || index >= rfcp->naddrs) return 0; + + addrs=rfcp->addrs+index; + + if (!addrs->tokens) + return 0; + + tok=rfc822_gettok(addrs->tokens); + + if (!tok) + return 0; + + rc=rfc822_display_addr_str(tok, chset, print_func, ptr); + free(tok); + return rc; +} + +int rfc2047_print_unicodeaddr(const struct rfc822a *a, + const char *charset, + void (*print_func)(char, void *), + void (*print_separator)(const char *, void *), + void *ptr) +{ + const char *sep=NULL; + int n; + + for (n=0; n<a->naddrs; ++n) + { + struct decode_unicode_s nbuf; + const struct rfc822addr *addrs; + size_t i=0; + char *cpbuf; + int need_braces=0; + + addrs=a->addrs+n; + + nbuf.bufptr=0; + nbuf.bufsize=1; + + if (rfc822_display_name_int(a, n, charset, + save_unicode_text, &nbuf) < 0) + return -1; + + nbuf.bufptr=malloc(nbuf.bufsize); + nbuf.bufsize=0; + if (!nbuf.bufptr) + return -1; + + if (rfc822_display_name_int(a, n, charset, + save_unicode_text, &nbuf) < 0) + { + free(nbuf.bufptr); + return -1; + } + nbuf.bufptr[nbuf.bufsize]=0; + + if (addrs->tokens == 0) + { + size_t i; + + if (nbuf.bufsize == 1) /* ; */ + sep=0; + + if (sep) + (*print_separator)(sep, ptr); + + for (i=0; i<nbuf.bufsize; ++i) + (*print_func)(nbuf.bufptr[i], ptr); + free(nbuf.bufptr); + if (nbuf.bufsize > 1) + (*print_separator)(" ", ptr); + sep=NULL; + continue; + } + if (sep) + (*print_separator)(sep, ptr); + + if (!addrs->name) + { + nbuf.bufsize=0; + nbuf.bufptr[0]=0; + } + + for (i=0; i<nbuf.bufsize; i++) + if (strchr(RFC822_SPECIALS, nbuf.bufptr[i])) + break; + + cpbuf=libmail_u_convert_tobuf(nbuf.bufptr, "utf-8", charset, + NULL); + + if (!cpbuf) + { + const char *errmsg="\"(unknown character set)\""; + + while (*errmsg) + (*print_func)(*errmsg++, ptr); + need_braces=1; + } + else + { + if (i < nbuf.bufsize) + { + (*print_func)('"', ptr); + + for (i=0; cpbuf[i]; ++i) + { + if (cpbuf[i] == '\\' || + cpbuf[i] == '"') + (*print_func)('\\', ptr); + (*print_func)(cpbuf[i], ptr); + } + (*print_func)('"', ptr); + need_braces=1; + } + else + { + for (i=0; cpbuf[i]; ++i) + { + need_braces=1; + (*print_func)(cpbuf[i], ptr); + } + } + + free(cpbuf); + } + free(nbuf.bufptr); + + if (need_braces) + { + (*print_func)(' ', ptr); + (*print_func)('<', ptr); + } + + nbuf.bufptr=0; + nbuf.bufsize=1; + + if (rfc822_display_addr(a, n, charset, + save_unicode_text, &nbuf) < 0) + return -1; + + nbuf.bufptr=malloc(nbuf.bufsize); + nbuf.bufsize=0; + if (!nbuf.bufptr) + return -1; + + if (rfc822_display_addr(a, n, charset, + save_unicode_text, &nbuf) < 0) + { + free(nbuf.bufptr); + return -1; + } + for (i=0; i<nbuf.bufsize; i++) + (*print_func)(nbuf.bufptr[i], ptr); + + free(nbuf.bufptr); + + if (need_braces) + (*print_func)('>', ptr); + sep=", "; + } + + return 0; +} + +static int rfc2047_print_unicode_addrstr(const char *addrheader, + const char *charset, + void (*print_func)(char, void *), + void (*print_separator)(const char *, void *), + void (*err_func)(const char *, int, void *), + void *ptr) +{ + struct rfc822t *t; + struct rfc822a *a; + int rc; + + t=rfc822t_alloc_new(addrheader, err_func, ptr); + + if (!t) + return -1; + + a=rfc822a_alloc(t); + + if (!a) + { + rfc822t_free(t); + return -1; + } + rc=rfc2047_print_unicodeaddr(a, charset, print_func, print_separator, + ptr); + rfc822a_free(a); + rfc822t_free(t); + return (rc); +} + +struct rfc822_display_hdrvalue_s { + + void (*display_func)(const char *, size_t, void *); + void *ptr; +}; + +static void rfc822_display_hdrvalue_print_func(char c, void *ptr) +{ + struct rfc822_display_hdrvalue_s *s= + (struct rfc822_display_hdrvalue_s *)ptr; + + (*s->display_func)(&c, 1, s->ptr); +} + +static void rfc822_display_hdrvalue_print_separator(const char *cp, void *ptr) +{ + struct rfc822_display_hdrvalue_s *s= + (struct rfc822_display_hdrvalue_s *)ptr; + + (*s->display_func)(cp, strlen(cp), s->ptr); + (*s->display_func)("", 0, s->ptr); /* Signal wrap point */ +} + +int rfc822_display_hdrvalue(const char *hdrname, + const char *hdrvalue, + const char *charset, + void (*display_func)(const char *, size_t, + void *), + void (*err_func)(const char *, int, void *), + void *ptr) +{ + struct rfc822_display_hdrvalue_s s; + + s.display_func=display_func; + s.ptr=ptr; + + if (rfc822hdr_is_addr(hdrname)) + { + return rfc2047_print_unicode_addrstr(hdrvalue, + charset, + rfc822_display_hdrvalue_print_func, + rfc822_display_hdrvalue_print_separator, + NULL, + &s); + } + + return rfc2047_decode_unicode(hdrvalue, charset, display_func, ptr); +} + +struct rfc822_display_hdrvalue_tobuf_s { + void (*orig_err_func)(const char *, int, void *); + void *orig_ptr; + + size_t cnt; + char *buf; +}; + +static void rfc822_display_hdrvalue_tobuf_cnt(const char *ptr, size_t cnt, + void *s) +{ + ((struct rfc822_display_hdrvalue_tobuf_s *)s)->cnt += cnt; +} + +static void rfc822_display_hdrvalue_tobuf_save(const char *ptr, size_t cnt, + void *s) +{ + if (cnt) + memcpy(((struct rfc822_display_hdrvalue_tobuf_s *)s)->buf, + ptr, cnt); + + ((struct rfc822_display_hdrvalue_tobuf_s *)s)->buf += cnt; +} + +static void rfc822_display_hdrvalue_tobuf_errfunc(const char *ptr, int index, + void *s) +{ + void (*f)(const char *, int, void *)= + ((struct rfc822_display_hdrvalue_tobuf_s *)s)->orig_err_func; + + if (f) + f(ptr, index, + ((struct rfc822_display_hdrvalue_tobuf_s *)s)->orig_ptr); +} + +char *rfc822_display_addr_tobuf(const struct rfc822a *rfcp, int index, + const char *chset) +{ + struct rfc822_display_hdrvalue_tobuf_s nbuf; + int errcode; + char *ptr; + + nbuf.buf=0; + nbuf.cnt=1; + + errcode=rfc822_display_addr(rfcp, index, chset, + rfc822_display_hdrvalue_tobuf_cnt, &nbuf); + + if (errcode < 0) + return NULL; + + ptr=nbuf.buf=malloc(nbuf.cnt); + nbuf.cnt=0; + if (!ptr) + return NULL; + + errcode=rfc822_display_addr(rfcp, index, chset, + rfc822_display_hdrvalue_tobuf_save, &nbuf); + + if (errcode < 0) + { + free(nbuf.buf); + return NULL; + } + *nbuf.buf=0; + return ptr; +} + +char *rfc822_display_hdrvalue_tobuf(const char *hdrname, + const char *hdrvalue, + const char *charset, + void (*err_func)(const char *, int, + void *), + void *ptr) +{ + struct rfc822_display_hdrvalue_tobuf_s s; + int errcode; + char *bufptr; + + s.orig_err_func=err_func; + s.orig_ptr=ptr; + s.cnt=1; + + errcode=rfc822_display_hdrvalue(hdrname, hdrvalue, charset, + rfc822_display_hdrvalue_tobuf_cnt, + rfc822_display_hdrvalue_tobuf_errfunc, + &s); + + if (errcode < 0) + return NULL; + + bufptr=s.buf=malloc(s.cnt); + + if (!bufptr) + return NULL; + + errcode=rfc822_display_hdrvalue(hdrname, hdrvalue, charset, + rfc822_display_hdrvalue_tobuf_save, + rfc822_display_hdrvalue_tobuf_errfunc, + &s); + if (errcode) + { + free(bufptr); + return NULL; + } + *s.buf=0; + return bufptr; +} + +char *rfc822_display_addr_str_tobuf(const char *tok, const char *chset) +{ + struct rfc822_display_hdrvalue_tobuf_s s; + int errcode; + char *bufptr; + + s.cnt=1; + + errcode=rfc822_display_addr_str(tok, chset, + rfc822_display_hdrvalue_tobuf_cnt, + &s); + + if (errcode < 0) + return NULL; + + bufptr=s.buf=malloc(s.cnt); + + if (!bufptr) + return NULL; + + errcode=rfc822_display_addr_str(tok, chset, + rfc822_display_hdrvalue_tobuf_save, + &s); + if (errcode < 0) + { + free(bufptr); + return NULL; + } + *s.buf=0; + return bufptr; +} + + +static const char xdigit[]="0123456789ABCDEFabcdef"; + +static const unsigned char decode64tab[]={ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 62, 0, 0, 0, 63, + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 0, 0, 0, 99, 0, 0, + 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, + 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 0, 0, 0, 0, 0, + 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, + 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 +}; + +static int nyb(int c) +{ + const char *p; + int n; + + p=strchr(xdigit, c); + + if (!p) + return 0; + + n=p-xdigit; + + if (n > 15) + n -= 6; + + return n; +} + +static size_t decodebase64(const char *ptr, size_t cnt, + char *dec_buf) +{ + size_t i, j; + char a,b,c; + size_t k; + + i=cnt / 4; + i=i*4; + k=0; + for (j=0; j<i; j += 4) + { + int w=decode64tab[(int)(unsigned char)ptr[j]]; + int x=decode64tab[(int)(unsigned char)ptr[j+1]]; + int y=decode64tab[(int)(unsigned char)ptr[j+2]]; + int z=decode64tab[(int)(unsigned char)ptr[j+3]]; + + a= (w << 2) | (x >> 4); + b= (x << 4) | (y >> 2); + c= (y << 6) | z; + dec_buf[k++]=a; + if ( ptr[j+2] != '=') + dec_buf[k++]=b; + if ( ptr[j+3] != '=') + dec_buf[k++]=c; + } + return (k); +} + + +static ssize_t rfc822_decode_rfc2047_atom(const char *str, + size_t cnt, + + void (*callback)(const char *, + const char *, + const char *, + size_t, + void *), + void *ptr) +{ + const char *chset_str; + const char *enc_str; + const char *content_str; + + char *chset; + char *lang; + + char *content; + + size_t i; + size_t j; + size_t k; + + size_t content_len; + + if (cnt < 2 || str[0] != '=' || str[1] != '?') + return 0; + + chset_str=str+2; + + for (i=2; i<cnt; i++) + if (str[i] == '?') + break; + + if (i >= cnt) + return 0; + + enc_str= str + ++i; + + for (; i < cnt; i++) + if (str[i] == '?') + break; + + if (i >= cnt) + return 0; + + content_str= str + ++i; + + while (1) + { + if (cnt-i < 2) + return 0; + + if (str[i] == '?' && str[i+1] == '=') + break; + ++i; + } + + for (j=0; chset_str[j] != '?'; ++j) + ; + + chset=malloc(j+1); + + if (!chset) + return -1; + + memcpy(chset, chset_str, j); + chset[j]=0; + + lang=strchr(chset, '*'); /* RFC 2231 */ + + if (lang) + *lang++ = 0; + else + lang=""; + + content_len=str + i - content_str; + + content=malloc(content_len+1); + + if (!content) + { + free(chset); + return -1; + } + + switch (*enc_str) { + case 'q': + case 'Q': + + k=0; + for (j=0; j<content_len; j++) + { + char c; + + if (content_str[j] == '=' && i-j >= 3) + { + content[k]=(char)(nyb(content_str[j+1])*16 + + nyb(content_str[j+2])); + ++k; + j += 2; + continue; + } + + c=content_str[j]; + if (c == '_') + c=' '; + content[k]=c; + ++k; + } + break; + + case 'b': + case 'B': + k=decodebase64(content_str, content_len, content); + break; + default: + free(content); + free(chset); + return (0); + } + + if (callback) + (*callback)(chset, lang, content, k, ptr); + free(content); + free(chset); + return i + 2; +} + +int rfc2047_decoder(const char *text, + void (*callback)(const char *chset, + const char *lang, + const char *content, + size_t cnt, + void *dummy), + void *ptr) +{ + ssize_t rc; + + while (text && *text) + { + size_t i; + + for (i=0; text[i]; i++) + { + if (text[i] == '=' && text[i+1] == '?') + break; + } + + if (i) + (*callback)("iso-8859-1", "", text, i, ptr); + + text += i; + + if (!*text) + continue; + + rc=rfc822_decode_rfc2047_atom(text, strlen(text), + callback, ptr); + + if (rc < 0) + return -1; + + if (rc == 0) + { + (*callback)("iso-8859-1", "", text, 2, ptr); + text += 2; + continue; + } + + text += rc; + + for (i=0; text[i]; i++) + { + if (strchr(" \t\r\n", text[i]) == NULL) + break; + } + + if (text[i] != '=' || text[i+1] != '?') + continue; + + rc=rfc822_decode_rfc2047_atom(text+i, strlen(text+i), NULL, + NULL); + + if (rc < 0) + return -1; + if (rc > 0) + text += i; + } + + return 0; +} + +static int rfc2047_decode_unicode(const char *text, + const char *chset, + void (*callback)(const char *, size_t, + void *), + void *ptr) +{ + struct rfc822_display_name_s s; + + s.chset=chset; + s.print_func=callback; + s.ptr=ptr; + + return rfc2047_decoder(text, rfc822_display_addr_cb, &s); +} diff --git a/rfc822/rfc822.c b/rfc822/rfc822.c new file mode 100644 index 0000000..c51460d --- /dev/null +++ b/rfc822/rfc822.c @@ -0,0 +1,826 @@ +/* +** Copyright 1998 - 2009 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +/* +*/ +#include "rfc822.h" +#include <stdio.h> +#include <ctype.h> +#include <stdlib.h> +#include <string.h> + +static void tokenize(const char *p, struct rfc822token *tokp, int *toklen, + void (*err_func)(const char *, int, void *), void *voidp) +{ +const char *addr=p; +int i=0; +int inbracket=0; + + *toklen=0; + while (*p) + { + if (isspace((int)(unsigned char)*p)) + { + p++; + i++; + continue; + } + +#define SPECIALS "<>@,;:.[]()%!\"\\?=/" + + switch (*p) { + int level; + + case '(': + if (tokp) + { + tokp->token='('; + tokp->ptr=p; + tokp->len=0; + } + level=0; + for (;;) + { + if (!*p) + { + if (err_func) (*err_func)(addr, i, + voidp); + if (tokp) tokp->token='"'; + ++*toklen; + return; + } + if (*p == '(') + ++level; + if (*p == ')' && --level == 0) + { + p++; + i++; + if (tokp) tokp->len++; + break; + } + if (*p == '\\' && p[1]) + { + p++; + i++; + if (tokp) tokp->len++; + } + + i++; + if (tokp) tokp->len++; + p++; + } + if (tokp) ++tokp; + ++*toklen; + continue; + + case '"': + p++; + i++; + + if (tokp) + { + tokp->token='"'; + tokp->ptr=p; + } + while (*p != '"') + { + if (!*p) + { + if (err_func) (*err_func)(addr, i, + voidp); + ++*toklen; + return; + } + if (*p == '\\' && p[1]) + { + if (tokp) tokp->len++; + p++; + i++; + } + if (tokp) tokp->len++; + p++; + i++; + } + ++*toklen; + if (tokp) ++tokp; + p++; + i++; + continue; + case '\\': + case ')': + if (err_func) (*err_func)(addr, i, voidp); + ++p; + ++i; + continue; + + case '=': + + if (p[1] == '?') + { + int j; + + /* exception: =? ... ?= */ + + for (j=2; p[j]; j++) + { + if (p[j] == '?' && p[j+1] == '=') + break; + + if (p[j] == '?' || p[j] == '=') + continue; + + if (strchr(RFC822_SPECIALS, p[j]) || + isspace(p[j])) + break; + } + + if (p[j] == '?' && p[j+1] == '=') + { + j += 2; + if (tokp) + { + tokp->token=0; + tokp->ptr=p; + tokp->len=j; + ++tokp; + } + ++*toklen; + + p += j; + i += j; + continue; + } + } + /* FALLTHROUGH */ + + case '<': + case '>': + case '@': + case ',': + case ';': + case ':': + case '.': + case '[': + case ']': + case '%': + case '!': + case '?': + case '/': + + if ( (*p == '<' && inbracket) || + (*p == '>' && !inbracket)) + { + if (err_func) (*err_func)(addr, i, voidp); + ++p; + ++i; + continue; + } + + if (*p == '<') + inbracket=1; + + if (*p == '>') + inbracket=0; + + if (tokp) + { + tokp->token= *p; + tokp->ptr=p; + tokp->len=1; + ++tokp; + } + ++*toklen; + + if (*p == '<' && p[1] == '>') + /* Fake a null address */ + { + if (tokp) + { + tokp->token=0; + tokp->ptr=""; + tokp->len=0; + ++tokp; + } + ++*toklen; + } + ++p; + ++i; + continue; + default: + + if (tokp) + { + tokp->token=0; + tokp->ptr=p; + tokp->len=0; + } + while (*p && !isspace((int)(unsigned char)*p) && strchr( + SPECIALS, *p) == 0) + { + if (tokp) ++tokp->len; + ++p; + ++i; + } + if (i == 0) /* Idiot check */ + { + if (err_func) (*err_func)(addr, i, voidp); + if (tokp) + { + tokp->token='"'; + tokp->ptr=p; + tokp->len=1; + ++tokp; + } + ++*toklen; + ++p; + ++i; + continue; + } + if (tokp) ++tokp; + ++*toklen; + } + } +} + +static void parseaddr(struct rfc822token *tokens, int ntokens, + struct rfc822addr *addrs, int *naddrs) +{ +int flag, j, k; + + *naddrs=0; + + while (ntokens) + { + int i; + + /* atoms (token=0) or quoted strings, followed by a : token + is a list name. */ + + for (i=0; i<ntokens; i++) + if (tokens[i].token && tokens[i].token != '"') + break; + if (i < ntokens && tokens[i].token == ':') + { + ++i; + if (addrs) + { + addrs->tokens=0; + addrs->name=i ? tokens:0; + for (j=1; j<i; j++) + addrs->name[j-1].next=addrs->name+j; + if (i) + addrs->name[i-1].next=0; + addrs++; + } + ++*naddrs; + tokens += i; + ntokens -= i; + continue; /* Group=phrase ":" */ + } + + /* Spurious commas are skipped, ;s are recorded */ + + if (tokens->token == ',' || tokens->token == ';') + { + if (tokens->token == ';') + { + if (addrs) + { + addrs->tokens=0; + addrs->name=tokens; + addrs->name->next=0; + addrs++; + } + ++*naddrs; + } + ++tokens; + --ntokens; + continue; + } + + /* If we can find a '<' before the next comma or semicolon, + we have new style RFC path address */ + + for (i=0; i<ntokens && tokens[i].token != ';' && + tokens[i].token != ',' && + tokens[i].token != '<'; i++) + ; + + if (i < ntokens && tokens[i].token == '<') + { + int j; + + /* Ok -- what to do with the stuff before '>'??? + If it consists exclusively of atoms, leave them alone. + Else, make them all a quoted string. */ + + for (j=0; j<i && (tokens[j].token == 0 || + tokens[j].token == '('); j++) + ; + + if (j == i) + { + if (addrs) + { + addrs->name= i ? tokens:0; + for (k=1; k<i; k++) + addrs->name[k-1].next=addrs->name+k; + if (i) + addrs->name[i-1].next=0; + } + } + else /* Intentionally corrupt the original toks */ + { + if (addrs) + { + tokens->len= tokens[i-1].ptr + + tokens[i-1].len + - tokens->ptr; + /* We know that all the ptrs point + to parts of the same string. */ + tokens->token='"'; + /* Quoted string. */ + addrs->name=tokens; + addrs->name->next=0; + } + } + + /* Any comments in the name part are changed to quotes */ + + if (addrs) + { + struct rfc822token *t; + + for (t=addrs->name; t; t=t->next) + if (t->token == '(') + t->token='"'; + } + + /* Now that's done and over with, see what can + be done with the <...> part. */ + + ++i; + tokens += i; + ntokens -= i; + for (i=0; i<ntokens && tokens[i].token != '>'; i++) + ; + if (addrs) + { + addrs->tokens=i ? tokens:0; + for (k=1; k<i; k++) + addrs->tokens[k-1].next=addrs->tokens+k; + if (i) + addrs->tokens[i-1].next=0; + ++addrs; + } + ++*naddrs; + tokens += i; + ntokens -= i; + if (ntokens) /* Skip the '>' token */ + { + --ntokens; + ++tokens; + } + continue; + } + + /* Ok - old style address. Assume the worst */ + + /* Try to figure out where the address ends. It ends upon: + a comma, semicolon, or two consecutive atoms. */ + + flag=0; + for (i=0; i<ntokens && tokens[i].token != ',' && + tokens[i].token != ';'; i++) + { + if (tokens[i].token == '(') continue; + /* Ignore comments */ + if (tokens[i].token == 0 || tokens[i].token == '"') + /* Atom */ + { + if (flag) break; + flag=1; + } + else flag=0; + } + if (i == 0) /* Must be spurious comma, or something */ + { + ++tokens; + --ntokens; + continue; + } + + if (addrs) + { + addrs->name=0; + } + + /* Ok, now get rid of embedded comments in the address. + Consider the last comment to be the real name */ + + if (addrs) + { + struct rfc822token save_token; + + memset(&save_token, 0, sizeof(save_token)); + + for (j=k=0; j<i; j++) + { + if (tokens[j].token == '(') + { + save_token=tokens[j]; + continue; + } + tokens[k]=tokens[j]; + k++; + } + + if (save_token.ptr) + { + tokens[i-1]=save_token; + addrs->name=tokens+i-1; + addrs->name->next=0; + } + addrs->tokens=k ? tokens:NULL; + for (j=1; j<k; j++) + addrs->tokens[j-1].next=addrs->tokens+j; + if (k) + addrs->tokens[k-1].next=0; + ++addrs; + } + ++*naddrs; + tokens += i; + ntokens -= i; + } +} + +static void print_token(const struct rfc822token *token, + void (*print_func)(char, void *), void *ptr) +{ +const char *p; +int n; + + if (token->token == 0 || token->token == '(') + { + for (n=token->len, p=token->ptr; n; --n, ++p) + (*print_func)(*p, ptr); + return; + } + + if (token->token != '"') + { + (*print_func)(token->token, ptr); + return; + } + + (*print_func)('"', ptr); + n=token->len; + p=token->ptr; + while (n) + { + if (*p == '"' || (*p == '\\' && n == 1)) (*print_func)('\\', ptr); + if (*p == '\\' && n > 1) + { + (*print_func)('\\', ptr); + ++p; + --n; + } + (*print_func)(*p++, ptr); + --n; + } + (*print_func)('"', ptr); +} + +void rfc822tok_print(const struct rfc822token *token, + void (*print_func)(char, void *), void *ptr) +{ +int prev_isatom=0; +int isatom; + + while (token) + { + isatom=rfc822_is_atom(token->token); + if (prev_isatom && isatom) + (*print_func)(' ', ptr); + print_token(token, print_func, ptr); + prev_isatom=isatom; + token=token->next; + } +} + +static void rfc822_prname_int(const struct rfc822addr *addrs, + void (*print_func)(char, void *), + void *ptr) + +{ + struct rfc822token *i; + int n; + int prev_isatom=0; + int isatom=0; + + for (i=addrs->name; i; i=i->next, prev_isatom=isatom) + { + isatom=rfc822_is_atom(i->token); + if (isatom && prev_isatom) + (*print_func)(' ', ptr); + + if (i->token == '"') + { + for (n=0; n<i->len; n++) + { + if (i->ptr[n] == '\\' && + n + 1 < i->len) + ++n; + (*print_func)(i->ptr[n], ptr); + } + continue; + } + + if (i->token != '(') + { + print_token(i, print_func, ptr); + continue; + } + + for (n=2; n<i->len; n++) + (*print_func)(i->ptr[n-1], ptr); + } +} + +static void rfc822_print_common_nameaddr_cntlen(char c, void *p) +{ + ++ *(size_t *)p; +} + +static void rfc822_print_common_nameaddr_saveaddr(char c, void *p) +{ + char **cp=(char **)p; + + *(*cp)++=c; +} + +static int rfc822_print_common_nameaddr(const struct rfc822addr *addrs, + char *(*decode_func)(const char *, + const char *, int), + const char *chset, + void (*print_func)(char, void *), + void *ptr) +{ + size_t n=1; + char *addrbuf, *namebuf; + char *p, *q; + int print_braces=0; + + if (addrs->tokens) + rfc822tok_print(addrs->tokens, + rfc822_print_common_nameaddr_cntlen, &n); + + + p=addrbuf=malloc(n); + + if (!addrbuf) + return -1; + + if (addrs->tokens) + rfc822tok_print(addrs->tokens, + rfc822_print_common_nameaddr_saveaddr, &p); + + *p=0; + + n=1; + + rfc822_prname_int(addrs, + rfc822_print_common_nameaddr_cntlen, &n); + + p=namebuf=malloc(n); + + if (!p) + { + free(addrbuf); + return -1; + } + + rfc822_prname_int(addrs, + rfc822_print_common_nameaddr_saveaddr, &p); + + *p=0; + + p=(*decode_func)(namebuf, chset, 0); + + free(namebuf); + if (!p) + { + free(addrbuf); + return -1; + } + + for (namebuf=p; *p; p++) + { + print_braces=1; + (*print_func)(*p, ptr); + } + free(namebuf); + + p=(*decode_func)(addrbuf, chset, 1); + free(addrbuf); + + if (!p) + return -1; + + if (print_braces) + (*print_func)(' ', ptr); + + for (q=p; *q; ++q) + if (*q != '.' && *q != '@' && strchr(RFC822_SPECIALS, *q)) + { + print_braces=1; + break; + } + + if (print_braces) + (*print_func)('<', ptr); + + for (addrbuf=p; *p; p++) + (*print_func)(*p, ptr); + + if (print_braces) + (*print_func)('>', ptr); + + free(addrbuf); + return (0); +} + +int rfc822_print(const struct rfc822a *rfcp, void (*print_func)(char, void *), + void (*print_separator)(const char *s, void *), void *ptr) +{ + return rfc822_print_common(rfcp, 0, 0, print_func, print_separator, ptr); +} + +int rfc822_print_common(const struct rfc822a *rfcp, + char *(*decode_func)(const char *, const char *, int), + const char *chset, + void (*print_func)(char, void *), + void (*print_separator)(const char *, void *), + void *ptr) +{ +const struct rfc822addr *addrs=rfcp->addrs; +int naddrs=rfcp->naddrs; + + while (naddrs) + { + if (addrs->tokens == 0) + { + rfc822tok_print(addrs->name, print_func, ptr); + ++addrs; + --naddrs; + if (addrs[-1].name && naddrs) + { + struct rfc822token *t; + + for (t=addrs[-1].name; t && t->next; t=t->next) + ; + + if (t && (t->token == ':' || t->token == ';')) + (*print_separator)(" ", ptr); + } + continue; + } + else if (addrs->name && addrs->name->token == '(') + { /* old style */ + + if (!decode_func) + { + rfc822tok_print(addrs->tokens, print_func, ptr); + (*print_func)(' ', ptr); + rfc822tok_print(addrs->name, print_func, ptr); + } + else + { + if (rfc822_print_common_nameaddr(addrs, + decode_func, + chset, + print_func, + ptr) < 0) + return -1; + } + } + else + { + if (!decode_func) + { + int print_braces=0; + + if (addrs->name) + { + rfc822tok_print(addrs->name, + print_func, ptr); + (*print_func)(' ', ptr); + print_braces=1; + } +#if 1 + else + { + struct rfc822token *p; + + for (p=addrs->tokens; p && p->next; p=p->next) + if (rfc822_is_atom(p->token) && + rfc822_is_atom(p->next->token)) + print_braces=1; + } +#endif + + if (print_braces) + (*print_func)('<', ptr); + + rfc822tok_print(addrs->tokens, print_func, ptr); + + if (print_braces) + (*print_func)('>', ptr); + } + else + { + if (rfc822_print_common_nameaddr(addrs, + decode_func, + chset, + print_func, + ptr) < 0) + return -1; + } + } + ++addrs; + --naddrs; + if (naddrs) + if (addrs->tokens || (addrs->name && + rfc822_is_atom(addrs->name->token))) + (*print_separator)(", ", ptr); + } + return 0; +} + +void rfc822t_free(struct rfc822t *p) +{ + if (p->tokens) free(p->tokens); + free(p); +} + +void rfc822a_free(struct rfc822a *p) +{ + if (p->addrs) free(p->addrs); + free(p); +} + +void rfc822_deladdr(struct rfc822a *rfcp, int index) +{ +int i; + + if (index < 0 || index >= rfcp->naddrs) return; + + for (i=index+1; i<rfcp->naddrs; i++) + rfcp->addrs[i-1]=rfcp->addrs[i]; + if (--rfcp->naddrs == 0) + { + free(rfcp->addrs); + rfcp->addrs=0; + } +} + +struct rfc822t *rfc822t_alloc_new(const char *addr, + void (*err_func)(const char *, int, void *), void *voidp) +{ +struct rfc822t *p=(struct rfc822t *)malloc(sizeof(struct rfc822t)); + + if (!p) return (NULL); + memset(p, 0, sizeof(*p)); + + tokenize(addr, NULL, &p->ntokens, err_func, voidp); + p->tokens=p->ntokens ? (struct rfc822token *) + calloc(p->ntokens, sizeof(struct rfc822token)):0; + if (p->ntokens && !p->tokens) + { + rfc822t_free(p); + return (NULL); + } + tokenize(addr, p->tokens, &p->ntokens, NULL, NULL); + return (p); +} + +struct rfc822a *rfc822a_alloc(struct rfc822t *t) +{ +struct rfc822a *p=(struct rfc822a *)malloc(sizeof(struct rfc822a)); + + if (!p) return (NULL); + memset(p, 0, sizeof(*p)); + + parseaddr(t->tokens, t->ntokens, NULL, &p->naddrs); + p->addrs=p->naddrs ? (struct rfc822addr *) + calloc(p->naddrs, sizeof(struct rfc822addr)):0; + if (p->naddrs && !p->addrs) + { + rfc822a_free(p); + return (NULL); + } + parseaddr(t->tokens, t->ntokens, p->addrs, &p->naddrs); + return (p); +} diff --git a/rfc822/rfc822.h b/rfc822/rfc822.h new file mode 100644 index 0000000..3d437e3 --- /dev/null +++ b/rfc822/rfc822.h @@ -0,0 +1,296 @@ +/* +*/ +#ifndef rfc822_h +#define rfc822_h + +/* +** Copyright 1998 - 2009 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +#if HAVE_CONFIG_H +#include "rfc822/config.h" +#endif + +#include <time.h> + +#ifdef __cplusplus +extern "C" { +#endif + +#define RFC822_SPECIALS "()<>[]:;@\\,.\"" + +/* +** The text string we want to parse is first tokenized into an array of +** struct rfc822token records. 'ptr' points into the original text +** string, and 'len' has how many characters from 'ptr' belongs to this +** token. +*/ + +struct rfc822token { + struct rfc822token *next; /* Unused by librfc822, for use by + ** clients */ + int token; +/* + Values for token: + + '(' - comment + '"' - quoted string + '<', '>', '@', ',', ';', ':', '.', '[', ']', '%', '!', '=', '?', '/' - RFC atoms. + 0 - atom +*/ + +#define rfc822_is_atom(p) ( (p) == 0 || (p) == '"' || (p) == '(' ) + + const char *ptr; /* Pointer to value for the token. */ + int len; /* Length of token value */ +} ; + +/* +** After the struct rfc822token array is built, it is used to create +** the rfc822addr array, which is the array of addresses (plus +** syntactical fluff) extracted from those text strings. Each rfc822addr +** record has several possible interpretation: +** +** tokens is NULL - syntactical fluff, look in name/nname for tokens +** representing the syntactical fluff ( which is semicolons +** and list name: +** +** tokens is not NULL - actual address. The tokens representing the actual +** address is in tokens/ntokens. If there are comments in +** the address that are possible "real name" for the address +** they are saved in name/nname (name may be null if there +** is none). +** If nname is 1, and name points to a comment token, +** the address was specified in old-style format. Otherwise +** the address was specified in new-style route-addr format. +** +** The tokens and name pointers are set to point to the original rfc822token +** array. +*/ + +struct rfc822addr { + struct rfc822token *tokens; + struct rfc822token *name; +} ; + +/*************************************************************************** +** +** rfc822 tokens +** +***************************************************************************/ + +struct rfc822t { + struct rfc822token *tokens; + int ntokens; +} ; + +struct rfc822t *rfc822t_alloc_new(const char *p, + void (*err_func)(const char *, int, void *), void *); + /* Parse addresses */ + +void rfc822t_free(struct rfc822t *); /* Free rfc822 structure */ + +void rfc822tok_print(const struct rfc822token *, void (*)(char, void *), void *); + /* Print the tokens */ + +/*************************************************************************** +** +** rfc822 addresses +** +***************************************************************************/ + +struct rfc822a { + struct rfc822addr *addrs; + int naddrs; +} ; + +struct rfc822a *rfc822a_alloc(struct rfc822t *); +void rfc822a_free(struct rfc822a *); /* Free rfc822 structure */ + +void rfc822_deladdr(struct rfc822a *, int); + +/* rfc822_print "unparses" the rfc822 structure. Each rfc822addr is "printed" + (via the attached function). NOTE: instead of separating addresses by + commas, the print_separator function is called. +*/ + +int rfc822_print(const struct rfc822a *a, + void (*print_func)(char, void *), + void (*print_separator)(const char *, void *), void *); + +/* rfc822_print_common is an internal function */ + +int rfc822_print_common(const struct rfc822a *a, + char *(*decode_func)(const char *, const char *, int), + const char *chset, + void (*print_func)(char, void *), + void (*print_separator)(const char *, void *), void *); + +/* Extra functions */ + +char *rfc822_gettok(const struct rfc822token *); +char *rfc822_getaddr(const struct rfc822a *, int); +char *rfc822_getaddrs(const struct rfc822a *); +char *rfc822_getaddrs_wrap(const struct rfc822a *, int); + +void rfc822_mkdate_buf(time_t, char *); +const char *rfc822_mkdate(time_t); +time_t rfc822_parsedt(const char *); + +#define CORESUBJ_RE 1 +#define CORESUBJ_FWD 2 + +char *rfc822_coresubj(const char *, int *); +char *rfc822_coresubj_nouc(const char *, int *); +char *rfc822_coresubj_keepblobs(const char *s); + +/* +** Display a header. Takes a raw header value, and formats it for display +** in the given character set. +** +** hdrname -- header name. Determines whether the header contains addresses, +** or unstructured data. +** +** hdrvalue -- the actual value to format. +** +** display_func -- output function. +** +** err_func -- if this function returns a negative value, to indicate an error, +** this may be called just prior to the error return to indicate where the +** formatting error is, in the original header. +** +** ptr -- passthrough last argument to display_func or err_func. +** +** repeatedly invokes display_func to pass the formatted contents. +** +** Returns 0 upon success, -1 upon a failure. +*/ + +int rfc822_display_hdrvalue(const char *hdrname, + const char *hdrvalue, + const char *charset, + void (*display_func)(const char *, size_t, + void *), + void (*err_func)(const char *, int, void *), + void *ptr); + +/* +** Like rfc822_display_hdrvalue, except that the converted header is saved in +** a malloc-ed buffer. The pointer to the malloc-ed buffer is returned, the +** caller is responsible for free-ing it. An error condition is indicated +** by a NULL return value. +*/ + +char *rfc822_display_hdrvalue_tobuf(const char *hdrname, + const char *hdrvalue, + const char *charset, + void (*err_func)(const char *, int, + void *), + void *ptr); + +/* +** Display a recipient's name in a specific character set. +** +** The index-th recipient in the address structure is formatted for the given +** character set. If the index-th entry in the address structure is not +** a recipient address (it represents an obsolete list name indicator), +** this function reproduces it literally. +** +** If the index-th entry in the address structure is a recipient address without +** a name, the address itself is formatted for the given character set. +** +** If 'charset' is NULL, the name is formatted as is, without converting +** it to any character set. +** +** A callback function gets repeatedly invoked to produce the name. +** +** Returns a negative value upon a formatting error. +*/ + +int rfc822_display_name(const struct rfc822a *rfcp, int index, + const char *chset, + void (*print_func)(const char *, size_t, void *), + void *ptr); + +/* +** Display a recipient's name in a specific character set. +** +** Uses rfc822_display_name to place the generated name into a malloc-ed +** buffer. The caller must free it when it is no longer needed. +** +** Returns NULL upon an error. +*/ + +char *rfc822_display_name_tobuf(const struct rfc822a *rfcp, int index, + const char *chset); + +/* +** Display names of all addresses. Each name is followed by a newline +** character. +** +*/ +int rfc822_display_namelist(const struct rfc822a *rfcp, + const char *chset, + void (*print_func)(const char *, size_t, void *), + void *ptr); + +/* +** Display a recipient's address in a specific character set. +** +** The index-th recipient in the address structure is formatted for the given +** character set. If the index-th entry in the address structure is not +** a recipient address (it represents an obsolete list name indicator), +** this function produces an empty string. +** +** If 'charset' is NULL, the address is formatted as is, without converting +** it to any character set. +** +** A callback function gets repeatedly invoked to produce the address. +** +** Returns a negative value upon a formatting error. +*/ + +int rfc822_display_addr(const struct rfc822a *rfcp, int index, + const char *chset, + void (*print_func)(const char *, size_t, void *), + void *ptr); + +/* +** Like rfc822_display_addr, but the resulting displayable string is +** saved in a buffer. Returns a malloc-ed buffer, the caller is responsible +** for free()ing it. A NULL return indicates an error. +*/ + +char *rfc822_display_addr_tobuf(const struct rfc822a *rfcp, int index, + const char *chset); + +/* +** Like rfc822_display_addr, but the user@domain gets supplied in a string. +*/ +int rfc822_display_addr_str(const char *tok, + const char *chset, + void (*print_func)(const char *, size_t, void *), + void *ptr); + +/* +** Like rfc822_display_addr_str, but the resulting displayable string is +** saved in a buffer. Returns a malloc-ed buffer, the caller is responsible +** for free()ing it. A NULL return indicates an error. +*/ +char *rfc822_display_addr_str_tobuf(const char *tok, + const char *chset); + +/* +** address is a hostname, which is IDN-encoded. 'address' may contain an +** optional 'user@', which is preserved. Returns a malloc-ed buffer, the +** caller is responsible for freeing it. +*/ +char *rfc822_encode_domain(const char *address, + const char *charset); + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/rfc822/rfc822.sgml b/rfc822/rfc822.sgml new file mode 100644 index 0000000..f0d7c93 --- /dev/null +++ b/rfc822/rfc822.sgml @@ -0,0 +1,625 @@ +<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"> +<!-- Copyright 2001-2007 Double Precision, Inc. See COPYING for --> +<!-- distribution information. --> +<refentry> + <info><author><firstname>Sam</firstname><surname>Varshavchik</surname><contrib>Author</contrib></author><productname>Courier Mail Server</productname></info> + + <refmeta> + <refentrytitle>rfc822</refentrytitle> + <manvolnum>3</manvolnum> + <refmiscinfo class='manual'>Double Precision, Inc.</refmiscinfo> + </refmeta> + + <refnamediv> + <refname>rfc822</refname> + <refpurpose>RFC 822 parsing library</refpurpose> + </refnamediv> + + <refsynopsisdiv> + + <informalexample> + <programlisting format="linespecific"> +#include <rfc822.h> + +#include <rfc2047.h> + +cc ... -lrfc822 +</programlisting> + </informalexample> + </refsynopsisdiv> + + <refsect1> + <title>DESCRIPTION</title> + + <para> +The rfc822 library provides functions for parsing E-mail headers in the RFC +822 format. This library also includes some functions to help with encoding +and decoding 8-bit text, as defined by RFC 2047.</para> + + <para> +The format used by E-mail headers to encode sender and recipient +information is defined by +<ulink url="http://www.rfc-editor.org/rfc/rfc822.txt">RFC 822</ulink> +(and its successor, +<ulink url="http://www.rfc-editor.org/rfc/rfc2822.txt">RFC 2822</ulink>). +The format allows the actual E-mail +address and the sender/recipient name to be expressed together, for example: +<literal moreinfo="none">John Smith <jsmith@example.com></literal></para> + + <para> +The main purposes of the rfc822 library is to:</para> + + <para> +1) Parse a text string containing a list of RFC 822-formatted address into +its logical components: names and E-mail addresses.</para> + + <para> +2) Access those individual components.</para> + + <para> +3) Allow some limited modifications of the parsed structure, and then +convert it back into a text string.</para> + + <refsect2> + <title>Tokenizing an E-mail header</title> + + <informalexample> + <programlisting format="linespecific"> +struct rfc822t *tokens=rfc822t_alloc_new(const char *header, + void (*err_func)(const char *, int, void *), + void *func_arg); + +void rfc822t_free(tokens); +</programlisting> + </informalexample> + + <para> +The <function moreinfo="none">rfc822t_alloc_new</function>() function (superceeds +<function moreinfo="none">rfc822t_alloc</function>(), which is now +obsolete) accepts an E-mail <parameter moreinfo="none">header</parameter>, and parses it into +individual tokens. This function allocates and returns a pointer to an +<structname>rfc822t</structname> +structure, which is later used by +<function moreinfo="none">rfc822a_alloc</function>() to extract +individual addresses from these tokens.</para> + + <para> +If <parameter moreinfo="none">err_func</parameter> argument, if not NULL, is a pointer +to a callback +function. The function is called in the event that the E-mail header is +corrupted to the point that it cannot even be parsed. This is a rare instance +-- most forms of corruption are still valid at least on the lexical level. +The only time this error is reported is in the event of mismatched +parenthesis, angle brackets, or quotes. The callback function receives the +<parameter moreinfo="none">header</parameter> pointer, an index to the syntax error in the +header string, and the <parameter moreinfo="none">func_arg</parameter> argument.</para> + + <para> +The semantics of <parameter moreinfo="none">err_func</parameter> are subject to change. It is recommended +to leave this argument as NULL in the current version of the library.</para> + + <para> +<function moreinfo="none">rfc822t_alloc</function>() returns a pointer to a +dynamically-allocated <structname>rfc822t</structname> +structure. A NULL pointer is returned if there's insufficient memory to +allocate this structure. The <function moreinfo="none">rfc822t_free</function>() function +destroys +<structname>rfc822t</structname> structure and frees all +dynamically allocated memory.</para> + + <note> + <para> +Until <function moreinfo="none">rfc822t_free</function>() is called, the contents of +<parameter moreinfo="none">header</parameter> MUST +NOT be destroyed or altered in any way. The contents of +<parameter moreinfo="none">header</parameter> are not +modified by <function moreinfo="none">rfc822t_alloc</function>(), however the +<structname>rfc822t</structname> structure contains +pointers to portions of the supplied <parameter moreinfo="none">header</parameter>, +and they must remain valid.</para> + </note> + </refsect2> + + <refsect2> + <title>Extracting E-mail addresses</title> + + <informalexample> + <programlisting format="linespecific"> +struct rfc822a *addrs=rfc822a_alloc(struct rfc822t *tokens); + +void rfc822a_free(addrs); +</programlisting> + </informalexample> + + <para> +The <function moreinfo="none">rfc822a_alloc</function>() function returns a +dynamically-allocated <structname>rfc822a</structname> +structure, that contains individual addresses that were logically parsed +from a <structname>rfc822t</structname> structure. The +<function moreinfo="none">rfc822a_alloc</function>() function returns NULL if +there was insufficient memory to allocate the <structname>rfc822a</structname> structure. The +<function moreinfo="none">rfc822a_free</function>() function destroys the <structname>rfc822a</structname> function, and frees all +associated dynamically-allocated memory. The <structname>rfc822t</structname> structure passed +to <function moreinfo="none">rfc822a_alloc</function>() must not be destroyed before <function moreinfo="none">rfc822a_free</function>() destroys the +<structname>rfc822a</structname> structure.</para> + + <para> +The <structname>rfc822a</structname> structure has the following fields:</para> + <informalexample> + <programlisting format="linespecific"> +struct rfc822a { + struct rfc822addr *addrs; + int naddrs; +} ; +</programlisting> + </informalexample> + + <para> +The <structfield>naddrs</structfield> field gives the number of +<structname>rfc822addr</structname> structures +that are pointed to by <structfield>addrs</structfield>, which is an array. +Each <structname>rfc822addr</structname> +structure represents either an address found in the original E-mail header, +<emphasis>or the contents of some legacy "syntactical sugar"</emphasis>. +For example, the +following is a valid E-mail header:</para> + + <informalexample> + <programlisting format="linespecific"> +To: recipient-list: tom@example.com, john@example.com; +</programlisting> + </informalexample> + + <para>Typically, all of this, except for "<literal moreinfo="none">To:</literal>", +is tokenized by <function moreinfo="none">rfc822t_alloc</function>(), then parsed by +<function moreinfo="none">rfc822a_alloc</function>(). +"<literal moreinfo="none">recipient-list:</literal>" and +the trailing semicolon is a legacy mailing list specification that is no +longer in widespread use, but must still must be accounted for. The resulting +<structname>rfc822a</structname> structure will have four +<structname>rfc822addr</structname> structures: one for +"<literal moreinfo="none">recipient-list:</literal>"; +one for each address; and one for the trailing semicolon. +Each <structname>rfc822a</structname> structure has the following +fields:</para> + <informalexample> + <programlisting format="linespecific"> +struct rfc822addr { + struct rfc822token *tokens; + struct rfc822token *name; +} ; +</programlisting> + </informalexample> + + <para> +If <structfield>tokens</structfield> is a null pointer, this structure +represents some +non-address portion of the original header, such as +"<literal moreinfo="none">recipient-list:</literal>" or a +semicolon. Otherwise it points to a structure that represents the E-mail +address in tokenized form.</para> + + <para> +<structfield>name</structfield> either points to the tokenized form of a +non-address portion of +the original header, or to a tokenized form of the recipient's name. +<structfield>name</structfield> will be NULL if the recipient name was not provided. For the +following address: +<literal moreinfo="none">Tom Jones <tjones@example.com></literal> - the +<structfield>tokens</structfield> field points to the tokenized form of +"<literal moreinfo="none">tjones@example.com</literal>", +and <structfield>name</structfield> points to the tokenized form of +"<literal moreinfo="none">Tom Jones</literal>".</para> + + <para> +Each <structname>rfc822token</structname> structure contains the following +fields:</para> + <informalexample> + <programlisting format="linespecific"> +struct rfc822token { + struct rfc822token *next; + int token; + const char *ptr; + int len; +} ; +</programlisting> + </informalexample> + + <para> +The <structfield>next</structfield> pointer builds a linked list of all +tokens in this name or +address. The possible values for the <structfield>token</structfield> field +are:</para> + + <variablelist> + <varlistentry> + <term>0x00</term> + <listitem> + <para> +This is a simple atom - a sequence of non-special characters that +is delimited by whitespace or special characters (see below).</para> + </listitem> + </varlistentry> + <varlistentry> + <term>0x22</term> + <listitem> + <para> +The value of the ascii quote - this is a quoted string.</para> + </listitem> + </varlistentry> + <varlistentry> + <term>Open parenthesis: '('</term> + <listitem> + <para> +This is an old style comment. A deprecated form of E-mail +addressing uses - for example - +"<literal moreinfo="none">john@example.com (John Smith)</literal>" instead of +"<literal moreinfo="none">John Smith <john@example.com></literal>". +This old-style notation defined +parenthesized content as arbitrary comments. +The <structname>rfc822token</structname> with +<structfield>token</structfield> set to '(' is created for the contents of +the entire comment.</para> + </listitem> + </varlistentry> + <varlistentry> + <term>Symbols: '<', '>', '@', and many others</term> + <listitem> + <para> +The remaining possible values of <structfield>token</structfield> include all +the characters in RFC 822 headers that have special significance.</para> + </listitem> + </varlistentry> + </variablelist> + + <para> +When a <structname>rfc822token</structname> structure does not represent a +special character, the <structfield>ptr</structfield> field points to a text +string giving its contents. +The contents are NOT null-terminated, the <structfield>len</structfield> +field contains the number of characters included. +The macro rfc822_is_atom(token) indicates whether +<structfield>ptr</structfield> and <structfield>len</structfield> are used for +the given <structfield>token</structfield>. +Currently <function moreinfo="none">rfc822_is_atom</function>() returns true if +<structfield>token</structfield> is a zero byte, '<literal moreinfo="none">"</literal>', or +'<literal moreinfo="none">(</literal>'.</para> + + <para> +Note that it's possible that <structfield>len</structfield> might be zero. +This happens with null addresses used as return addresses for delivery status +notifications.</para> + </refsect2> + + <refsect2> + <title>Working with E-mail addresses</title> + <informalexample> + <programlisting format="linespecific"> +void rfc822_deladdr(struct rfc822a *addrs, int index); + +void rfc822tok_print(const struct rfc822token *list, + void (*func)(char, void *), void *func_arg); + +void rfc822_print(const struct rfc822a *addrs, + void (*print_func)(char, void *), + void (*print_separator)(const char *, void *), void *callback_arg); + +void rfc822_addrlist(const struct rfc822a *addrs, + void (*print_func)(char, void *), + void *callback_arg); + +void rfc822_namelist(const struct rfc822a *addrs, + void (*print_func)(char, void *), + void *callback_arg); + +void rfc822_praddr(const struct rfc822a *addrs, + int index, + void (*print_func)(char, void *), + void *callback_arg); + +void rfc822_prname(const struct rfc822a *addrs, + int index, + void (*print_func)(char, void *), + void *callback_arg); + +void rfc822_prname_orlist(const struct rfc822a *addrs, + int index, + void (*print_func)(char, void *), + void *callback_arg); + +char *rfc822_gettok(const struct rfc822token *list); +char *rfc822_getaddrs(const struct rfc822a *addrs); +char *rfc822_getaddr(const struct rfc822a *addrs, int index); +char *rfc822_getname(const struct rfc822a *addrs, int index); +char *rfc822_getname_orlist(const struct rfc822a *addrs, int index); + +char *rfc822_getaddrs_wrap(const struct rfc822a *, int); +</programlisting> + </informalexample> + + <para> +These functions are used to work with individual addresses that are parsed +by <function moreinfo="none">rfc822a_alloc</function>().</para> + + <para> +<function moreinfo="none">rfc822_deladdr</function>() removes a single +<structname>rfc822addr</structname> structure, whose +<parameter moreinfo="none">index</parameter> is given, from the address array in +<structname>rfc822addr</structname>. +<structfield>naddrs</structfield> is decremented by one.</para> + + <para> +<function moreinfo="none">rfc822tok_print</function>() converts a tokenized +<parameter moreinfo="none">list</parameter> of <structname>rfc822token</structname> +objects into a text string. The callback function, +<parameter moreinfo="none">func</parameter>, is called one +character at a time, for every character in the tokenized objects. An +arbitrary pointer, <parameter moreinfo="none">func_arg</parameter>, is passed unchanged as +the additional argument to the callback function. +<function moreinfo="none">rfc822tok_print</function>() is not usually the most +convenient and efficient function, but it has its uses.</para> + + <para> +<function moreinfo="none">rfc822_print</function>() takes an entire +<structname>rfc822a</structname> structure, and uses the +callback functions to print the contained addresses, in their original form, +separated by commas. The function pointed to by +<parameter moreinfo="none">print_func</parameter> is used to +print each individual address, one character at a time. Between the +addresses, the <parameter moreinfo="none">print_separator</parameter> function is called to +print the address separator, usually the string ", ". +The <parameter moreinfo="none">callback_arg</parameter> argument is passed +along unchanged, as an additional argument to these functions.</para> + + <para> +The functions <function moreinfo="none">rfc822_addrlist</function>() and +<function moreinfo="none">rfc822_namelist</function>() also print the +contents of the entire <structname>rfc822a</structname> structure, but in a +different way. +<function moreinfo="none">rfc822_addrlist</function>() prints just the actual E-mail +addresses, not the recipient +names or comments. Each E-mail address is followed by a newline character. +<function moreinfo="none">rfc822_namelist</function>() prints just the names or comments, +followed by newlines.</para> + + <para> +The functions <function moreinfo="none">rfc822_praddr</function>() and +<function moreinfo="none">rfc822_prname</function>() are just like +<function moreinfo="none">rfc822_addrlist</function>() and +<function moreinfo="none">rfc822_namelist</function>(), except that they print a single name +or address in the <structname>rfc822a</structname> structure, given its +<parameter moreinfo="none">index</parameter>. The +functions <function moreinfo="none">rfc822_gettok</function>(), +<function moreinfo="none">rfc822_getaddrs</function>(), <function moreinfo="none">rfc822_getaddr</function>(), +and <function moreinfo="none">rfc822_getname</function>() are equivalent to +<function moreinfo="none">rfc822tok_print</function>(), <function moreinfo="none">rfc822_print</function>(), +<function moreinfo="none">rfc822_praddr</function>() and <function moreinfo="none">rfc822_prname</function>(), +but, instead of using a callback function +pointer, these functions write the output into a dynamically allocated buffer. +That buffer must be destroyed by <function moreinfo="none">free</function>(3) after use. +These functions will +return a null pointer in the event of a failure to allocate memory for the +buffer.</para> + + <para> +<function moreinfo="none">rfc822_prname_orlist</function>() is similar to +<function moreinfo="none">rfc822_prname</function>(), except that it will +also print the legacy RFC822 group list syntax (which are also parsed by +<function moreinfo="none">rfc822a_alloc</function>()). <function moreinfo="none">rfc822_praddr</function>() +will print an empty string for an index +that corresponds to a group list name (or terminated semicolon). +<function moreinfo="none">rfc822_prname</function>() will also print an empty string. +<function moreinfo="none">rfc822_prname_orlist</function>() will +instead print either the name of the group list, or a single string ";". +<function moreinfo="none">rfc822_getname_orlist</function>() will instead save it into a +dynamically allocated buffer.</para> + + <para> +The function <function moreinfo="none">rfc822_getaddrs_wrap</function>() is similar to +<function moreinfo="none">rfc822_getaddrs</function>(), except +that the generated text is wrapped on or about the 73rd column, using +newline characters.</para> + + </refsect2> + + <refsect2> + <title>Working with dates</title> + <informalexample> + <programlisting format="linespecific"> +time_t timestamp=rfc822_parsedt(const char *datestr) +const char *datestr=rfc822_mkdate(time_t timestamp); +void rfc822_mkdate_buf(time_t timestamp, char *buffer); +</programlisting> + </informalexample> + + <para> +These functions convert between timestamps and dates expressed in the +<literal moreinfo="none">Date:</literal> E-mail header format.</para> + + <para> +<function moreinfo="none">rfc822_parsedt</function>() returns the timestamp corresponding to +the given date string (0 if there was a syntax error).</para> + + <para> +<function moreinfo="none">rfc822_mkdate</function>() returns a date string corresponding to +the given timestamp. +<function moreinfo="none">rfc822_mkdate_buf</function>() writes the date string into the +given buffer instead, +which must be big enough to accommodate it.</para> + + </refsect2> + + <refsect2> + <title>Working with 8-bit MIME-encoded headers</title> + + <informalexample> + <programlisting format="linespecific"> +int error=rfc2047_decode(const char *text, + int (*callback_func)(const char *, int, const char *, void *), + void *callback_arg); + +extern char *str=rfc2047_decode_simple(const char *text); + +extern char *str=rfc2047_decode_enhanced(const char *text, + const char *charset); + +void rfc2047_print(const struct rfc822a *a, + const char *charset, + void (*print_func)(char, void *), + void (*print_separator)(const char *, void *), void *); + + +char *buffer=rfc2047_encode_str(const char *string, + const char *charset); + +int error=rfc2047_encode_callback(const char *string, + const char *charset, + int (*func)(const char *, size_t, void *), + void *callback_arg); + +char *buffer=rfc2047_encode_header(const struct rfc822a *a, + const char *charset); +</programlisting> + </informalexample> + + <para> +These functions provide additional logic to encode or decode 8-bit content +in 7-bit RFC 822 headers, as specified in RFC 2047.</para> + + <para> +<function moreinfo="none">rfc2047_decode</function>() is a basic RFC 2047 decoding function. +It receives a +pointer to some 7bit RFC 2047-encoded text, and a callback function. The +callback function is repeatedly called. Each time it's called it receives a +piece of decoded text. The arguments are: a pointer to a text fragment, number +of bytes in the text fragment, followed by a pointer to the character set of +the text fragment. The character set pointer is NULL for portions of the +original text that are not RFC 2047-encoded.</para> + + <para> +The callback function also receives <parameter moreinfo="none">callback_arg</parameter>, as +its last +argument. If the callback function returns a non-zero value, +<function moreinfo="none">rfc2047_decode</function>() +terminates, returning that value. Otherwise, +<function moreinfo="none">rfc2047_decode</function>() returns 0 after +a successful decoding. <function moreinfo="none">rfc2047_decode</function>() returns -1 if it +was unable to allocate sufficient memory.</para> + + <para> +<function moreinfo="none">rfc2047_decode_simple</function>() and +<function moreinfo="none">rfc2047_decode_enhanced</function>() are alternatives to +<function moreinfo="none">rfc2047_decode</function>() which forego a callback function, and +return the decoded text +in a dynamically-allocated memory buffer. The buffer must be +<function moreinfo="none">free</function>(3)-ed after +use. <function moreinfo="none">rfc2047_decode_simple</function>() discards all character set +specifications, and +merely decodes any 8-bit text. <function moreinfo="none">rfc2047_decode_enhanced</function>() +is a compromise to +discarding all character set information. The local character set being used +is specified as the second argument to +<function moreinfo="none">rfc2047_decode_enhanced</function>(). Any RFC +2047-encoded text in a different character set will be prefixed by the name of +the character set, in brackets, in the resulting output.</para> + + <para> +<function moreinfo="none">rfc2047_decode_simple</function>() and +<function moreinfo="none">rfc2047_decode_enhanced</function>() return a null pointer +if they are unable to allocate sufficient memory.</para> + + <para> +The <function moreinfo="none">rfc2047_print</function>() function is equivalent to +<function moreinfo="none">rfc822_print</function>(), followed by +<function moreinfo="none">rfc2047_decode_enhanced</function>() on the result. The callback +functions are used in +an identical fashion, except that they receive text that's already +decoded.</para> + + <para> +The function <function moreinfo="none">rfc2047_encode_str</function>() takes a +<parameter moreinfo="none">string</parameter> and <parameter moreinfo="none">charset</parameter> +being the name of the local character set, then encodes any 8-bit portions of +<parameter moreinfo="none">string</parameter> using RFC 2047 encoding. +<function moreinfo="none">rfc2047_encode_str</function>() returns a +dynamically-allocated buffer with the result, which must be +<function moreinfo="none">free</function>(3)-ed after +use, or NULL if there was insufficient memory to allocate the buffer.</para> + + <para> +The function <function moreinfo="none">rfc2047_encode_callback</function>() is similar to +<function moreinfo="none">rfc2047_encode_str</function>() +except that the callback function is repeatedly called to received the +encoding string. Each invocation of the callback function receives a pointer +to a portion of the encoded text, the number of characters in this portion, +and <parameter moreinfo="none">callback_arg</parameter>.</para> + + <para> +The function <function moreinfo="none">rfc2047_encode_header</function>() is basically +equivalent to <function moreinfo="none">rfc822_getaddrs</function>(), followed by +<function moreinfo="none">rfc2047_encode_str</function>();</para> + + </refsect2> + + <refsect2> + + <title>Working with subjects</title> + + <informalexample> + <programlisting format="linespecific"> +char *basesubj=rfc822_coresubj(const char *subj); + +char *basesubj=rfc822_coresubj_nouc(const char *subj); +</programlisting> + </informalexample> + + <para> +This function takes the contents of the subject header, and returns the +"core" subject header that's used in the specification of the IMAP THREAD +function. This function is designed to strip all subject line artifacts that +might've been added in the process of forwarding or replying to a message. +Currently, <function moreinfo="none">rfc822_coresubj</function>() performs the following transformations:</para> + <variablelist> + <varlistentry> + <term>Whitespace</term> + <listitem> + <para>Leading and trailing whitespace is removed. Consecutive +whitespace characters are collapsed into a single whitespace character. +All whitespace characters are replaced by a space.</para> + </listitem> + </varlistentry> + <varlistentry> + <term>Re:, (fwd) [foo]</term> + <listitem> + <para> +These artifacts (and several others) are removed from +the subject line.</para> + </listitem> + </varlistentry> + </variablelist> + + <para>Note that this function does NOT do MIME decoding. In order to +implement IMAP THREAD, it is necessary to call something like +<function moreinfo="none">rfc2047_decode</function>() before +calling <function moreinfo="none">rfc822_coresubj</function>().</para> + + <para> +This function returns a pointer to a dynamically-allocated buffer, which +must be <function moreinfo="none">free</function>(3)-ed after use.</para> + + <para> +<function moreinfo="none">rfc822_coresubj_nouc</function>() is like +<function moreinfo="none">rfc822_coresubj</function>(), except that the subject +is not converted to uppercase.</para> + </refsect2> + </refsect1> + + <refsect1> + <title>SEE ALSO</title> + + <para> +<ulink url="rfc2045.html"><citerefentry><refentrytitle>rfc2045</refentrytitle><manvolnum>3</manvolnum></citerefentry></ulink>, +<ulink url="reformail.html"><citerefentry><refentrytitle>reformail</refentrytitle><manvolnum>1</manvolnum></citerefentry></ulink>, +<ulink url="reformime.html"><citerefentry><refentrytitle>reformime</refentrytitle><manvolnum>1</manvolnum></citerefentry></ulink>.</para> + </refsect1> +</refentry> diff --git a/rfc822/rfc822_getaddr.c b/rfc822/rfc822_getaddr.c new file mode 100644 index 0000000..6286727 --- /dev/null +++ b/rfc822/rfc822_getaddr.c @@ -0,0 +1,46 @@ +/* +** Copyright 1998 - 2008 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +/* +*/ +#include "rfc822.h" +#include <stdlib.h> + +static void cntlen(char c, void *p) +{ + if (c != '\n') + ++ *(size_t *)p; +} + +static void saveaddr(char c, void *p) +{ + if (c != '\n') + { + char **cp=(char **)p; + + *(*cp)++=c; + } +} + +char *rfc822_getaddr(const struct rfc822a *rfc, int n) +{ + return rfc822_display_addr_tobuf(rfc, n, NULL); +} + +char *rfc822_gettok(const struct rfc822token *t) +{ +size_t addrbuflen=0; +char *addrbuf, *ptr; + + rfc822tok_print(t, &cntlen, &addrbuflen); + + if (!(addrbuf=malloc(addrbuflen+1))) + return (0); + + ptr=addrbuf; + rfc822tok_print(t, &saveaddr, &ptr); + addrbuf[addrbuflen]=0; + return (addrbuf); +} diff --git a/rfc822/rfc822_getaddrs.c b/rfc822/rfc822_getaddrs.c new file mode 100644 index 0000000..7fffc40 --- /dev/null +++ b/rfc822/rfc822_getaddrs.c @@ -0,0 +1,108 @@ +/* +** Copyright 1998 - 2009 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +/* +*/ +#include "rfc822.h" +#include <stdlib.h> + +static void cntlen(char c, void *p) +{ + c=c; + ++ *(size_t *)p; +} + +static void cntlensep(const char *p, void *ptr) +{ + while (*p) cntlen(*p++, ptr); +} + +static void saveaddr(char c, void *ptr) +{ + *(*(char **)ptr)++=c; +} + +static void saveaddrsep(const char *p, void *ptr) +{ + while (*p) saveaddr(*p++, ptr); +} + +char *rfc822_getaddrs(const struct rfc822a *rfc) +{ + size_t addrbuflen=0; + char *addrbuf, *ptr; + + if (rfc822_print(rfc, &cntlen, &cntlensep, &addrbuflen) < 0) + return NULL; + + if (!(addrbuf=malloc(addrbuflen+1))) + return (0); + + ptr=addrbuf; + if (rfc822_print(rfc, &saveaddr, &saveaddrsep, &ptr) < 0) + { + free(addrbuf); + return NULL; + } + + addrbuf[addrbuflen]=0; + return (addrbuf); +} + +static void saveaddrsep_wrap(const char *p, void *ptr) +{ +int c; + + while ((c=*p++) != 0) + { + if (c == ' ') c='\n'; + saveaddr(c, ptr); + } +} + +char *rfc822_getaddrs_wrap(const struct rfc822a *rfc, int w) +{ + size_t addrbuflen=0; + char *addrbuf, *ptr, *start, *lastnl; + + if (rfc822_print(rfc, &cntlen, &cntlensep, &addrbuflen) < 0) + return NULL; + + if (!(addrbuf=malloc(addrbuflen+1))) + return (0); + + ptr=addrbuf; + + if (rfc822_print(rfc, &saveaddr, &saveaddrsep_wrap, &ptr) < 0) + { + free(addrbuf); + return NULL; + } + + addrbuf[addrbuflen]=0; + + for (lastnl=0, start=ptr=addrbuf; *ptr; ) + { + while (*ptr && *ptr != '\n') ptr++; + if (ptr-start < w) + { + if (lastnl) *lastnl=' '; + lastnl=ptr; + if (*ptr) ++ptr; + } + else + { + if (lastnl) + start=lastnl+1; + else + { + start=ptr+1; + if (*ptr) ++ptr; + } + lastnl=0; + } + } + return (addrbuf); +} diff --git a/rfc822/rfc822_mkdate.c b/rfc822/rfc822_mkdate.c new file mode 100644 index 0000000..ad84276 --- /dev/null +++ b/rfc822/rfc822_mkdate.c @@ -0,0 +1,112 @@ +/* +** Copyright 1998 - 1999 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +/* +*/ + +#include "rfc822.h" + +#include <sys/types.h> +#include <time.h> +#include <stdio.h> +#include <string.h> +#if HAVE_UNISTD_H +#include <unistd.h> +#endif + +static const char * const months[]={ + "Jan", + "Feb", + "Mar", + "Apr", + "May", + "Jun", + "Jul", + "Aug", + "Sep", + "Oct", + "Nov", + "Dec"}; + +static const char * const wdays[]={ + "Sun", + "Mon", + "Tue", + "Wed", + "Thu", + "Fri", + "Sat"}; + +void rfc822_mkdate_buf(time_t t, char *buf) +{ +struct tm *p; +int offset; + +#if USE_TIME_ALTZONE + + p=localtime(&t); + offset= -(int)timezone; + + if (p->tm_isdst > 0) + offset= -(int)altzone; + + if (offset % 60) + { + offset=0; + p=gmtime(&t); + } + offset /= 60; +#else +#if USE_TIME_DAYLIGHT + + p=localtime(&t); + offset= -(int)timezone; + + if (p->tm_isdst > 0) + offset += 60*60; + if (offset % 60) + { + offset=0; + p=gmtime(&t); + } + offset /= 60; +#else +#if USE_TIME_GMTOFF + p=localtime(&t); + offset= p->tm_gmtoff; + + if (offset % 60) + { + offset=0; + p=gmtime(&t); + } + offset /= 60; +#else + p=gmtime(&t); + offset=0; +#endif +#endif +#endif + + offset = (offset % 60) + offset / 60 * 100; + + sprintf(buf, "%s, %02d %s %04d %02d:%02d:%02d %+05d", + wdays[p->tm_wday], + p->tm_mday, + months[p->tm_mon], + p->tm_year+1900, + p->tm_hour, + p->tm_min, + p->tm_sec, + offset); +} + +const char *rfc822_mkdate(time_t t) +{ +static char buf[50]; + + rfc822_mkdate_buf(t, buf); + return (buf); +} diff --git a/rfc822/rfc822_parsedt.c b/rfc822/rfc822_parsedt.c new file mode 100644 index 0000000..036be34 --- /dev/null +++ b/rfc822/rfc822_parsedt.c @@ -0,0 +1,256 @@ +/* +** Copyright 1998 - 2011 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +/* +*/ +#include "config.h" +#include <stdio.h> +#include <string.h> +#include <time.h> + +#define my_isalpha(c) ( ( (c) >= 'a' && (c) <= 'z' ) || \ + ( (c) >= 'A' && (c) <= 'Z' ) ) + +#define my_isdigit(c) ( (c) >= '0' && (c) <= '9' ) + +#define my_isalnum(c) ( my_isalpha(c) || my_isdigit(c) ) + +#define my_isspace(c) ( (c) == ' ' || (c) == '\t' || (c) == '\r' || (c) == '\n') + +/* +** time_t rfc822_parsedate(const char *p) +** +** p - contents of the Date: header, attempt to parse it into a time_t. +** +** returns - time_t, or 0 if the date cannot be parsed +*/ + +static unsigned parsedig(const char **p) +{ + unsigned i=0; + + while (my_isdigit(**p)) + { + i=i*10 + **p - '0'; + ++*p; + } + return (i); +} + +static const char * const weekdays[7]={ + "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" + } ; + +static const char * const mnames[13]={ + "Jan", "Feb", "Mar", "Apr", + "May", "Jun", "Jul", "Aug", + "Sep", "Oct", "Nov", "Dec", NULL}; + +#define leap(y) ( \ + ((y) % 400) == 0 || \ + (((y) % 4) == 0 && (y) % 100) ) + +static unsigned mlength[]={31,28,31,30,31,30,31,31,30,31,30,31}; +#define mdays(m,y) ( (m) != 2 ? mlength[(m)-1] : leap(y) ? 29:28) + +static const char * const zonenames[] = { + "UT","GMT", + "EST","EDT", + "CST","CDT", + "MST","MDT", + "PST","PDT", + "Z", + "A", "B", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", + "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", + NULL}; + +#define ZH(n) ( (n) * 60 * 60 ) + +static int zoneoffset[] = { + 0, 0, + ZH(-5), ZH(-4), + ZH(-6), ZH(-5), + ZH(-7), ZH(-6), + ZH(-8), ZH(-7), + 0, + + ZH(-1), ZH(-2), ZH(-3), ZH(-4), ZH(-5), ZH(-6), ZH(-7), ZH(-8), ZH(-9), ZH(-10), ZH(-11), ZH(-12), + ZH(1), ZH(2), ZH(3), ZH(4), ZH(5), ZH(6), ZH(7), ZH(8), ZH(9), ZH(10), ZH(11), ZH(12) }; + +#define lc(x) ((x) >= 'A' && (x) <= 'Z' ? (x) + ('a'-'A'):(x)) + +static unsigned parsekey(const char **mon, const char * const *ary) +{ +unsigned m, j; + + for (m=0; ary[m]; m++) + { + for (j=0; ary[m][j]; j++) + if (lc(ary[m][j]) != lc((*mon)[j])) + break; + if (!ary[m][j]) + { + *mon += j; + return (m+1); + } + } + return (0); +} + +static int parsetime(const char **t) +{ + unsigned h,m,s=0; + + if (!my_isdigit(**t)) return (-1); + + h=parsedig(t); + if (h > 23) return (-1); + if (**t != ':') return (-1); + ++*t; + if (!my_isdigit(**t)) return (-1); + m=parsedig(t); + if (**t == ':') + { + ++*t; + + if (!my_isdigit(**t)) return (-1); + s=parsedig(t); + } + if (m > 59 || s > 59) return (-1); + return (h * 60 * 60 + m * 60 + s); +} + +time_t rfc822_parsedt(const char *rfcdt) +{ +unsigned day=0, mon=0, year; +int secs; +int offset; +time_t t; +unsigned y; + + /* Ignore day of the week. Tolerate "Tue, 25 Feb 1997 ... " + ** without the comma. Tolerate "Feb 25 1997 ...". + */ + + while (!day || !mon) + { + if (!*rfcdt) return (0); + if (my_isalpha(*rfcdt)) + { + if (mon) return (0); + mon=parsekey(&rfcdt, mnames); + if (!mon) + while (*rfcdt && my_isalpha(*rfcdt)) + ++rfcdt; + continue; + } + + if (my_isdigit(*rfcdt)) + { + if (day) return (0); + day=parsedig(&rfcdt); + if (!day) return (0); + continue; + } + ++rfcdt; + } + + while (*rfcdt && my_isspace(*rfcdt)) + ++rfcdt; + if (!my_isdigit(*rfcdt)) return (0); + year=parsedig(&rfcdt); + if (year < 70) year += 2000; + if (year < 100) year += 1900; + + while (*rfcdt && my_isspace(*rfcdt)) + ++rfcdt; + + if (day == 0 || mon == 0 || mon > 12 || day > mdays(mon,year)) + return (0); + + secs=parsetime(&rfcdt); + if (secs < 0) return (0); + + offset=0; + + /* RFC822 sez no parenthesis, but I've seen (EST) */ + + while ( *rfcdt ) + { + if (my_isalnum(*rfcdt) || *rfcdt == '+' || *rfcdt == '-') + break; + ++rfcdt; + } + + if (my_isalpha((int)(unsigned char)*rfcdt)) + { + int n=parsekey(&rfcdt, zonenames); + + if (n > 0) offset= zoneoffset[n-1]; + } + else + { + int sign=1; + unsigned n; + + switch (*rfcdt) { + case '-': + sign= -1; + case '+': + ++rfcdt; + } + + if (my_isdigit(*rfcdt)) + { + n=parsedig(&rfcdt); + if (n > 2359 || (n % 100) > 59) n=0; + offset = sign * ( (n % 100) * 60 + n / 100 * 60 * 60); + } + } + + if (year < 1970) return (0); + if (year > 9999) return (0); + + t=0; + for (y=1970; y<year; y++) + { + if ( leap(y) ) + { + if (year-y >= 4) + { + y += 3; + t += ( 365*3+366 ) * 24 * 60 * 60; + continue; + } + t += 24 * 60 * 60; + } + t += 365 * 24 * 60 * 60; + } + + for (y=1; y < mon; y++) + t += mdays(y, year) * 24 * 60 * 60; + + return ( t + (day-1) * 24 * 60 * 60 + secs - offset ); +} + +const char *rfc822_mkdt(time_t t) +{ +static char buf[80]; +struct tm *tmptr=gmtime(&t); + + buf[0]=0; + if (tmptr) + { + sprintf(buf, "%s, %02d %s %04d %02d:%02d:%02d GMT", + weekdays[tmptr->tm_wday], + tmptr->tm_mday, + mnames[tmptr->tm_mon], + tmptr->tm_year + 1900, + tmptr->tm_hour, + tmptr->tm_min, + tmptr->tm_sec); + } + return (buf); +} diff --git a/rfc822/rfc822hdr.c b/rfc822/rfc822hdr.c new file mode 100644 index 0000000..1ce3106 --- /dev/null +++ b/rfc822/rfc822hdr.c @@ -0,0 +1,160 @@ +/* +** Copyright 2001-2011 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +#include "config.h" +#include <stdio.h> +#include <ctype.h> +#include <stdlib.h> +#include <string.h> +#include "rfc822hdr.h" + + +/* +** Read the next mail header. +*/ + +int rfc822hdr_read(struct rfc822hdr *h, FILE *f, off_t *pos, off_t epos) +{ + size_t n=0; + int c; + + for (;;) + { + if ( n >= h->hdrsize) + { + size_t hn=h->hdrsize + 1024; + char *p= h->header ? realloc(h->header, hn): + malloc(hn); + + if (!p) + return (-1); + + h->header=p; + h->hdrsize=hn; + } + + if (pos && *pos >= epos) + { + h->header[n]=0; + break; + } + + c=getc(f); + if (c == EOF) + { + if (pos) + *pos=epos; + h->header[n]=0; + break; + } + if (pos) + ++*pos; + + h->header[n]=c; + if (c == '\n') + { + if (n == 0) + { + if (pos) + *pos=epos; + h->header[n]=0; + break; + } + + c=getc(f); + if (c != EOF) + ungetc(c, f); + if (c == '\n' || c == '\r' || + !isspace((int)(unsigned char)c)) + { + h->header[n]=0; + break; + } + } + n++; + if (h->maxsize && n + 2 > h->maxsize) + --n; + } + + if (n == 0) + { + if (pos) + *pos=epos; + h->value=h->header; + return (1); + } + + for (h->value=h->header; *h->value; ++h->value) + { + if (*h->value == ':') + { + *h->value++=0; + while (*h->value && + isspace((int)(unsigned char)*h->value)) + ++h->value; + break; + } + } + return (0); +} + +void rfc822hdr_fixname(struct rfc822hdr *h) +{ + char *p; + + for (p=h->header; *p; p++) + { + *p=tolower((int)(unsigned char)*p); + } +} + +void rfc822hdr_collapse(struct rfc822hdr *h) +{ + char *p, *q; + + for (p=q=h->value; *p; ) + { + if (*p == '\n') + { + while (*p && isspace((int)(unsigned char)*p)) + ++p; + *q++=' '; + continue; + } + *q++ = *p++; + } + *q=0; +} + +/* This is, basically, a case-insensitive US-ASCII comparison function */ + +#define lc(x) ((x) >= 'A' && (x) <= 'Z' ? (x) + ('a'-'A'):(x)) + +int rfc822hdr_namecmp(const char *a, const char *b) +{ + int rc; + + while ((rc=(int)(unsigned char)lc(*a) - (int)(unsigned char)lc(*b))==0) + { + if (!*a) + return 0; + ++a; + ++b; + } + + return rc; +} + +int rfc822hdr_is_addr(const char *hdr) +{ + return rfc822hdr_namecmp(hdr, "from") == 0 || + rfc822hdr_namecmp(hdr, "to") == 0 || + rfc822hdr_namecmp(hdr, "cc") == 0 || + rfc822hdr_namecmp(hdr, "bcc") == 0 || + rfc822hdr_namecmp(hdr, "resent-from") == 0 || + rfc822hdr_namecmp(hdr, "resent-to") == 0 || + rfc822hdr_namecmp(hdr, "resent-cc") == 0 || + rfc822hdr_namecmp(hdr, "resent-bcc") == 0; +} diff --git a/rfc822/rfc822hdr.h b/rfc822/rfc822hdr.h new file mode 100644 index 0000000..3926d6d --- /dev/null +++ b/rfc822/rfc822hdr.h @@ -0,0 +1,49 @@ +/* +*/ +#ifndef rfc822hdr_h +#define rfc822hdr_h + +/* +** Copyright 2001 Double Precision, Inc. +** See COPYING for distribution information. +*/ + + +#if HAVE_CONFIG_H +#include "rfc822/config.h" +#endif +#include <sys/types.h> +#include <stdio.h> +#include <string.h> +#include <stdlib.h> + +#ifdef __cplusplus +extern "C" { +#endif + +struct rfc822hdr { + char *header; + char *value; + + size_t hdrsize; + size_t maxsize; +} ; + +#define rfc822hdr_init(h,s) \ + do { memset((h), 0, sizeof(*h)); (h)->maxsize=(s); } while(0) + +#define rfc822hdr_free(h) \ + do { if ((h)->header) free ((h)->header); } while (0) + +int rfc822hdr_read(struct rfc822hdr *, FILE *, off_t *, off_t); +void rfc822hdr_fixname(struct rfc822hdr *); +void rfc822hdr_collapse(struct rfc822hdr *); + +int rfc822hdr_namecmp(const char *a, const char *b); +int rfc822hdr_is_addr(const char *hdr); + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/rfc822/testsuite.c b/rfc822/testsuite.c new file mode 100644 index 0000000..7064c42 --- /dev/null +++ b/rfc822/testsuite.c @@ -0,0 +1,134 @@ +/* +** Copyright 1998 - 2006 Double Precision, Inc. +** See COPYING for distribution information. +*/ + +#include "rfc822.h" +#include "rfc2047.h" +#include <stdio.h> +#include <stdlib.h> + + +static void print_func(char c, void *p) +{ + p=p; + putchar(c); +} + +static void print_separator(const char *s, void *p) +{ + p=p; + printf("%s", s); +} + +static struct rfc822t *tokenize(const char *p) +{ +struct rfc822t *tp; +int i; +char buf[2]; + + printf("Tokenize: %s\n", p); + tp=rfc822t_alloc_new(p, NULL, NULL); + if (!tp) exit(0); + buf[1]=0; + for (i=0; i<tp->ntokens; i++) + { + buf[0]=tp->tokens[i].token; + if (buf[0] == '\0' || buf[0] == '"' || buf[0] == '(') + { + printf("%s: ", buf[0] == '"' ? "Quote": + buf[0] == '(' ? "Comment":"Atom"); + if (fwrite(tp->tokens[i].ptr, tp->tokens[i].len, 1, + stdout) != 1) + exit(1); + + printf("\n"); + } + else printf("Token: %s\n", buf[0] ? buf:"atom"); + } + return (tp); +} + +static struct rfc822a *doaddr(struct rfc822t *t) +{ +struct rfc822a *a=rfc822a_alloc(t); + + if (!a) exit(0); + printf("----\n"); + rfc822_print(a, print_func, print_separator, NULL); + printf("\n"); + return (a); +} + +int main() +{ + struct rfc822t *t1, *t2, *t3, *t4, *t5; + struct rfc822a *a1, *a2, *a3, *a4, *a5; + + t1=tokenize("nobody@example.com (Nobody (is) here\\) right)"); + t2=tokenize("Distribution list: nobody@example.com daemon@example.com"); + t3=tokenize("Mr Nobody <nobody@example.com>, Mr. Nobody <nobody@example.com>"); + t4=tokenize("nobody@example.com, <nobody@example.com>, Mr. Nobody <nobody@example.com>"); + + t5=tokenize("=?UTF-8?Q?Test?= <nobody@example.com>, foo=bar <nobody@example.com>"); + + a1=doaddr(t1); + a2=doaddr(t2); + a3=doaddr(t3); + a4=doaddr(t4); + a5=doaddr(t5); + + rfc822a_free(a5); + rfc822a_free(a4); + rfc822a_free(a3); + rfc822a_free(a2); + rfc822a_free(a1); + rfc822t_free(t5); + rfc822t_free(t4); + rfc822t_free(t3); + rfc822t_free(t2); + rfc822t_free(t1); + +#define FIVEUTF8 "\xe2\x85\xa4" + +#define FIVETIMES4 FIVEUTF8 FIVEUTF8 FIVEUTF8 FIVEUTF8 + +#define FIVETIMES16 FIVETIMES4 FIVETIMES4 FIVETIMES4 FIVETIMES4 + +#define FIVEMAX FIVETIMES16 FIVETIMES4 FIVETIMES4 + + { + char *p=rfc2047_encode_str(FIVEMAX, "utf-8", + rfc2047_qp_allow_any); + + if (p) + { + printf("%s\n", p); + free(p); + } + } + + { + char *p=rfc2047_encode_str(FIVEMAX FIVEUTF8, "utf-8", + rfc2047_qp_allow_any); + + if (p) + { + printf("%s\n", p); + free(p); + } + } + + { + char *p=rfc2047_encode_str(FIVEMAX "\xcc\x80", "utf-8", + rfc2047_qp_allow_any); + + if (p) + { + printf("%s\n", p); + free(p); + } + } + + return (0); +} diff --git a/rfc822/testsuite.txt b/rfc822/testsuite.txt new file mode 100644 index 0000000..a543187 --- /dev/null +++ b/rfc822/testsuite.txt @@ -0,0 +1,100 @@ +Tokenize: nobody@example.com (Nobody (is) here\) right) +Atom: nobody +Token: @ +Atom: example +Token: . +Atom: com +Comment: (Nobody (is) here\) right) +Tokenize: Distribution list: nobody@example.com daemon@example.com +Atom: Distribution +Atom: list +Token: : +Atom: nobody +Token: @ +Atom: example +Token: . +Atom: com +Atom: daemon +Token: @ +Atom: example +Token: . +Atom: com +Tokenize: Mr Nobody <nobody@example.com>, Mr. Nobody <nobody@example.com> +Atom: Mr +Atom: Nobody +Token: < +Atom: nobody +Token: @ +Atom: example +Token: . +Atom: com +Token: > +Token: , +Atom: Mr +Token: . +Atom: Nobody +Token: < +Atom: nobody +Token: @ +Atom: example +Token: . +Atom: com +Token: > +Tokenize: nobody@example.com, <nobody@example.com>, Mr. Nobody <nobody@example.com> +Atom: nobody +Token: @ +Atom: example +Token: . +Atom: com +Token: , +Token: < +Atom: nobody +Token: @ +Atom: example +Token: . +Atom: com +Token: > +Token: , +Atom: Mr +Token: . +Atom: Nobody +Token: < +Atom: nobody +Token: @ +Atom: example +Token: . +Atom: com +Token: > +Tokenize: =?UTF-8?Q?Test?= <nobody@example.com>, foo=bar <nobody@example.com> +Atom: =?UTF-8?Q?Test?= +Token: < +Atom: nobody +Token: @ +Atom: example +Token: . +Atom: com +Token: > +Token: , +Atom: foo +Token: = +Atom: bar +Token: < +Atom: nobody +Token: @ +Atom: example +Token: . +Atom: com +Token: > +---- +nobody@example.com (Nobody (is) here\) right) +---- +Distribution list: nobody@example.com, daemon@example.com +---- +Mr Nobody <nobody@example.com>, "Mr. Nobody" <nobody@example.com> +---- +nobody@example.com, nobody@example.com, "Mr. Nobody" <nobody@example.com> +---- +=?UTF-8?Q?Test?= <nobody@example.com>, "foo=bar" <nobody@example.com> +=?utf-8?B?4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk?= +=?utf-8?B?4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk?= =?utf-8?B?4oWk?= +=?utf-8?B?4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk4oWk?= =?utf-8?B?4oWkzIA=?= |
