Message format

The Internet email message format is now defined by RFC 5322, with multi-media content attachments being defined in RFC 2045 through RFC 2049, collectively called Multipurpose Internet Mail Extensions or MIME. RFC 5322 replaced the earlier RFC 2822 in 2008, and in turn RFC 2822 in 2001 replaced RFC 822 – which had been the standard for Internet email for nearly 20 years. Published in 1982, RFC 822 was based on the earlier RFC 733 for the ARPANET.

Internet email messages consist of two major sections:

  • Header – Structured into fields such as From, To, CC, Subject, Date, and other information about the email.
  • Body – The basic content, as unstructured text; sometimes containing a signature block at the end. This is exactly the same as the body of a regular letter.

The header is separated from the body by a blank line.

Message header

Each message has exactly one header, which is structured into fields. Each field has a name and a value. RFC 5322 specifies the precise syntax.

Informally, each line of text in the header that begins with a printable character begins a separate field. The field name starts in the first character of the line and ends before the separator character “:”. The separator is then followed by the field value (the “body” of the field). The value is continued onto subsequent lines if those lines have a space or tab as their first character. Field names and values are restricted to 7-bit ASCII characters. Non-ASCII values may be represented using MIME encoded words.

Header fields

Email header fields can be multi-line, and each line should be at most 78 characters long and in no event more than 998 characters long. Header fields defined by RFC 5322 can only contain US-ASCII characters; for encoding characters in other sets, a syntax specified in RFC 2047 can be used. Recently the IETF EAI working group has defined some standards track extensions replacing previous experimental extensions, to allow UTF-8 encoded Unicode characters to be used within the header. In particular, this allows email addresses to use non-ASCII characters. Such characters must only be used by servers that support these extensions.

The message header must include at least the following fields:

  • From: The email address, and optionally the name of the author(s). In many email clients not changeable except through changing account settings.
  • Date: The local time and date when the message was written. Like the From: field, many email clients fill this in automatically when sending. The recipient’s client may then display the time in the format and time zone local to him/her.

The message header should include at least the following fields:

  • Message-ID: Also an automatically generated field; used to prevent multiple delivery and for reference in In-Reply-To: (see below).
  • In-Reply-To: Message-ID of the message that this is a reply to. Used to link related messages together. This field only applies for reply messages.

RFC 3864 describes registration procedures for message header fields at the IANA; it provides for permanent and provisional message header field names, including also fields defined for MIME, netnews, and http, and referencing relevant RFCs. Common header fields for email include:

  • To: The email address(es), and optionally name(s) of the message’s recipient(s). Indicates primary recipients (multiple allowed), for secondary recipients see Cc: and Bcc: below.
  • Subject: A brief summary of the topic of the message. Certain abbreviations are commonly used in the subject, including “RE:” and “FW:”.
  • Bcc: Blind carbon copy; addresses added to the SMTP delivery list but not (usually) listed in the message data, remaining invisible to other recipients.
  • Cc: Carbon copy; Many email clients will mark email in one’s inbox differently depending on whether they are in the To: or Cc: list.
  • Content-Type: Information about how the message is to be displayed, usually a MIME type.
  • Precedence: commonly with values “bulk”, “junk”, or “list”; used to indicate that automated “vacation” or “out of office” responses should not be returned for this mail, e.g. to prevent vacation notices from being sent to all other subscribers of a mailing list. Sendmail uses this header to affect prioritization of queued email, with “Precedence: special-delivery” messages delivered sooner. With modern high-bandwidth networks delivery priority is less of an issue than it once was. Microsoft Exchange respects a fine-grained automatic response suppression mechanism, the X-Auto-Response-Suppress header.
  • References: Message-ID of the message that this is a reply to, and the message-id of the message the previous reply was a reply to, etc.
  • Reply-To: Address that should be used to reply to the message.
  • Sender: Address of the actual sender acting on behalf of the author listed in the From: field (secretary, list manager, etc.).
  • Archived-At: A direct link to the archived form of an individual email message.

Note that the To: field is not necessarily related to the addresses to which the message is delivered. The actual delivery list is supplied separately to the transport protocol, SMTP, which may or may not originally have been extracted from the header content. The “To:” field is similar to the addressing at the top of a conventional letter which is delivered according to the address on the outer envelope. In the same way, the “From:” field does not have to be the real sender of the email message. Some mail servers apply email authentication systems to messages being relayed. Data pertaining to server’s activity is also part of the header, as defined below.

SMTP defines the trace information of a message, which is also saved in the header using the following two fields:

  • Received: when an SMTP server accepts a message it inserts this trace record at the top of the header (last to first).
  • Return-Path: when the delivery SMTP server makes the final delivery of a message, it inserts this field at the top of the header.

Other header fields that are added on top of the header by the receiving server may be called trace fields, in a broader sense.

  • Authentication-Results: when a server carries out authentication checks, it can save the results in this field for consumption by downstream agents.
  • Received-SPF: stores results of SPF checks in more detail than Authentication-Results.
  • Auto-Submitted: is used to mark automatically generated messages.
  • VBR-Info: claims VBR whitelisting
