Open Source Mail System
This document provides an description of an implementation of an enterprise class email system built entirely with open source and/or freely available software. The author has worked with a number of enterprise customers implementing various systems and has sought to duplicate much of the functionality found in proprietary mail systems with open source and free software.
Many site on the web present various pieces of this solution and the author owes the successful implementation of this system to many sources. As such, the author focuses on not duplicating this information, but rather provides a higher level description of the various decision points and components required to implement a system to meet the requirements as specified in this document.
Table of Contents
During the course of reviewing and exploring what the Open Source community has been doing, I became impressed with the scope and quality of what is available. The result of my investigations grew from an initial exploration of file sharing and single signon to a replication of an enterprise IT organization with by only using Open Source. While I was impressed with the technology and the ability of the technology to integrate together, it was not necessarily a straightforward process. The information required was mostly available; however, it took significant effort to pull all these various sources of information together in order create the required solution.
While this email system was built within the context of an overall enterprise infrastructure, this document will focus on the implementation of the email system and will only briefly touch on other aspects of the system required to implement the solution. More information on these dependant system are located in additional documents on this site. However, in order to understand some of the "why" associated with this implementation, it is necessary to layout what needed to be accomplished.
There are many different approaches and a number of solution for implementing an email system. The simplest being an email client and utilizing someone else's mail relay and infrastructure; however, this system is only a small component of what would be required for implementing an enterprise mail system.
So what is meant by an enterprise mail system? The following list provides a brief overview of the primary requirements for building this system:
The system must provide the ability for users and clients to access their mail
The system must provide a mail relay for sending authorized user's mail to other systems
The system must be able to receive valid mail from other mail agents
The system must have security as a focus
The system must allow for user mail boxes without requiring a "shell account"
The system must allow for users to have mail aliases
The system must be able to send and receive mail from the Internet
The system must be able to send and receive mail from the local network (Intranet)
The system must be composed of open source and/or free software only
The system must allow for hosting virtual mail domains
While this encompases the bulk of the general requirements, there are also a number of like to have items that also goverened a number of choices such as:
Ability to access the system using Windows Mobile devices
The ability to leverage "direct push" technology for mail delivery to mobile devices
The ability to send and receive mail both from within the Intranet and while connecting from the Internet
While there are a number of commercial email systems that are typically found in enterprise organizations today, such as Microsoft's Exchange server, IBM's Lotus Domino, or Novell's Groupwise the primary model for implementing this system is Microsoft's Exchange server, for better or for worse. This is not an indication of a preference for one commercial system or another or even for commercial systems in general, but as Microsoft Exchange is relatively common in most of the organizations I have worked with, it seems reasonable that in order to claim parity with commercial systems, an set of open source components should be comparable to some commercial system. For the sake of comparison and due to the authors familiarity with Microsoft Exchange, that was the chosen platform for comparison.
The good news is that even though some components that the author would like to see do not seem to be available (or the author has not yet identified or uncovered the solution) the current state of what is available for implementing an enterprise email system is very good. That is much of the required functionality has been implemented and can be successfully integrated to provide a reasonable and working system.
Security is somewhat of a nebulous subject when it comes to email systems; however, my requirements are rather simple and encompass the following:
The system must provide a presence within the DMZ for access from outside the infrastructure
The system must allow for at least three levels of trust: Internet (little or no trust), DMZ (medium level of trust), and corporate or fully trusted network
The system must allow for users to have access to email without having to have a shell account on any of the servers
The system must allow for users with shell accounts to be able to use the same email system
The authorization and authentication systems must allow for integration of other components: remote access, wireless network access, shell access, etc while allowing authorization on a component by component and user by user basis
A users email must be accessible both from within the corporate infrastructure as well as from outside the corporate infrastructure
Regarding network security, I adhere to a security principle known as a "protocol break" when it comes to the three trust zones. This approach says that any protocol used from a less secure zone into a more secure zone cannot also be used from the more secure zone into a more secure zone. That is, whatever protocol is used to access resources within the DMZ cannot also be used by the systems within the DMZ to communicate back into the corporate infrastructure.
As an example, while I do allow HTTP access to email, the system is such that the user accesses email via HTTP, which is the protocol allowed through the perimeter firewall into the DMZ. From the DMZ, the user is authenticated via LDAP in the secure corporate network. Mail access from the system in the DMZ is via IMAP from the DMZ back into the corporate network. As you can see, this demonstrates that any protocol used to enter the DMZ (HTTP) is not allowed from the DMZ into the corporate network.
To finish out the examples for email, it is obvious that you have to allow SMTP from the internet into the DMZ, as this is the protocol used for exchanging emails between various external email systems. Additionally, I also do LDAP authentication and authorization from the gateway SMTP servers; however, adhering to my rule of using a protocol break, I do not use SMTP from the DMZ into the secure environment, but rather leverage LMTP from the gateway email router into the enterprise email system for final delivery to the users.
The solution that is presented in this paper owes significant thanks to many different sources of information across the Internet. However, the issue that the author found with implementing this system was a general lack of information at a level above "this is my configuration file and here is a sentence on one or two of the options I decided to use". As a result, while the information is available, the author feels that by providing a high level overview that provides a picture of all the various components and how they are integraged that the adoption of such an open source solution for enterprise mail systems would be significantly enhanced.
There are a number of components that must be successfully integrated in order to provide a complete enterprise solution. Some of these components will bear greater scrutiny in this document; however, some of the components will only be briefly described as either sufficient information is already provided elsewhere or the effort or information required to facilitate the integration and its contribution to the overall solution is minimal in comparison to other components.
A quick note must be provided that the authors decision of one component over another is not generally an endorsement of one specific component over another, but rather either that the component provided for easier integration with other components or that perhaps the author was unaware of additional components at the time of integrating all the components into the complete system. Additionally, the choice of components may be as a result of the specific requirements imposed by the author or by the solution the author was attempting to replicate.
Any person attempting to undertake the effort to integrate a significant set of components should start their effort by documenting their requirements and ensuring that the components selected best meet their specific requirements, which might or might not be similar to the requirements as specified by this document/effort. That is, while discussions of this product being superior to that product might be interesting or at least entertaining, the author would propose that most any solution that rises to prominence would by nature be a good solution. The real question and discussion, according to this author should be how well a given product or solution meets ones specific requirements and that a discussion of this is better than that in the absence of such a context should be primarily undertaken as an exercise in entertainment.
Now that we have this behind us, the rest of this section will focus on providing a general overview of all the components used in order to create the solution. The following sections will provide insight into the following components:
Linux (Gentoo specifically, but the specific flavor is generally not significant)
Apache HTTP Server
There are a number of open requirements that remain unfilled. Additionally, a number of issues remain with the current implementation; however the solution continues to be refined as the Open Source community continues to push the various components forward. Additionally, there are a number of very good products that could be substituted and would still yield a high quality solution. A number of components were evaluated that were not used in the final solution and I will provide some background and insight into why this set of specific components was implemented so that as others have differing requirements, they may gain at least some insight into how these other offerings may provide them with a better overall solution with regard to their requirements. The following components were evaluated and each of them provide high quality solutions or alternatives and probably should be evaluated when seeking a comprehensive solution given significantly or potentially slightly differing set of requirements:
The following sections provide greater insights into each of these components.
Gentoo Linux has become my OS of choice for a number of reasons. First, I love the fact that it is a source based distribution. I don't know if I would recommend Gentoo for enterprise use, as I am not sure how much it would cost in actual maintenance and operations, but as I do quite a bit of work on it more or less as a hobby and as my platform of choice for working with my local infrastructure, it is quite interesting.
However, the implementation of this email system is not necessarily dependant on any specific Linux variant and should be applicable to just about any distribution. There would be notable exceptions for BSD or other distributions based on differing *nix variants (from the typical Linux, which is neither SYS V or BSD really) as this implementation guide is focused on the architecture rather than the "brass tacks" of any specific product or implementation.
Postfix fits nicely into this solution as both the mail relay of choice for the DMZ and as the local MTA for all the machines in both the DMZ and the local network. It's focus on security and its configurability, especially with regards to security and "lock down" are quite good. It's only drawback, in the author's opinion, is it's rather cryptic documentation and dizzing array of potential configurations (which we hope to assist with through this article). Additionally, while there are a number of variations to installation described in the postfix documentation, the author has found few "complete" examples of how to setup the server for each situation.
Even so, postfix is definitely the MTA of choice for the author for this implementation.
Additionally, Postfix provides for LDAP integration for address re-writing for aliasing or virtual hosting as well as being highly configurable regarding filtering, delivery rules, delivery mechanisms, and security. However, Q-Mail or recent versions of Sendmail each have their own set of strengths and weaknesses and could have easily been substituted for Postfix within the described system and architecture.
OpenLDAP is used for a number of pieces of the implementation due to the nature of it's ability to provide a consolidated configuration mechanism. A number of extensions need to be made to the LDAP schema, which will be documented and detailed in this document. The primary issue with leveraging LDAP in this implementation was the integration of LDAP sensibly into the varying components without over duplication of information and with minimal deviations from the "packaged" schemas. However, the author has spent considerable time working with the various components and with the LDAP schema to provide a consistent and logic integration of all the components.
More information on the specific implementation details and requirements for LDAP integration with the varying products is detail later in this document.
DBMail was chosen for it's ability to store email in a relational database rather than either MBox or maildir formats. Additionally, DBMail provides a LMTP daemon that facilitated and met our needs for the required protocol break between the internet and the corporate network.
Additionally, DBMail provides integration with LDAP for authentication and authorization of local email users and a central repository for email related administration.
The overall architecture is rather simple. It is driven by the view of the world as a three level trust system. The most trusted component of our system or the one we hope we have greatest control over is our corporate network. It is protected from the less trusted world by a firewall. Between our corporate network and the rest of the world lies the DMZ. While we feel confident that we can exert some control over the DMZ and the DMZ is protected by a second firewall, it is, by its nature, more vulnerable than our corporate network. Outside of our second firewall is the internet, of which we have absolutely no control.
Given this picture of the world, with our corporate network which is highly secured and tightly controlled, the DMZ which is allowed to touch the rest of the world, but that we still want to maintain some control over, and the Internet over which we have no control.
On top of these three trust zones are layered a number of other components, network and infrastructure, protocols for communication between the solution components, and the applications themselves that comprise or make up our email solution.
The network architecture is driven by the three level trust system described earlier and the Internet and corporate network architectures are the simplest, so they will be presented first, leaving the network architecture of the DMZ to be explained and analyzed in greater detail for last.
The Internet is comprised of both routable and unroutable addresses. The routable addresses are required in order for systems maintained by various organization to be able to interoperate and communicate between these systems. The unroutable addresses are for use within any given infrastructure for the purposes of enabling network communciation within a limited system. NAT and masquerading allow systems within a private or unroutable (in the sense of the broader Internet) to participate with the broader internet. As such, in order to allow communications to and from the Internet, the mail gateways at a minimum must be addressed with routable addresses.
Within my corporate or secure infrastructure, I maintain one or more private networks that are never addressable from the internet, as they are protected behind two firewalls (one between the DMZ and the Internet and the other between the DMZ and the corporate network). No communications are allowed into the corporate network from the Internet without a machine in the corporate network initiating the conversation. If resources withing the corporate network need to access information, they are NATed at least once (an in my current infrastructure, twice) before being released into the "wild" of the Internet.
For the sake of discussion, let's call out some network numbers. I have been allocated a block of publicly addressable internet addresses, let's say 22.214.171.124/29, which means that I have 126.96.36.199-188.8.131.52 as my available addresses (both 184.108.40.206 and 220.127.116.11 are reserved for various purposes, see other resources on subnet working and subnetting). Let's say I assign 18.104.22.168 as my primary mail gateway. I would assign this address to a machine in the DMZ and configure my firewall to allow network connections to this machine from the Internet for the SMTP protocol (TCP port 25).
Further, for the sake of this discussion, let's assume that I use 192.168.1.0/24 for my private corporate network. That is, all machines within my network that do not lie in my DMZ will have IP addresses within this block, which, is also a private address block and should not be routed by any router within the "public" internet address space. That is, within my private ntowkr(s), my routers will allow these addresses to be routed, but my border routers will not allow these addresses to "leak" over into the public internet and even if they did, proper router configuration and implementation would not allow them to be routed any further.
So we have a server that can be contacted by other email servers and we have our private network. Now to the DMZ. Within the DMZ, I usually use a completely different private address block. That is, while I could just create another subnet within the 192.168 address block, I usually use a subnet within the 10.0.0.0/8 address block. You could switch this around, depending on your needs for the number of private addresses required within your infrastructure. Further, let's assume that I have allocated the 10.1.1.0/24 block/subnet for use within the DMZ. Fine and good, as this allows communication from the corporate (again private network) to the DMZ and, firewall rules permitting, from the DMZ into the corporate subnet, but what about our gateway mail server?
For the gateway mail server, I assign two addresses, one the 22.214.171.124 and also an address within the private address block such as 10.1.1.10. I setup all servers within the DMZ to use this (usually called "dual homed") networking scheme. Further, I only allow communication between machines within the DMZ on the private addresses, rather than the public internet addresses. Firewall rules additional filter the packets that are allowed both into and out of the DMZ to the internet.
This network architecture allows me to, for instance, enable a border email gateway to be contactable via SMTP from the internet, but to also allow the administrators to ssh into the machine using the private network address (10.1.1.10). That is, I selectively configure each service and specify on which address they are available, so that while SSH is allowed from the corporate network or DMZ, it is not allowed from the public internet.
As this paper is focused on email and not network architecture, I will note that there are ways to enable ssh access from outside the corporate network, but it requires greater authentication and authorization checks than those provided by just ssh. For instance, in order to ssh into the DMZ from the internet, a user must first authenticate themselves and provide a secure network channel, such as through a VPN before being allowed to SSH into the DMZ.
For the purposes of describing the protocols used within the solution architecture, we again fall back to our three levels of trust and our principle of leveraging "protocol breaks" as described earlier. As such, and for the purposes of email, we allow HTTPS and SMTP into the DMZ from the internet. HTTPS for secure access to email from outside of the corporate infrastructure, as you don't want your corporate secrets flying around the internet unencrypted. SMTP is obviously required in order to both send and receive email from other organizations. More specifically, we prefer ESMTP with TLS encrypted sessions when available, but do not require this for communications with other systems.
From within the DMZ, we allow LDAP (with TLS, port 389) and LMTP (port 24) into the corporate network. LDAP is used for authentication and authorization of remote users accessing mail via HTTPS currently, but also allows for address verification and user authentication and authorization for mail relay if required. LMTP is used from the email gateway to the mail store for local delivery of addresses. LDAP can also be used for address re-writing for aliases or virtual mail domains.
Within the corporate network, SMTP is used for mail relay between servers. IMAP is used for access to users mail from their email client of choice. Since SQL is used for mail storage, the local mail delivery agent uses SQL port 3306 (MySQL) locally, but it is worth noting, especially if scale out is required within the local mail delivery and mail store subsystems. Outgoing mail is sent via SMTP from the users client to a mail hub within the corporate infrastructure and then relayed outbound via the mail gateway to the rest of the internet.
This section details key decision points that must be considered and decided upon by the architect(s) and implementation team. The author will present his views on the various trade-offs and why various implementation directions were made. It should be understood that in the face of differing requirements or differing operational parameters, among other considerations, that the author would have made different decisions; therefore, it is important to understand that the answers provided by the author should serve as a basis or a starting point, but that every distinct implementation may have a differing opinion or a better way of implementing solutions, based on your specific needs. That is, there are few "right" answers and generally only a few "good" answers, which are dependant upon many factors.
Again, this guidance should be used as just that, "guidance" and most assuredly should not be taken as "the only right answer"! Carefully consider your situation and use these items to guide your own decision making process based upon your unique environment, needs, and constraints.
Most "tradition" Unix systems store email as files on the file system. There have been basically two models for storing email as files, the "mbox" and the "maildir" formats. While these have served the purposes requested of them and while knowing that many very large email systems exist using these solutions, most "modern" mail systems leverage a relational database system of some kind for storing emails.
Without delving too deeply into the details, I wanted my mail system to leverage more of a relational database backend than the traditional file system based mail systems. I am sure there are security and scalability arguments on both sides; however, my decision was less based on these issues than the fact that most (other than Unix) modern mail systems have moved away from file based storage for a number of reasons.
One not of caution is that when choosing a database, rather than file based mail storage, you are significantly limiting your options in other areas of the architecture. Simply put, there is greater and more broad support for both "mbox" and "maildir" when it comes to choices for various components within the architecture that if one chooses a different storage format.