My goal was to build a resilient Rancher environment that is self-healing.
- Rancher Server: Single server with a backup AMI I can use to replace it if needed. Database is on RDS with multi-zone replication. (Tried Full HA but it would not remain stable. Had certificate issues. etc.) See This page for Diagram
- AMI for my base image: This images has my preferred version of docker, my registry certificate, the gluster driver, AWS cli tools.
- Launch Configurations (2): one for my log server cluster and one for what I am calling my Rancher Pool. (This is where the majority of my containers live.) These are configured to mount the gluster volume, change the ssh port to 2222, and install the rancher agent with the correct label for each group.
- AutoScaling Groups (2): one for my Graylog server cluster and one for the RancherPool. My RancherPool ASG starts with 3 and scales to 5 according the CPU load. The LogServer one is set to stay at 3.
On the Rancher Side, I have my application stacks which are configured to run on the appropriate hosts.
- ELK stack: (es, kibana, logstash,kibana) all configured to run on my LogServer hosts.
- Jobber: which runs on one my Gluster Servers. (This backs up my gluster volume to S3 every hour) Link to Jobber project on GitHub.
- Registry: (from catalog, but customized) runs on the RancerPool instances and stores its data to the Gluster volume
- A load balancer: I am using the Advanced routing options, so that I can route based on DNS names.
- Route53: This provides me with DNS Names for any container that exposes a port. I set up a Route53 zone just for rancher. I can then use CNames in my production zone to make the urls nice and simple.
- Logspout: to gather the logs and send them to Logstash. (runs on all hosts)
- Prometheus: Produces some pretty graphs. (Runs on any/all instances.)