Yes, you read that right: how we at Velostrata are using our own Velostrata solution to automate movement of servers between on-prem and the cloud. It’s a powerful use case, and it all starts with one of our key goals as a company: to deliver and maintain an enterprise-grade product that we find as much value in as our own customers. Let’s dive in and start at the beginning.
In continued pursuit of this goal, we’ve built a new, continuous integration process for each development sub-team. On the data-path sub-team, for example, each change in our software’s code goes through the following process after being committed to our code repository:
- Build process – compile the code and make sure there are no errors or warnings.
- Unit tests – Most of our classes have unit tests which validate the API, functionality, and corner cases.
- Component tests – Each component (cache, write-back, migration, etc.) have their own dedicated tests which check specific use-cases by using our end-to-end data-path system. Most of the component tests are ran with specific disruptions which are made during the test itself, verifying our durability, integrity and consistency.
- Integration tests – Long tests (each test is more than an hour) which are made to check stability and data-integrity of each component using an end-to-end data-path system and real cloud components (S3, for example). Most of those tests are run with random disruptions.
In this blog, we’ll focus on our component tests.
In the data-path team, we have more than 20 tests that, due to some environment restrictions, cannot be run in parallel. To reduce the time of those tests (so each developer gets feedback regarding the stability of their changes as quickly as possible) we’ve divided those tests into 4 categories, all of which are run in parallel across 4 servers.
The test servers are regular Linux virtual machines located within our on-premises ESX environment, and we use Jenkins as a scheduler. Recently, as the number of virtual machines that are used in our offices has increased, we’ve started to suffer from false test failures and really slow test executions. These issues are due to a lack of resources (a CPU bottleneck) on the ESX host machine.
To overcome those problems without getting another ESX host, we’ve decided to practice what we preach and enjoy the flexibility that the public cloud can offer. And, of course, we’ll be using our own product- Velostrata- to do that.
We evaluated the best solution architecture that would fit our needs, particularly from a performance and cost perspective, and came up with the following:
- Build servers will be run in the cloud during the “rush hours”, which are Sunday to Thursday between 9am and 6pm.
- Workload machines don’t perform a lot of IO operations so we can use our “Cost Optimized” cloud extension.
- Workload instances don’t need to manage a state so we can leverage “AWS Spot Instances” for additional cost savings. Even if a spot instance is taken back mid-test, we can simply repeat the test on a new instance so there is no risk here.
- We aim for maximum efficiency, so that all processes (including server management) are automated.
The actual implementation of this architecture was simple: we wrote a Python script (which you can download here from our Velostrata Github Community page) that is scheduled to run as a dedicated job on Jenkins every 5 minutes, and will use APIs (Velostrata RESTful API, Jenkins RESTful API and VMware SDK) to check the date and time to determine whether the servers should be on-premises or in the cloud.
Continuing along, to make Jenkins work with the in-cloud servers, we need to create another 4 Jenkins nodes and configure them with a fixed IP address (Jenkins API doesn’t allow you to change a node’s IP after a node is created). With Velostrata, doing this is easy: we simply create 4 Elastic Network Interfaces (ENIs) in AWS and pass them as an argument to Velostrata’s Run In Cloud API. Velostrata will attach the given ENI to the newly created cloud instance instead of creating a new one, so the in-cloud servers will always get the same IP address.
Though we’ll use AWS spot instances wherever possible to reduce costs, they can still be taken back from us by Amazon at any time. If this happens to occur, Velostrata will automatically update that instance’s state to “Terminated”, which will automatically initiate Velostrata’s “Move Back On-Prem” operation. Our script (which runs every five minutes) will see that the server is now on-premises when it should be in the cloud, and will initiate a new “Run In Cloud” task automatically as a result.
Another feature we rely on is “Auto Stop Cloud Extension”. When there are no VMs running in the cloud, the cloud extension is stopped automatically to reduce costs. Thus, the cost of this setup (including Velostrata Cost Optimized Cloud Extension and Spot Instances) is only about $180 per month, which is significantly less than buying and maintaining new ESX hosts! In fact, from a performance perspective, our current ESX host came ‘back to life’, and the other VMs that are running on it can rely on much more CPU. In addition, choosing a larger instance in the cloud helped us reduce tests run-time by more than 40% per instance. For reference, here is an example of the logs while our scripts run:
We hope you’ve found this look into how Velostrata is using the public cloud alongside our own software to accelerate and optimize our own testing cycles. If you’d like more information on how Velostrata can help you with your Dev/Test, Hybrid Cloud, or Cloud Migration Projects, don’t hesitate to drop us a line.