At Eyeview, like at any other startup, we look closely at our technology cost. As the VP of Architecture, it is my responsibility to forecast our technology cost for the upcoming year as well as track to ensure we meet this forecast, expressed as percentage of revenue. In this post, I will share valuable insights about cost analysis, different tips for cost efficiency across the AWS ecosystem and challenges faced along the way.
The Starting Point
If you want to get anywhere with cost analysis you need to start by tagging. Cost allocation tagsare keys and values you can set to all of your AWS resources in order to enable slicing and dicing the cost later on. While some of those tags are automatic, the usage of User-Defined tags is the focal point here. We generally use a single tag that represents the resource logical “service”. In EC2 it will be the application name, in DynamoDB it is the table name etc. Eventually, we will group those tags in logical ways for different types of reports.
The Basics of Cost Analysis
AWS itself provides you with tools to analyze your costs. The one Eyeview uses most is the AWS Cost Explorer which recently got a fresh look. The cost explorer is a great ad-hoc tool which allows slicing and dicing the different applications, services and usage costs. Filtering by your cost allocation value and grouping by usage type is an insightful way of understanding your service cost.
Within the cost explorer you will find other helpful tools, including reports on Reserved Instances coverage and utilization. Reserved Instances are a great way to save costs on EC2 instances second only to Spot Instances. If utilizing on-demand EC2 Instances and forecasting that you will continue to use them in the next year, Reserved Instances is the way to go. The coverage report can tell you how much of your on-demand workload is covered by Reserved Instances hours, while the utilization report will tell you how much of your Reserved Instances hours are being utilized by on-demand instances. Combining the two reports, will provide insights about which instance types you should purchase.
AWS also provides in depth cost report directly to your s3 bucket, though we did not find that useful and determined other services can analyze it much better, which leads me to the next set of tools.
The last native AWS tool you probably want to take a closer look at is AWS Trusted Advisor, though you will need the business support plan to use the cost optimization section. A trusted advisor aids you in highlighting inefficiencies on EC2 utilization as well as RI recommendation.
Taking Cost Analysis Further
While the cost explorer is not bad, we reached a point where we needed more insights. The two things that we needed are (a) to have team-level reports and (b) to have a financial view of COGS vs OPEX.
Team-level insights are important to us as every engineering team is using a different part of our AWS infrastructure in different usage pattern. Our data team utilizes Kinesis Streams along with EC2 instances that run KCL to populate S3, while our bidding team utilizes the same streams to populate ad delivery data in DynamoDB. In order to understand how our architecture is evolving from a cost point-of-view, it is better to trace it back from a cost perspective and try to analyze which team’s architecture affected the cost bottom line. This tactic can help drive the next architectural decision.
An analysis of COGS vs OPEX is important from a financial point-of-view for two reasons. Firstly, it will allow you to understand what drives costs and whether the technology of the company is scalable from a cost aspect. Secondly, passing an audit often requires that all costs used directly in service of revenue (including those associated with AWS) are recorded as COGS. The balance of the costs should be shifted “below the line” into OPEX.
Those two items can be achieved in a few ways. One way is to create cost explorer reports that have the right filters and contain all of those tag values, but the cost explorer has some limitations and that are not ideal. Another method is to add another cost allocation tag for those two grouping definitions. That is OK but not very flexible, you cannot have a resource that is not fully attached to one group will only show changes going forward and not retroactively.
One tool that we looked at, and maybe so should you, is ice – an open source AWS cost analysis tool created by Netflix. We ended up thinking it is not intuitive nor easy to use enough.
We then tried Stax which proved very effective for us. Stax is a cloud management tool that gives us better visibility and insight into the cost and efficiency of our AWS workloads. You can define the cost allocation rules and have different views looking at it.
An added bonus are the daily and weekly reports providing a summary of how we are doing comparing to past weeks and months.
Tips for Cost Efficiency
If you have reached this milestone, you should at least get something out of it. Some other items worth looking at in order to optimize costs on core services are: