Best Practices for Performance Tuning and Troubleshooting Amazon Aurora

Amazon Aurora is a fully managed, MySQL- and PostgreSQL-compatible, relational database engine. It’s known for its known for its enterprise-class performance and open-source-like cost-efficiency. It’s simple setup makes it popular choice to expand existing MySQL and PostgreSQL environments. It even has  push-button migration tools allow many organizations to lift-and-shift their on-prem MySQL and PostgreSQL workloads to the cloud while keeping their existing code, tools, and applications in tact.

Once you’ve made the jump to Aurora, you’ll need to monitor it. Luckily, if you’re familiar with MySQL and MariaDB, much of your monitoring strategy for those database platforms should be similar to what you’ll see here.

What should you be monitoring?

For most of us, “everything” is not the necessary or desirable answer. Here are four things you must include in your Aurora monitoring strategy.

Figure 1: Query execution is a top performance metric for many databases, including Amazon Aurora.

1. Query volume: As with most database performance monitoring strategies, query execution should top your Aurora list. For Aurora that starts with queries per second. You can take it a step further by looking at your read and writes and making sure they’re in check. In Aurora, reads are tracked as Select Throughput and writes as DML (insert, update, or delete) Throughput. MySQL veterans may recognize these metrics as rollups of the Com Select + Qcache Hits and the Com Insert + Com Update + Com Delete, respectively.

2. Query performance: Behind query execution, latency is always a top measure of database performance. There are a couple of Aurora-specific metrics–SELECT latency and DML latency–that you should be sure to track. If you’re a fan of the MySQL slow queries metric–which triggers anytime query execution exceeds a certain number of seconds–that’s also available in Aurora.

3. Connections: Sometimes a database “outage” is really a case of a client failing to connect to the database, likely because Aurora has reached it’s configured max connections limit. Monitor the Open Database Connections and the Failed Connection Attempts to avoid this situation.

4. Resource utilization: Many database performance problems aren’t related to the database at all, but rather caused by resource constraints in the underlying disk, CPU, memory, and network. Here’s a few to keep an eye on.

Disk I/O: Aurora has auto-scaling storage, so unlike MySQL and MariaDB, you won’t need to track storage utilization, but if your database requests are waiting on storage volumes, you’ll see it in the Disk Queue Depth.

Network: Aurora’s key traffic measures, Network Receive Throughput and Network Transmit Throughput, represent only network traffic to and from clients and do not include traffic to the storage volumes.

Memory: Memory constraints can drag your Aurora instance down if your dataset outgrows the cache. Look for the Freeable Memory metric, and check to be sure your Read IOPS remain low consistently.

CPU: If your disk, memory and network stats look normal, but you’re still seeing performance problems CPU utilization could be the culprit.

What should you be looking for?

Once you’ve got hundreds of metrics in front of you, it can be hard to figure out where to focus. Here are our top recommendations:

1. Resource constraints: Aurora, like other MySQL databases, lives and breathes on CPU, memory, disk and network, so you’ll want to make sure it has access to them at all times.

2. Drastic drops in throughput: Sudden changes in query volume could indicate an issue with connections or an emerging issue beyond the database.

3. Increased query errors: Drastic increases could signal an issue with your client application or your database.

4. Slowest query statements: Identify your slowest 5% of query statements according to runtime.

amazon aurora postgres slowest queries selectstar

Figure 2: Once you’ve found your slowest queries, a query explain plan tool can help you correct them.

Where can you look for these metrics?

It’s entirely possible to gather the key metrics for your Aurora DB cluster yourself through a combination of Amazon sources. Here’s all you need to know about where to find what.

1. Amazon RDS console: RDS exposes dozens of high-level metrics to monitor the health and performance of your Aurora DB cluster. In fact, if you’ve gone metric hunting for other databases, you may notice Aurora has more metrics than most in RDS. But there are limits to what you will find in the Amazon RDS console. Some metrics in the Amazon RDS console shown only for specific instance classes. If you’re familiar with MySQL, you may find familiar metrics under different names or using different units of measurement, so be sure to double check carefully. In addition, you won’t find the following metrics at all:

  • AuroraBinlogReplicaLag
  • DeleteLatency
  • DeleteThroughput
  • EngineUptime
  • InsertLatency
  • InsertThroughput
  • NetworkThroughput
  • Queries
  • UpdateLatency
  • UpdateThroughput

2. AWS CLI: For quick spot checks on specific metrics, you might install the AWS CLI. Just remember you’ll have to specify the dimension Name=EngineName,Value=aurora.

3. Amazon CloudWatch: If you’re sticking within the AWS stack, CloudWatch is going to be your best bet for visualizing, tracking and alerting on key Aurora metrics. It brings in everything you can find in the RDS console, plus the key metrics above. You can even bring your Amazon Aurora audit logs to CloudWatch Logs, which will enable you to alert on events such as database logins, user information, details of queries executed, and impacted tables. CloudWatch also contains a lot of the core features you’d expect from other performance monitoring software, but it may require considerable configuration to get it set up with many of the features other tools have out-of-the-box. For example, you’ll have to know exactly which metrics you want to alert on and set each of their thresholds up individually.

When it’s time for a database performance platform

While Amazon CloudWatch certainly provides many key metrics, it doesn’t really provide you with a lot of context. It won’t show the relationships between your database and other parts of your infrastructure. That’s probably OK for organizations with a couple dozen databases, but connecting the dots yourself doesn’t scale in complex environments that include multiple database management systems and hybrid or multi-cloud infrastructure.

If you want to quickly get a feel for what’s going on across your entire data layer,  or if you’d like historical detail on your Aurora DB cluster for longer than two weeks, consider trying a cloud database performance platform like Selectstar, which ties together hundreds of metrics from your Aurora instances with critical resource utilization stats from your RDS infrastructure.

LET'S GET STARTED!

Try SelectStar for 14 days. No credit card required.

TRY IT NOW!