SelectStar graph of PostgreSQL queries in the last hour

5 Steps to Optimize Hadoop Monitoring

Hadoop continues to gain steam as a favored option for DBAs, but at the same time we’ve witnessed a struggle with finding a monitoring tool that meets the dynamic needs of the Hadoop ecosystem. Among them comes a demand for truly monitoring the infrastructure which can often be the culprit of many of the issues for your clusters and other Hadoop resources.

In this post, we’ll cover five key steps to help you optimize your Hadoop monitoring so you can gain the valuable insight that you need to ensure your Hadoop infrastructure and clusters are operating at optimal performance.

1. FOCUS ON SERVICE-ORIENTED MONITORING

Hadoop isn’t the first to require a focus on service-oriented monitoring — we first saw it gain popularity with Microsoft SCOM. Rather than focusing on the components that fuel the ecosystem; monitoring the resulting services can offer a better indication of what may be faltering. After all, end-user wait times, whether internally or externally, continue to transform how IT organizations measure success.

With Hadoop monitoring in particular, approaching your monitoring with a focus on services becomes essential in order to ensure that the entire ecosystem functions at the level you need it to.

2. BRING IN MONITORING FOR KEY TECHNOLOGIES

Understanding how your Hadoop clusters are performing serves as one piece of the puzzle, but to truly optimize performance, your Hadoop monitoring needs to incorporate monitoring and metrics for key technologies that comprise the Hadoop ecosystem. These include:

  • HDFS:Known as the Hadoop Distributed File System, this component of the Hadoop architecture stores large data sets — breaking down data within a Hadoop cluster into blocks and distributing across the Hadoop ecosystem as required by your services.
  • YARN:Yet Another Resource Negotiator, or YARN for short, focuses on managing the performance of clusters within your Hadoop ecosystem. It creates a ResourceManager, which refers to global resources, and an ApplicationMaster, which focuses on resources by application.

3. MONITOR KEY HDFS METRICS

The best way to optimize your Hadoop monitoring? Monitor the specific technologies that make up your Hadoop ecosystem. For your HDFS resources, we recommend tracking the following metrics:

  • NameNode
    • Blocks Missing
    • Capacity Remaining
    • Capacity Used
    • DataNodes Dead
    • Metric Endpoint
    • Volume Failures Total
  • DataNode
    • Blocks Read
    • Blocks Removed
    • Blocks Replicated
    • Blocks Written
    • Data Read
    • Data Written
    • Disk Remaining
    • Failed Volumes
    • Metric Endpoint

4. MONITOR KEY YARN METRICS

In addition to your HDFS metrics, it’s recommended that you also continually pull key metrics for YARN resources. Among those, we recommend the following as key metrics to track:

  • ResourceManager
    • NodeManagers Active
    • NodeManagers Decommissioned
    • NodeManagers Lost
    • NodeManagers Rebooted
    • NodeManagers Unhealthy
    • Applications Completed
    • Applications Failed
    • Applications Running
    • Metric Endpoint
    • Queue Memory Allocated
    • Queue Memory Available
    • Users Active
  • NodeManager
    • Containers Allocated
    • Containers Completed
    • Containers Failed
    • Containers Launched
    • Memory Allocated
    • Memory Allocated Ratio
    • Memory Available
    • Metric Endpoint
    • Virtual Cores Allocated
    • Virtual Cores Allocated Ratio
    • Virtual Cores Available
    • Virtual Cores Total

5. FIND A COMPREHENSIVE DATABASE MONITORING SOLUTION

When it comes to monitoring your Hadoop ecosystem, not all tools are created equally. Finding a monitoring system that can give you the metrics that you need across your entire Hadoop ecosystem (infrastructure included) can transform the operations of your Hadoop clusters and related technologies.

SelectStar Screenshot

To determine the best solution for your needs, find a database monitoring solution that offers a free trial to see first-hand how it works with your organization’s unique requirements — with Hadoop, as well as other database and infrastructure tools that your team may be using.

Get a deeper look and analysis into how you can optimize Hadoop monitoring in our upcoming webinar, “Maximizing the Performance of Your Hadoop Clusters & Infrastructure,” presented by SelectStar’s Mike Kelly.

LET'S GET STARTED!

Try SelectStar for 14 days. No credit card required.

TRY IT NOW!