The big 3 observability tools: Datadog vs New Relic vs Splunk

Comparing their price, usability, and features with recommendations for picking your observability stack.

·

15 min read

The big 3 observability tools: Datadog vs New Relic vs Splunk

When we consider adding observability to our stack, we may default to popular options like Datadog. While they have a ton of features covering most use cases, they also come with premium pricing attached to it. This blog covers a tool-wise comparison of these platforms, comments from existing users, and our recommendations for your observability stack.

Not only are observability tools costly, costing about 30% of a company’s outside vendor spending, but they also lack visibility and transparency in their pricing, making it nearly impossible to calculate how their pricing has been calculated. The various services have different pricing models, each with limits (number of hosts/devices/requests/ingest) and multiple tiers with monthly and annual billing options.

Datadog reviews

Data from Statista shows that the Observability tools market, worth $12.9 Billion in 2020, is set to reach $19.3 Billion by 2024. These tools have applications in various industries, from media, communications, financial services, technology, health, the public sector, manufacturing, etc., in all regions of the world. Organizations are also increasing the number of observability tools used while consolidating the number of vendors. (Splunk, State of Observability 2022)

Observability Tools used

Here is what users have to say

Before we dive into feature-specific comparison, let’s look at what the users of these tools have to say. These are some common likes, dislikes, and good to knows we found from over half a dozen review sites, including G2, Gartner, and software advice.

New Relic

Liked:

  • Straightforward setup
  • Share dashboards externally
  • Transparent pricing with great ROI
  • AI-based anomaly detection and alerting
  • Built on OpenTelemetry standards
  • Go backward in time and analyze historical bottlenecks and consumption trends

Disliked:

  • Complex query builder, limited regex, and other limitations of NRQL
  • The learning curve to make use of all customization options can be overwhelming
  • Incorrect query results, bugs, sluggish experience

Keep in mind:

  • It comes with a generous free tier; use it yourself to see if it works
  • Their network monitoring doesn’t yet support all infra
  • Use their data ingest cost estimator

Splunk

Liked:

  • Works well for high-volume data ingestion
  • Indexing and ML applied to the data make it valuable
  • Built-in reports and dashboards that can be customized
  • Built on OpenTelemetry standards
  • They offer workload, ingest, and entity pricing options

Disliked:

  • Monthly pricing is expensive and opaque
  • Lack of visualization options, room for improvement with their interface

Keep in mind:

  • There is a 10k session/month per host limit for RUM
  • If you don’t want the bundle offerings in Standard and Pro plans, you can also choose individual offerings as per your requirements
  • Consider Splunk’s flexible pricing options to find the one that best suits you
  • If you’re also considering using Splunk for IT Ops and Security, check their cloud platform

Datadog

Liked:

  • Stability and the constant addition of new features
  • Out-of-the-box integrations
  • AI and ML Capabilities
  • All tools in one place, with a ton of customization options

Disliked:

  • Pricing is opaque, inflexible billing, and unexpected charges
  • Hard to find the right metric and look at historical data
  • Deploying the agents is still a very manual task

Keep in mind:

  • Leverage estimated usage and set alerts to avoid overspending
  • Monthly billing is 20-50% higher than annual billing
  • Their security offerings are relatively new and not up to speed with the rest of their monitoring
  • Their free trial is limited compared to their competitors

Moving into the comparison, we compare each tool by its features. Starting with APM & log management and ending with alerting & network monitoring.

Note: Here we only compare non-enterprise tiers. Free tiers (where available), Splunk’s Standard tier, Datadog’s Pro tier, and New Relic’s Pro tier. If you’re interested in enterprise pricing, check out their pricing pages - Datadog, Splunk, New Relic.

APM - Application Performance Monitoring

FeaturesDatadogSplunkNew Relic
TracingAutomatic trace_Id injection to logs, connect traces to infra metrics, network calls, and live processesCollect all trace data, AI-powered methods to sift through trace dataObserves 100% of traces and provides actionable insights
Live visibilityAll ingested traces and service dependencies over the last 15 minutesNoSample™ full-fidelity tracing, collecting 100% of traces combined with AI-driven directed troubleshooting makes detection time fastReal-time streaming sends data every 5 seconds, can view, visualize and query that data
ControlSet SLOs, track trends, and monitor KPIs by generating span-based metrics using any set of tagsTurns every span and trace into metrics, to create pre-built service monitoring dashboardsManage SLOs with automated service level management
DeployMonitor and compare impacts of canary, blue-green, and shadow deployCan have multiple, distinct application environments that don’t interact directly with each other but that are all being monitored by Splunk APMTracking deployments create deployment markers that appear in APM charts.
Supported languagesJava, .NET, PHP, Node.js, Ruby, Python, Go, or C++ applicationsJava, Python, .Net (Core and Framework), Node.js, GoLang, Ruby, and PHPJava, .NET, PHP, Node.js, Ruby, Python, Go, or C applications
Frameworkshundreds of frameworkshundreds of frameworkshundreds of frameworks
Performance monitorsApplications, hosts, containers, serverless functions, and PaaSApplications, containers, serverless functions, microservicesApplications, hosts, containers, database services, or grouping of these
Related productsContinuous ProfilerAlwaysOn continuous code profilingNew Relic Edge with Infinite Traces
Support for OpenTelemetryYesYesYes
Starting price$ 31/month/host (Billed annually)$55/month/host (Billed annually) Also offer usage-based pricingFree tier, pay as you go

Log management

FeaturesDatadogSplunkNew Relic
Ingest data fromAny source, at any scaleIngest from any source at any scale. Separate ingest and indexed logs to reduce costs. Complement existing agents with OpenTelemetryAny text-based data using the forwarder that works best in your environment
TrackingTrack trends, metrics, and KPIs from all logsBlend logs with real-time metrics, in context troubleshooting
DashboardVisualize summarized logs data on dashboards. create and save granular views.Log metrics into real-time dashboardsAutomatically clusters logs into patterns and detects outliers
Search and querySearch, filter, and analyze logs on the fly—no complex query language requiredNo-code search. Watch critical logs with unified filters and time controls. Easily transition between logs, traces, and metrics.
Related FeaturesLive Tail & Logging Without LimitsInfinite Logging using S3 buckets
Decouples log ingestion and indexing
PricingFree tier, pay as you go
IngestStarts at $ 0.10 per GB ingested or scanned GB/moStarts at $ 0.10 / host / GB ingested / mo
Retention3-day retention starts $1.06 / million log events/moStandard retention is 30 days
IndexStarts at $5 / indexed GB/ host /mo

Infra monitoring

FeaturesDatadogSplunkNew Relic
Environments supportedOn-premise, hybrid, IoT, and multi-cloud environmentsOn-prem, hybrid or multi-cloudCloud and on-prem infrastructure
VisibilityTens of thousands of metrics, out of the box. one-click corelation of related metricsCorrelation between your hybrid infrastructure and microservices, insights for faster troubleshootingSpot all issues and monitor performance in one place
IntegrationsVendor-backed integrations for k8s, serverless and 500+ popular technologies250+ cloud service integrations and pre-built dashboards out of the boxClose to 500 integrations are available
Historical recordsYes, even on resources that don't exist anymore-Time travel back to incident's origins and replay the historical state
Starting priceLimited free plan, Pro plan starts at $15 /month/host$15/month/hostFree tier, pay as you go

Serverless monitoring

FeaturesDatadogSplunkNew Relic
VisibilityGet all your functions in one place.Pre-built visualizationsVisualize, trace, alert
Real-TimeIngest, search, and analyze 100% of traces live over the last 15 minutes. Real-time alerts on memory, timeout, and concurrency metricsMonitoring and alerting on every function-
Metrics monitored1. Cold starts 2. Errors 3. Memory 4. Timeout and latency 5. Concurrency and custom metrics for CX1. Cold starts 2. Errors 3. Invocations 4. Compute duration 5. Custom business & CX metrics1. Invocations 2. Errors 3. Spans 4. Custom metrics
SupportsLambda, Google Cloud Functions, Azure Functions, AWS SAM, Serverless Framework, and AWS CDK integrationsLambda, Google Cloud Functions, Azure FunctionsLambda, Google Cloud Functions, Azure Functions
Test in CI/CD pipelineYes, Integrate to CI/CD pipelinesYes, automatically pass/fail builds based on the performance budget in your CI/CD pipelinesYes, Integrate to CI/CD and build pipelines
PricingWorkload monitoring starts at $5 / active function/moUsage-based pricingFree tier, pay as you go

Real user management

FeaturesDatadogSplunkNew Relic
VisibilityWeb apps, Native mobile apps, app backendsWeb apps, Native mobile apps, app backendsBrowser monitoring and mobile monitoring
Core web vitalsYes, page load, interactivity and visual stability. Filter by location, device, etc.Yes, page load, interactivity and visual stability. Filter by location, device, etc.Yes, page load, interactivity and visual stability. Filter by location, device, etc. Set alerts when vitals drops.
Full session analysisContextualize user sessions attributes like user ID, email, and name. Ingest custom metrics and track business-critical user actionsIncluding route change, API calls, impact of images and resources on userSee trends with sessions, filter by app and device versions
Native mobile appsTroubleshoot app crashes, set up alerts, connect server-side and client-side metricsAuto capture common client attributes - app crash report, full app lifecycle visibility, network requests and errorsInsights into crashes, handled exceptions, and network failures
Session replayYes, 30-day retention policy-Reproduce incidents using event trails and mobile breadcrumbs
Integrates withlogs, APM, profilerSplunk APM for Backend visibility
PricingStarting at $ 0.45 / 1,000 sessions / month*Starting at $ 14 / 10,000 sessions / moFree tier, pay as you go

Synthetic monitoring

FeaturesDatadogSplunkNew Relic
LocationsSimulated requests and actions from around the globe, and synthetic private locationsRun simulated tests from nearly 50 global locationsSimulate traffic across thousands of public and private locations
Monitor typesAllows single and chained requests at these levels: HTTP, SSL, DNS, WebSocket, TCP, UDP, ICMP, and gRPC health check.Request level, run level metrics, test-level, page-level, transaction-level metricsBroken links, certificate check, ping, step, simple and scripted browser monitors, and API tests.
TroubleshootingGet full context for troubleshooting failed test runs with correlated metrics, traces, and logsTrack and report SLOs and SLAs for uptime and performance. 300+ optimization recommendations to fix defects and improve UXIdentify issues from a third party, backend service, and infrastructure. Improve end-user experience with user-centric metrics
RecordRecord browser tests and monitors customer experiences with end-to-end testsCapture screenshots and simulated sessions. Configure test schedules and set up alerts.Scripted browsers tests are driven by Selenium WebJS to emulate customer navigation, action, and more
Test in CI/CD pipelineYes, Integrate to CI/CD pipelinesYes, automatically pass/fail builds based on the performance budget in your CI/CD pipelinesYes, Integrate to CI/CD and build pipelines
Starting priceUptime Tests $1/mo/ 10,000 requests (billed annually)Free tier, pay as you go
API Tests $ 5 /mo /10,000 test runs (billed annually)API Tests $ 4 /mo/10,000 test runs (billed annually)
Browser Tests $ 12 /mo /1,000 test runs (billed annually)Browser Tests $12/mo/1,000 test runs (billed annually)

Alerts and incident management

FeaturesDatadogSplunkNew Relic
NotificationsDatadog's web & mobile app, Slack app, Hangouts Chat, and Microsoft Teams, and moreMeta-data-rich alerts on any device, incl. iOS, and Android appsSmart detection distinguishes between critical and minor concerns. Scheduling and muting capabilities are also available
AutomationAutomatically apply alerts to new hosts, and detect anomalies in apps, infra, and services. Automated incident management workflowsAutomate scheduling, time-sensitive actions incl. escalations, war room, and post-incident resolutionBaseline conditions automatically adjust based on the system's behavior. Anomaly detection. Automatically sets permissions, no personal data collected.
ContextDescribe the incident and pass on assessment fields such as root cause, detection method, services, etc.Identify similar incidents using historical insights and audit trails. Use resources like run books, articles, and dashboards to help responders triage and resolve incidents fasterYou can include charts about the incident to provide context
Custom triggersWith an Anomaly monitor, set anomaly detection, trigger window, and recovery window. Advanced options with seasonality, algorithms availableRules Engine is a full-stack service level feature that allows you to set certain conditions, and trigger custom actionSet alert conditions specific to data sources or data behavior thresholds
Integrations100+ integrations works with your existing workflow100+ integrations out-of-the-boxYes, all major integrations
ReportsCreate, track, and report on critical SLOs and visualize them on dashboardsPost-Incident review, MTTA/MTTR performance report, On-call report, Incident frequency report
Starting Price$20/user/mo (billed annually)Up to 10 users - $ 5/user/mo (Billed annually)Free tier, pay as you go
10+ users $23/user/mo (billed annually)

Splunk On-Call was previously VictorOps, which was acquired by Splunk in 2018.

Network monitoring

FeaturesDatadogSplunkNew Relic
VisibilityMonitor the performance of connections among your hosts, services, virtual private clouds (VPCs), and other elements of your on-prem, public, or private cloud.-Analyze all of your network, app, infrastructure, and digital experiences on a single platform
Network metricsTraffic between any two endpoints, TCP retransmits, latency, connection churn,-Network syslogs, Network flow logs, cloud flow logs
Device MetricsAutomatically discover and collect metrics on your network from any device, drill down and create custom views to evaluate device performance-Device performance via SNMP
ForecastingUse forecasting to determine when interfaces will exceed their available bandwidth-
DNSAnalyze system-wide DNS performance, Assess DNS server health with request-volume, response-time, and error-code metrics,--
Starting priceNetwork Performance Monitoring $5 /mo/host (billed annually)Free tier, pay as you go
Network device monitoring $7/mo/device (billed annually)
  • New Relic’s solution is based on the ktranslate docker container. This single container image is hosted in your environment to collect and process your data to be exported to the Event, Metric, and Log APIs and displayed in New Relic.

As we can see from all of the above charts, Splunk and New Relic have nearly caught up to the offerings of Datadog on most fronts. Through their recent push towards observability, recent acquisitions, contributions to open-source projects, and partnerships. They also offer aggressive pricing, which undercuts Datadog’s and other expensive competition on most fronts.

Choosing your observability stack

As the business impact of outages rises day by day, more and more businesses are likely to spend on Observability tools. While the above comparisons give you a basic overview of which tools to use and how much each might cost, here are a few considerations to keep in mind while choosing the observability tool.

  1. Analyze which parts of your stack needs monitoring the most, then analyze and try out tools specific to that need. There are open-source and free tiers tools available for most of these features.
    1. Infrastructure
    2. Networks
    3. Application performance
    4. End-user experience
    5. Alerting
  2. Understand which pricing model works best for your current infra setup. Each vendor has different pricing models, so try out their free trials and see if their ease of use and transparency in pricing is comfortable for you.
    1. Workload-based pricing
    2. The volume of telemetry ingested
    3. Number of users
    4. Event-based pricing
  3. If you’re not quite ready for enterprise volume discounts, look to use multiple tools from different vendors. More than half of the customers surveyed use ten or more tools.
  4. Work on your existing strengths. Enterprise customers using these tools have dedicated IT teams set up for monitoring. As a smaller company looking to build on the cloud, leverage automation where possible so you can spend more time on your code and less time watching your systems.
  5. Try installing tools like Prometheus and Grafana to understand how much work it is before testing a tool like Datadog. This will help you understand if the time saved using these tools is worth the money you’ll spend on them.
  6. Have a way to train your team, and make it easy to onboard new employees into your stack. Once your observability stack is up and running, it is important to have good getting started guides (internal or external), which can help your team learn and work with your observability tools easily.

Conclusion

Choosing between Datadog, Splunk, and New Relic can be difficult as they have similar offerings at first glance. But, there are a few things each platform does better than the others. Using the feature-level comparisons and considerations mentioned above, you can narrow down and make the best decision that saves cost and works well with your stack.

While there is no shortage of observability tools in the market today, choosing from prominent vendors has its advantages in terms of compatibility, cost, and easier monitoring. And that's why companies today are choosing to consolidate their vendors.

If you are interested in seeing the progress the industry has made in the past 12 months and where it is headed, check out this insightful blog by Hayden James.

Argonaut removes the complexity out of your app and infra deployments. You can also integrate third-party apps like your favorite observability tool. Get started with Argonaut today.