New Relic Dashboards as Code
Structlog, New Relic and NerdGraphQL API
Dive deep into structured logging in Python with our comprehensive guide. Explore New Relic’s potential from core features to NerdGraphQL, and master creating dashboards and alerts with hands-on snippets. A must-read for both novices and experts looking to enhance their monitoring skills!
What’s Struct log
Structured logging, often referred to as “struct logging,” is an approach to logging where log messages are not just free-form text, but rather adhere to a consistent structure or format, usually key-value pairs. This consistent format makes it easier to query, analyze, and understand logs, especially in larger and more complex systems where logs might be generated at a very high volume.
Python’s standard logging module provides a flexible framework for emitting logs from Python applications. It’s versatile and can be adapted to various needs, but by default, it primarily focuses on unstructured text-based logs. In modern, distributed systems, parsing and understanding unstructured logs can become cumbersome, especially when trying to correlate events across multiple services.
This is where structlog
comes into play. structlog
is a Python library designed to bring structured logging to Python applications without requiring a complete overhaul of existing logging setups. It builds upon the standard logging module but introduces a way to produce structured logs. With structlog
, you can seamlessly transition from traditional log messages to structured ones, enhancing the clarity and utility of your logs.
For example, instead of logging a message like “User John logged in from IP 192.168.1.1”, you might log the structured message:
{
“event”: “user_login”,
“username”: “John”,
“ip_address”: “192.168.1.1”
}
This structured format is much more machine-friendly and can be easily ingested into modern log analysis tools or platforms to produce insightful metrics and visualizations.
In summary, while the standard logging module in Python offers a foundational logging framework, structlog enhances this by facilitating structured logging, making log data more accessible, organized, and valuable.
Quick Setup for Struct Logging
structlog
provides a flexible way to implement structured logging in Python applications. It seamlessly integrates with the standard logging module while offering additional benefits like structured log messages. Let’s explore how to set up structlog
and utilize its capabilities effectively.
Initialization
To begin with, you can create a customizable initialization process that allows you to determine how logs will be generated and processed. Below is a possible initialization procedure for structlog
:
import structlog
import logging
import sys
def add_module_and_lineno(logger, name, event_dict):
frame, module_str = structlog._frames._find_first_app_frame_and_name(additional_ignores=[__name__])
event_dict["lineno"] = frame.f_lineno
return event_dict
def init_logging(level=logging.INFO):
logging.basicConfig(stream=sys.stdout, level=level, force=True)
cloud_reporting = (
[
SentryJsonProcessor(level=logging.ERROR),
send_to_newrelic,
structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M:%S.%f"),
structlog.processors.JSONRenderer(sort_keys=True),
]
if not LOCAL
else [structlog.dev.ConsoleRenderer(colors=LOCAL)]
)
structlog.configure(
wrapper_class=structlog.stdlib.BoundLogger,
processors=[
structlog.contextvars.merge_contextvars,
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.processors.StackInfoRenderer(),
structlog.dev.set_exc_info,
add_module_and_lineno,
structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M.%S", utc=False),
*cloud_reporting,
],
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
cache_logger_on_first_use=False,
)
log = structlog.get_logger()
log.propagate = False # Avoid propagation to root logger of AWS Lambda to avoid duplicates.
return log
logger = init_logging()
Tagging with Contextvars
and Sentry
Leveraging contextvars with Sentry provides a robust mechanism to tag your logs for better traceability and clarity. Here’s an example of how you can set a tag using contextvars and sentry_sdk
:
import sentry_sdk
def set_tag(name, value):
sentry_sdk.set_tag(name, value)
bind_contextvars(**{name: value})
Emitting Events with Logger
Logging events, especially with specific identifiers like event IDs, can be invaluable for metrics, statistics, and monitoring. Here’s an example of emitting a warning log with an event ID:
try:
# Some operation...
except RetryableError as ex:
monitoring.set_status(monitoring.STATUS_RETRYABLE_ERROR)
logger.warning(f"Retrying: {str(ex)}", event_id=monitoring.EVENT_RETRY)
self.retry_message_later(record)
raise
In this scenario, if a RetryableError
occurs, the code sets a retry status and then logs a warning with a specific event ID. This makes it simpler to trace and analyze specific events in your logs, especially when aggregating event statistics.
With this setup, you can effectively utilize structlog
in your Python application to enhance the quality and usability of your logs. By combining structured logging with tools like Sentry, you ensure better traceability and insights from your application’s operations.
New Relic
New Relic is a powerful cloud-based platform designed to help developers monitor, debug, and optimize the performance of their applications. It operates as an application performance management (APM) tool, but it offers much more than just performance insights. New Relic provides real-time data analytics, infrastructure monitoring, digital customer experience, and more, making it a comprehensive solution for end-to-end observability in modern software environments.
One of the standout features of New Relic is its capability for logs collection. Logs, as we’ve discussed with structlog
, provide crucial insights into the operational aspects of an application. New Relic’s Logs platform enables you to aggregate, filter, and analyze logs from various sources, offering a centralized space for all your log data. When combined with structured logs, like those produced by structlog
, New Relic becomes an even more potent tool, allowing for faster troubleshooting and more profound insights.
A notable aspect of New Relic’s logging system is its use of NRQL (New Relic Query Language). NRQL is a query language tailored to New Relic’s platform, allowing users to fetch, filter, and analyze their data seamlessly. For those familiar with SQL, NRQL will feel quite intuitive. When applied to a structlog
based event database, NRQL shines by making audit and data requests fast and straightforward. Imagine the ability to quickly query structured logs using a language like:
SELECT * FROM log WHERE event_type = 'user_login' AND user = 'JohnDoe'
This intuitive approach to log analysis significantly reduces the time spent sifting through logs and brings forth actionable insights with minimal effort.
In conclusion, New Relic serves as an all-in-one platform for application monitoring and analysis. Its integration capabilities, especially when dealing with structured logs, paired with the power of NRQL, make it an indispensable tool in a developer’s toolkit.
Sending Structlog to New Relic
New Relic offers robust integrations with major cloud platforms such as AWS, GCP, and Azure, simplifying the process of collecting and analyzing logs from applications hosted on these platforms. Additionally, for traditional setups, New Relic provides standalone agents to ensure that wherever your application resides, you can still take advantage of New Relic’s insights.
However, while New Relic’s integration with AWS CloudWatch is straightforward, it doesn’t inherently support structlog out of the box. This means, for those utilizing structured logging with structlog, some additional steps are needed. Fortunately, you can send your structured logs directly to New Relic using their API.
Below is a Python function that demonstrates how you can send structured logs from structlog to New Relic:
import httpx
import json
import os
NEW_RELIC_LICENSE_KEY = None
MAX_NR_LOG_SIZE = 512
def send_to_newrelic(logger, log_method, event_dict):
global NEW_RELIC_LICENSE_KEY
if not NEW_RELIC_LICENSE_KEY:
raw = os.getenv("NEW_RELIC_LICENSE_KEY")
NEW_RELIC_LICENSE_KEY = json.loads(raw)["LicenseKey"]
if not NEW_RELIC_LICENSE_KEY or event_dict["level"] == "debug":
return event_dict
headers = {"X-License-Key": NEW_RELIC_LICENSE_KEY}
event_dict["event"] = str(event_dict["event"])[:MAX_NR_LOG_SIZE]
event_dict.pop("sentry")
payload = {
"message": event_dict["event"],
"attributes": {"environment": ENV, **event_dict},
}
try:
# Adjust the URL based on the region.
endpoint = "https://log-api.eu.newrelic.com/log/v1" if YOUR_REGION == 'EU' else "https://log-api.newrelic.com/log/v1"
response = httpx.post(endpoint, json=payload, headers=headers)
except Exception as ex:
logger.info(f"Can't send logs to new relic: {ex}")
return event_dict
if not response or not response.is_success:
logger.info(f"Can't send logs to new relic: {response} {response.text}")
return event_dict
Note the endpoint URL. If your services operate in Europe, ensure the ‘EU’ region is specified in the URL (log-api.eu.newrelic.com). Otherwise, for other regions, use log-api.newrelic.com.
With this function in place, you can seamlessly direct your structured logs to New Relic, ensuring they’re stored, analyzed, and accessible when you need them.
NerdGraphQL
GraphQL is a dynamic query language for APIs that lets clients request exactly the data they need. NerdGraphQL is New Relic’s adaptation of GraphQL, offering a flexible interface for the platform. Through NerdGraphQL, users can directly engage with the vast datasets in New Relic, including the ability to programmatically create dashboards and alerts.
The advantage of creating dashboards and alerts via code lies in its precision. It eliminates potential mismatches between the sent data and the monitored metrics, ensuring that what you push aligns with what you observe. Moreover, having these configurations as part of your codebase ensures auditability. Every change is tracked within your repository, reducing the risk of unauditable modifications by users through other means. In essence, NerdGraphQL not only brings the power of GraphQL to New Relic but also reinforces best practices in monitoring through code-driven configurations.
Creating and Configuring Dashboards in New Relic with NerdGraphQL
To ensure that our New Relic dashboards dynamically reflect the data we desire, we’ll use NerdGraphQL to programmatically create or update widgets. The provided code segments consist of functions that query the NerdGraphQL API to achieve this, and make use of the tenacity
library to handle retries, given the occasional unpredictability of the API.
Querying NerdGraphQL
The query_nerdgraph
function sends a GraphQL query to New Relic’s API, leveraging the requests
library. Using tenacity, the function is designed to retry up to 5 times or for 10 seconds if any issues arise. Should the response return with a status code above 300, an exception is raised.
@tenacity.retry(stop=tenacity.stop_after_attempt(5) | tenacity.stop_after_delay(10))
def query_nerdgraph(query):
headers = {"API-Key": NEW_RELIC_ACCOUNT_SECRET}
response = requests.post(ENDPOINT, headers=headers, json={"query": query}, timeout=20)
if response.status_code > 300:
raise RuntimeError(f"Nerdgraph query failed with a {response.status_code}.")
return json.loads(response.content)
Creating or Updating Widgets
The create_or_update_widget
function either creates a new widget or updates an existing one on a dashboard. Depending on whether a widget with the provided title exists, the function will use the appropriate GraphQL mutation template to perform the desired operation.
def create_or_update_widget(widget, title, config, page):
template = CREATE_WIDGET_QUERY if not widget else UPDATE_WIDGET_QUERY
context = {
"account_id": NEW_RELIC_ACCOUNT_ID,
"widget_id": widget["id"] if widget else None,
"page_guid": page["guid"],
"title": title,
**config,
}
context.setdefault("style", "line")
query = template % context
response = query_nerdgraph(query)
errors = response.get("errors")
if not errors:
logger.info(f"Updated or created {title}: {response}")
else:
logger.error(f"Errors while updating or creating a widget: {errors}")
Dashboard Configuration
The create_dashboards
function manages dashboard configurations, going through each widget’s config and either creating or updating the widget on the dashboard using the previously mentioned function. You would need to create the dashboards in advance and get their GUIDs first.
def create_dashboards(dashboard_name=None): # None == all
for dashboard_guid, widgets_config in DASHBOARDS_CONFIG.items():
response = query_nerdgraph(GET_DASHBOARD_QUERY % dashboard_guid)["data"]["actor"]["entity"]
if dashboard_name and dashboard_name != response["name"]:
continue
logger.info(f"Configuring dashboard {dashboard_name}")
page = response["pages"][0]
page_name = page["name"]
row = 1
column = 1
for title, config in widgets_config.items():
if type(config) is str:
config = {"nrql": config}
logger.info(f"Configuring widget [{page_name}][{title}]")
widget = next((widget for widget in page["widgets"] if widget["title"] == title), None)
config.setdefault("row", row)
config.setdefault("column", column)
column += 4
if column > 12:
column = column % 12
row = (row + 4) % 12
create_or_update_widget(widget, title, config, page)
if dashboard_name and dashboard_name == response["name"]:
break
The associated GraphQL queries are built using New Relic’s GraphQL interface, accessible at https://api.eu.newrelic.com/graphiql. This intuitive interface allows users to construct, test, and visualize their GraphQL queries before integrating them into their applications. In our provided code, there are templates for both retrieving dashboard information (GET_DASHBOARD_QUERY
) and for updating or creating widgets (UPDATE_WIDGET_QUERY
and CREATE_WIDGET_QUERY
).
GET_DASHBOARD_QUERY = """
{
actor {
entity(guid: "%s") {
... on DashboardEntity {
guid
name
pages {
guid
name
widgets {
rawConfiguration
id
title
}
}
}
}
}
}
"""
UPDATE_WIDGET_QUERY = """
mutation {
dashboardUpdateWidgetsInPage(
guid: "%(page_guid)s"
widgets: {
id: %(widget_id)s
title: "%(title)s"
configuration: {
%(style)s: {
nrqlQueries: {
accountId: %(account_id)s
query: "%(nrql)s"
}
}
}
layout: {
column: %(column)s
row: %(row)s
width: 4
height: 3
}
}
) {
errors {
description
type
}
}
}
"""
CREATE_WIDGET_QUERY = """
mutation {
dashboardAddWidgetsToPage(
guid: "%(page_guid)s"
widgets: {
title: "%(title)s"
configuration: {
%(style)s: {
nrqlQueries: {
accountId: %(account_id)s
query: "%(nrql)s"
}
}
}
layout: {
column: %(column)s
row: %(row)s
width: 4
height: 3
}
}
) {
errors {
description
type
}
}
}
"""
It’s worth noting that by programmatically creating or updating dashboard widgets, you have the power to automate and standardize monitoring dashboards across various environments or applications, ensuring consistency and reducing manual overhead.
Creating and Configuring Alerts in New Relic with NerdGraphQL
To proactively monitor and respond to specific events or states within our application, we’ll use NerdGraphQL to programmatically manage alert policies and conditions in New Relic. Here’s a succinct overview of the provided code:
Setting Up Alerts
The function create_alerts runs through a predefined configuration (ALERTS_CONFIG
) for each alert policy. It checks if the alert policy already exists using the FIND_POLICY
template, creates one if necessary using the CREATE_POLICY
template, then checks for conditions linked to that policy with the FIND_CONDITION
template, and finally either creates or updates the conditions using CREATE_CONDITION
or UPDATE_CONDITION
respectively.
Don’t forget to set up alerting over slack or other channel that you prefer.
def create_alerts():
for name, config in ALERTS_CONFIG.items():
logger.info(f"Checking policy {name}")
policy = query_nerdgraph(FIND_POLICY % {"account_id": NEW_RELIC_ACCOUNT_ID, "name": name})["data"]["actor"][
"account"
]["alerts"]["policiesSearch"]["policies"]
if not policy:
logger.info(f"Creating policy {name}")
policy = query_nerdgraph(CREATE_POLICY % {"account_id": NEW_RELIC_ACCOUNT_ID, "name": name})["data"][
"alertsPolicyCreate"
]
else:
policy = policy[0]
condition = query_nerdgraph(FIND_CONDITION % {"account_id": NEW_RELIC_ACCOUNT_ID, "policy_id": policy["id"]})[
"data"
]["actor"]["account"]["alerts"]["nrqlConditionsSearch"]["nrqlConditions"]
if not condition:
logger.info(f"Creating conditions for {policy}")
response = query_nerdgraph(
CREATE_CONDITION % {"account_id": NEW_RELIC_ACCOUNT_ID, "policy_id": policy["id"], **config},
)
logger.info(response)
else:
logger.info(f"Updating conditions {condition} for {policy}")
condition = condition[0]
response = query_nerdgraph(
UPDATE_CONDITION
% {"account_id": NEW_RELIC_ACCOUNT_ID, "policy_id": policy["id"], "id": condition["id"], **config},
)
logger.info(response)
LIST_POLICIES = """
{
actor {
account(id: %s) {
alerts {
policiesSearch {
nextCursor
policies {
id
name
incidentPreference
}
}
}
}
}
}
"""
CREATE_POLICY = """
mutation {
alertsPolicyCreate(
accountId: %(account_id)s
policy: {name: "%(name)s", incidentPreference: PER_CONDITION}
) {
accountId
id
incidentPreference
name
}
}
"""
FIND_POLICY = """
{
actor {
account(id: %(account_id)s) {
alerts {
policiesSearch(searchCriteria: { name: "%(name)s" }) {
policies {
id
name
}
}
}
}
}
}"""
CREATE_CONDITION = """
mutation {
alertsNrqlConditionStaticCreate(
accountId: %(account_id)s,
condition: {
enabled: true
name: "Condition"
nrql: { query: "%(nrql)s" }
terms: {
operator: ABOVE
priority: CRITICAL
threshold: %(threshold)d,
thresholdDuration: %(threshold_duration)d,
thresholdOccurrences: ALL
}
}
policyId: "%(policy_id)s"
) {
id
name
}
}
"""
FIND_CONDITION = """
{
actor {
account(id: %(account_id)s) {
alerts {
nrqlConditionsSearch(searchCriteria: {name: "Condition", policyId: "%(policy_id)s"}) {
totalCount
nrqlConditions {
id
name
}
}
}
}
}
}
"""
UPDATE_CONDITION = """
mutation {
alertsNrqlConditionStaticUpdate(
accountId: %(account_id)s
condition: {
nrql: { query: "%(nrql)s" }
terms: {
operator: ABOVE
priority: CRITICAL
threshold: %(threshold)d,
thresholdDuration: %(threshold_duration)d,
thresholdOccurrences: ALL
}
}
id: "%(id)s"
) {
id
name
}
}
"""
Config examples
AWS_DASHBOARD_CONFIG = {
"Queue sizes": {
"nrql": f"SELECT max(`aws.sqs.ApproximateNumberOfMessagesVisible`) FROM Metric "
f"WHERE aws.sqs.QueueName not like '%dlq%' "
f"TIMESERIES since {DEFAULT_TIMEFRAME} ago FACET `aws.sqs.QueueName` LIMIT 1000",
},
"DLQ sizes": {
"nrql": "SELECT max(`aws.sqs.ApproximateNumberOfMessagesVisible`) "
"FROM Metric where aws.sqs.QueueName like '%dlq%' "
"timeseries since 1 day ago FACET `aws.sqs.QueueName` LIMIT 1000",
"style": "area",
},
"Publishing rates": f"SELECT sum(`aws.sqs.NumberOfMessagesSent`) FROM Metric since {DEFAULT_TIMEFRAME} ago "
f"timeseries facet aws.sqs.QueueName ",
"DDB Table sizes": f"FROM Metric SELECT max(aws.dynamodb.ItemCount) TIMESERIES since {DEFAULT_TIMEFRAME} ago "
f"FACET aws.dynamodb.TableName",
"DDB Source products": f"FROM Metric SELECT max(aws.dynamodb.ItemCount) TIMESERIES since {DEFAULT_TIMEFRAME} ago "
f"FACET aws.dynamodb.TableName where aws.dynamodb.TableName like '%sourceProducts%'",
"DDB gtins router": f"FROM Metric SELECT max(aws.dynamodb.ItemCount) TIMESERIES since {DEFAULT_TIMEFRAME} ago "
f"FACET aws.dynamodb.TableName where aws.dynamodb.TableName like '%gtinsRouting%'",
"DDB target products": f"FROM Metric SELECT max(aws.dynamodb.ItemCount) TIMESERIES since {DEFAULT_TIMEFRAME} ago "
f"FACET aws.dynamodb.TableName where aws.dynamodb.TableName like '%TargetProducts%'",
}
DASHBOARDS_CONFIG = {
AWS_DASHBOARD_GUID: AWS_DASHBOARD_CONFIG,
}
ALERTS_CONFIG = {
"Too many retryable errors": {
"nrql": f"SELECT count(*) from Log where event_id='{EVENT_RETRY}' and environment = 'production'",
"threshold": 1000,
"threshold_duration": 10 * 60,
}
}
Harnessing the capabilities of New Relic’s NerdGraphQL API provides a powerful way to programmatically manage and configure dashboards and alerts. The provided code snippets showcase a systematic approach to querying, updating, and creating widgets for dashboards, as well as establishing alert conditions based on specific criteria. By integrating these scripts into your system management routines, you not only streamline your monitoring setup but also ensure consistency, traceability, and prevent potential discrepancies that may arise from manual configurations. Leveraging such automation practices underlines a commitment to precision and robust system monitoring.