HTML and XML Inspectors
The Inspectors API provides a useful way to review or manipulate the output of your HTML or XML resources. The API utilizes Nokogiri, a Ruby gem which lets you work with a DOM-like API directly on the nodes of a document tree.
This is a safer approach of modifying HTML/XML content than alternatives such as string manipulation or regular expressions which can be prone to error or fail on unexpected input.
Bridgetown doesn’t directly install the Nokogiri gem, so be sure to run bundle add nokogiri
if it isn’t already part of your bundle.
Inspectors will only apply to files Bridgetown considers Resources. Thus any HTML or XML file in your project lacking front matter won’t get processed through your Inspectors. Make sure you add two lines of triple dashes ---
to the top of any file to indicate it’s a Resource.
Your First Inspector #
Let’s add an oft-requested feature to our site: automatic target="_blank"
attributes on all outgoing links. It’s easy with an HTML Inspector.
We’ll create a new builder plugin and use the inspect_html
method to access the Nokogiri document and update all the relevant links.
class Builders::Inspectors < SiteBuilder
def build
inspect_html do |document|
document.query_selector_all("a").each do |anchor|
next if anchor[:target]
next unless anchor[:href]&.starts_with?("http") && !anchor[:href]&.include?(site.config.url)
anchor[:target] = "_blank"
end
end
end
end
Note that query_selector_all
is an alias for Nokogiri’s css
method. We also provide query_selector
as an alias for at_css
.
In the example above, we loop through all a
tags, skip the tag if it already has a target or is not a true external link, otherwise we set the target attribute to _blank
.
Another example of a feature you might want to add is to append “#” links to the ends of headings in your content so that people can copy a permalink to that particular heading. It’s easy with this HTML Inspector:
inspect_html do |document|
document.query_selector_all("article h2[id], article h3[id]").each do |heading|
heading << document.create_text_node(" ")
heading << document.create_element(
"a", "#",
href: "##{heading[:id]}",
class: "heading-anchor"
)
end
end
You can see this in action right on this very page!
Inspector blocks support an optional second resource
argument if you need access to the underlying Resource object.
Works with XML Too #
If you need to work with XML files such as feeds or sitemaps, you can do this as well with the inspect_xml
method. It works just like inspect_html
, except that it can optionally take an extension argument (the default is xml
).
inspect_xml do |document, resource|
# Work on any .xml file, or…
# Manually check the specific XML format:
next unless document.root.name == "urlset"
# Yay, we found a sitemap!
end
inspect_xml "opml" do |document|
# OPML files are outlines which can contain URLs or other structured text.
urls = document.query_selector_all("outline[url]").map { _1[:url] }
# Do something with the list of URLs in the .opml file…
end
Performance Considerations #
All resources which result in HTML or XML output (rather than JSON or some other format) will be processed through any defined Inspectors. For greater performance and fidelity, the Nokogiri document for a single resource will be the same across all Inspectors (rather than instantiating a new Nokogiri document for each Inspector).
Nokogiri relies on a C extension which in turn uses libxml2
. You should see fast performance unless the number of resources in your project is quite extensive.
If you find yourself needing to bypass Inspectors for certain, large resources to avoid the overhead of using Nokogiri, you can set the front matter variable bypass_inspectors: true
to instruct Nokogiri not to parse that resource. To apply this to a whole array of resources, make it a default with front matter defaults.