Skip to main content

std/xml — Tree-walk performance builtins

std/xml is feature-complete for correctness: parse, findAll, getAttr, getChildren, and the streaming parseFold family cover every shape of input. This page documents the four builtins added in v0.21.0 that exist purely to cut the FFI/allocation cost of walking a parsed XmlNode tree.

The motivation came from a sunholo/ailang-parse profile: one walk of a 79 KB HTML page produced 21,651 function calls and 43,732 trace events, dominated by getChildren allocating a fresh [XmlNode] per node and getAttr paying a separate FFI crossing per attribute. The fix is API-level, not runtime-level — three new builtins that fold what the walker was doing into one call.

All four functions operate on the existing XmlNode ADT. std/html callers benefit for free because std/html.parse returns the same shape.

When to use

Pick the primitive by the shape of your result, not by what your old code happens to look like:

You wrote…Use insteadReason
foldl(f, init, getChildren(node)) (scalar accumulator: count/sum/string/map)foldChildren(node, init, f)Skips the allocated [XmlNode] and one FFI crossing per visit.
flatMap(processChild, getChildren(node)) (walker output is a flat list)flatMapChildren(node, processChild)foldChildren is the wrong primitive here — its AILANG-side accumulator (prepend+reverse) allocates O(N²) bytes vs flatMapChildren's O(N) Go-native append.
map(processChild, getChildren(node)) (one output element per child)mapChildren(node, processChild)Same reasoning. Pre-sized output slice, exact length known.
First child matching a predicate via filter + headfoldChildrenStep(node, None, \_, c. if pred(c) then Stop(Some(c)) else Continue(None))Stops the walk on the first match instead of visiting every child.
Pulling N attributes from one element via N × getAttrlet m = getAttrMap(node) in (map.get(m, "a"), map.get(m, "b"), …)Collapses N FFI crossings into one. Crossover at N ≥ 2.
if getTag(n) == "" then … // text? // else …match nodeKind(n) { KindElement => …, KindText => …, KindComment => … }Pattern-matchable, exhaustiveness-checkable, one FFI call.

If you only need a single attribute (N=1), keep using getAttrgetAttrMap allocates a map and is a net loss there.

Footgun warningfoldChildren is for scalar accumulators only. If your handler returns a list and you accumulate via prepend + reverse (or by repeatedly appending immutable lists), you will allocate more than the classic flatMap(_, getChildren(_)) did, not less. flatMapChildren exists exactly for the list-output case. This was the v0.21.0 M4 hotfix after sunholo/ailang-parse reported a 229 MB → 395 MB regression on a 1.7 MB Mollie page.

API

foldChildren[a](node: XmlNode, init: a, f: (a, XmlNode) -> a) -> a

Fold over the direct children of an Element without materialising [XmlNode]. Non-Element nodes (Text, Comment) return init unchanged.

import std/xml (foldChildren, getTag)

-- Count how many direct <li> children a <ul> has.
func countListItems(ul: XmlNode) -> int =
foldChildren(ul, 0, \acc, child.
if getTag(child) == "li" then acc + 1 else acc
)

Cost: O(direct children) calls to f. Zero intermediate list allocation. Compared to foldl(f, init, getChildren(n)), saves one [XmlNode] allocation and the per-element FFI crossings that materialising the list would cost.

foldChildrenStep[a](node: XmlNode, init: a, f: (a, XmlNode) -> FoldStep[a]) -> a

Bounded variant of foldChildren. The handler returns FoldStep[a] = Continue(a) | Stop(a) (from std/iter). Stop(acc') halts the walk immediately and returns acc'.

import std/iter (FoldStep, Continue, Stop)
import std/option (Option, None, Some)
import std/xml (foldChildrenStep, getTag)

-- First direct child whose tag is "h1".
func firstH1(node: XmlNode) -> Option[XmlNode] =
foldChildrenStep(node, None, \acc, c.
if getTag(c) == "h1"
then Stop(Some(c))
else Continue(acc)
)

Cost: O(prefix until Stop). Mirrors the existing parseFoldStep short-circuit pattern.

getAttrMap(node: XmlNode) -> Map[string, string]

Build a Map[string, string] from all attributes of an Element. Non-Element nodes return an empty map. Duplicate attribute names: last write wins (consistent with findAttrs semantics).

import std/xml (getAttrMap)
import std/map as map

-- One FFI call, then O(1) per attribute lookup.
func extractImgFields(img: XmlNode) -> {src: Option[string], alt: Option[string], srcset: Option[string]} =
let attrs = getAttrMap(img) in
{ src = map.get(attrs, "src")
, alt = map.get(attrs, "alt")
, srcset = map.get(attrs, "srcset")
}

Cost: 1 FFI call + O(attrs) Go-side map build, then O(1) per lookup. Crossover with getAttr at N ≥ 2 attribute probes per node. Source order is not preserved — if you need ordering, keep using the existing Element(_, attrs, _) field directly, which is the ordered list.

nodeKind(node: XmlNode) -> NodeKind

Classify a node as KindElement | KindText | KindComment without string-equality on the tag. Pattern-matchable and exhaustiveness-checkable.

import std/xml (NodeKind, KindElement, KindText, KindComment, nodeKind, getText)

func render(n: XmlNode) -> string =
match nodeKind(n) {
KindElement => renderElement(n),
KindText => getText(n),
KindComment => ""
}

Cost: 1 FFI call returning a tagged value with no fields.

type NodeKind = KindElement | KindText | KindComment

A small ADT with three nullary constructors. The constructors are prefixed Kind to avoid colliding with XmlNode's own Element / Text / Comment constructors — both types are exported from std/xml, and the prefix lets you import both without rename.

flatMapChildren[a](node: XmlNode, f: (XmlNode) -> [a]) -> [a]

Added v0.21.0 (M4 hotfix). Go-iterative flatMap over the direct children of an Element. f returns a list per child; results are concatenated into one Go-native slice with O(1) amortised append. Use this — not foldChildren — when the walker output is a flat list. Non-Element nodes return [].

import std/xml (flatMapChildren)

-- Process every direct child into 0+ Block outputs; flatten the result.
func htmlProcessChildren(node: XmlNode) -> [Block] =
flatMapChildren(node, htmlProcessNode)

Cost: 1 FFI call returning a single allocated slice. No [XmlNode] intermediate, no AILANG-side accumulator. Drops the prepend+reverse penalty foldChildren would impose for list output.

mapChildren[a](node: XmlNode, f: (XmlNode) -> a) -> [a]

Added v0.21.0 (M4 hotfix). Like flatMapChildren but each callback returns exactly one output element. Output slice is pre-sized to the children count. Non-Element nodes return [].

import std/xml (mapChildren, getTag)

-- Snapshot of the direct child tag names in document order.
func childTags(node: XmlNode) -> [string] =
mapChildren(node, \c. getTag(c))

Cost: 1 FFI call. Exactly len(children) output elements; no resize.

Cost model

The four builtins target three concrete bottlenecks identified by the ailang-parse profile. Use this table to predict whether a rewrite will pay off:

PatternFFI crossings / nodeAllocations / nodeAfter rewrite
foldl(f, init, getChildren(n))1 (getChildren) + 1 per child1 × [XmlNode]foldChildren: 1 FFI call total, zero list alloc
getAttr(img, k) over 100 images7000getAttrMap: 100 FFI calls + 100 small map allocs, O(1) lookups
getTag(n) == "" text-detect1 (getTag) + 1 string compare1 × stringnodeKind: 1 FFI call returning a tagged variant

The wins compound. A walker that does all three on every node — fold direct children, classify each, then for each element pull multiple attributes — sees the AILANG-level wallclock drop materially. Measured numbers (see "Measured impact" below) are 20-29% wallclock reduction on the benchmark example, with the upper end approached as attribute fan-out grows.

The wins do not appear in Go-level microbenchmarks. FnCallerN in test mode is a direct Go call with no closure-invocation overhead — exactly the overhead the design targets. Quote AILANG-level timings (from runnable examples) when reporting impact; Go benches only verify the floor cost.

Measured impact

From examples/runnable/xml_walk_perf.ail — a synthetic tree of 2,000 <row> children with 10 attributes each, walked via clock.now() wallclock comparison:

WalkerTime (median of 5 runs)
Classic: foldl + getChildren + 10× getAttr / child~130 ms
New: foldChildren + getAttrMap + 10× pure map lookups~103 ms

≈ 20 % wallclock reduction; roughly 18,000 fewer AILANG↔Go crossings per walk. The win scales with attributes-probed-per-node: for the workload that drove this design (<img> elements with 7 attributes each, ≥100 images per page), expected reduction is closer to 30 %.

Run the example yourself:

ailang run --caps Clock examples/runnable/xml_walk_perf.ail

Migration: rewriting a walker

A typical walker node -> [Block] looks like this before:

func processChildren(node: XmlNode) -> [Block] =
flatMap(processNode, getChildren(node))

func processNode(n: XmlNode) -> [Block] =
if getTag(n) == ""
then [TextBlock(getText(n))]
else
let src = getAttr(n, "src") in
let alt = getAttr(n, "alt") in
let title = getAttr(n, "title") in
[ImgBlock(src, alt, title)]

After (one FFI call per concern):

import std/xml (foldChildren, getAttrMap, nodeKind, KindElement, KindText, KindComment, getText)
import std/map as map

func processChildren(node: XmlNode) -> [Block] =
foldChildren(node, [], \acc, c. append(acc, processNode(c)))

func processNode(n: XmlNode) -> [Block] =
match nodeKind(n) {
KindText => [TextBlock(getText(n))],
KindComment => [],
KindElement =>
let attrs = getAttrMap(n) in
[ImgBlock(map.get(attrs, "src"), map.get(attrs, "alt"), map.get(attrs, "title"))]
}

For long child lists, the append-per-step pattern above is O(n²); accumulate into a chunked builder or reverse-cons-then-reverse if that becomes the bottleneck.

  • std/iterFoldStep, Continue, Stop for the foldChildrenStep handler shape.
  • std/xml.parseFold / parseFoldStep — the same fold-over-bounded-input pattern, but at the parse level for streaming large documents.
  • std/html — returns the same XmlNode ADT, so all four builtins apply unchanged to HTML walks.
  • Stdlib Index — the full std/xml and std/html surface.

See also

The "Cost visibility" axiom (A9) requires every new stdlib function to ship with an explicit cost model. The table in Cost model is the contract.