Our first solution was to build a custom web crawler that would extract our content and convert it to JSON or Markdown.
It worked—kind of.
But it was brittle. Since our original site wasn’t built with semantic HTML, even minor design changes could break the parser. We quickly realized that trying to retrofit clean structure onto an unstructured system was the wrong approach.