I think if your proposal is always strictly better than not closing, then I support it. Programming in Arc tries to take away the need to read raw html, so we can handle a little extra verbosity in the emitted code in some situations.
It looks like the HTML specification defines this as a "non-void-html-element-start-tag-with-trailing-solidus parse error." The spec says that in this case, "The parser behaves as if the U+002F (/) is not present," but also that "[browsers] may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification."
I don't know of any browsers that abort the parsing altogether, so it's still reliable to write the HTML that way.
However, the similarity to XML is actively misleading in this case. When you process that document as HTML, you still get structure like this:
So if you're trying to write a polyglot HTML/XML document, self-closing <p /> tags still probably aren't a great option. Closing the paragraphs explicitly, like so, makes it clearer how the structure will end up:
I think modern HTML does have a reliable common subset with XML. Modern HTML treats <br></br> and <p /> as parse errors, but it treats <br /> and <p></p> as valid. To write HTML/XML polyglot content, you just need to pay attention to whether you're dealing with a void element like "br" or a non-void element like "p".
Incidentally, why use an HTML/XML polyglot at all? There are at least a few situations where it can make sense:
- You're serving it as HTML, but (at least someday) you might want to use an XML-processing tool on it or serve it as XHTML.
- You're trying to serve it as XHTML, but you're worried you'll mess up your server configuration and serve it as HTML by mistake.
- You're confident you can serve it as XHTML today, but you have a backup plan to serve it as HTML if needed. In particular, you're afraid someday your XHTML will be invalid due to a bug in your code, a bug in a browser, an intentional spec violation in a browser (e.g. for security or user privacy), or a backwards-incompatible change in the spec. The XHTML spec dictates that an invalid page won't be displayed at all, so if you end up with invalid XHTML for any of those reasons, your site will be rather unusable until you can implement a fix. If that happens at a time you're not ready to drop everything and look at the bug in depth, then you can make a pretty quick switch to serving it as HTML, and most of the page will display again.
Because of the brittle handling of errors, XHTML still hasn't really gotten off the ground. So it seems like the primary value of the HTML/XML polyglot is to serve a document as HTML but use XML-processing tools on it behind the scenes.
A side note...
In the very early days of XML and XHTML, when people were trying to make their HTML pages as XML-like as possible, many browsers would interpret something like <br/> as an element with the tag name "br/". That's why people got into the habit of putting in a space like <br />. That way those browsers would instead interpret the / as an attribute named "/", which was mostly harmless. Nowadays, the space is pretty much vestigial and you can just write <br/> if you want to.