Search code examples
epub

Does ePub restrict HTML to only some subset?


I was thinking about creating an ePub reader. All the ePub files I have seen so far seemed very simple: just text paragraphs with some big font for the title, and some rectangular illustration images. So, I thought ePub provides only simple ways to describe the text content.

But it seems that an ePub file contains lots HTML and CSS. I opened a sample ePub and it contained text in <p> with the class attribute. Does it mean that it can basically be like a website archive? The author can use any advanced formatting/layout feature that are used when creating an HTML website? If so, I would have to implement a whole web browser to create an ePub reader.

Or, is the HTML allowed in ePub are somehow restricted to only certain HTML tags and attributes, like the HTML that is allowed when writing on an online forum.


PS: I did some research on my own after posting this, and my conclusion is that it is the former. I have tried some famous ePub apps on the Android market, and they all seem to be weird in terms of GUI (meaning, probably non-native),and whilst there does not seem to be a definitive way to know whether an app is native or a web-app, one trick was enabling the layout boundary, and those apps do not have boundaries inside the ePub view itself, meaningly it probably is a web-view.

I searched GitHub for ePub viewers, and they all seem to be using JavaScript or a web-view, including this Android ePub viewer.

So, probably those ePub apps are just parsing the meta data files in the ePub format, and for the rendering of the book itself, they are just delegating that to the web-view and using some sort of JavaScript framework to add a UI on the web-view.

If someone knows better, please correct me.


Solution

  • My understanding of previous ePub specs is that it is a web archive of sorts. A compressed archive consisting of metadata, fonts, images, and content.

    It used to be that this content was only in a specially-flavored XHTML format, but it looks like they've also added SVG content documents. I've admittedly lost track of the ePub spec changes (I didn't realize they had merged efforts with the W3C), but hopefully the spec links above can give an idea of what's different between a standard html5 web page and what epub expects.


    EDIT: I should also mention that a lot of the readers I worked with back in the day had the bad habit of stripping out formatting and just presenting text (not even text with embedded fonts -- a big no-no for non-English texts). Not sure if this was the reader software being "robust" and acting against ePub formatting that would break their app, or something else.