The Document Object Model(DOM) is an application programming interface for HTML and XML documents. It defines the logical structure of documents and how a document can be accessed and manipulated using it.
The HTML Document, when parsed by the browser, is converted into DOM for all further operations. The DOM represents HTML Document as a tree structure of tags as shown below. For example:
1 2 3 4 5 6 7 8 9 10
<!DOCTYPE HTML> <html> <head> <title>DOM | Hackinbits</title> </head> <body> <h1>Welcome to hackinbits<h1> <p> Learn programming and technology in bits.<p> </body> </html>
You can edit the example document and see the tree structure at this link hixie.ch
Parsing of HTML Document by Browser
Let's discuss briefly how the HTML document is parsed by the browser and DOM is generated. When the browser processes the HTML document, it performs the following steps:
- Conversion: The browser first converts received data into individual characters based on specified character encoding of the document( ex.UTF-8).
- Tokenizing: In the next step, browser read strings of characters obtained from the first step and convert them into distinct tokens as specified by the W3C standards; for example "<html>" is a token.
- Lexing: The tokens produced in the second step are converted into "objects", which define their properties and rules.
- DOM construction: The objects created in this way are then linked to a tree data structure which also captures the relationship between HTML tags as defined in the original document. For example, The HTML object is the parent of body object, the body object is the parent of paragraph object and so on.
The DOM generated by the above steps is used by the browser for all further processing.