Blog
Bending the rules with ranges
Ranges. If you are a web developer, depending on the use case, the phrase can make you quiver. Read up on it if you have a chance
. Ranges are somewhat obscure, especially if you want to do anything with them, for instance.... The latest project I'm working on is called "ForwardLink Protocol" I will avoid talking about that for now and just say, it is a reciprocal linking process that has to do with revisions in time and.... ranges.
The use case:
- User selects text on a page they find interesting (we'll call this the origin)
- That text is sent to their clipboard
- They then insert that text into a website they they are a contributer of (we'll call this a feeder)
- They text is smart enough to tell the origin that the feeder has cited a block of text and both the origin and feeder are notified that there is linkage between them
- A visitor to the feeder website sees a link and follows that to the origin where And this is the hard part the original block of text is highlighted with a link around it.
- Please note: The origin server is notified of the citing only after the article and copied text from the origin are created!
"So what is the problem?" you say? Four characters, "H.T.M.L". To put it lightly, all browsers modify the html the it has been sent at runtime, thus altering the original downloaded from the server. So back to ranges, how do I select a range? Embarrassingly as it may seem, I didn't have the answer.
- Position? - Nope, the page can change size depending on the browser and if the document is updated
- Object offset? - Read previous
- Text? - The user can select text between tags, partial tags, child to parent, there are no rules
So what is the solution?
Simple.... To ignore.
That is if we can sanitize the html down to just words, and do the same with the selected text ahhhhh, now we can compare them. But how? How can I strip a string of html and keep that position within an html document? That is what I thought about for a few hours... Remember jQuery.sheet? Jison to the rescue! We build a super simple parser to do the work for us. The parser can analyze words, html, and characters and return something in their place if needed, thus we don't need to change the whole document, just the parts that matter.
Here is how it works:
- the html body is analyzed into an array of words, a word is a group of characters that are either lowercase, uppercase, or numbers
- the phrase the was selected prior is analyzed in the same way that the html body was
- the html body words are then compared with the phrase words and I find the point at which they match up
- the html body is analyzed once more, this time we know where the words both start and end within the html body, and if a word is detected to be in that range it is encapsulated as the selected phrase or citation block
Success! And not only does it work well, but I haven't been able to find a place that it doesn't work, Internet Explorer, Firefox, Chrome. Now that isn't to say it doesn't have bugs, I'm sure it does, we are imperfect, but it works. Bending the rules feels so good, especially after hitting a few walls.
Check out the code for yourself
The down side to it is that an object in memory is not always the tag as text, many attributes can be changed, and once you run the analysis on the document dom object, all of this is lost. But you can't have everything I guess.
