HTML DOM for mIRC ( $dom )
Written by PennyBreed @ irc.nuphrax.com on #botdev
This set of aliases will allow you to retrieve, query and traverse HTML DOM
using an InternetExplorer instance. The calls support nesting, example:
$dom($dom($dom_open(http://foo.bar),xpaths,link),str,type)
This was my primary goal with my $xml script, before I realized it would not accept
malformed HTML, but finished it anyway. These scripts basically simplify the $com calls
and provide a quick way to clean them up, using /dom_free and /dom_freeall
Instead of loading the page in IE, I THINK you can com into ms's DOM handler, and specify a page or local file instead of requiring IE to download and parse the page. Will report back if I find what I'm looking for. Also, mmmm coms!
Found what I was looking for: msxml2.DOMDocument.6.0, After creating the com, you can load xml/html locally instead of requiring the use of IE to download the file.
If you want to download a page, then traverse the DOM, you can use msxml2.xmlhttp or similar to get the page, then use msxml2.DOMDocument to parse what's downloaded. It should be quite a bit faster than using InternetExplorer.Application to download and parse the page/file into a usable DOM
msxml2.DOMDocument Object Reference: http://msdn.microsoft.com/en-us/library/ms766487(v=vs.85).aspx
msxml.xmlhttp Object Reference: http://msdn.microsoft.com/en-us/library/ms759148(v=vs.85).aspx
Soo, after a bit of researching and some hit and miss coding, I think I have something of interest. Instead of using DOMDocument, have you tried htmlfile. It allows for malformed html and xml, while still giving the ability to load a file locally.
I haven't researched too much into to see if it can get and process a remote file + javascript, but here's a simple test alias as an example:
; /domLoad ComName random HTML Code
alias domLoad {
comopen $1 htmlfile
var %page = $1 $+ page
if ($comerr) {
echo -a :: comerr on open
}
elseif (!$com($1,open,1)) || $comerr) {
echo -a :: comerr when opening htmlfile
}
elseif (!$com($1,write,1,bstr, $2-) || $comerr) {
echo -a :: comerr when writing htmlfile
}
elseif (!$com($1,close,1) || $comerr) {
echo -a :: comerr when closing htmlfile
}
/*
This is just for testing purposes.
From here you could start using getElement*By*()
methods or similar
*/
elseif (!$com($1, body, 2, dispatch* %page) || $comerr) {
echo -a :: comerr when retriving body
}
elseif (!$com(%page, innerHTML, 2) || $comerr) {
echo -a :: comerr when retriving innerHTML of body
}
else {
echo -a :: $com(%page).result
}
:error
reseterror
comclose $1
comclose %page
}
--
I'd like to apologize if I've come off as being smug or snobbish while commenting (attitudes don't get transferred with text), my intentions were to be insightful and give feedback, not to be egotistical or arrogant.
These don't accept malformed HTML, and always throw parsing errors. I actually started out using these, before I recognized that. I saved the old code and made it strictly for XML, but have not posted it yet. I've not really been doing any msl. Furthermore, loading them in IE allows you to wait until the page is loaded, and access information filled by javascript or other means.