Phishing/malware Link Detection

By Yawhatnever on Jun 10, 2013

;Some assembly required.
;More instructions are in the leading script comments.

This is a two-part script that checks links posted in IRC channels and provides a warning for suspicious links.
The first part checks the domain for strings that are matching or near-matching with phrases that you specify.
The second part checks the page against Google's phishing and malware blacklists. This lookup can be disabled via an alias near the top of the script (instructions in the comments).

/** EXAMPLE
(10:18:23pm) secure.login.fasebook.com
(10:18:23pm) <@Curiosity> [CAUTION]: {match:'facebook',domain:'fasebook.com',trusted:'unknown'}

(10:18:48pm) ianfette.org
(10:18:50pm) <@Curiosity> [WARNING]: This site may contain malware. You can read more about this warning at: http://safebrowsing.clients.google.com/safebrowsing/diagnostic?site=http%3A%2F%2Fianfette%2Eorg%2F [Advisory provided by Google's Safe Browsing Lookup API]
*/

Both parts require Mozilla's public suffix list to be saved to a text file in the save directory. To save the list, type the following into the editbox (without quotes):
" //url -an http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 | var %file $_phishingDir(mozilla_effective_tld_names.txt) | write %file | run %file | run $nofile(%file) "

The public suffix list is used to correctly parse the base domain name.

SSL is required to connect to Google's API. http://www.mirc.com/ssl.html
//echo -a $sslready

You will need to generate an API key to use Google's service.
Sign up here: https://developers.google.com/safe-browsing/key_signup
Add your key to alias 'api_client_key'.

You should whitelist trusted domains to reduce API lookups and to keep warnings from being displayed for valid matching sites. For instance, if you're checking for 'runescape' with an edit distance range of 0-2, you would want to whitelist 'runescape.com'. 'runescape.wikia.com' would also return a match, but wikia.com can be considered a trusted domain so that would be whitelisted as well. 'services-runescaepee.com' would return a match, but should obviously NOT be whitelisted.
To add or remove sites from the black/white lists, use /trust [-rbw] . (remove/black/white)

/*
* Written by Yawhatnever (Travis) irc.swiftirc.net #mSL
* Other sources are listed within the code.
* Free to use and modify as long as this comment is included with any substantial part of the script.
*/
/***********************************************
* An API key is required to use Google's service.
* Sign up here: https://developers.google.com/safe-browsing/key_signup
* Add your key to alias 'api_client_key'.
************************************************
* Requires SSL. 
 //echo -a $sslready
* http://www.mirc.com/ssl.html
************************************************
* mozilla_effective_tld_names.txt should be updated occasionally in order to remain accurate.
* Type the following:
 //url -an http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 | var %file $_phishingDir(mozilla_effective_tld_names.txt) | write %file | run %file | run $nofile(%file)

*/

;***************************** BEGIN CUSTOMIZED SETTINGS ********************************
alias -l api_client_key return $null | ;return your API key value instead of $null
alias -l check_mywords {
  /*
  * Change/add strings to search for and the edit distance allowed for each.
  * Be careful of short strings and relatively large edit distances. 
  *** Your edit distance allowed range for each word must be valid isnum syntax.
  * Use a distance of 0 for exact matching.
  * (Edit distance = minimum number of insertions, deletions, or replacements)
  * [http://en.wikipedia.org/wiki/Levenshtein_distance]
  */
  return $checkForWords($1, runescape, 0-2, imageshack, 0-2, fbcdn, 0, facebook, 0-2)
}
alias -l debugChannel {
  return $null 
  /*
  * Optionally return a channel name to output API lookups and string matches to. Useful for monitoring matches until you're confident that you've whitelisted the most common trusted domains.
  * Ops in this channel may add domains to the trusted list by typing !white example.com (read the comments in the text event below).
  */
}
alias -l logMatches return $true | ;turn logging of lookups on or off
alias -l logLookups return $true
alias -l api_client_lookup_enabled return $true | ;optionally disable usage of Google's API and only use string matching.
alias -l api_client_name return Sojourner
alias -l api_client_version return 0.01
alias _phishingDir {
  ;If you edit this, be sure the correct files are moved into the directory!
  if (!$isdir($qt($mircdirmalware and phishing\))) mkdir $qt($mircdirmalware and phishing\)
  return $qt($mircdirmalware and phishing\ $+ $noqt($1-))
}

;If you wish to set specific nicks that are allowed to run this script, de-comment the following line and edit the Mars rover names with your nicks.
;on *:text:$($iif(!$istok(Curiosity Opportunity Spirit Sojourner,$me, 32),*)):#:noop
;***************************** END CUSTOMIZED SETTINGS ********************************

on $*:text:/^[!.]((b)lack|(w)hite|(r)emove) [-a-z\d.]+$/Si:#:{
  /*
  * Syntax: !black/white/remove example.com
  * Whitelist domains (especially for sites that match phrases you check for) to prevent lookups and warnings for trusted domains.
  * Blacklisted domains will always show a caution message but will still be looked up on the Google API, which may result in an additional warning if a match is returned.
  */
  if (!$debugChannel) || ($nick !isop $debugChannel) return
  trust - $+ $regml(2) $2
  if ($parseDomain($2)) notice $nick *** $+($upper($left($regml(1), 1)), $right($regml(1), -1), :) $v1
  else notice $nick *** Invalid domain.
}
alias trust {
  /*
  * /trust [-rbw] <example.com>
  * The -r switch removes an entry.
  * The -b/w switches add a domain to the black or white list, respectively.
  * If no switch is given, the domain will be added to the white list.
  */
  loadtrusted
  if (!$regex(trustedalias, $1, /^-[bwr]$/Si)) tokenize 32 -w $1
  var %domain $parseURL($2).domain
  if (!%domain) { 
    if (!$nick) echo -asec info * Error: Invalid domain.
    return
  }
  if ($1 == -r) hdel trusted %domain
  elseif ($1 == -b) hadd trusted %domain $false
  elseif ($1 == -w) hadd trusted %domain $true
  hsave trusted $_phishingDir(trusted.hsh)
  if (!$nick) echo -asec info * $gettok(White Black Remove, $findtok(-w -b -r, $1, 32), 32) $+ : %domain
}
alias -l loadtrusted {
  if ($hget(trusted)) return
  hmake trusted
  if ($isfile($_phishingDir(trusted.hsh))) hload trusted $_phishingDir(trusted.hsh)
}
on *:start:{
  loadtrusted
  noop $parseDomain(example.com)
}
on $*:text:$($catchURLex):#:{
  var %urls $extractURL($1-)
  var %c 1
  while ($gettok(%urls, %c, 32)) {
    if (!$trusted($v1)) {
      if ($api_client_lookup_enabled) && (!$spamcheck($network, #, $gettok(%urls, %c, 32))) google_malwareapi # $gettok(%urls, %c, 32) | ;if page has not been posted to channel recently, send to lookup alias
      if ($matchSummary($gettok(%urls, %c, 32))) var %matches %matches $v1 | ;if an alarm string was matched in the domain, add to list
    }
    inc %c
  }
  if (%matches) {
    if ($debugChannel) {
      !msg $debugChannel # $+ : $1-
      !msg $debugChannel %matches
    }
    if ($logmatches) write $_phishingDir(string_phrase_matches.log) $asctime($gmt) GMT $network # $+(<, $nick, >) $1- //matches: %matches 
    !msg # [CAUTION]: %matches
  }
}
alias google_malwareapi {
  /*
  * /google_malwareapi #channel http://example.com/path/
  * Adds #channel to the list of channels waiting for a response for example.com
  * If (example.com is pending or cached) { does /send_googlewarning example.com }
  * Else { opens a socket to check example.com }
  */
  if (!$api_client_key) {
    echo -esac info * Error: Google's API requires an API key. Edit alias 'api_client_key' in the header section of the script. https://developers.google.com/safe-browsing/key_signup
    halt
  }
  hadd -m malwareapi $+($network, :chans:, $2) $addtok($hget(malwareapi, $+($network, :chans:, $2)), $1, 32)
  if ($hget(malwareapi, $2)) send_googlewarning $2
  else {
    var %ticks $ticks
    while ($dvar(checksite.,%ticks,.site)) inc %ticks
    sockopen -e $+(checksite.,%ticks) sb-ssl.google.com 443
    set -e %checksite. $+ %ticks $+ .site $2
    hadd -mu60 malwareapi $2 pending
    if ($debugChannel) msg $debugChannel api lookup $1 $+ : $2
    if ($logLookups) write $_phishingDir(google_api_lookups.log) $asctime($gmt) GMT $network $1 $2
  }
}
alias send_googlewarning {
  /*
  * /send_googlewarning example.com
  * Sends a warning to channels listed as waiting for a response from that domain with the type of risk and link to more info.
  * Sending the message individually rather than using multi-target messages is intentional. (+B should not stop a url warning)
  */
  var %url $1
  if ($hget(malwareapi, $+($network, :chans:, %url))) tokenize 32 $v1 $debugChannel
  else return
  var %result $hget(malwareapi, %url)
  if (%result == phishing) var %warning Suspected phishing page:
  elseif ($v1 == malware) var %warning This site may contain malware.
  elseif ($v1 == phishing,malware) var %warning This site may contain malware or try to steal your information.
  if (%warning) msg $* [WARNING]: %warning $iif(%result != phishing, You can read more about this warning at: http://safebrowsing.clients.google.com/safebrowsing/diagnostic?site= $+ $url_encode(%url), $replace(%url,.,(dot))) [Advisory provided by Google's Safe Browsing Lookup API]
  if ($hget(malwareapi, %url) != pending) {
    if ($logLookups) write $_phishingDir(google_api_lookups.log) %result %url
    hdel malwareapi $+($network, :chans:, %url)
  }
}
on *:SOCKOPEN:checksite.*:{
  var %site $dvar($sockname, .site)
  var %form $encodeForm(client, $api_client_name, key, $api_client_key, appver, $api_client_version, pver, 3.1, url, %site)
  ;save the form in case of error for logs
  set -e % $+ $sockname $+ .form %form
  sockwrite -nt $sockname GET /safebrowsing/api/lookup? $+ %form HTTP/1.1
  sockwrite -nt $sockname Host: sb-ssl.google.com
  sockwrite -nt $sockname user-agent: mIRC/ $+ $version
  if ($cookies) sockwrite -nt $sockname $v1
  sockwrite -nt $sockname Connection: close
  sockwrite -nt $sockname $crlf
} 
on *:SOCKREAD:checksite.*:{
  /*
  HTTP/1.1 204 No Content
  -not in database
  HTTP/1.1 200 OK
  -phishing
  -malware
  -phishing,malware
  400
  -bad request (incorrect format) (missing parameters, invalid url, improper encoding)
  401
  -api key  not authorized
  503
  -unavailable (failure or throttled)
  */
  var %r,%site $dvar($sockname, .site)
  var %form $dvar($sockname, .form)
  sockread -f %r
  if (!$sock($sockname).mark) {
    ;header
    if (%r == $null) sockmark $sockname 1
    elseif (%r == HTTP/1.1 204 No Content) hadd -mu1200 malwareapi %site OK
    elseif (%r == HTTP/1.1 200 OK) hadd malwareapi %site receiving
    elseif (%r == HTTP/1.1 403 Forbidden) logerror %site 403 Forbidden; Form: %form
    elseif (%r == HTTP/1.1 401 Not Authorized) logerror %site 401 Not Authorized; Form: %form
    elseif (%r == HTTP/1.1 400 Bad Request) logerror %site 400 Bad Request; Form: %form
    elseif (HTTP/1.1 503 * iswm %r) logerror %site $v2 $+ ; Form: %form
    cookie_check %r
  }
  else {
    ;content body
    if ($hget(malwareapi, %site) == receiving) hadd -mu1200 malwareapi %site %r
  }
}
on *:sockclose:checksite.*:{
  send_googlewarning $dvar($sockname, .site)
  unset $+(%, $sockname, .*)
}
alias -l logerror {
  write $_phishingDir(google_malware_api_errors.log) $asctime($gmt) GMT $1-
  echo -st Logged Google phishing/malware lookup API Error: $1-
}
alias -l spamcheck {
  /*
  * $spamcheck($network, #, site)
  * returns $true if site/# combo has been checked in the last 20 seconds.
  */
  var %result $hget(malwareapi, $+($1, $2, $3))
  hadd -mu20 malwareapi $+($1, $2, $3) $true
  return %result
}
alias -l matchSummary {
  /*
  * $matchSummary(example.com)
  * checks domain for specific strings
  * if a match is found, returns {match:'matchString',domain:'example.com',trusted:'[yes/no/unknown]'}
  */
  var %checkmatch = $check_mywords($parseURL($1).fulldomain), %trusted = $trusted($1), %color
  if (%trusted == $null) %color = 07unknown
  elseif (%trusted == $true) %color = 03yes
  elseif (%trusted == $false) { 
    %color = 04no
    if (!%checkmatch) %checkmatch = none
  }
  if (%checkmatch) && ($parseURL($1).domain) return $+({match:',%checkmatch,',$chr(44),domain:',$v1,',$chr(44),trusted:',$chr(3),%color,$chr(3),',$chr(125)) 
}
alias -l trusted return $hget(trusted, $parseURL($1).domain)
alias checkForWords {
  /*
  * $checkForWords(string, word1, N1[-N2], word2, N1[-N2], ...,N1[-N2], wordN)
  * Checks each group of alphanumeric characters against a list of words. Returns the first word match with a LV distance within range N1-N2
  * $checkForWords(runerscape.com, runescape, 0-1) would return runescape
  * $checkForWords(imageshaack.us, runescape, 0-1, imageshack, 0-2) would return imageshack
  * returns $null if there was no match
  */
  ;fill backreferences with all alphanumeric strings
  noop $regex(checkwords,$1,/([a-z\d]+)/Sig)
  ;loop through backreferences
  var %c 1
  while ($regml(checkwords, %c) != $null) {
    ;loop through match strings being checked
    var %word 2, %range 3
    while ($eval($+($, %word), 2) != $null) {
      ;If a match string is within its allowed edit distance from the string being checked, return the match string.
      if ($levenshtein($regml(checkwords, %c), $v1) isnum $eval($+($, %range), 2)) return $eval($+($, %word), 2)
      inc %word 2
      inc %range 2
    }
    inc %c
  }
}
alias tld_list_url return http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
alias dvar {
  /*
  * $dvar(foo., bar., %ticks)
  * returns value of %foo.bar.46516501
  */
  return $eval($+(%, $replace($1-, $chr(32), $null)), 2)
}
alias url_encode {
  ;source/license:
  ;https://github.com/david-schor/CodeArchive/blob/master/mSL/net/url_encode.mrc
  return $regsubex(urlencode,$1, /([\W\s])/Sg, $iif(\t == $chr(32), +, $+(%, $base($asc(\t), 10, 16, 2))))
}
alias encodeform {
  /*
  * $encodeform(user, name, pass, pass word)
  * returns user=name&pass=pass+word
  * leading, trailing, and consecutive spaced are trimmed...
  */
  paramToVar $*
  return $regsubex(encodeform,$left($str(@=@&, $calc($0 / 2)), -1), /@/g, $url_encode($eval($+(%, param., \n), 2)))
  unset %param*
}
alias -l paramToVar {
  inc -u %param
  set -u %param. $+ %param $1-
}
alias -l _cookieFile return cookies.ini
alias cookie_check {
  /*
  * /cookie_check <header>
  * Checks if a header response sets a cookie.
  * TODO: real cookie parsing
  */
  if (!$sockname) return
  elseif ($regex(cookie,$1-,/^Set-Cookie: ([^=]+)=([^;]+)/i)) {
    var %1 $regml(cookie,1), %2 $regml(cookie,2) | ;just to be sure $regml() getting reset later on won't ever cause an issue
    writeini $_cookieFile $parseDomain($sock($sockname).addr) %1 %2
  }
}
alias cookies {
  /*
  * $cookies
  * returns Cookie: cookie=value; cookie2=value;
  * returns $null if no cookies are saved for site
  */
  if (!$sockname) || (!$isid) return
  var %addr $parseDomain($sock($sockname).addr)
  var %c 1
  while ($readini(cookies.ini,n,%addr,$ini(cookies.ini,%addr,%c)) != $null) {
    var %cookies %cookies $ini(cookies.ini,%addr,%c) $+ = $+ $v1 $+ ;
    inc %c
  }
  if (%cookies) return Cookie: %cookies
}
alias catchURLex return /\b((?:https?://)?)([a-z\d][-a-z\d]*(?:\.[a-z\d][-a-z\d]*)+)((?:/[-a-z\d._~:/!$&()*+=,;%]*)?)((?:\?[-a-z\d._~:/!$&()*+=,;%?]*)?)((?:#[-a-z\d._~:/!$&()*+=,;%]*)?)/Sig
alias extractURL {
  /*
  * $extractURL(STRING)
  * Returns a space-delimited list of all URLs in STRING.
  */
  if (!$regex(catchurl,$1-,$catchURLex)) return
  ;URL is caught into 5 backreferences. Some can be empty. The second is the domain and it's the only non-optional one
  var %c 2
  while ($regml(catchurl, %c)) {
    ;only labels with a valid public suffix are added to the url list (aka removing matches like 'o.o')
    if ($parseDomain($regml(catchurl, %c))) {
      var %protocol $iif($regml(catchurl, $calc(%c - 1)), $lower($v1), http://)
      var %domain $lower($regml(catchurl, %c))
      var %path $iif($regml(catchurl, $calc(%c + 1)), $v1, /)
      var %query $regml(catchurl, $calc(%c + 2))
      var %fragment $regml(catchurl, $calc(%c + 3))
      var %urls $addtok(%urls,$+(%protocol, %domain, %path, %query, %fragment),32)
    }
    inc %c 5
  }
  return %urls
}
alias parseURL {
  /*
  ** $parseURL(example.com)
  ** Parses a single URL and returns the section specified by $prop.
  ** If $prop is $null, returns full URL.
  ** Properties:
  * protocol
  * fulldomain
  * domain
  * publicsuffix
  * path
  * query
  * pathquery
  * fragment
  * TODO: Port, IPv4, IPv6
  */
  var %url $extractURL($1)
  if (!$prop) || (!%url) return %url
  noop $regex(catchurl, %url, $catchURLex)
  if ($prop == protocol) return $gettok($regml(catchurl, 1), 1, 58)
  elseif ($prop == fulldomain) return $regml(catchurl, 2)
  elseif ($prop == domain) return $parseDomain($regml(catchurl, 2))
  elseif ($prop == publicsuffix) return $parseDomain($regml(catchurl, 2)).suffix
  elseif ($prop == path) return $regml(catchurl, 3)
  elseif ($prop == query) return $regml(catchurl, 4)
  elseif ($prop == pathquery) return $+($regml(catchurl, 3), $regml(catchurl, 4))
  elseif ($prop == fragment) return $regml(catchurl, 5)

}
alias parseDomain {
  /*
  * returns a single label with a public suffix appended
  * $parsedomain(example.com)
  * returns $null if .com is not a valid top level domain
  * otherwise returns 'example.com'
  * $parsedomain(foo.example.co.uk) returns 'example.co.uk'
  * $parsedomain(foo.example.co.uk).suffix returns 'co.uk'
  */  
  ;Some top level domains (like .jp) have a lot of rules and thus are expensive to parse. Caching allows $parseDomain() to be used repeatedly on a domain without repeating the expensive parsing.
  if ($hget(domaincache,0).item > 20000) hfree domaincache | ;prevent the cached domains table from becoming very large
  if (!$hget(domaincache)) hmake domaincache 10000
  if (!$hget(domaincache,$1)) hadd domaincache $1 $parseDomainInternal($1).both
  if ($prop == suffix) return $gettok($hget(domaincache, $1), 2, 32)
  else return $gettok($hget(domaincache, $1), 1, 32)
}
alias -l parseDomainInternal {
  if ($numtok($1, 46) < 2) return
  if (!$hget(tld)) filltld
  var %tld = $gettok($1, -1, 46)
  if ($hget(tld, %tld) !isnum) return | ;don't check for $null, domains could end up like '.ps4'...
  var %rules = $v1, %c = 0, %level = 1, %result
  while (%c <= %rules) {
    var %x = 1, %currentrule = $hget(tld, %tld $+ %c), %rulex = /((?<=\.|^) $+ $replace(%currentrule, ., \., !, $null, *, [-a-z\d]+) $+ )$/i
    if ($regex(suffix, $1, %rulex)) {
      if ($left(%currentrule, 1) == !) { %result = $remove(%currentrule, !) | break }
      if ($numtok(%currentrule, 46) > $numtok(%result,46)) %result = $regml(suffix, 1)
    }
    inc %c
  }      
  if (%result == $1) return | ;example: $parsedomaininternal(co.uk) returns $null
  if ($prop == suffix) return %result
  var %domain $lower($gettok($1, $calc($numtok($1, 46) - $numtok(%result, 46)) $+ -, 46))
  if ($prop == both) return %domain %result
  return %domain
}
alias -l _mozillaSuffixList return $_phishingDir(mozilla_effective_tld_names.txt)
alias -l filltld {
  ;fills tld hash table with mozilla's tld list
  if ($hget(tld)) hfree tld
  hmake tld 10000
  if (!$file($_mozillaSuffixList)) {
    .timer 1 0 tldGetDialog
    hfree tld
    halt
  }
  else filter -fkg $_mozillaSuffixList addrule ^(?!//)\S
}
alias -l tldGetDialog {
  if (%tldgetdialog) return
  else set -eu6000 %tldgetdialog $true 
  beep 1
  var %msg $!parseDomain() requires mozilla's public suffix list to be saved to $_mozillaSuffixList in order to function. $crlf $+ The list and notepad.exe should have opened automatically, but if they haven't the list is here: http://goo.gl/ht6EO - When saving from notepad you must select 'File -> Save as' and overwrite the existing mozilla_effective_tld_names.txt with UTF-8 encoding selected.
  noop $input(%msg,obv,Setup Required)
  .timer 1 1 unset %tldgetdialog
  url -an $tld_list_url
  write $_mozillaSuffixList
  run $_mozillaSuffixList
  echo -aesc info * %msg
}
alias -l addrule {
  ;adds a tld rule to the table
  var %tld = $gettok($1, -1, 46)
  if ($hget(tld,%tld) == $null) hadd tld %tld 0
  else hinc tld %tld
  hadd tld %tld $+ $hget(tld, %tld) $1
}
/*
* This is a rewritten version of codemastr's Levenshtein Distance alias. This version fixes a few errors and runs faster.
* In addition to rewriting parts, I've also included more details in the comments.
* Additional information about the function of the alias is available with the original version:
* http://www.mircscripts.org/showdoc.php?type=code&id=2127
* http://www.mircscripts.org/comments.php?cid=2127
* 
* Syntax 1: $levenshtein(string1, string2)
* Syntax 2: $levenshtein(string1, string2, insertCost, replaceCost, deleteCost)
* $editdistance() is the same function.
* Case-sensitive versions are $levenshteincs()/$editdistancecs()
*
* Modified by Yawhatnever (Travis) - irc.swiftirc.net #mSL
* Free to use in any script, just attribute the sources above :)
*/

alias editdistance return $levenshteininternal($1,$2,$3,$4,$5,$false)
alias editdistancecs return $levenshteininternal($1,$2,$3,$4,$5,$true)
alias levenshtein return $levenshteininternal($1,$2,$3,$4,$5,$false)
alias levenshteincs return $levenshteininternal($1,$2,$3,$4,$5,$true)

alias -l levenshteininternal {
  var %x = $len($1), %y = $len($2), %matrixsize.y = $calc(%y + 1)
  if ($5 isnum) var %ins_cost = $3, %rep_cost = $4, %del_cost = $5
  else var %ins_cost = 1, %rep_cost = 1, %del_cost = 1
  if (!%x) return $calc(%y * %ins_cost)
  if (!%y) return $calc(%x * %ins_cost)
  hmake lvmatrix
  set -u %matrixsize.x $calc(%x + 1)
  ;fill bottom row with insert cost
  var %i, %c = 1, %cost = %ins_cost
  while (%c < %matrixsize.x) {
    matrixset %c 0 %cost
    inc %c
    inc %cost %ins_cost
  }
  ;fill left column with delete cost
  var %c = 0, %cost = 0
  while (%c < %matrixsize.y) {
    matrixset 0 %c %cost
    inc %c
    inc %cost %del_cost
  }
  %c = 1
  while (%c <= %x) {
    %i = 1
    while (%i <= %y) {
      if ($levenshteinequal($mid($1, %c, 1), $mid($2, %i, 1), $6)) %cost = 0
      else %cost = %rep_cost
      matrixset %c %i $levenshteinmin(%c, %i, %ins_cost, %del_cost, %cost)
      inc %i
    }
    inc %c
  }
  var %return $matrixget(%x, %y)
  ;The following line is used for debug purposes.
  ;var %c %y | while (%c >= 0) { echo -sg $regsubex($left($str(@-,%matrixsize.x),-1),/@/g,$base($matrixget($calc(\n - 1),%c),10,10,2)) | dec %c }
  :error
  hfree lvmatrix
  return %return
}
alias -l levenshteinmin {
  /*
  * compare(str1, str2)
  * the value at point ($len(str1), $len(str2)) on the grid will be the levenshtein/edit distance
  * use $matrixget(x, y) to get the value at point (x, y)
  *
  *{y}
  * 2|4|3|2|1|1|
  * r|3|2|1|0|1|
  * t|2|1|0|1|2|
  * s|1|0|1|2|3|
  *  |0|1|2|3|4|
  *    |s|t|r|1|{x}
  */
  ; bottom row is insert cost, left column is delete cost
  ; $levenshteinmin(x,y,ins_cost,del_cost,rep_cost)
  var %left = $calc($matrixget($calc($1 - 1), $2) + $3)
  var %below = $calc($matrixget($1, $calc($2 - 1)) + $4)
  var %diag = $calc($matrixget($calc($1 - 1), $calc($2 - 1)) + $5)
  return $gettok($sorttok(%left %below %diag, 32, n), 1, 32)
}
alias -l matrixset hadd lvmatrix $calc(%matrixsize.x * $2 + $1) $3
alias -l matrixget {
  /*
  * $matrixget(x coord, y coord)
  * returns the value stored at point (x, y)
  * bottom left is (0, 0)
  * |6|7|8|
  * |3|4|5|
  * |0|1|2|
  * e.g. value of point(2, 2) is stored in the hash table with key "8"
  */
  return $hget(lvmatrix,  $calc(%matrixsize.x * $2 + $1))
}
alias -l levenshteinequal {
  ;character 1, character 2, $true = case sensitive/$false = insensitive
  if ($1 != $2) return $false
  elseif ($1 === $2) return $true
  elseif (!$3) return $true
}

Comments

Sign in to comment.
Yawhatnever   -  Nov 16, 2016

Updated to use v3.1 of google's api. New keys being generated weren't working with v3.0.

v3.x will only work until early 2017, so be aware of that. If it breaks send me a message, because I probably won't update it unless I know someone is actually using it.

 Respond  
Exuviax   -  Jan 26, 2014

I really like this script, but I am very new to Coding. A guppy. I learn fast and added the specified parameters to your script. But it isn't working, I logged into my test account and posted a porn link in chat and it wasn't removed. In fact, the bot didn't even respond.

Yawhatnever  -  Jan 26, 2014

Did you blacklist the domain of the porn site? I didn't really write it with filtering out porn or other "inappropriate" links in mind, but it can function as a blacklist if that's what you want.

Also, unless you've added some commands to ban a user the most it will do is post a warning. With the possibility of false positives or incorrect usage of black/white lists, I felt it was best to leave out banning and let moderators decide how to handle the warnings. You can change the behavior to kick/ban relatively easily if that's what you would prefer.

Exuviax  -  Jan 26, 2014

I was hoping it would just auto grab all inappropriate websites by using the Google API. Although, I guess I could add it to the Blacklist although that process seems lengthy. I have a mock up script I use right now for banning all links, and I was hoping I could use this one instead, but as I have read through it I realize the lack of Permit commands and what not. Maybe I will look into blending of the two scripts. Anyway, I am looking to create a full Chat Bot script that looks much like Nightbot or Moobot and post it for free online, although I am new to mIRC I am not all that new to Scripting, would you be interested in this project?

Yawhatnever  -  Jan 27, 2014

Google's API only checks against their phishing and malware blacklists. If you don't allow links that have been deemed inappropriate and you don't want to make the moderators do any work, then a whitelist will always work better than a blacklist.

I wondered if you were planning to use it for twitch when you referenced messages being removed. That's not something normally included with IRC clients.

This script is probably not what you're looking for. It would take a fair amount of work to modify it to allow each channel to control what it does when links are posted. It also only handles "clickable" links, i.e. links that have not been modified to avoid a filter (since the main purpose was to deal with phishing links being posted and people clicking without realizing it was a fake website).

As much as I hate to admit it, mIRC simply can't operate at the scale of Nightbot/moobot. If you want to offer some of the same features with your own twist for a few channels (or make the scripts available so people can run their own version for their channel) that's one thing, but it would be impossible to run a bot for even a few hundred channels using mIRC.

Exuviax  -  Jan 27, 2014

I understand, I am only looking for a personal bot. I don't want my Exubot to replace Nightbot. I just want to create a script for a personal ChatBot for twitch and then opensource it for any twitch power users to use.

Sign in to comment

Ryahn   -  Aug 25, 2013

I am still a little new to mirc. I am unable to add my api key to the alias that it keeps asking me to.

Yawhatnever  -  Aug 25, 2013
alias -l api_client_key return 123

where '123' is your API key.

You need to edit the alias in the script. Is that what you tried?

Yawhatnever  -  Aug 28, 2013

Are you sure the script is only loaded once?

Ryahn  -  Aug 28, 2013

Pretty sure. I deleted it last time and added it back. I will have to go back through to check

EDIT: Well not its not throwing out any errors. Is there suppose to be anything that comes up when you white or black list a URL? If so, I am not getting anything to come up

Yawhatnever  -  Aug 29, 2013

To use the !white/black/remove commands from another nick, you must specify a debug channel (alias debugChannel) and the nick must be op in that channel.

Sign in to comment

ProIcons   -  Jun 16, 2013

Your idea is good. Your code seems well managed and well commented. Haven't Tried it yet, but it seems like a neat snippet.
Good Job.

Yawhatnever  -  Jun 16, 2013

Thanks.

Sign in to comment

Are you sure you want to unfollow this person?
Are you sure you want to delete this?
Click "Unsubscribe" to stop receiving notices pertaining to this post.
Click "Subscribe" to resume notices pertaining to this post.