jaytea commented on a Page, Bad Word Detection Engine (Using Socket)  -  Oct 17, 2011

@ ( |\b), doesn't \b match a space too?

\b doesn't match any characters, it matches positions. the space is useless there only if whatever precedes it is a word character. 'a(?= |\b)' is functionally equivalent to 'a\b' but the same cannot be said for ';(?= |\b)' and ';\b'.

But there are times you want to be creative, and creativity and innovation go hand in hand, Dean.

there's a difference between creativity that results in something new and better, and creativity that produces code that can mislead others and only serves to make the code worse overall. this seems a prime example of what i always say: too much time is spent on the trivial 0.001% and not nearly enough time spent on the function and logic of the overall idea. for instance: a script that pulls a data file from a web server EVERY TIME a message is received, especially when that data file is likely to change far less frequently than that, and given that having an out of date data file is not at all detrimental to the functioning of the script in this case, is simply a bad idea. it is slow as alabama pointed out (possibly too slow to function reliably in certain cases), but not just that, it is overkill to the point where it just plain doesn't make sense.

The only advantage of using a socket is, you don't have to store the file in your hard drive, saving some space.

this isn't a real advantage because this doesn't concern anyone. modern hard disks (say, 300GB) are capable of storing over 100,000,000 files of that size :P

$regex(%1-,/(^| )\Q $+ $bvar(&bw,1-).text $+ \E\b/)

why \b on one side and not the other? does that make sense given the purpose of the script and the nature of the data? did you look at the whole data file? some of the words in the list include wildcards making them incompatible with your current method.

 Respond  
Are you sure you want to unfollow this person?
Are you sure you want to delete this?
Click "Unsubscribe" to stop receiving notices pertaining to this post.
Click "Subscribe" to resume notices pertaining to this post.