Bloom Filters
I recently implemented a Bloom Filter for some internal file processing. A bloom filter is an interesting probabalistic data structure modeled around a giant bit array - you take your data set and hash them in order to activate different bits. You can then test your different input values against the filter - if you get back a negative, you know your input value isn't in the filter. If you get a positive, it means it might be in the filter. The error chances of getting a positive for a value that isn't in the filter increases based on there being more items in your data set - but this can be offset by using more hashing functions or starting with a larger bit array.
It's proven very useful in the context I've been using it anyway!
|
Bloom Filters by Example
http://billmill.org/bloomfilter-tutori…
|