|
Josh - 2016-07-20 14:33:45
Typically, the fastest method to look up an IP is
SELECT * from DATABASE.table where ip_end >= INET_ATON('$ip') ORDER BY ip_end LIMIT 1
Where the DB has ip_start and ip_end in long form with a sorted DB and ip_end indexed. It's very fast and 1 line of code.
how does your method compare?
Chi H. - 2016-07-23 17:12:38 - In reply to message 1 from Josh
Hi, my code uses binary search like the original code. But the original code was badly implemented. I was the winner with %40 improvement over the original query thus the original implementation had a lot of overhead. It uses a lot redudancy code especially in time critical loops. I also used a pre-calculation over the whole database file so it's even faster.
Josh - 2016-07-23 17:27:01 - In reply to message 2 from Chi H.
Why not COMPARE it to the method I gave you and see the performance difference? The db should be sorted by address block so it shouldn't take long.
Nobody uses the code provided by these DB vendors. There are several methods out there. It might take a bit of work if you have to rebuild the DB to the proper format, but nothing that can't be done overnight. You only have to do it once.
I'd also be interested to see how it compares to the SPATIAL method, which required rebuilding the DB
Chi H. - 2016-07-23 17:59:23 - In reply to message 3 from Josh
I was also interested in the spatial method. I suppose you can use MySQL on the console? I can try it! The test.php file contains 100000 up addresses. My winning class needs 10-40sec for 100000 lookups it also depends on the cpu. I highly doubt MySQL is faster!
Josh - 2016-07-23 19:28:08 - In reply to message 4 from Chi H.
I just wrote a little script to try. I have a full ipv4 DB, 6.9M records. Sorted by ip_start in long decimal format. It's not from ip2Location.com, but should be a similar structure. It has ip_start,ip_end in ip2long() format to specify the block.
I have an E3-1276v3 cpu 3.6Ghz
Using Consecutive IPs 100K lookups returning the entire record:
5.7s
Using random IPs
95 seconds
BUT, the next time I ran it:
17 seconds
And each time I ran it again (with random addresses), it took less time. from 17 seconds down to just under 6 seconds. So it looks like 5.7 seconds is fully cached. The full database is 670M bytes, so it's not implausible that the entire thing could be cached if you have enough memory.
$db = new mysql_db();
$start = microtime(true);
for ($i=0;$i<100000;$i++){
$ip = long2ip(rand(0,3758096383));
$res = $db->get_row("SELECT * from ipLocationDB where ip_end >= INET_ATON('$ip') ORDER BY ip_end LIMIT 1");
}
$stop=microtime(true);
$duration = $stop - $start;
echo "Duration: $duration seconds\n";
Chi H. - 2016-07-23 20:10:35 - In reply to message 5 from Josh
So, you need to cache it. I didn't tried it. Thanks for shareing!
Chi H. - 2016-07-23 20:18:52 - In reply to message 6 from Chi H.
Btw. the test.php script also compares the output with the original so there is a little overhead!
Josh - 2016-07-23 21:14:14 - In reply to message 7 from Chi H.
Yes, I take my tests out once I'm sure it works. The random generator has some overhead as well. but it's just a fraction of a second all tolled.
A full DB is nearly 7M records; you should generate a dummy to do real life testing. Anyone with a Full DB will have a lot more records and things can be a lot different with millions of records.
I'm not sure if it's the mysql cache or the filesystem; it's probably a bit of both.
I tested this method against a spatial DB and it was about the same. The spatial stuff is a kind of murky magic and I prefer to use something more straightforward.
Chi H. - 2016-07-23 22:10:24 - In reply to message 8 from Josh
Did you tried my package on your PC? IMO it should be faster. My PC is a G3258 and a VMware! BTW. My spatial algorithm is the same speed. A z curve is quite easy to understand!
Josh - 2016-07-23 22:34:33 - In reply to message 9 from Chi H.
No, I don't have time to mess with it. Too many moving parts, and we've spent time developing our own class for a couple of applications so it's not something we're going to swap out unless we need to.
To convince people you should compare it to what most of the world is using, and prove that it scales to a full size database. If it doesn't scale then it's not usable with a real DB.
Realize on a live system where you're doing live lookups, most of the addresses are in the same blocks and there are a lot of repeat address rather than unique ones. So you have to optimize to not look up the same address 1000s of times. And if you're just doing lookups, like on a website getting info for an address, the lookup speed isn't that important. You don't want it to be 5 seconds, but the difference between .005 seconds and .02 seconds isn't noticeable.
|