North American Network Operators Group|
Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical
On Wed, Jul 12, 2006 at 06:24:08PM -0400, Jim Popovitch <jimpop@xxxxxxxxx> wrote a message of 32 lines which said: > The strangeness is that some of their crawling is looking for URLs > with multiple exclamation points, those URLs never existed. This may > be indicative of a character translation on my system or theirs. >From my experience (and I talked with people - or at least intelligent bots - at Gigablast), their HTML parser is seriously broken and it generates non-existing URL quite often. For instance <a href="http://www.example.fr/Cafe%20au%20lait"> will make their crawler ask for "/Cafe". I reported the problem months ago but I got nothing except standard "Thanks for telling us".