I am currently working on a project for which I need to identify WordPress blogs as fast as possible, given a list of URLs. I decided to write a review on this topic since I found relevant but sparse hints on how to do it.
First of all, let’s say that guessing if a website uses WordPress by analysing HTML code is straightforward if nothing was been done to hide it, which is almost always the case. As WordPress is one of the most popular content management systems, downloading every page and performing a check afterward is an option that should not be too costly if the amount of web pages to analyze is small. However, downloading even a reasonable number of web pages may take a lot of time, that is why other techniques have to be found to address this issue.
The way I chose to do it is twofold, the first filter is URL-based whereas the final selection uses HTTP HEAD requests.
There are webmasters who create a subfolder named “wordpress” which can be seen clearly in the URL, providing a kind of K.O. victory. If the URLs points to a non-text ...more ...