Slumming Through My Web Site Traffic Logs, Part 1
I keep track of my web site traffic numbers. Partly I'm just curious about what I can find out about visitors. I'm also interested in what people do when they get here. I've adjusted my page content and navigation paths accordingly. For example, I recently re-organized my home page to provide better information to first time visitors and to reflect the trend to micro-blogging.
Usually I don't look at the "raw" statistics. My service vendor Squarespace tracks unique visitors by type as a standard service, and I also use Google Analytics to track unique pageviews and internal navigation. But Squarespace does provide a "raw" traffic log that I check from time to time.
It's my impression that the overall ratio of incoming "junk links" from fake blogs and fake ecommerce sites (e.g., "Viagra for Women" is a hot one right now) is increasing. I don't have any trend data to report but here's what I found this morning when I classified the most recent incoming links as of 6:30 AM (note that I don't recommend you undertake this type of analysis without active spyware detection and anti virus software):
I classify as "good" links those that come from other web sites and blogs and from email servers such as Gmail and Yahoo. "Bad" links I've classified as follows:
Blogspot and Blogger are the most egregious sources of junk links. These sources appear to be one-page automatically generated "blogs" that probably (I hope) get shut down almost as quickly as they pop up. Some display mixes of Roman and Cyrillic characters. What they hope to gain from linking to my blog I don't know. Possibly what I am seeing is a remnant of attempts to add fake trackbacks to the comment fields of my blog posts; fortunately Squarespace is very good at controlling comment spam and I do frequent checking myself, so the volume of comments spam I get is very low and I have not had to add any sort of hard to read verification techniques to my comment fields.
Another impression I have is that the number and proportion of incoming links I am getting from "closed" sources is increasing. By "closed" I mean sources that require a password to enter; examples are private bookmarking systems and university and college based teaching systems. I may also include an increasing number of incoming links from Facebook sourced links there as well since Facebook generally makes it easy to link out from an entry. I'll address this in a future entry in this series.
Note to statisticians: I am aware that concentrating an analysis on the "most recent 50 incoming links" is not a true random sample since I do not know how representative these 50 are or what kind of periodicity exists. I would be very interested in hearing from folks who have performed more rigorous analysis along these lines and whether you have found anything similar -- or different.