Tag Archives: blog

Web analytics used here at the eclectic blog

Thought you might be interested in the the web analytics used on this blog; in total there are five pieces of technology collecting data and then used for performing web analysis here. They are:

  1. SiteCatalyst / Omniture – http://www.omniture.com/ – Sun standard, embedded in blogs.sun.com (and monitors all Sun websites), produces the page hits total
  2. SiteMeter – http://www.sitemeter.com/ – you can access my results yourself by simply clicking on the SiteMeter logo on this page and here’s the link: http://www.sitemeter.com/stats.asp?site=s38horkan
  3. StatCounter http://www.statcounter.com/
  4. Google Analytics http://www.google.com/analytics/
  5. ClustrMaps – http://www2.clustrmaps.com/ – simple location counter displayed as a informative graphic here’s the link to my hit counter: http://www2.clustrmaps.com/counter/maps.php?url=http://blogs.sun.com/eclectic/

Why use more than one? Frankly web analytics is more than a shaky area, none of them ever seem to catch all hits just as I’d like, nor measure them in a similar fashion, so I use differing web analytic software to ‘triangulate’ the best view possible (for instance one will count some spiders traffic as hits, whilst another won’t, frankly I want to know the difference between humans and the web crawlers, etc.). Furthermore some have functionality which the others don’t and some produce quick to see ‘snapshots’ whilst others produce detailed ‘drill-downs’.

For instance Sun’s web analytics is the same as the corporate one, so it’s enterprise grade and highly flexible, sadly this means it’s extremely large scale and quite hard to manipulate because the amount of configuration you have to do is just horrendous (but it can give you the most detail).

So SiteCatalyst / Omniture is too much hassle to produce quick updates and ClustrMaps is really eye candy for users, therefore I only really use SiteMeter for quick updates without logging in, and StatCounter and Google Analytics for more detailed, but quickly available, reports on what constitutes readers favourite articles and pages.

For 2 to 5 above you’ll need to sign up for online accounts and add the tracking code yourself, this isn’t too hard, it just takes a little time.

For 1 it’s already there on all the Sun websites and blogs, however you need to request access to the corporate Omniture / SiteCatalyst web analytics system to get access if you are a Sun blogger, then you have to learn how to use it, then you need to use something else as well (see problems I describe above, because you might prefer a quick info ‘fix’).

Most of all this is about personal preference, and what works for you; for about two years after starting blogging I was a data demon, wanting to understand and interpret the stats, and now, well I’m a little more relaxed.

Make Google notice your Blog

Posted this in response to an internal email titled “What makes Google notice a blog?”, it’s pretty universal and applicable to most search engines so I thought I’d share it as a blog article.

Here’s a few suggestions, hope they help. Wayne.

1) You may want to write your blog in a manner which is spider, as well as human, friendly.

Include meta-data and micro-format information, such as tags; don’t forget that key words in headers will increase the ‘value’ of that key word (for many search engines); always make sure that “SCRIPT” HTML segments are followed by “NOSCRIPT” segments (most spiders don’t “do” JavaScript, specifically Google’s; lean web page code that is easy for spiders to ‘consume’.

Re: Tags / Meta-Data / Micro-Formats – I use the Operator plug in / add on for Firefox, this informs the user about semantic data embedded in a viewed web page.

Re: Keyword Analysis – I use the SeoQuake plug in / add on for Firefox, which allows me to do dynamic keyword (and related key word) analysis.

Here’s an article I wrote on Tag format standardization, I recommend that you standardize on a Tag format that is Search engine friendly: ‘Tic, Tag, Toe‘. Don’t over tag nor under tag, but try and match your articles tags with other similar articles, try and join in with the subject matter’s folksonomy if at all possible (i.e. the tags people are using when talking about that subject matter, technorati and delicious are both good examples).

As well as embedding all the tags for all of the articles on the front page (have a look in Operator if you decide to use it or another semantic data ‘explorer’) I also embed tags to major blog directories and social bookmarking sites on the individual page for each entry, here’s an article which demonstrates this: ‘Roller Weblogger blog post tag link code for blogs.sun.com, technorati and del.icio.us‘. I’ve superseded this code now, with a nicer layout and having added more blog directories / social bookmarking sites, you can see the example at the end of the page for any given blog article I’ve written, give me a shout if you’d like the newer code.

2) Google’s PageRank algorithms work on links, inbound, outbound, number, and the PageRank of those inbound and outbound links.

Link to sources, get inbound links from sources / reciprocal links if possible.

Don’t forget to trackback articles that you reference, if the trackback fails try leaving a comment with a link to the article that references it.

3) Make sure you let sites such as Google know you’ve updated your site and that you’d like it re-“spider”ed, indexed and advertised.

This is done by “blog pinging” search engines and blog directories so that they are informed that your site has been updated and to send over there spiders when they get chance (most search engines / blog directories want to do this quite quickly as they want to be first with any potentially newsworthy content that draws traffic).

Personally I wanted a more granular level of control over this than offered with the standard blog ping functionality embedded in roller and so I wrote my own stand alone version: ‘Free XML-RPC blog ping site submitter: “Blog Ping”‘.

4) Other things to consider…

PageRank of your site and individual pages; how well does your article compete with articles of a similar nature.

Have pages been bookmarked in del.ici.ous, technorati, etc., i.e. are they being shared.

P.S. This article doesn’t mention quality of written articles, cadence of posts, timeliness of posts to current events, etc., as it focuses purely on the current electronic mechanisms for getting noticed by a search engine like Google and not the related, but extremely important, human and social element that gains you readership.

links for 2008-11-21

The Commoditization of Massive Data Analysis – O’Reilly Radar Excellent piece by Joseph Hellerstein, comparing the traditional, enterprise favoured, database (SQL based, Relational Databases) and the online favoured datastore (MapReduce and Hadoop). ….. 1 Trackback

links for 2008-09-13

The PutPlace Blog » Blog Archive » When Bad Things Happen to Good Computers A variety of “photos of melted, damaged and destroyed hardware”; unsurprisingly the punchlines in the last one… …..

Links for this article:

You really know you’re using software heavily…

…when you’re raising bugs against it.

So my first Roller Weblogger bug, ROL-1667 or, rather, “Date URLs incorrectly use updateTime to sort entries“.

Basically the get entries pager is selecting entries based upon ‘Updated Date’ and not ‘Published Date’, so accessing entries via entry date, which you’d assume would use ‘Published Date’ actually displays them based upon ‘Updated Date’.

This effects all date based blog entry selections, so access via date string based URLs or via the Calendar (either large or small variant, whose selections resolve to date based URLs) are all effected too.

Thankfully Dave raised it for me on the roller bug traq site, although I’ve since created my own account too.

Given the Open Source paradigm, I’ve decided to try and contribute directly and fix it myself, if no one gets to it before me that is.

Dave was kind enough to give me the following advice re: contributing to Roller:

I usually point potential contributors to this: http://cwiki.apache.org/confluence/x/2hsB

You can also contribute by telling us where our wiki and docs need improvement.

– Dave

During our email exchange about the bug I also asked Dave about overriding existing macros, especially the macro code for things like get weblog entries (the paging macro getWeblogEntriesPager) and the large calendar (or hCalendarTableBig as it’s also known).

He gave me the following advice:

Two places to look for additional info on macro coding:

1) Template Author Guide (get it here: http://roller.apache.org/download.cgi)
Lists all models, macros and shows HTML generated by each.

2) weblog.vm (http://tinyurl.com/yuwfvu)
Source code for all of the Roller macros.

– Dave

I found this bug whilst doing some template enhancements, around differing content per category, which once this bug is fixed I hope to implement. It showed up because of the tag policy I had implemented, and subsequently had a large number of blog entries which had been updated.

Tic, Tag, Toe

Or rather “tagging, tags, and blog tag policy” or even “what’s the best / most optimal tag nomenclature / syntax”. After redesigning the blog interface I decided to start to rationalise my tags – and to institute a ‘tag policy’.

Tag Policy

  1. Use “-” to delimit multi-word tags
  2. Use all lower case characters

But “Why ?”

For a long time I had been using the “+” symbol to link multi-word tags, but I found that Google Translate (which I use for the language translation capability, up on the top right of the page if you’re reading the blog at http://blogs.sun.com/eclectic/) was having problems processing URL’s which contain “+” or “%2B”.

Here’s a little table I whipped up documenting the issues I was coming up against using multi-word tags, after trying out a number of delimiters, not just “+”, against a variety of technology.

Delimiters tested were: “+”, “%2B”, “_”, ” “, “%20” and “-“. Sites / technology tested were: Roller Blogger (4.0-dev, the version we currently run http://blogs.sun.com on), Google Translate, Google Search, Technorati, Del.icio.us and Slynker.

“+” (plus sign) “%2B” (encoded plus sign) “_” (underscore character)
Roller Weblogger 4.0-dev Will save and retrieve posts which use tags with “+” in the editor
Will not resolve tags URL which use “+” (actually the main site will, but individual blogs can’t)
Will save and retrieve posts which use tags with “%2B” in the editor
Will resolve tags URL which use “%2B”
Will save and retrieve posts which use tags with “_” in the editor
Will resolve tags URL which use “_”
Google Search Will search and retrieve multi-word tags as they are written, i.e. with the “+”, search produces a small number of results because of the infrequency of using “+” to separate written words Will search and retrieve multi-word tags as they are written, i.e. with the “%2B”, search produces a small number of results because of the infrequency of using “%2B” to separate written words Will search and retrieve multi-word tags as they are written, i.e. with the “_”, search produces a small number of results because of the infrequency of using “_” to separate written words
Google Translate Attempts to resolve tags URL which use “+”, encoding the URL to use “%2B” instead (which Roller can serve, see above), then promptly fails Fails to resolve the correct URL to translate using “%2B” Resolves tags URL which use “_” and continues to translate them successfully
Technorati Resolves tag URLs which use “+” correctly
Replaces the “+” with ” ” and produces good results based upon that
Resolves tag URLs which use “%2B” correctly
Replaces “%2B” with ” ” and produces good results based upon that
Resolves tag URLs which use “_” correctly
Produces smaller, but not unreasonable, results, due of the infrequency of using “_” to separate written words
Del.iciou.ois Resolves tag URLs which use “+” correctly
Produces results based upon using “+”
Resolves tag URLs which use “%2B” correctly
Replaces “%2B” with “+” and produces results based upon using “+”
Resolves tag URLs which use “+” correctly
Produces results based upon using “+”
Slynker Fails to resolve “+”
Produces no results
Attempts to resolve tags URL which use “%2B”, encoding the URL to use “%252B” instead
Produces results based upon using “+”
Resolves tag URLs which use “_” correctly
Produces results based upon using “_”
” ” (space) “%20” (encoded space) “-” (minus sign)
Roller Weblogger 4.0-dev Will save posts which use tags with ” ” in the editor
Will not retrieve posts which use tags with ” ” in the editor, instead it separates the words, retrieving them all in alphabetical order
Will resolve tags URL which use ” “, encoding the URL to use “%20” instead
Will save and retrieve posts which use tags with “%20” in the editor
Will resolve tags URL which use “%20”
Will save and retrieve posts which use tags with “-” in the editor
Will resolve tags URL which use “-“
Google Search Will search and retrieve multi-word tags as they are written, i.e. with the ” “, search produces a large number of results Will search and retrieve multi-word tags as they are written, i.e. with the “%20”, search produces a small number of results because of the infrequency of using “%20” to separate written words Will search and retrieve multi-word tags as they are written, i.e. with the “-“, and will replace the “-” with ” ” as well, thus retrieving the most amount of related information
Google Translate Attempts to resolve tags URL which use ” “, encoding the URL to use “%20” instead (which Roller can serve, see above), then promptly fails Fails to resolve the correct URL to translate using “%20” Resolves tags URL which use “-” and continues to translate them successfully
Technorati Resolves tag URLs which use ” ” correctly, after re-encoding the URL with “%20”
Produces good results based upon using ” “
Resolves tag URLs which use “%20” correctly, replaces the “%20″ with ” ” and produces good results based upon that Resolves tag URLs which use “-” correctly
Produces smaller, but not unreasonable, results, due of the infrequency of using “-” to separate written words
Del.iciou.ois Resolves tag URLs which use ” ” correctly, after re-encoding the URL with “%20”
Produces results based upon using ” “
Resolves tag URLs which use “%20” correctly
Replaces “%20″ with ” ” and produces results based upon using ” “
Resolves tag URLs which use “-” correctly
Produces results based upon using “-“
Slynker Attempts to resolve tags URL which use ” “, encoding the URL to use “%20” instead
Produces results based upon using ” “
Resolves tag URLs which use “%20” correctly
Replaces “%20″ with ” ” and produces results based upon using ” “
Resolves tag URLs which use “_” correctly
Produces results based upon using “_”

As you’ve probably surmised by now the issue is actually about the convergence of two technologies, and the incompatibilities they currently have. Principally that of tagging blog posts (and other stuff too) and that of URL encoding. It is not due to the limitations differing web1.0 and web2.0 platforms have around tag syntax, specifically multi-word tags, but of the correct adherence of these platforms in there support of RFC 1738: Uniform Resource Locators (URL) specification.

The problem is that tagging generally uses a relatively free form syntax (driven mainly by the communities which use and propagate said tag nomenclature, or “Folksonomy”), when and where possible, but that URL encoding has a variety of reserved characters, which conflict with the characters used in tags.

Characters for special use in defining URL syntax include the following “Reserved Characters”, and should be encoded where possible (although as the data in the tables above prove even the encoded URLs fail to produce the expected, or required, results).

Character Hex Dec
 “$” (the dollar sign)
“&” (ampersand symbol)
“+” (plus sign)
“,” (comma symbol)
“/” (forward slash)
24
26
2B
2C
2F
36
38
43
44
47
Character Hex Dec
 “:” (the colon)
“;” (the semi-colon)
“=” (equal sign)
“?” (the question mark)
“@” (the ‘at’ symbol)
3A
3B
3D
3F
40
58
59
61
63
64

Given that the above are “Reserved Characters” when it comes to URL encoding, and that they include some of the most popular delimiters used by multi-word tags (specifically “+” which is used a great deal, especially on Technorati). And, as I have found in the investigation above, have a number of issues in being used both in multi-word tags and in URL encoding, I have decided to standardise on “-” as the multi-word tag delimiter of choice.

For me it has a number of advantages:

  1. saved and retrieved correctly in tags in the Roller edit post page
  2. the URL is encoded correctly in Roller too
  3. it resolves correctly whilst using Google Translate
  4. it returns all search results for both “-” and ” ” in Google Search – an unexpected bonus, in terms of returning search results (and thus being included in said search results)
  5. it returns reasonable results from Technorati, based upon “-“
  6. it returns reasonable results from Del.icio.us, based upon “-“
  7. it returns reasonable results from Slynker, based upon “-“

As to the issue of upper versus lower case, I have standardised on all lower case, as this has little effect in searches (outside of Technorati, which returns slightly differing results, albeit with a low delta between the results returned).

You may be able to see that I have started to retroactively replace the tags so far created with this new standard – however I have focused on the most popular tags for the time being, and I will continue to use this format from now on.

I found this article on “URL Encoding (or: ‘What are those “%20″ codes in URLs?’)” provided a nice overview of the issues of URL encoding, and of RFC 1738 itself.