Category Archives: blog

Free XML-RPC blog ping site submitter: “Blog Ping”

Here’s a free, as in beer, Blog Directory and Search Engine Site Submitter I wrote which works by sending out an XML-RPC Blog Ping.

“In blogging, ping is an XML-RPC-based push mechanism by which a weblog notifies a server that its content has been updated. An XML-RPC signal is sent to one or more “ping servers,” which can then generate a list of blogs that have new material. Many blog authoring tools automatically ping one or more servers each time the blogger creates a new post or updates an old one.”

– according to the Wikipedia article: ‘Ping (blogging)‘.

One of the things I have noticed most across the blogosphere and the wider ‘net during my time blogging, and it’s been just less than a year, is the obsession people have with “SEO”, or ‘Search Engine Optimization‘. There are a huge number of blogs dedicated to the subject and who simply go on about the whole SEO thing alone.

Frankly I feel that if I’m going to be putting effort into blogging then I might as well make it easy for people to find me and I’m sure plenty of other people feel the same way. If using technologies like blog pinging and other techniques like SEO is the norm, then it’s almost as though you are forced into doing the same to compete for readership; a technical ‘arms race’ in the competition for the attention of your readers.

So I wrote a program to help me notify the largest number of directories and search engines possible, in a simple and convenient way, and you’re welcome to use it as well.

You can download the Blog Ping application here: Blog Ping v1.0 (BlogPing.Jar)

You may find that if you click on the link to the file above, and you have Java installed, that it will attempt to run the application from where it is, so it’s probably best to do a ‘Save as…’ and save a local copy.

Please don’t deep link directly to the file above, instead link to the page you are currently on at: https://horkan.com/2008/04/22/blog-ping-search-submitter-seo

Here’s what the application looks like in action:

Blog Ping Application FAQ

Basically what does this software do ?

It notifies a variety of Blog Directories and Search Engines that your blog has been updated recently, which is often followed by those Blog Directories and Search Engines checking your site for new content, using a technique called ‘Spidering‘. Once verified by these ‘Web Crawlers’ they list your new content, postings, etc., in their directories and search results.

The list of Blogs is configurable, as is the the list of Blog Directories and Search Engines Servers (with a maximum of one thousand), and the application has a default list of those Servers, which includes some of the most popular that I have tested the application with successfully.

Will it improve the quality of my site ?

C’mon, now you’re just being silly, of course it won’t.

Will it improve the quality of conversation about my site ?

Again ‘No’, only you can do that by engaging your audience.

Will it get you listed on a large number of Search Engines and Blog Directories ?

Yes.

Will it generate page hits from people potentially coming to read your site ?

Yes, but very much dependent on the volume, quality, and cadence, of your blog posts. I’ve tested it against sites with small volumes of blog posts and poor cadence, and I have found that despite being listed in those blog directories, etc., it does not generate page hits.

Fundamentally you need three things, but you need these anyway to create a blog which gets regular readership, and that is:

  • A good volume of posts
  • Good quality of posts with interesting and engaging content
  • Regular postings, ‘Cadence’

Will it connect you with people who are genuinely interested in your topic matter ?

Maybe, maybe not, you’ll just have to see. It’s very dependent on the answer to the question above…

When should I use this software and how often should I use it ?

I recommend that you use it when you’ve initially set up a site to make as many Blog Directories and Search Engines aware of your new site as possible. You should probably be aware that a number of them require you to create an account with them, however the default list included in the application aren’t any of these.

After that I recommend that you only use it after posting a new blog entry and definitely not more than once a day (even with new posts and content).

What will this software do on my system ?

It will load up, along with the libraries it is bundled with. It uses the vanilla java swing libraries, as well as the apache xml-rpc project libraries.

Once loaded up it’ll do nothing until you either:

  • add, modify or delete a blog which you want to notify blog directories about (it’ll save your blogs in a file called “blog.txt” in the same directory as the application is run out of)
  • add, modify, or delete a blog directory / search engine to notify via an xml-rpc pingback (it’ll save your blogs in a file called “ping.txt” in the same directory as the application is run out of)
  • start a blog ping session, where it will cycle through all the blogs you’ve added, and through all the blog directories you’ve added, and send each one an xml-rpc pingback call (it’ll connect to the Internet via the Apache xml-rpc libraries, so you may need to let Java or BlogPing.jar have access through any locally configured firewalls)
  • have a look at the about page, which will load info from a special page from this blog, where I’ll post help and any news or updates about the application

What do I need to get it to work ?

An installed copy of the the Sun Java Runtime Environment (JRE). I set the software to be compatible with JRE version 1.5 and above, although I’ve only tested it from JRE 1.6 onwards.

Whilst writing it I used the Java Development Kit (JDK) version 1.6.0_0.5 (or 1.6.0 update 5, as it’s also known, the latest current version) so the JRE which matches this will definitely work.

You can get the Sun JRE here: http://java.com/en/download/index.jsp

Will it work on my system ? I use Windows, Linux, Solaris, etc.

Yes, it very much should, and because of Java’s platform independence, meaning programs written in the Java language have to run similarly on any supported hardware / operating-system platform via the Java Virtual Machine (Java VM or JVM). It should work on any system for which there is an available JRE.

For a full list of Operating Systems, System Configurations, and platforms, supported with a JRE (version 1.6) from Sun Microsystems, please see this page: http://java.sun.com/javase/6/webnotes/install/system-configurations.html

How do I run this software ?

Easy peasy, once the JRE is installed, two choices, command line or desktop environment.

If you are using the command line the following command should run it:

java -jar BlogPing.jar

Otherwise you should be able to simply ‘click’ on it from your desktop environment for it to start up.

For this to work files of the type “jar” (a ‘Java ARchive’) need to be registered as being associated with Java (notably the Java executable). You may find that the application doesn’t start in this instance and a common cause is that another application, most frequently compression and de-compression software like Rar or Zip (or there windowed versions, like WinRAR or WinZip), have already made this association and thus will be started up instead of Java.

How do I use this software ?

I’ve used screen grabs to show how to use the program, you can resize these images using the “Body Image Size” function over in the top of the right hand side bar (options are “Small”, “Medium”, and “Large”, and they should be set on “Medium” when you first come to the page).

When the program loads this is the first screen, and as it is such a simple program there is not much to it.

There are four menu items of note: “Exit” (under “File”), “About” (under “Help”), “Add Blogs” and “Add Pings” (both under “Menu”).

This screen shows the two menu item which you need to use to get the software to ping the servers you want to notify. You need to let the program know which blogs to tell people about and which directories and search engines to notify.

If you don’t have any blogs configured it will ask you to add one.

You need four pieces of information for this:

  • The title of your blog.
  • The URL of your blog.
  • The main URL for blog posts, most frequently the same as the URL of your blog.
  • The URL for syndication services on your blogs, either RSS or Atom. If you don’t know it simply add the URL from above.

If you choose ‘OK’ above it will have saved your blog, and you are free to add more, edit whats already there, or delete some.

Afterward add, editing and deleting blogs choose ‘OK’ to save them or ‘Cancel’ to ignore all the recently made changes.

If you don’t have any servers to send blog pings configured it will ask you to add some.

You can choose to add them individually, or to load the default list.

Having choosen to load in the default list of servers to ping, you are free to add more, edit whats already there, or delete some.

Afterward add, editing and deleting servers to ping choose ‘OK’ to save them or ‘Cancel’ to ignore all the recently made changes.

All servers to be sent blog pings are saved in a text file called “ping.txt” which should be in the same directory, or folder, as the one the ‘BlogPing.jar’ program was started in.

When adding a blog directory or search engine ping service you need just one piece of information: the URL for XML-RPC blog pings for that site.

Here the program is working through a processing cycle of blogs and servers to ping. The results are posted in the main notification output area. Once it’s finished going through all of the blogs you call grab the output and copy it into a text file.

Here’s the about box, it loads a page from this blog, which means I can update the page and make sure news and information about the program is kept up to date.

Why did I write it ?

Because I could and because I wanted more flexibility in using blog pinging over and above the default blog ping functionality in my blogs platform (Roller Weblogger, created by Dave Johnson, is used exclusively to host http://blogs.sun.com).

You may find that you have a similar requirement, even if your using another blogging platform such as WordPress, Movable Type, LiveJournal, or the like.

Plus I really dislike elitism, especially ‘technology’ elitism based on arbitrary things like what or how much you know, for me experience of having ‘done’ something matters more. Giving this software away and distributing it in the way that I am is an attempt to bring this capability to the non-programming, non-scripting, and much wider, blogging community.

Shouldn’t you have just contributed to Roller ?

Probably, in the longterm yes, as this is partially a tactical ‘fix’, done primarily to see if I could do it reasonably easily, of which the answer was definitely ‘yes’.

I also wanted more control and granularity when notifying Search Engines and Blog Directories of updates to my site, in fact I initially used it to make sure my blog was listed on as many global sites as possible.

What went into writing it ?

The Java Development Kit (JDK) 1.6.0 update 0.5 (and of course, as stated above you’ll need a compatible JRE to run it).

NetBeans version 6.0.1, which you definitely don’t need to run this program, available here: http://download.netbeans.org/netbeans/6.0/final/

Please note that the latest version of NetBeans (6.1) is currently in RC, or ‘release candidate’, form and that I’ll likely update Blog Ping to have been written in that in the very near future.

Download Java

Download Netbeans

The Apache XML-RPC libraries, version 3.1, also which you don’t need to run this software, available here: http://ws.apache.org/xmlrpc/

The ProGuard libraries, version 4.2, also which you don’t need to run this software, available here: http://proguard.sourceforge.net/

Anything else interesting about this software ?

Yes, my default set of blog ping services, which I’ll post later, and that I obfusticated the code using the latest version of ProGuard after being inspired by this article written by Geertjan on how to obfusticate java code written using NetBeans.

Possibly that I wrote it whilst listening to Flanders and Swann, notably “Madeira M’Dear”, famous for its syllepsis.

And maybe that I wrote this blog post listening to New Order, specifically Blue Monday (’95 tweleve inch version), and thanks to Walter Milner I think the Pink Fairies got in there somewhere too.

What do you want for this software ? Similar Blog Ping clients are on sale around the ‘net from anything between $30 and $100 ?

Nowt, yadda, zip, nothing, I just wanted to see if I could do it, and found that I could. If you use it, and you like it, please tell people about it, blog about it, add links to this page and this blog (but not the download itself), and leave a comment if you have time.

I’m especially interested in hearing from anyone who downloads and installs Java and the JRE as a result of wanting to use this software, so please let me know if you have.

Link to my blog here: http://blogs.sun.com/eclectic/

Link to the Blog Ping application page here: https://horkan.com/2008/04/22/blog-ping-search-submitter-seo

And comments here please: https://horkan.com/2008/04/22/blog-ping-search-submitter-seo#comments

There are also online services like ‘Pingoat’ and ‘King Ping’ which act as ‘Blog Ping aggregators’ for you, you can find them at http://www.pingoat.com and http://www.kping.com respectively. Whats nice about the software here is that you can configure it to use these ‘Blog Ping aggregators’, if they have an XML-RPC interface, which in the case of ‘Pingoat’ and ‘King Ping’ they both do.

By the way, there are other free Blog Ping software out there, notably Blog Pinger (a Linux command line utility, which as it’s written in Python should run on any OS where you have a compatible Python instance installed) and Submit’em now (a Firefox Add on).

Go check them out, they might be more to your liking, and diversity is good.

Will you be maintaining or updating this software ?

Maybe, it depends on three things: demand (from you guys), use (for me), and time (i.e. what gets priority over this).

What changes would you make ?

Probably spend some time learning more about the XML-RPC blog ping call protocol, I’m sure I could generate much higher volumes of Blog Ping successes if I did.

What license does this software use ?

This instance uses the Creative Commons License. Copyright 2008, Wayne T. Horkan.

Why isn’t this software ‘Open Source’ ?

If there is enough demand for the software, and more importantly changes to the software, then I’ll consider putting the effort into setting it up as Open Source. However, frankly, it’s a very minimal and trifling set of code at the moment.

Is this software anything to do with Sun Microsystems at all ?

This is my personal weblog and on it I do not speak for my employer. However the program was written using Sun technologies and I do work at Sun (although I put this software together as a home project and my current role at Sun rarely involves writing code).

Does that mean that Sun are they responsible for it ?

No, definitely not, nag me about it, not Sun, and do that in the comments section of this page please.

Is this a ‘White Hat‘ or ‘Black Hat‘ SEO tool or technique ?

The tool is implicitly amoral, it’s the user that constructs a moral or immoral pattern of usage.

Using this software aggressively to send falsified blog postings will likely, and deservedly, get search engines and blog directories to block your site, potentially even de-listing it, so please don’t be irresponsible in using it.

Have a look at the following Wikipedia article for more information on White Hat versus Black Hat SEO techniques.

Internationalizing a Roller Weblogger based Blog

So I’ve been spending time lately providing better international support to the blog.

In fact check out the variants I’ve put together:

For the translator I used the Yahoo Babelfish translation service, rather than Google Translate (which I use to produce on demand translations of the site, at the time of posting this it should be at the top of the right hand sidebar), because I didn’t want to become tied in to a single Translation Service Supplier.

During translation I switched to using “blog” rather than “weblog” for the title, as many of the languages would translate blog but not weblog (possibly a weakness of the translation service).

I was alluding to the new multi-language pages and the new multi-lingual nature of the blog in the post on St. Patrick’s Day, however I’d only translated some of the posts, not internationalised the site itself, and so it wasn’t really time to go live, but I did want the posts to start being spidered (and the post “Weblog language translator – blog translation on the fly with Roller specific functionality” explains why).

iron l10n zion – or how I did it…

There are a number of ways to internationalize a blog running over Roller Weblogger, for instance at the Aquarium, another Sun Blog, they use multiple blogs instances, like the Japanese Aquarium, I didn’t go for this approach as I wanted to keep to a single blog instance (due to maintainability basically).

I approached the problem by having a language resource file which loads as the session begins based upon the locale determined in the URL.

At run time this is done dynamically like this:

  1. Decide which locale the user is loading the page from
  2. Load the language specific resource file / pack (from a repository of language resource files, of which there is one resource file / pack per language)
  3. Variables are already allocated and populated with language specific data
  4. Use the above variables throughout the Roller Weblogger template code (HTML mainly) to create the page
  5. Present the page to the user requesting it

A number of language resource files were needed, all of which I populated with text based named variables (obviously the name of the variable stays the same, just the content per language resource file is different).

Then I replaced all the specific uses of text across my roller templates with calls to those variables.

This is a code snippet example of the code which decides which language resource file to load, and yes, before you say it, it’s not aesthetically pleasing, but I’m the only person who’ll be debugging it, so I’ll let myself off on that one. As you can see it checks which locale the page is being called from (based on the URL, but you can’t see that bit), once it finds a positive it loads the language specific resource file (notice I also ensure to load a default at the end if a match can’t be found).





  #if ($model.locale == "en")





      #includeTemplate($model.weblog "_lang_en")





  #elseif ($model.locale == "zh")





      #includeTemplate($model.weblog "_lang_zh")





  #else





      #includeTemplate($model.weblog "_lang_en")





  #end




And here’s an example of a specific language resource file, in this case this is the start of “_lang_de”, one of the files that would have been loaded based on the logic in the above piece. As you can see it has my (string) variables allocated and populated.





  #set ($gtTitle = "Blog Wayne-Horkans: eklektisch")





  #set ($gtMostPopTags = "Die meisten populären Umbauten")





  #set ($gtSitePrefs = "Aufstellungsort-Präferenzen")




Here’s an example use of the $gtTitle (string) variable from above within the Roller Weblogger template, which Roller builds dynamically at run time, obviously if the page was being


  $gtTitle

Probably the worst part of this was being adversely effected by Roller timezone and localization sensitivity issues as documented in ROL-1337 “all components involved in weblog rendering need to be locale & timezone sensitive”.

For instance when generating blog specific URLs in my templates, not all of the Roller Weblogger functions, macros and variables are timezone / localisation safe, and so for a number of them I’ve had to step though the templates modding the code to be timezone and localization safe as I go.

This meant that I had to bodge parts of the code with temporary ‘fixes’ to make up for the incomplete coverage, but it will do for now.

A couple of the most obvious issues was one with dates, as “$utils.formatDate” only produces day and month names in English, and another with “$entry.permalink” as it produces a non-locale specific URL





  ## Replaced instances of $entry.permalink with $entryLSP (Locale Specific Permalink)


  #set ($entryLSP = 

  "http://blogs.sun.com/eclectic/$model.locale/entry/$utilities.encode($entry.anchor)")




I may also have to write alternate macros which are locale specific, including the one that generates a list of recent entries:





  #set ($rEntries = $model.weblog.getRecentWeblogEntries($chosenCat, $rEntriesTotal)) 




  ## Have to use this as locale settings don't yet effect "getRecentWeblogEntries"


  #showWeblogEntryLinksList($rEntries)




I don’t want to give the impression that non of the Roller Weblogger timezone / locale specfic functionality works. In fact a lot more than I assumed would, did. Including the menu (content) functions and Tag URL functions, and I was very pleased that there was the level of support in Roller that there is for internationalization.

I’ll be providing more multi-lingual content, specifically the content rich, article like posts I’ve been doing, so far I’ve translated three posts of this ilk:

Ouch !

So I went and asked one of my friends at work what they thought of my blog, and after sending the following reply they asked me if I’d be posting it (albeit anonymously the little scamp !). ….. 1 Trackback

You really know you’re using software heavily…

…when you’re raising bugs against it.

So my first Roller Weblogger bug, ROL-1667 or, rather, “Date URLs incorrectly use updateTime to sort entries“.

Basically the get entries pager is selecting entries based upon ‘Updated Date’ and not ‘Published Date’, so accessing entries via entry date, which you’d assume would use ‘Published Date’ actually displays them based upon ‘Updated Date’.

This effects all date based blog entry selections, so access via date string based URLs or via the Calendar (either large or small variant, whose selections resolve to date based URLs) are all effected too.

Thankfully Dave raised it for me on the roller bug traq site, although I’ve since created my own account too.

Given the Open Source paradigm, I’ve decided to try and contribute directly and fix it myself, if no one gets to it before me that is.

Dave was kind enough to give me the following advice re: contributing to Roller:

I usually point potential contributors to this: http://cwiki.apache.org/confluence/x/2hsB

You can also contribute by telling us where our wiki and docs need improvement.

– Dave

During our email exchange about the bug I also asked Dave about overriding existing macros, especially the macro code for things like get weblog entries (the paging macro getWeblogEntriesPager) and the large calendar (or hCalendarTableBig as it’s also known).

He gave me the following advice:

Two places to look for additional info on macro coding:

1) Template Author Guide (get it here: http://roller.apache.org/download.cgi)
Lists all models, macros and shows HTML generated by each.

2) weblog.vm (http://tinyurl.com/yuwfvu)
Source code for all of the Roller macros.

– Dave

I found this bug whilst doing some template enhancements, around differing content per category, which once this bug is fixed I hope to implement. It showed up because of the tag policy I had implemented, and subsequently had a large number of blog entries which had been updated.

First time at #1 most popular blogs on http://blogs.sun.com

Cripes !“, as seemingly countless British comic book characters have said over the last century, I appear to have got the most popular blog on http://blogs.sun.com for the first time ever in my blogging lifetime.

Let’s hope it’s because of interest in the talk and presentation I’m giving on “Case Studies of Enterprise Architecture” this coming Monday (which I reported a couple of days ago) and not because it’s a slow news day at Sun…

I was pleasantly surprised to find I’d made #1 most popular blog on our collective blog server after finishing off some work tonight (ah, I hear you think, the shear excitement of weekend work).

Here’s the screen grab – which as you can guess I’m kind of proud of (third column along: “Popular Blogs” – lol).

Screen Grab taken on the 16th of February, 2008, at 19:47

Since writing this entry marking this landmark (sic) I’m even more surprised to find that I’m the victim (or is that ‘lucky winner’ ?) of comment spam. Although I’ve also noticed that they stopped at adding comments on my blog entry about Tom Hanks being a Villa Fan and so, in the style of one of the Catherine Tate sketches doing the rounds “the dirty, Villa dodging… ” (expletive optional).

I’ll do some analysis of the comments as I’m not sure yet whether to moderate (basically delete) them, it’ll be my first time at that too.

And to think that I didn’t even have to hold Frank Zappa over my face to get all these page hits (although I hope to join in the ‘sleeveface’ phenomenon as soon as I can find where I’ve buried the last of my vinyl).

Links for DD-MM-YYYY Not Likely

A response to Alec Muffett‘s recent post “A disappointed (occasional) reader…” on his excellent blog.

I’m sure Alec won’t mind me having posted the following comment in response to his article:

Hi Alec,

Although I sympathise, especially as a fellow ‘blog writer it appears one has to produce a very regular cadence to ensure continued, and growing, readership, I have to agree with your reader, mentioned above.

The fashion for producing a blog post which is simply titled “Links for DD-MM-YYYY” and contains nothing but links is becoming ubiquitous – and even sadder is in full sway across blogs.sun.com.

Like anyone I like to see interesting sites and links, however I go to blogs to read blogs, to gather opinion, see what peeps are chatting about, etc., etc., not to checkout someone else’s bookmarks.

I believe that one has to think very hard about what blog postings are for, and if indeed “Links for DD-MM-YYYY” type postings are an adequate and appropriate mechanism for sharing bookmarks with one’s readers.

Personally I feel that links, and bookmarks, are acceptable if introduced to the readers during a posting (or even as reference at the end of a posting), for me there has to be some posting ‘meat’ to go with my ‘link’ vegetables (terrible analogy, but it won’t be the worst thing you’ve forgiven me of).

However I suspect that whilst the “Links for DD-MM-YYYY” helps to produce a regular cadence, and continued readership, it will sustain it’s use as a blog posting across the blogosphere.

And for the record I really like your blog, as you can probably guess from the number of comments I keep leaving.

All the best, Happy New Year, etc.,

Wayne

I’ll be trying my best to avoid using blog posts as bookmark aggregators, but this is a personal decision, and each to their own.

To back up my assertion that the “Links for DD-MM-YYYY” type posts have become a staple at blogs.sun.com checkout this link to blogs.sun.com’s search facility, as of today it returns 1,092 results for posts which include “Links for”.

In fact, given the number of people writing these types of posts, perhaps that’s where I’m going wrong… :-)

Tic, Tag, Toe

Or rather “tagging, tags, and blog tag policy” or even “what’s the best / most optimal tag nomenclature / syntax”. After redesigning the blog interface I decided to start to rationalise my tags – and to institute a ‘tag policy’.

Tag Policy

  1. Use “-” to delimit multi-word tags
  2. Use all lower case characters

But “Why ?”

For a long time I had been using the “+” symbol to link multi-word tags, but I found that Google Translate (which I use for the language translation capability, up on the top right of the page if you’re reading the blog at http://blogs.sun.com/eclectic/) was having problems processing URL’s which contain “+” or “%2B”.

Here’s a little table I whipped up documenting the issues I was coming up against using multi-word tags, after trying out a number of delimiters, not just “+”, against a variety of technology.

Delimiters tested were: “+”, “%2B”, “_”, ” “, “%20” and “-“. Sites / technology tested were: Roller Blogger (4.0-dev, the version we currently run http://blogs.sun.com on), Google Translate, Google Search, Technorati, Del.icio.us and Slynker.

“+” (plus sign) “%2B” (encoded plus sign) “_” (underscore character)
Roller Weblogger 4.0-dev Will save and retrieve posts which use tags with “+” in the editor
Will not resolve tags URL which use “+” (actually the main site will, but individual blogs can’t)
Will save and retrieve posts which use tags with “%2B” in the editor
Will resolve tags URL which use “%2B”
Will save and retrieve posts which use tags with “_” in the editor
Will resolve tags URL which use “_”
Google Search Will search and retrieve multi-word tags as they are written, i.e. with the “+”, search produces a small number of results because of the infrequency of using “+” to separate written words Will search and retrieve multi-word tags as they are written, i.e. with the “%2B”, search produces a small number of results because of the infrequency of using “%2B” to separate written words Will search and retrieve multi-word tags as they are written, i.e. with the “_”, search produces a small number of results because of the infrequency of using “_” to separate written words
Google Translate Attempts to resolve tags URL which use “+”, encoding the URL to use “%2B” instead (which Roller can serve, see above), then promptly fails Fails to resolve the correct URL to translate using “%2B” Resolves tags URL which use “_” and continues to translate them successfully
Technorati Resolves tag URLs which use “+” correctly
Replaces the “+” with ” ” and produces good results based upon that
Resolves tag URLs which use “%2B” correctly
Replaces “%2B” with ” ” and produces good results based upon that
Resolves tag URLs which use “_” correctly
Produces smaller, but not unreasonable, results, due of the infrequency of using “_” to separate written words
Del.iciou.ois Resolves tag URLs which use “+” correctly
Produces results based upon using “+”
Resolves tag URLs which use “%2B” correctly
Replaces “%2B” with “+” and produces results based upon using “+”
Resolves tag URLs which use “+” correctly
Produces results based upon using “+”
Slynker Fails to resolve “+”
Produces no results
Attempts to resolve tags URL which use “%2B”, encoding the URL to use “%252B” instead
Produces results based upon using “+”
Resolves tag URLs which use “_” correctly
Produces results based upon using “_”
” ” (space) “%20” (encoded space) “-” (minus sign)
Roller Weblogger 4.0-dev Will save posts which use tags with ” ” in the editor
Will not retrieve posts which use tags with ” ” in the editor, instead it separates the words, retrieving them all in alphabetical order
Will resolve tags URL which use ” “, encoding the URL to use “%20” instead
Will save and retrieve posts which use tags with “%20” in the editor
Will resolve tags URL which use “%20”
Will save and retrieve posts which use tags with “-” in the editor
Will resolve tags URL which use “-“
Google Search Will search and retrieve multi-word tags as they are written, i.e. with the ” “, search produces a large number of results Will search and retrieve multi-word tags as they are written, i.e. with the “%20”, search produces a small number of results because of the infrequency of using “%20” to separate written words Will search and retrieve multi-word tags as they are written, i.e. with the “-“, and will replace the “-” with ” ” as well, thus retrieving the most amount of related information
Google Translate Attempts to resolve tags URL which use ” “, encoding the URL to use “%20” instead (which Roller can serve, see above), then promptly fails Fails to resolve the correct URL to translate using “%20” Resolves tags URL which use “-” and continues to translate them successfully
Technorati Resolves tag URLs which use ” ” correctly, after re-encoding the URL with “%20”
Produces good results based upon using ” “
Resolves tag URLs which use “%20” correctly, replaces the “%20″ with ” ” and produces good results based upon that Resolves tag URLs which use “-” correctly
Produces smaller, but not unreasonable, results, due of the infrequency of using “-” to separate written words
Del.iciou.ois Resolves tag URLs which use ” ” correctly, after re-encoding the URL with “%20”
Produces results based upon using ” “
Resolves tag URLs which use “%20” correctly
Replaces “%20″ with ” ” and produces results based upon using ” “
Resolves tag URLs which use “-” correctly
Produces results based upon using “-“
Slynker Attempts to resolve tags URL which use ” “, encoding the URL to use “%20” instead
Produces results based upon using ” “
Resolves tag URLs which use “%20” correctly
Replaces “%20″ with ” ” and produces results based upon using ” “
Resolves tag URLs which use “_” correctly
Produces results based upon using “_”

As you’ve probably surmised by now the issue is actually about the convergence of two technologies, and the incompatibilities they currently have. Principally that of tagging blog posts (and other stuff too) and that of URL encoding. It is not due to the limitations differing web1.0 and web2.0 platforms have around tag syntax, specifically multi-word tags, but of the correct adherence of these platforms in there support of RFC 1738: Uniform Resource Locators (URL) specification.

The problem is that tagging generally uses a relatively free form syntax (driven mainly by the communities which use and propagate said tag nomenclature, or “Folksonomy”), when and where possible, but that URL encoding has a variety of reserved characters, which conflict with the characters used in tags.

Characters for special use in defining URL syntax include the following “Reserved Characters”, and should be encoded where possible (although as the data in the tables above prove even the encoded URLs fail to produce the expected, or required, results).

Character Hex Dec
 “$” (the dollar sign)
“&” (ampersand symbol)
“+” (plus sign)
“,” (comma symbol)
“/” (forward slash)
24
26
2B
2C
2F
36
38
43
44
47
Character Hex Dec
 “:” (the colon)
“;” (the semi-colon)
“=” (equal sign)
“?” (the question mark)
“@” (the ‘at’ symbol)
3A
3B
3D
3F
40
58
59
61
63
64

Given that the above are “Reserved Characters” when it comes to URL encoding, and that they include some of the most popular delimiters used by multi-word tags (specifically “+” which is used a great deal, especially on Technorati). And, as I have found in the investigation above, have a number of issues in being used both in multi-word tags and in URL encoding, I have decided to standardise on “-” as the multi-word tag delimiter of choice.

For me it has a number of advantages:

  1. saved and retrieved correctly in tags in the Roller edit post page
  2. the URL is encoded correctly in Roller too
  3. it resolves correctly whilst using Google Translate
  4. it returns all search results for both “-” and ” ” in Google Search – an unexpected bonus, in terms of returning search results (and thus being included in said search results)
  5. it returns reasonable results from Technorati, based upon “-“
  6. it returns reasonable results from Del.icio.us, based upon “-“
  7. it returns reasonable results from Slynker, based upon “-“

As to the issue of upper versus lower case, I have standardised on all lower case, as this has little effect in searches (outside of Technorati, which returns slightly differing results, albeit with a low delta between the results returned).

You may be able to see that I have started to retroactively replace the tags so far created with this new standard – however I have focused on the most popular tags for the time being, and I will continue to use this format from now on.

I found this article on “URL Encoding (or: ‘What are those “%20″ codes in URLs?’)” provided a nice overview of the issues of URL encoding, and of RFC 1738 itself.

Out with the old, in with the new – blog design that is…

Given I’ve just redesigned the site, I thought it would be nice to keep a visual comparison of the old 2007 look and feel versus the new 2008 blog interface and design.

Sadly image rendering in IE and Firefox lags behind that of Opera and Safari (the four browsers I review the site with), so these images may be a little ‘out of focus’. If that’s the case for you please get back to me and I’ll see if I can do anymore to improve the issue, prior to the two ‘largest by volume of users’ browsers catching up of course.

Example 1: UK Government G2G Messaging Sub-Systems

2007 version

2008 version

Example 2: Messaging Sub-Systems in the UK Government

2007 version

2008 version

Example 4: Evolution of UK Government Messaging Systems

2007 version

2008 version

Radical Alex Cox ‘Repo Man’ inspired weblog redesign

Thanks to Alex Cox I’ve radically redesigned my ‘blog inspired by the product branding used in his seminal 1984 film ‘Repo Man’.

Written and directed by Alex, with the ex-Monkee Michael Nesmith as Executive Producer, Repo Man stars Emilio Estevez and Harry Dean Stanton as repossession agents on the trail of missing car with a little extra in the boot (a stolen ‘nuke). It’s a satirical and surreal comedy, widely seen as one of the first truly independent movies (along with it’s stable mate, the excellent ‘Rumble Fish’). It has a great, mainly Punk, soundtrack, including Black Flag’s ‘TV Party’, and songs performed by Iggy Pop, Suicidal Tendencies, The Circle Jerks, The Plugz, Burning Sensations, Fear and Juicy Bananas.

I was really struck by the product branding used in Repo Man – it’s pure and simple function over the aesthetic (with a good dollop of humorous irony thrown in for good measure).

After writing to Alex to ask his permission to use the branding as the basis of the look and feel of the site I was very happy to receive a reply (in quick order too).

Here’s Alex’s response:

You’re welcome to use that look.

It was originally the brainchild of Ralphs Supermarket in Los Angeles, who gave us all their generic stuff. The only labels we had to make said Drink and Food.

John Lydon also used it for his PIL album, ALBUM aka CASSETTE.

So you are in good company!

All best

Alex

Permanent US bases in Iraq? Afraid so.

http://www.alexcox.com/ed_current.htm

I was really pleased about getting a reply as I’m a big fan of Alex, and of his work, and not just because he does the best Jimmy Carl Black impression I’ve ever seen.

As Alex rightly points out the look and feel was later used by John Lydon’s post Sex Pistols / Post Punk band Public Image Limited for their generic release, called ‘Compact Disc’ or ‘Album’ or ‘Cassette’ depending on the format (the branding extended to the singles released, the promotional materials, and the merchandising too).

There are other people for me to thank for different elements of the new look and feel, and of especial mention is Dave Johnson.

For those of you who don’t know Dave, he is the creator and driving force behind the Roller Weblogger (now a project in the Apache Software Foundation) used at Sun Microsystems as it’s ‘blogging platform of choice (it powers blogs.sun.com), as well as being a fellow Sun employee.

Whilst recently reading Dave’s blog I had an idea that the colour scheme and basic layout he used would be a near perfect springboard for the ideas I had around using the product branding used in Repo Man as the basis for my sites look and feel redesign.

Thanks to Dave, or rather his blog I’ve rebuilt the basic layout of this site, incorporating the Repo Man inspired look and feel. To properly credit Dave I added “Derived: Dave Johnson’s Rollerblogger blog CSS” to the header of my CSS file.

Additionally theres a few more people to thank including www.khmerang.com, who’s post on ‘Real World Bar Graphs (with some CSS)’ helped my develop the Tag Pareto / Bar Graph, which I’m using as a page leader rather than the ubiquitous Tag Cloud (although there is an obligatory Tag Cloud on my archives page).

Then there’s blogs.sun.com/junkfood, who’s multiple posts on Roller Hacking, specifically ‘Roller: Re-ordering the Category Bar’ helped me develop the code for sorting the Tag’s by frequency.

That then led my to develop two other new pieces of functionality using Roller’s in built template scripting language ‘velocity’.

Firstly a new menu which incorporates both the page menu functionality and the category menu functionality – it’s included in the banner at the top of the page.

Secondly a new recent posts menu which both adapts to the currently chosen category and instead of linking to the individual page for an entry it instead links to an anchor on the page of the main blog (also current category dependent). The advantage here is that the reader still has the choice of reading posts around the target post – if they attract the eye

I also have to thank the friends and colleagues who read reviewed my blog. A constant theme that came out in the comments made by them that the nature of the site was too diverse. Amusingly Walter Milner had this to say, which I thought was the most succinct (and most humorous):

So relating to blogging – we have multiple aspects of our personalities, and I suspect that if you mix them on the same ‘channel’ you construct a confused message. One aspect is working at Sun/programming in C/PRINCE, another is a bizarre experience of a walk-on role in ‘The Birds’. I think you should separate them.

And why have you got some 5 channel paper tape as your banner? ;-)

However many of the reviewers wanted different things from my blog. Family (and some friends) generally wanted light fluffy stuff, like what’s going on at home, and what music am I listening to. Work friends generally wanted Sun Microsystems specific content. Whilst friends I had made in the IT Industry generally wanted generic technology information and opinion. Personally I also wanted to be able to blog about contemporary issues and news, in fact to use my blog as a diary of the significant events occurring around me.

In an attempt to reach a compromise I ratiionalised my blog categories, ‘boiling’ them down to only four categories (not including root, or ‘All’, which makes five). The four are: (1) Home – personal stuff, what record I’m listening too, etc., (2) Life – contemporary news, etc., (3) Tech – from micro IT to macro IT, technology and the technology industry, and (4) Work – stuff about Sun Microsystems, etc.

I’m hoping that by simplifying the categories down to four core areas, and by providing category specific functionality (now and more in the future), it will be easier for readers of this blog to navigate and find the stories and information pertinent to them.

And for those of you who haven’t seen Repo Man, obviously I recommend seeing it, and here’s a link to the theatrical trailer (hosted over at YouTube) for you to either get a ‘taster’ of the film, or remind yourself of it. In one scene in the trailer you can briefly see Emilio Estevez, as ‘Otto’, eating out of a can simply labeled ‘food’ – wonderful.

Link to above clip: http://www.youtube.com/watch?v=554AX4l1tmw

Relevant links:

Weblog language translator – blog translation on the fly with Roller specific functionality

Finally got round to upgrading my ‘Weblog language translator‘ from beta.

Key to improving it was removing the roll over based banner I had implemented (the Google translation service, which I piggy-back off of, only translates circa 3k characters, so the banner header, full of links was using up the majority of the translation).

Obviously this points out a few of the flaws of the implementation, namely reliance on Google to provide the service (and of course a dependency on the call syntax not changing), and all of the weaknesses that follow on from relying on the Google service, not least the translatable character limit.

This time round I’m much happier with the implementation – and I’ve done a fair bit of testing to ensure it’s fit for purpose.

Unlike the other implementations out on the web I’ve added Roller specific functionality, implemented in JavaScript, creating a ‘main’ (or rather ‘weblog’) page for each language.

I did this because I wanted to tailor the service to be language specific, and because the major search engines outside of the English speaking, Google dominated, Internet, often verify that there is actual language specific content (and I want these search engines to be able to index my site, even if that’s only a couple of pages).

The code uses Roller Weblogger specific URL notations to provide the matching ‘weblog_xx’ (where xx stands for the two character country code – five characters when looking at Traditional and Simplified Chinese) to the target language to be translated to.

Currently it works for the generic weblog URL, all ‘entry’ variants, all ‘date’ variants, and all ‘page’ variants. It doesn’t work for ‘tags’ or ‘category’ variants (mainly because I haven’t had time to research the URL notation), but I hope to get this done soon. I’ll research and code up the other, alternative Roller URL formations when I next revisit the code. I find this acceptable, as it still provides a translation, however without accessing the language specific ‘weblog’ page.

The JavaScript is available via the page source – and you’re welcome to have a look and re-use if you wish (it’s nowhere near the nicest bit of JavaScript available – if you’d like to tidy it up at all you’re more than welcome).

I’ve also added Dutch and Greek to the list of languages that can be translated to, as these have been recently added to Google’s translation service (still no Hindi or Bengali though). That makes a total of fourteen languages, including the already implemented Simplified Chinese, Traditional Chinese (Taiwanese), English, Spanish, Arabic, Portuguese, Russian, Japanese, German, Korean, French and Italian. Plus I’ve replaced the language text with flag icons – which improves the look and feel too.

The icons are “available for free use for any purpose with no requirement for attribution” (although I thought it would be nice to credit the originating site) from FamFamFam, by fellow ‘Brummie‘ Mark James, available at http://www.famfamfam.com/lab/icons/flags/

Previously, after the initial implementation in beta, I found a variety of resources in a similar vein, none of which are Roller specific though, here’s a few examples for you to have a look at if you’re interested:

Have to admit I’m really glad I’ve tidied this up as I was starting to feel as though it was in danger of genuinely being in ‘permanent beta’, and however fashionable that is, in the apocryphal words of Steve Jobbs: “real artists ship”.

Using Alternate Style Sheets to switch design

Due to the large number of images and diagrams that will be accompany the articles on “UK Government G2G Messaging Sub-Systems” to follow over the next week, I’ve implemented an “Image Resize” function, to allow you to alter the image size of all diagrams in the main body of this site.

You should be able to see a section heading on the right hand side bar called “Body Image Size”, the choices are “Small” (thumbnail) , “Medium” (default) and “Large” (body width).

It’s implemented using alternate CSS Style Sheets, and was inspired by Tim Bray’s site ‘Ongoing’, where Tim uses it as a technique to switch between the ‘Serif’ and ‘Sans-Serif’ font types.

I got assistance from this article “Alternative Style: Working With Alternate Style Sheets” by Paul Sowden, hosted over at ‘A List Apart’.

Thanks to Justin Hibbard, Lead Engagement Architect and Systems Engineer (SE) for the Department of Work and Pensions (DWP) at Sun in the UK, who’s comment on the issue of “illegible text” on my diagrams instigated me to add this functionality. Justin also points out that images are rendered poorly on both IE and Firefox, however Safari (both on Windows and Mac) does a better job, personally I find Opera has the best image rendering support.

In the future I’m hoping to use this technique to allow the readers to instantly change the look and feel of the site. I like the site look and feel as it is but change is good – and choice is even better.

Few last items before I sign off tonight:

  1. Congratulations to Gordon Brown on his ascension to Prime Minister tonight, and to Harriet Harman as the new deputy leader of the Labour Party.
  2. Congratulations to Harry Saxon on his ascension to Prime Minister last night (Whovian specific content).
  3. Pleased and proud to say that Andy and Joey were both awarded Orange belts (junior 4th Kyu), and that Donna and I were also both awarded Orange belts (adult 4th Kyu), at our Karate classes today.

Stabilising Look and Feel

Since my first post I’ve been working on getting the look and feel that I want, and although I haven’t quite finished here’s an update.

I started with the Sun Pacifico Theme, which at the time I liked a lot. However the more I looked at my blog, the more I wanted something that was, if not unique, at least “mine” – and not just in terms of the content.

Look and feel / web design was the most obvious area to change, but, to an extent I had held back a little, because I knew that it would likely mean delving into a host of technologies – only some of which I was up to date & familiar with.

Wanting something that was very minimal, similar to the Blogger Template Style “Minima” by Douglas Bowman (here) of Stopdesign (here), and as used by my friend Alan Mather on his blog (here).

I feel that the content needs to stand for itself without too many distractions calling the eye’s attention. I find that very busy websites, with lots of “eye candy“, lose detail amongst the noise. I know lots of people are enjoying using technologies like Snap (here), but I wish they would include some mechanism for the user to turn it off – as it can easily get confusing for with so many link page pop-ups appearing.

As to the banner, I had previously been impressed with Damien Hirst’s Pharmaceuticals (2005), an example of which is here, an installation he had done as part of his show at the Tate Modern, New York.

But instead of tablets and pills, I thought it would be effective to use small web site logos & icons instead (mainly the favicon). After getting a version working on that premise I very quickly realised that there would be a variety of copyright issues involved, as well as issues in loading a banner comprised of 300 (5 rows by 60 columns) of 16×16 pixelated images. Just too many calls to the web server, meaning page load time was very slow.

So this is pretty much the finalised look and feel for the time being. I’m much more pleased with the banner now – and having utilised CSS Sprites, reduced the calls to two images – both of which I cache using JavaScript at the start of the page too.

Now that I’m happy with the overall look and feel – I’m going to focus on Site Navigation, followed by a code cleanup, and then, maybe, back to the Design aesthetic.

There might even be time for the odd post or two.

Weblog language translator – beta

I’ve just implemented a weblog language translator, based on Google Translator.

It’s very rough and ready, deserving of the title “beta”, and very simple, but it appears to do quite a nice job of translating into the majority of the World’s most used languages.

I had just been reading The Aquarium (over here), and I was very impressed with it’s multi-lingual support.  I don’t know how the guys are doing this, but I’m presuming that they are actually translating the text manually (i.e. with human editors), after searching on the topic of blog translation.

The languages that I’ve included are: Mandarin (Simplified Chinese), Chinese (Traditional Chinese), English, Spanish, Arabic, Portuguese, Russian, Japanese, German, Korean, French and Italian. 

I wanted to do the fifteen or so most used languages – however the sources I found disagreed slightly on actual numbers and rankings.  The sources I used to understand the breakdown of percentage of languages spoken by the World population were:

1) Dr. Dennis O’Neil’s website (here) at the Behavioral Sciences Department, Palomar College, San Marcos, California.

2) The “Languages of the World” article (here) at The National Virtual Translation Center.

3) The “List of languages by number of native speakers” article (here) at Wikipedia. 

Unfortunately it suffers from two major issues. Firstly it’s limited to the languages supported by the Google Translator service, which unfortunatly does not cover a number of the World’s most used languages (notably Hindi and Bengali).  Secondly the Google Translation service modifies the page links so that the “Language” links I’ve implemented are translated twice, which fails in the service at runtime. 

Other issues include: maximum text amount that can be translated is limited (or appears to be, so that part of the page doesn’t get translated), the banner I’ve implemented goes awry in some of the translations, the sidebar isn’t getting translated (might be due to the text length limit issue, as the sidebar is written after the content), and, as I don’t speak the majority of these languages, I’m presuming the translation that it does is no means as good as an actual, professional, human translator. 

I’m going to tweak the code and look at how (and if) I can use the service to perhaps translate individual components, plus I’m going to see if the Google API can provide a more succint and elegant dynamic solution.  I had tried to implement in both Google Translator and Yahoo Babelfish, but the Babelfish service was erroring out, thus the use of Google – I might try it again later though.

I have a other requirements for this functionality too: ideally it should produce pages which can be indexed by the major search engines and it should translate feeds – both RSS and Atom. 

Have a look and see what you think – any opinion would be good, especially from those who aren’t native English speakers.

About me…

Hi, my name is Wayne Horkan and I’m the Chief Technologist for the UK and Ireland at Sun Microsystems.

I’ve been at Sun for almost eight years and in my current role for just over two years.

The role I have covers three main areas: Customer and Partner engagements (helping develop systems), Awareness and Adoption (helping to make people more aware of Sun and Sun technologies), and Architectural and Solution Quality (help to ensure we reduce risk by using Standardization).

Before being assigned my current role I spent most of my time at Sun in Sun’s delivery organization, directly delivering systems and helping people in the adoption of technology. Whilst I’ve been at Sun I’ve always been part of what Sun call Customer Engineering (CE, although it also gets called Field Engineering or FE), this is the field organization which works directly with customers, in comparison to Product Engineering (PE) who innovatively develop our new technologies.

Being at Sun has given me wonderful opportunities to work at a senior level on some of the largest, most diverse and interesting, systems in the world, with some of the best technologists, business people and consultants, including:

  • SOA Design and ‘transformation roadmap’ for one of the largest UK Government organisations.
  • Identity system for an early SOA at one of the worlds largest investment banks (over 42,000 users across over 30 major systems).
  • Consolidation of 5000+ servers at another large investment bank (based out of Canary Wharf).
  • A Total Cost of Ownership (TCO) engagement with a very large ISP.
  • Technical Design Authority (TDA) and Technology Leader brought in as a White Kinght at a Data Centre build out which up until then had gotten eight months behind schedule.
  • The Governance design for Sun’s largest customer engagement.

Before working at Sun I spent almost three years as the Chief Architect for Harrods, building amongst other things Harrods Online (v1 and v2, v1 was MS Commerce Server based if you remember that, whilst v2 was Sun and Vignette based).

Prior to Harrods I worked at Keane Inc., a Systems Integrator (SI), as a Technical Consultant. I spent time at Sun Life Assurance (now AXA) building a workflow and document imaging (scanning) solution, and at East Midlands Electricity (EME, followed by PowerGen, currently E.On) developing messaging subsystems and front end applications as part of the deregulation of the Gas and Electricity industries (the 19M programme as it was called).

I also spent a couple of years at Touch Systems, writing software to improve manufacturing process quality and cost, utilizing hand held data collectors, a shop floor network application environment and Statistical Process Control (SPC).

Outside of Sun I also work with a CDFI charity called Street UK, and I give (limited) advice to a CDFI collective called the Fair Finance Consortium.

I’m a supporter of professional membership organisations, and am a member of the British Computer Society (BCS), the Institue of Electrical and Electronic Engineers (IEEE), the Insitute of Directors (IoD), the Lunar Society and the Information Technologists Company (ITC).

Contact details

You can get in touch with me here: wayne.horkan-AT-sun-DOT-com

About this site

This site is my personal weblog, hosted and provided by Sun Microsystems, my employer.

This blog is governed by the Sun’s blogging policy, or the Sun Guidelines on Public Discourse as it’s called.

Many thanks to Linda Skrocki who recently wrote about Sun’s Revised Blogging Policy (AKA Guidelines on Public Discourse).

Disclaimer

This is a personal weblog, I do not speak for my employer, Sun Microsystems (or Sun Microsystems UK).

Copyright

This work is licensed under a Creative Commons License.

Copyright 2007-2008, Wayne T. Horkan (wayne dot horkan at sun dot com).

FAQ

“I’d like you to come and present to my organisation on…”

  • Sun’s product portfolio, strategy, etc.
  • Futurology.
  • Enterprise Architecture.

Get in touch (see above) and lets talk.