Archive for January, 2008

Internet-grade

It’s probably been coined already, and I’m sure it’s not a new realisation. Something happened at my employer recently that’s made me wonder whether the old benchmark of “enterprise-grade” is really relevant any more.

Our internal IM system was closed down for a while this week, and when it was restarted a number of us could not reconnect. It turns out that the IM servers had been set up to lock this particular client out. Nothing unusual about that really, as it has happened in the past with unsupported clients that stress the servers in unexpected ways.

What was different this time is that the client in question is part of a new “integrated communications” offering — a version of our e-mail client that has the IM client built-in. This product, which will be sent to-market quite soon (and therefore we will be expecting our customers to buy), has been locked out of our IM infrastructure. The further irony is that the part of the business that markets this software runs a “use what we make” initiative to get people to use development versions of their software in their day-to-day work.

The IM system in question is marketed as enterprise-grade — and in general it lives up to that, having to support a couple of hundred-thousand users at peak. What got me thinking though is that systems like MSN Messenger (or whatever it’s called now) and Yahoo! IM and AOL IM must be supporting millions of connections at a time with nary a blink.

So (if it wasn’t already) I’m knocking “enterprise-grade” off the top-spot of reliability rankings. Nowadays, the top spot surely goes to “Internet-grade”. I mean, just imagine the amount of traffic that must pour through Google Talk and Skype — these are systems that not only do text chat but voice and video as well — while our IM is still struggling with smilies and changing fonts. The trouble, in the case of my employer, is that the name of this IM service is synonymous with the concept of IM there. It doesn’t matter that even an open system like Jabber could scale better.

In my opinion, our software people need to take a look at what Google has done in taking XMPP/Jabber and creating Google Talk. Either that or the company needs to do what another prominent software company did and actually use one of the public IM systems (I cant remember which one they use, either YIM or AIM) as the corporate IM platform.

I feel for the developers of the new client, who I’m sure would love to have a stable environment to do a large-scale test on. Oh well.

Tags:

OpenTTD

So I was catching up on the RSS feeds I subscribe to, and came across an article on the latest issue of Full Circle (a magazine about goings-on around Ubuntu Linux). In it I found an article on OpenTTD, an open-source clone of the old 90′s game Transport Tycoon Deluxe. As one who spent many an hour in front of games like Railroad Tycoon in my youth, I had to try it. Unfortunately, I’m hooked…

I’ve been playing the game all night since I found it on Monday afternoon. Sleep seems a distant priority compared to making sure I snag the subsidy for a passenger service from Podlondlington to Nunmubhattan…

It’s easy to install on the Ubuntus, but you do need to obtain the data files from the original CD — the Full Circle article contains instructions on how to do that (or I’m sure the website tells you).

Sure, the graphics don’t measure up to today’s insane system-melting specifications and the isometric view, while state-of-the-art in its day, is at times frustrating (I’m sure there was a control you could use to hide the buildings so you could see behind things… maybe I’m thinking of Lincity). Still, it’s both a great bit of entertainment and a trip down memory lane at the same time. If you’re like me and played with the Tycoon games as a kid, or if you’re a bit of a retrogamer, I encourage you to check it out. Don’t expect to see much of your family for a while though… :)

Tags:

Cisco XML apps: things made of fail

Since I have a few Cisco phones around here, I’ve played with XML apps. I have written a timezone calculator, an LDAP phone directory lookup utility (which hooks into the “external directory” function of the phones), an app that uses Qantas’ WAP interface to get flight arrival/departure information, and the obligatory RSS reader. They work, in some cases very well, but the inconsistency of the XML interface between different levels of the Cisco firmware makes it a trying exercise.

My latest exercise was an update to the RSS reader I’ve used for ages. I found RSS2Cisco ages ago and have used it quite successfully, but I’ve never really been satisfied with its way of displaying the whole feed in one text page. It works well for news feeds, where all you get is a headline and a teaser, but for things like blogs it’s not suitable (you’re lucky to get through one posting before hitting the limit of a Cisco XML text page). I wanted an interface like a “normal” RSS reader, where it lists the items in the feed in a menu and you then choose an item to be displayed.

Sounded simple, and wasn’t too hard to hack rss2cisco around to make it do my bidding (it’s not optimal yet as every time you read an item it pulls down the entire feed again). The problem I faced was in making the thing work consistently between the 7960 phones and the 7970s.

All my phones are running fairly recent SIP code, but for some reason the 7960 has an ancient XML parser. By ancient, I mean that the level of the XML SDK it supports is tied back to Call Manager 3.0. The 7970s, on the other hand, have support for a much more recent SDK and support some of the fancy operations that you can’t do on a 7960 unless you’re running SCCP firmware. At first I thought that there might have been a hardware limitation and that Cisco couldn’t fit the extra smarts of a client for later SDKs, but the SCCP code can’t be that much simpler than SIP that they’d have more room to fit a better XML browser and all the other features the SCCP code has over SIP…

So the SIP firmware for 7960 has a junk XML browser. You’d think, then, that the 7970 was easier to work with than the 7960… Wrong! Valid XML that worked quite happily on the 7960 would fail with a cryptic “XML Error[4]: Parse Error” message. It took quite a bit of time and quite a bit of trial-and-error to work out some of the dependencies (32 seems to be a magic number, folks…).

Call Manager XML (CMXML) is supposed to be really simple, but I can only imagine how complex it might get to deliver an app with a consistent interface if you had a number of different phone models to support — I have only two, and I’m looking at two different versions of my app!

In their defence, Cisco have provided a way for the phone to identify itself and its SDK level when it makes a request. A set of HTTP headers identify the device, and one specifically states the SDK version supported by the phone client. Reading these headers would allow a developer to adjust the output of their app to cater for the various phones — one app, but multiple output capabilities.

It strikes me though as a heck of a lot of work for limited return. These are phones intended for corporate installations, so it’s almost a given that there will be a full-function computer at the same desk. Why would a company invest that much effort developing and supporting an internal application for a single platform that’s tied to desks, when they could write it as a web app and deliver it practically anywhere? I’m starting to see why the Internet is not exactly awash with sites selling CMXML apps…

Having said that though, I love my timezone calculator. With three button presses I can find out the time in any of my six favourite timezones, and I can find any timezone in the world with only a few more presses. An application somewhere on the web couldn’t be anywhere near that speedy for me, and a desktop app would have to be some kind of widget already running and configured (or be the KDE Clock applet, all it takes is a mouseover… shame I’m stuck with GNOME for my work desktop).

So I’m not too keen to apply much development effort to my XML apps. I will stick them on my development site some time soon, but I don’t think it’s worth the effort to keep them functional. The Qantas one, for instance, is totally dependent on the URL and query format of the Qantas WAP application, which is obviously subject to change at any time. I wonder sometimes if a WAP-XML gateway would be useful, but then I think about the effort of writing a system to translate pages delivered over a dying protocol to an interface that never got off the ground…

In case you’re curious what the RSS reader looks like:

and something a bit more voluminous from my blog:

Yes, I am a bit proud of it, even though it’s rubbish… ;)

Tags: , , ,

KDE 4.0: be free.


Since I watch Planet KDE it was easy to get caught up in the excitement around the launch of the new version of KDE (the announcement is here). I was unable to resist giving it a try on the laptop! So this post is coming from Konqueror 4.0.0.

I tried an early Beta of the KDE 4.0 Live CD, but it was still using the KDE3 Kicker and was also a bit unstable. I wasn’t sure if it was the fact I was running in a virtual machine that made the graphics a bit flaky or whether it really was beta-quality code making things a bit funny. The KDE team put a lot of effort into bug-swatting in the weeks leading up to 4.0 being tagged, and it’s a lot better now!

This announcement from the Kubuntu folk shows how to get the KDE 4 packages installed on Gutsy. KDE 4 installs in a different path to KDE 3, so you can try out KDE 4 without affecting your existing environment.

I did have a bit of a heart-starter with this though, as apt-get wanted to remove a package called “kdebase-bin-kde3″, which looked risky! It’s okay though, as equivalent binaries are provided by “kdebase-bin-kde4″. In fact, if you follow Kubuntu’s instructions exactly, you should not see the issue: it happened to me because I did a system update after adding the Kubuntu PPA repository but before installing KDE 4. The system update brought a bunch of updated KDE 3 packages out of the PPA, one of which was to replace the standard “kdebase-bin” package with a “kdebase-bin-kde3″.

First impressions are that Oxygen (the new artwork for 4.0) looks great — it’s a very modern look. Some might think it borrows from Vista, but to me it’s got as much of Mac OS X’s appearance as that of Aero. Plasma (the desktop shell) does some interesting things, like turning desktop icons into widgets, but I’m yet to spend enough time with it to experience the other improvements it brings.
The biggest thing I’m looking to trying out is the compositing built into the window manager, KWin. Unfortunately the laptop is a bit old for this to work well (or at all in fact), so I’ll either have to find some magic Xorg setting or get the KDE 4 packages on the desktop machine. I’ve had trouble running Beryl and Compiz thanks to something about the terminal program Yakuake tickling a long-lived bug in X11 (I think part of the reason it’s long-lived is that the X11 folks don’t accept it as a bug but rather a fringe case that Yakuake shouldn’t be exercising, hence a stand-off) so it will be interesting to see if KWin has the same kind of issue.

As for bugs, well there look like plenty. :) As I’m keying this, Konqueror is chewing 100% CPU and the characters are delayed by a couple of seconds (and of course, now that I observe this, it stops doing it). Still with Konqueror, this is about the third time I’ve tried to post this thanks to Konqueror segfaulting for strange reasons. Also, the Alt-F2 program launcher reports that it was unable to launch whatever you told it to, even though it does so successfully.

There has been plenty written by the KDE folks about the “1.0.0 release of KDE 4″, and they’re copping a fair amount of stick from people who think they’ve done the wrong thing by releasing as 4.0.0. I’m on KDE’s side. Although many KDE folks have used their KDE 4 builds as their daily desktop for months, I haven’t seen anyone who wears a KDE hat recommending that others do so. The term “will eat your children” has been used to describe KDE 4 by folks from the KDE team, so there has never been any pretense that KDE 4.0.0 would be a daily desktop for the majority of users. I’ve never really participated in large-scale software development, but I can see their motivation for releasing what they had as 4.0.0 — I’m proof of it. As long as it was a beta I was not really all that fussed about trying it out; even after there were release candidates I wasn’t all that keen. As soon as you call it a release, however, your early-adopters rush in and kick the tyres and your real testing can start.

By being open about 4.0.0′s status (and I don’t think you can get more open than “will eat your children”), they can make sure that subsequent releases are a lot better than they would be if they dragged on in perpetual beta — the model that Google and the Web 2.0 fraternity seem to insist is better, plodding on for months hiding behind beta status and its implicit “get out of jail free” card.

Instead, KDE has shown the courage to take their code, along with its bugs, and hold it up as something they are proud to give to the world. It’s the foundation not only for future releases of KDE, but possibly the start of new ways that people work with their computers. By working with the community, instead of closeted away from it, I believe the KDE team will succeed.

Okay, so that finished a bit more ra-ra than I planned! Seriously, give KDE 4.0 a try… but if you aren’t happy to suffer a few bugs then by all means wait until 4.0.1 or even 4.1. Oh, and be free. :)

Jabber and Google: part two

In part one I mentioned how I was considering using Google Talk as my main chat ID. As it turns out, I talked myself out of it pretty quickly after I delved into using Google Talk to connect to MSN and other services as I do now with my own Jabber server. While there are a lot of links around for using Jabber transports to hook your Google Talk ID to other services, there’s a tiny catch… well, actually, I think it’s a bloody great huge catch personally.

You see, it wasn’t until I read the how-tos that it became clear how it works. The trick is that Google doesn’t run Jabber transports on their own servers, so you therefore need to take advantage of various “open” Jabber servers that do (“open” in this context refers to a server that lets you use its transports without necessarily being a registered user there).

Seeing there didn’t seem to be any restrictions on the servers that could be used, I figured that I could use my own server. Sure enough, after the right incantations to expose the service on the ‘net, I could connect my Google Talk ID through the Jabber-MSN transport on my server to my MSN account. Yay, right? Well, not really — each little test message I sent in either direction incurred three trips over my Internet connection! Yes, three: one to go from my Google Talk client to Google, one back from Google to the transport on my Jabber server, then a third from the transport to MSN. Obviously the same happens in reverse as well (for incoming messages from MSN).

Seeing this as a less than optimum setup, and also being wary of getting listed as a Google Talk-friendly Jabber transport provider, I lopped the transport’s external visibility and went back to using my own JID for transport access. It’s a bit of a shame too; since fring (mentioned briefly in my last post) doesn’t let me connect to an arbitrary Jabber server, to keep connected to everything I’d need two mobile chat programs running.

It’s not like I do that much IM that I need to keep all this running, but it is at least a little bit interesting… :)

Which Nokia device to get?

I’ve developed a very strong desire to be connected to people recently. In the last fortnight I’ve reawakened my Google account and regularly sit on Google Talk, reawakened an old Free World Dialup account and plugged it into my home phone system, and signed up to Twitter. I also found a mobile IM and SIP client called fring that looks good and works really nicely. I’d love to use fring constantly, thanks to its integration to Twitter and Google Talk (heck, it might even make me find my old Skype ID) but…

My current phone is a Nokia N70, which has served me well for a couple of years, but I’m not keen to use it too much for fring because I don’t have a mobile data plan (and my phone company charges fairly steeply for casual data). Besides, it’s only UMTS 3G so the data rate is not great (better than GSM data, but only occasionally so). What I really need is one of the newer devices around that has Wi-Fi built in. Something like the N80, new N82 or E51, or N95. That way I could use fring at home (which is where I am most of the time nowadays) and not have to worry about data costs.

Thinking about spending that kind of money though (again, my phone company is happy to talk to me about upgrading my handset, but the kind of plan I’d have to go onto to get a phone like that would be insane) makes me wonder about other devices. Something like the N800, or even a new N810. I don’t think fring is available on Nokia’s tablet devices, but with the alternate OS platform on the N8x0 I could install just about any kind of IM client I want. Plus I’d have a nice device to web-surf, program MythTV, check mail, and various other tasks.

What about other devices? The Asus EeePC has tweaked my curiosity, but I think it would end up being just a bit too large to fit in with the kind of usage I’m imagining for this type of device. Blackberry is a bit scary to me, it doesn’t really seem to be a general-usage consumer-oriented device (more a corporate connect-back-to-the-proprietary-box-in-the-server-room kind-of thing). The iPod touch is out as well: it’s closed nature would frustrate the heck out of me (it’s got a browser, but you can’t load anything on it…). The only other manufacturer I’d think about for a mobile device right now is Sony-Ericsson: Ericsson manufactured a couple of the nicest phones I’ve ever owned, but Sony has ruined them for me. I’m just not interested in getting back onto the hardware-to-lock-users-to-the-Sony-tower treadmill.

It’s all just navel-gazing, unfortunately. Realistically, I can’t justify dropping a wad of money on some new shiny just to satisfy what is probably just a bit of a personal fad. I think I’ll wait a bit longer and see how quickly the newly-released N95-8GB drops in price, or how far it pushes the price of the old N95 down — ditto the N810 and N800.

Oh, and I’ll wait for fring to fix my biggest issue: no support for Jabber. Queries on their forum on this have gone unanswered for almost a year. Technically it can’t be a big leap for them, as they have support for Google Talk!

Tags: , ,

Jabber and Google, part one

I reactivated an idle Google account the other day. A friend of mine from the Netherlands invited me ages ago but I never really did anything with it until I discovered that a Google Mail account can be used for other Google stuff as well, including Google Talk. I read that Google Talk is based on Jabber and works with any Jabber client, so I flicked over to Kopete and plugged in the details. Sure enough it worked… but then it got interesting.

I run a Jabber server for internal things. I wanted to have a secure, private chat facility to use over VPN with my nephews; I want to someday migrate my Nagios IRC bot to Jabber; and I use transports to link into MSN and Yahoo! to reach friends on those networks. The last point is great: I really like the fact that now, from whatever Jabber client I use (even the mobile ones I’ve played with) that I merely connect to my Jabber server and I’m online on MSN and Yahoo! as well.

Google Talk, though, has proven to be a bit of a challenge. It’s actually working like a tower, even though it’s based on (arguably) the most open of the IM platforms! You see I more-or-less took for granted that “transport” way of doing things, using my Jabber server to bridge to other networks. There’s no Jabber transport for Jabber though!

What I want to do kind-of flies in the face of how Jabber is designed. Ideally, you’re supposed to only have one Jabber ID (JID) — Jabber creates an open network with servers establishing connections when needed, very much like e-mail, and you only need an ID on one server to be able to chat with anyone on any other server. So what I wanted to do, which was connect to one Jabber server and have it “relay” messages to an ID on a different server is just not necessary with Jabber. Nor should it be necessary for Google Talk users to send messages to me using my Google Talk ID only — they can send straight to my JID on my Jabber server.

In the early days of Google Talk, Google had not enabled the “server-to-server” functionality that allowed this kind of communication to happen. Google Talk worked just like MSN, Yahoo! or AIM — you had to have a Google Talk account to chat with anyone on Google Talk. While this was the case, folks were looking making a Jabber-Jabber transport for connecting Jabber servers to Google Talk. At some point, though, Google opened the connectivity paths that allowed Google Talk to exist on the open Jabber network (I’ve tested this for myself). Once this happened, the need for a  ”Google Talk Transport” for Jabber evaporated in most people’s minds.

The solution nowadays is to use a client that supports multiple connections, and connect to your Jabber and Google Talk accounts at the same time. It works of course, but you don’t get the nice benefits that a transport provides — the main one being access to all your IM services and accounts from a single server connection.

So now, having resigned myself to not being able to bring my home JID and Google Talk ID together, the question arose: do I still need my own Jabber server? My current fave mobile IM client only connects to Google Talk… Could I get by just using the Google Talk service? Find out in Part two! :)

Gentoo + jabberd = aargh

I’ve been running jabberd2 from ~x86 for ages. Tonight I went to make some config changes, and stopped and started jabberd using the init script like usual. Things were different though, as the init script didn’t shut down all the Jabber tasks and I had to stop them manually. When I went to restart it, only two processes were shown and not all the separate processes I was used to.

Nothing was being logged either, as I was trying to find out what was going on and why the processes weren’t starting. It was as if it was suddenly ignoring all my configuration files!

Careful inspection of some output from eix showed the problem: Jabberd 2 has been moved to its own ebuild (jabberd2), and the highest version in the jabberd ebuild is now a 1.4.4-something. Not only that, they’ve hard-masked jabberd2:

# Krzysiek Pawlik  (08 Oct 2007)
# Masked untill the split from net-im/jabberd is complete.
# See bug #178055 and bug #195091
net-im/jabberd2

Looks like the last time I emerged I downgraded my Jabberd 2 to 1.4. No wonder the thing was not responding to me.

This is the kind of thing that happens on Gentoo from time-to-time. It’s why I started a regular sync of portage and email-output-of-emerge-pretend-world process: so that I didn’t get too far behind and have a heap of these things to sort out. This one got me off guard though.

Note to self: pay closer attention to emerge output in future!

Tags: ,

OpenLDAP database recovery

Something ugly happened to my LDAP database a while back, and I never noticed. I saw it had lost a bunch of records, but I’d put it down to some replication problem and never investigated. It wasn’t until I tried to replace one of the lost records, and got an error from LDAP telling me the non-existent record already existed, that I figured something was really wrong.

Multiple iterations of db_recover, attempts to re-index, dump-and-restores of the raw Berkely DB files… Nothing helped. In the end, all that was left was the slapcat-delete-slapadd dance.

(You know that your OpenLDAP is especially sick when commands like slapcat generate glibc backtraces. :( )

So with what was left of my LDAP data, I started to compare against my replicated LDAP server. The first thing I noticed was that a number of records that I expected to have been replicated were not. I figured that records in the master directory that were lost to database corruption and not to an LDAP operation (a modify or delete) should have been present on the replicated copy. This was not the case, which makes me think that replication only takes effect after the master directory’s backend is updated, and if something like a corrupted database prevents the master from being updated then the replication doesn’t take place. As Zaphod might say, ten points for directory consistency but minus several million for data preservation… :)

(As I think about this though, the more it doesn’t make sense. If slapd had been unable to update the backend, and hence the replication didn’t take place, surely that would have been returned to me as an update error? I know for a fact that the data I lost made it to the database because I tested an app using the data. It’s unreasonable to me to think that BDB would have returned success on a write operation unless it had actually done so, but I suppose write-caching might create an opportunity for that to occur… No, I suspect a different problem, maybe just replication being suspended at the time, as the real reason that some data was missing from the replica.)

Next I found, despite what I thought was happening based on the lost records, there were quite a few records that were on the replica. This makes me think I’ve had multiple failures, apparently at different times, that have impaired my master directory — one that caused new updates to be lost, the other resulting in loss of existing data.

I’ve added a step to my Bacula processing that performs a slapcat and backs up the resulting LDIF, so if anything happens in the future I have a bit of a chance of running through old files and restoring. The other thing that I’ll kick off is a process to verify the accuracy or integrity of the replica — this might tip me off to a problem sooner rather than later.

My theory on what the cause of this hassle was? Well a while ago I was having a bit of trouble with partitions filling. At a guess I’d say that OpenLDAP was trying to do something (update a transaction log maybe) at a time when the partition its data lives on was full, and got twisted. Soon I’m going to write a separate post with my (updated) thoughts about isolation of failure domains…

For those that haven’t seen it, here’s the process I used to get things back:

# cd /var/lib
# slapcat > whatsleft.ldif
# /etc/init.d/slapd stop
# mv openldap-data openldap-data-old
# mkdir openldap-data
# chown ldap:ldap openldap-data
# cp -a openldap-data-old/DB_CONFIG openldap-data/
# cd openldap-data
# slapadd < ../whatsleft.ldif
# chown ldap:ldap *
# /etc/init.d/slapd start

Obviously if you find yourself in the unfortunate position of having to use this process, substitute your distribution's values for the path to the OpenLDAP data directory and the user/group that LDAP runs under.