Archive for category Linux

LDAP-backed DNS and DHCP…?

I’m having a bit of an infrastructure redesign here at the Crossed Wires campus.  Each time I have an outage (the last one was caused by a power failure) I learn a little more about the holes in my current setup and what I can do better.

I’m implementing a router box on an old low(-ish)-power PC that will be backed up by a virtual machine on my main virt-box.  I’ve already done most of the preparation of using keepalived to implement VRRP, and a colleague has given me some pointers in using the Linux-HA tools like Heartbeat and DRBD to make services like e-mail and Samba redundant.

I’ve had a soft spot for LDAP for ages; I’ve always thought that putting as much backend data into LDAP as you can would be a really good way to get failover and redundancy.  Instead of having to deal with every single server’s different way of doing replication and failover, just bung everything into LDAP and get that replicating.  Sounds good in theory, but in a nutshell it’s not working out that way for the two least-celebrated but most important components of my (arguably any) network: DNS and DHCP.

There are a number of LDAP-backed DNS projects out there.  If I’m willing to go to the bleeding edge with BIND on my Gentoo build I can get access to the two most talked-about ones (bind9-sdb-ldap and the BIND DLZ LDAP driver), and other solutions like PowerDNS and ldapdns are available.  But none of them offer integration with DHCP, and I’m currently using dhcpd’s “interim DDNS update method” to make sure that hostnames are seen in my DNS when a lease is granted (okay, there’s a Perl daemon that goes with bind9-sdb-ldap, but it seems like a sort-of clunky afterthought).

Speaking of DHCP, LDAP backends for it are virtually non-existent.  The only LDAP-enablement I’ve found for ISC DHCP involves putting the config file into LDAP, not the leases…  I actually used that for a few days a while ago and pulled it out because it was actually more work to do it that way (and for no benefit in failover).

It seems to me it would be a project ripe for the picking: take an integrated DNS/DHCP server like dnsmasq and make it write into LDAP instead of to a file.  If I had more free time I’d probably have a go at it, except for the fact that no-one really seems to be that interested in storing DNS and DHCP in LDAP: that it hasn’t been done says to me that there’s no demand for it, and it’d end up being a big waste of time and effort.

Over to you, lazyweb…  Is this a yawning chasm of unfulfilled networking dreams, or a case of me trying to make something more complex than it needs to be?  After all, the rest of the world gets by with DNS master-slave and DHCP failover, they should be good enough for me too, right?  ;-)

Tags: , , , ,

Trouble with apt-get and Squid

I recently started having trouble with APT transactions on my Kubuntu desktop. “apt-get update” would fail for some source entries with the error “The HTTP server sent an invalid reply header”. I thought it was something specific to (K)Ubuntu, but when I had the exact problem on my NSLU2 running Debian I figured the problem must be elsewhere…

I’d recently updated the machine that provides the transparent web proxy function for the network; one of the updates took Squid up to version 3.0 (from 2.6). This was the first thing I was suspicious of.

There’s an option in Squid that controls how it handles an “If-Modified-Since” request from a client. The default is for Squid to respond based on the age of the item in the cache, not based on the real item on the source web page. The comments in the Squid config file indicate that some clients use an IMS when requesting a reload — looks like APT is one of those clients.

Setting this option to “on” (from the default of “off”) in squid.conf fixed the issue for me:

refresh_all_ims on

Tags: , , ,

Comments and Downtime

Observant readers will notice that they are no longer able to respond to posts. The blog-spammers have won the battle but, as they say in the classics, they will not win the war…

I've turned off the comment capability, until I can get something in place to bring the rubbish under control (a recent update to PolarBlog helped a bit, in that the crap doesn't display on the site any more, but when I log on I get to see the mess). I'm thinking of a new site, where I can discuss technical stuff a bit more and thoroughly while keeping the private stuff separate if I need to.

The site has had a bit of downtime recently, due to my non-existent monitoring of what's happening on my hosted server. This will change shortly, and I'm looking forward to things returning to the stability they had when I was self-hosting.

Tags: ,

Photo printing pain

S went to print some photos the other day, and what was supposed to have been a simple exercise turned out to be a very frustrating one for both of us. I was utterly amazed to discover that even on the eve of 2009 there are web sites that think the world is only viewed through Windows…

S's and my respective creative sides are being adequately satisfied by the iLife suite on the Mac, but there are times when we need to get the pictures out of the silver tower and onto other media—on this occasion paper, for albums and so on. A large retailer here has part of their floor space in each store set aside for those photo printing kiosks, and I introduced S to the art of putting photos onto a USB stick so that she could print some photos when next she went there…

On her return from the shop, she reported that we hadn't successfully put the photos she wanted onto the stick. When she'd plugged the stick in, she'd found only less than half of the photos we'd stored there. Sure enough, when I plugged the stick in all the files were there safe and sound. Strange thing was I could find nothing in common about the files (uppercase/mixedcase filename, long or 8.3 filename, datestamp, etc) that would have yielded the number of photos that the kiosk had found on it.

Annoying, but life is too short to worry about it. After all, this same retailer was plastering adverts of their new web-based photo printing service… S could submit the photos online for printing and pick them up from the store later.

<sarcasm>This is where the fun really started.</sarcasm>

Their app is Flash-based but seems to have some Java involved as well. While it loaded quickly enough, the app portion of the web page had an incongruous grey background that just looked dodgy. S had to create an account and sign onto the site just to get this far though, which was a bit annoying.

The workflow seemed to be to create an album, upload pictures to the album, then select photos from the album for processing. Creating the album went fine, but when the upload function was selected there were no action buttons visible to complete the operation! S was using Safari, but Firefox made no difference.

Then I suggested she use her laptop, which runs Ubuntu 8.04. The situation actually seemed a bit better to start with, as instead of the upload function showing an embedded file selection dialog like it did on the Mac we got a "normal" GNOME file dialog box. However, only some of the photos showed again: this time, it was because they had hard-coded a non-modifiable filename filter for the dialog that was only picking lower-case file extensions!

Trying to work around this, I mounted the stick manually with different mount options. I succeeded in getting all but one of the files showing with a lowercase name, and a rename fixed that one. Back in the web page however, it still didn't like us: any file chosen from the dialog box resulted in a nonsensical error message followed by a "You have selected no files to upload" dialog.

S was beyond caring by this stage (she has a very low threshold for being stuffed around by technology). She went to Snapfish after a friend's recommendation, and found a well-designed and easy to use WEB site that required no downloads or other junk.

So why did this wind me up to the point of spending all this time blogging it? Because nowhere on Big-W's site is there any mention of browser or operating system compatibility. Not even a "we've tested only on Windows, Mac users may experience difficulty"[1]. Not a blessed thing. Their Help page has a single paragraph about trouble uploading, blaming "your IT Department" for "setting certain network properties that inhibit the upload tool from working".

I wonder if the developers of the app were just so blind to believe that their gunk would just work wherever it was run, or whether they really think that it's a Windows world. Of the two I hope it's the former. ;-)

So Snapfish gets a recommendation for being not just an application hosted on the web but a web application. They do good photos too!

[1] I never expect to see Linux mentioned on these things and get pleasantly surprised on the occasions it is; even if it says "Linux is not supported", someone there at least knows enough to mention it.

The difference between pipe and redirection

Newcomers to UNIX-like operating systems are often confused by the difference between the shell operations pipe and redirection. The difference is easily explained with an example, in the context of web development. The shell command echo "st=1" | ./lifeswork.pl shows how a pipe is used to supply command line input to a script usually invoked via CGI in a web server. This allows the script to be more easily debugged by testing at the command line. The shell command echo "st=1" > ./lifeswork.pl shows how redirection uses command line input to overwrite a script file, destroying the file and the web developer's sanity. Hopefully this example illustrates the difference between pipe and redirect, and helps you avoid the idiotic mistake I just made.

Tags: ,

Security blows

I was about to post about how pleased I was with Synergy in helping me tidy up my desktop clutter (by removing a keyboard and mouse from the surface). Ironically, I’m instead posting about a problem with the configuration that will cause me to throw it out and look for something else. Why the title? Because the default configuration of a Linux distribution nowadays has given me no way to fix this ridiculously simple problem without powering off the running PC, VMware guests and all.

The problem is that Synergy and the VMware console don’t play well together (I could have sworn that when I first started using Synergy I had no trouble with it, but there are a few hits around that describe problems like I’ve now hit). The problems people are reporting are that keys like Shift and Ctrl are not passed to the VM (some described here and here).

My problem is slightly different: the screen of my Synergy client (the one that’s running VMware) locked while a VMware guest had focus. Now, the Shift and Ctrl keys are not picked up by gnome-screensaver to unlock the screen. Even the real keyboard attached directly via USB doesn’t work. Big problem, for the following reasons:

* Thanks to password strength rules enforced on the Linux build I use, my password now has a Shift-obtained punctuation character.
* I can’t switch to a virtual console, since that requires Ctrl (e.g. Ctrl-Alt-F1).

Okay, so the keyboard doesn’t work. This client machine just happens to be a tablet PC, and I had hacked gnome-screensaver (to display the onscreen keyboard to allow the screen to be unlocked in tablet mode). I grabbed the pen and tapped out my password, but it *still* didn’t work: even the output of the virtual keyboard gets the Shift modifier dropped. Hmm… Starting to fume now.

Never mind, I’ll connect via the network…

* Fedora does not start SSH by default (okay, yes, and I didn’t make sure it gets started after I’d finished the install).
* There is no remote desktop (VNC server, XDMCP) configured.
* The shiny web-based management interface on VMware Server 2.0 only listens on 127.0.0.1 (or is being blocked by the Fedora firewall).

So with no way to get access to the machine to try and fix it, a power-off is the only solution. Some readers are probably thinking “boo-hoo, diddums had to kill-switch his widdle poota, how tewwible,” but I hate having to do that; not because the system doesn’t recover, but it’s “problem resolution, Windows-style”.

Even though the real problem was between Synergy and VMware, I’m blaming the (perceived) need for security since without that I wouldn’t have a cryptic password that I can’t enter without Shift and a system I can’t administer over the network. Red Hat and Fedora doing everything in their power to ensure I don’t fall prey to nasty Internet fiends (rich analogies to governmental nannying, but that’s probably over-thinking things).

So in summary: Synergy is great, just as long as you’re not using VMware console and have a password with punctuation or uppercase… Remember to have your SSH or other network access enabled before you play!

Tags: , , , ,

Sometimes, Gentoo bites

I had a failure of my Cacti system over the weekend, entirely caused by bad Gentoo emerges. Two different problems, both caused by bad upgrades of packages brought in from ~amd64 or ~x86, made Cacti colourfully dysfunctional for a couple of days.

The first was an update to the spine resource poller, part of the Cacti project but installed separately (it used to be called cactid). Turns out that somewhere between 0.8.7a and 0.8.7b, bugs were introduced that made spine unreliable on 64-bit systems. The update brought in a SVN version of spine which, while still labelled 0.8.7a, must have been somewhere after one or more of the bugs came in. The symptom was that every data value obtained via SNMP was garbage and ignored.

The second issue was strange — graphs were getting generated (even those for which there was no data) but there was no text on them! Titles, margins, legend, axes, all were blank. Some posts pointed to a problem accessing the TTF font file provided with rrdtool, but the actual problem turned out to be the upgrade to rrdtool 1.2.28 which introduced different parameters for the configuration of text attributes in graphs — and a corresponding “feature” that suppressed any text output if the new parameters were missing.

So what does “~” have to do with this? The software on your system is built according to the architecture of your machine. In Gentoo, this is called your “arch” (for architecture) and is usually “x86″ or “amd64″. Gentoo implements a “testing branch” in an arch which starts with “~”; if a pre-release version of a package exists in portage you can bring it in with the “~x86″ keyword. The nice thing about this is that you don’t have to enable a testing repository across your whole system — you can enable the ~ keyword for specific packages on your system, and everything else stays stable.

Unfortunately, this flexibility has a cost. The “amd64″ arch seems to lag a bit behind “x86″ in terms of packages being marked stable or just simply having packages available. This means that just to get things installed, it’s necessary to flag packages with “x86″, “~amd64″ or even “~x86″. This flagging is easily done — almost too easy in fact, as it creates a problem later on when the package you actually set the keyword for eventually becomes stable and you don’t need the keyword set any more. It’s a manual process to revisit the keywords you’ve set and verify that they are still needed (and you know how well manual processes work).

Some time ago I started adding comments to the Portage config file where keywords are set, trying to explain why I set the flag: “to bring in version 1.2.34″ for example. That way, if I ever do get around to manually auditing the package.keywords file, I’ll be able to check if some of the keywords are still needed. Still a manual review though.

So in the case of rrdtool and spine, I had set the “~” keyword some time in the past for some reason, possibly to get early access to a bug-fix ebuild. With no established method to revisit the keywords, I continued to pull in unstable versions of packages long after the packages I really needed had been marked stable. Eventually, it bit me.

The pre- and post-upgrade chacklist grows some more…  :)

Tags: , ,

Don’t you hate it when defaults change?

Sometimes when working with computers and networks (as with most things in life) the thing that causes the most problem is the last thing you suspect–or often something you never suspected. I had a reminder of this the other day, when a moderately complex task I’d set myself looked to be scuppered for absolutely no reason I could fathom.

I’ve got a system here that is a host for a virtualisation environment I run. I dedicated a couple of network cables to the adapters owned by the virtualised system, and a third one was attached to the host’s IP stack. To get connectivity for another system, I had to steal the host’s cable though–which wasn’t a problem as the operation of the system works more-or-less entirely from the console rather than over the network. Just for grins, however, I decided to set up connectivity to the host by routing through the virtualised environment it hosts.

Having established the tunnel connection between the virtualiser and the host stack, I set about configuring the special details required to support routing through this system. After a few tries at getting it right, I was rewarded with successful pings between the systems on my LAN and the hosts system on its routed connection. So I jumped onto the console of the machine and light up Firefox, but got an error page. I realised I hadn’t set DNS resolution–on the LAN, the machine was having resolv.conf configured by DHCP, so now I had to do it manually.

Okay, so DNS resolver now correctly set, let’s see Firefox WIN! Oh. Fail.

When I hit Try Again or Reload, the page would instantly refresh. This was starting to look like no routing problem. I used dig to test name resolution, and it told me it was being rejected. I looked at my dns.conf… Nope, so subnet restrictions coded there…

So I hit the lazyweb, and it didn’t take too long before I found a forum post that led me to this. In BIND 9.4.1-P1, ISC basically changed the default behaviour of a couple of query filtering settings. This had the effect of rejecting some requests that were previously accepted, such as those from non-local subnets. A reconfiguration of my DNS server gave me success at last.

Hooray for persistence! Now, someone hand me some Cat-5 so I can make a cable and plug this thing back in properly. :)

Tags: ,

Ubuntu 8.04 Wireless Weirdness

Over the last fortnight I finally got the wriggle-on to upgrade all my (K)Ubuntu systems to Hardy Heron. Various issues occurred with each of them, but overall the entire exercise went smoothly (my wife’s little old Fujitsu Lifebook was probably smoothest of the lot). I had one rather vexing issue however, on my old (I’m tempted to say “ancient”) Vaio laptop.

The onboard wireless on this thing is an ipw2100, hence only 802.11b, and I had a PCMCIA 802.11g NIC lying around (actually it came from the Lifebook, liberated from there after I bought it a Mini-PCI 802.11g card on eBay). On Gutsy, I used the hardware kill-switch to disable the onboard adapter to make double-sure that it wouldn’t try and drag the network down to 11Mbps.

This laptop was the last machine I upgraded to Hardy, and I was playing with KDE 4 on it so I was looking forward to seeing what KDE4-ness made it into Hardy. While the upgrade was taking place the wi-fi connection dropped out, but I didn’t think anything of it since Ubuntu upgrades try and restart the new versions of things and I figured NetworkManager had fallen and couldn’t get up. After the reboot, however, KNetworkManager (still the KDE3 version, don’t get me started there) could find no networks — could find no adapters, in fact.

I logged back into KDE3 and poked. Still no wireless (as if the desktop would make a difference, but I had to make *some* start on pruning the fault tree). The Hardware Drivers Manager was reporting that the Atheros driver was active (for the PCMCIA card), and an unplug-plug cycle generated all kinds of good kernel messages.

On a whim, I flicked the hardware kill-switch for the onboard wifi[1]. Almost instantly, KNetworkManager prompted to get my wallet unlocked — it had found my network and wanted the WPA passphrase. I provided it, and got a connection: via the PCMCIA NIC.

“That’s odd”, I thought, and flicked the switch. A few seconds passed, and the link dropped. Flicked the switch on, link came back. Flicked the switch off again: this time a few minutes went past, but again the link failed. Tried it several times again, and the same thing happened. The state of the kill-switch for the onboard NIC was influencing the other NIC too!

It seems that this is altered behaviour in NetworkManager, applying the state of the hardware switch to all wi-fi adapters. If it annoys me significantly I’d like to think I’ll trawl changelogs, or even better lodge something on Launchpad… more likely though I’ll forget all about it having found a kludgy workaround.

I’ve now added ipw2100 to the module blacklist and things work okay (presumably because the state of the onboard switch can’t be reported any more). I’ll also have a think about whether a few dollars for another g-capable Mini-PCI NIC will be throwing good money after bad, as this laptop really is quite long-in-the-tooth.

Oh yes, that’s right… KDE 4. Next time perhaps. :-)

[1] I can’t think why I did this. I knew that I’d disabled 802.11b in my access point, to make triple-sure an 802.11b device wouldn’t slow my network down… The onboard 802.11b NIC would never successfully get a connection.

Tags: , ,

Zeroshell redux

I wrote about Zeroshell, and how I thought it was pretty great. I still do, but it hasn’t taken centre-stage in my network configuration like I thought it would. I’ve had to tone down my raves about some of its integrated features as well.

The fact that it hasn’t taken centre-stage is possibly as much to do with VMware’s bogus clock-drift problems as anything, as I haven’t dedicated hardware to my Zeroshell instance yet (I could keep it running virtual, but some of the things I want to do with it will make more sense if it’s a separate machine). VMware Server takes another barb for its handling of VLAN tagging (but to be fair that might be the Linux 8021q module works). It seems that if you have any VLAN definitions on a network card, VMware won’t get to see any VLAN tags on that NIC. You can get a guest attached to a bridged interface to see the real VLAN tags, but only if Linux has not got any VLAN awareness over that NIC.

Alright, so enough ragging on VMware. I have Zeroshell attached to the networks it needs and all is fine. Except that I can’t actually change anything! The web interface that I spoke so highly of originally is actually very restricted in some areas. One of these is in the RADIUS server, and it bit me badly when I decided I’d use Zeroshell’s RADIUS server to authenticate access to the Web interface of my Linksys switch. Turns out that the Linksys firmware expects a particular attribute to appear in the response from the RADIUS server.

The fact that Linksys don’t document this anywhere is not Zeroshell’s fault, but that there is no interface allowing me to do updates to the records above what Zeroshell uses for its own applications is a bit of an issue. It means that instead of a Zeroshell box potentially becoming the hub of administration functions, it is in danger of becoming just another little vertical application server that doesn’t integrate.

Having said that, the backend for most (all?) authentication data is LDAP so a tool like PHPLDAPAdmin might be usable to extend the base records. But, arguably, I shouldn’t have to do that! It is still beta software though, so improvements and enhancements will be made.

The other area that it’s a bit lacking in is monitoring/graphing. Okay sure, I’d probably integrate Zeroshell into the rest of my Cacti setup, but it would be nice if Zeroshell did like other router distos and had a pre-built statistics/graphing page.

Zeroshell is still my pick (I revisited pfSense and fixed the problem updating, but to me it doesn’t have enough function to justify running its own hardware), but it’s just not quite the bees-knees it was when I first saw it.

Tags: , ,