Jim Bassett's Weblog

View current page
...more recent posts

Someone left a slightly larger and much nicer Dell Trinitron monitor in the world headquarters' garbage area. Thanks! I'm taking it as a good sign (after just taking it.)

I am really enjoying learning so much more about linux. What an amazing thing. Has anything else ever been built that is so vast, and at the same time so transparent? The more I learn the more it boggles my mind. All the information you need to learn about the system is an integral part of the system itself.

Every command has a corresponding manual page. You read these manual pages from the command line by typing 'man [command]'. So to read the manual page for, say, the command ifconfig you just type 'man ifconfig' and it spits out a page detailing the proper syntax for this command, a brief description of what it does, and a list of all options. And everything has a man page.

Of course, if you're rather new to the whole thing like I am, and you really don't know what a command does, the man page might be a bit terse. No problem though, that's what google is for. The amount of information is just staggering. Sharing information about the system is built into the social fabric of the community much the way man pages are built into the fabric of the system.

It's very cool. But that's not to say it's easy. People who know how to do things tend to answer questions in a way that puts you on the right track, rather than just telling you exactly how to do it. It's a "teach a man to fish" philosophy. It's not always what the newbies think they want (myself included when I am really stuck,) but it is a great way to learn. It forces you to learn.

But even beyond man pages and google (and mailing lists which I will talk about later,) more serious adepts have the best learning tool of all - the actual code itself. If you don't know why a certain command option isn't working like you thought it would, and the man page is no help, and you can't find the answer in google, you can just open up the source code for that command and start reading. Even if you don't understand the code itself, it will be heavily marked up with human readable comments.

I guess this is the ultimate source of the transparency. Good programmers document their code as they go so that other people, without access to the original programmers mind, can look at the source code and understand it. Documentation is not an after thought, but is, like I said before, an integral part of the thing itself. And this philosophy extends from the source code ground up.

You can drive a car your whole life and never understand how a carburetor works; but if you administer a computer running linux you are going to eventually get a sense of how the internals work. You almost have no choice. Because of the transparency, learning how to do things is the same as learning how things work.

To me this is fascinating, but it also means that there is a lot to digest. Here's an example. I am beginning to think very concretely about how to organize the file system on the future server. Like many things in this world, there is no "right" way to set it up. Linux is very flexible which allows you to do almost anything, including shooting yourself in the foot in an almost infinite number of ways. In order to insure I don't shoot myself in my foot - *for my given situation* - no one can give me a foolproof recipe for success.

Instead, the people who have the knowledge tend to lay out how they do it, and more importantly *why* they do it that way. They explain the underlying considerations that made them choose a certain path, and by elucidating those underlying conditions (by teaching you about how it works at a more fundamental level,) you can then come to a conclusion for your given situation.

This, again, is the transparency. There are no right answers, except to explain how things work on the next level down. So when looking for answers you are quickly sucked many levels deeper than you might have originally thought you needed to go. And hence you end up learning a lot.

Here is where I ended up in researching file system layout. I don't really need to know all that (nor do I even begin to understand all that!) What I am trying to do is not overly mission critical (lives aren't going to be hanging in the balance,) and it is also not going to have to scale very much, nor will it really be in danger of taxing modern computer hardware. But still I am reading stuff like this, and slowly beginning to get a fuzzy picture of these deeper levels. And that seems to be the linux way. It's turtles all the way down.

Less abstractly (sorry to make you slog through all that,) I could imagine the test server going to the colo here in NYC at the beginning of week of the 18th (about a week from now.) And then the new server will follow quickly after that.
- jim 9-11-2005 8:07 pm [link] [1 comment]

The old Penguin server has been pulled out of deep storage, I have acquired a very old Viewsonic 14 inch monitor, and they are set up and ready to go at the new secret Datamantic world headquarters. I am now waiting for my Powerbook to burn disc 1 of CentOS 4.1 i386 (that is a specific distribution and flavor of Linux,) so I can load it into the server and begin to configure this thing.

It is exciting and also a little scary. Like walking around in the dark. I really don't know what is going to happen. I've been studying the very active CentOS mailinglist for the last few weeks and there seems to be an awful lot of community support around this distribution. Hopefully that will be enough to get me up and running.

For the record, I am most scared of Bind, followed by email services.

Here goes...

- jim 9-07-2005 10:09 pm [link] [4 comments]

Down on the Jersey shore today. Spent an enjoyable day doing some leisurely coding. I am now very close to realizing the one click web site setup idea. I load one webpage on my local machine and it asks for a a few pieces of information about the remote server (address, username, password,) and then it creates the directory structure, uploads all the php files, and sets up the database.

All I have left is to include an option to transfer the contents of the local database to the database on the remote server.

This way I can develop new sites on my local machine, and then when everything is ready, I can move the site to the remote (real) server with one click.

Not very difficult, but I am very happy that I have it set up this way. It is taking longer than I thought, but this will save a *lot* of time down the road.
- jim 8-21-2005 4:11 am [link] [6 comments]

Lighttpd 1.4.0 released as promised, just in time for me to try it out on the new server. Excellent. I am excited about this webserver.
- jim 8-20-2005 12:46 am [link] [add a comment]

I guess I'll call the new software 'datamantic' and hope that doesn't cause confusion with the site name.

I finished the main part of the installation program yesterday. It's not quite one click, but it's very easy. I still need to ssh into the account to set some directory permissions, but that's not a big deal. I guess I can make a shell script to take care of that (PHP can't do it because it doesn't run as a user and therefore doesn't have access to the file system outside the web root - and that's where I need to change some permissions.)

This morning I've tackled the last remaining big missing piece in the datamantic software. It's a little embarassing, because it's so obvious in hindsight that it needed this capability, but until now it couldn't handle floating point numbers. Items (and the metadata atoms of each item) can, in theory, be anything you want. That's the whole point of the system: flexibility. Except they couldn't be floating point numbers, only integers. Obviously for a business system we are going to often need atoms to hold monetary values. And unless we are okay with excluding cents (and of course we're not okay with that,) we need some float support.

And now it's in there, at least for adding new information. I still need to bring the edit script up to speed, but that shouldn't be too hard.

I really want to get this done today so that Monday I can shift focus away from the software and start back to dealing with the hardware side of things.
- jim 8-07-2005 9:27 pm [link] [1 comment]

I still have never said anything explanatory about the new software. And it's not that I haven't tried. Explaining software is difficult. So I'm still hoping for a longer, more thorough post, but here's something shorter about what I am working on presently.

The software is basically done. This is what I wrote several months ago during the initial phase of my long absence from blogging. It is a descendant of the software that runs this site, but much more generalized in an effort to target small business websites (especially inventory-centric websites,) instead of just blogging. Where this site has posts and then a bunch of meta-data associated with each post (author's name, date, summary, comments, etc...) the new software allows you to define anything as the main post - these are called 'Items' in the new system - and then associate any number and kind of metadata 'Atoms' with each item. All items, and their associated atoms, can be specified through the same sort of browser based interface that I use everywhere.

So, in other words, this site works great for blogging, but not for much else. The new software could be set up to blog, but it can also handle the situation where, say, instead of blogging you want to have a website for your wine business. Now instead of blog posts you have bottles of wine as your main items. Each wine then has a bunch of metadata associated with it (grape types, producer, year, etc....) To do this before would require me to write a lot of code. Now I can do it all through a web interface.

I wish I could explain it better because it's pretty cool.

You specify what your item and it's atoms are going to be like. Text, binary data, numbers? Then the software asks you a few questions about your items and it's atoms (like, what sort of form elements will be necessary for adding and editing,) and it builds all the necessary user interface pieces for you. I'm not sure how clear that is, but that's the slick part. The posting and editing scripts are very generalized and they can deal with any data, no matter how it is structured.

The whole project was a matter of abstracting the software that runs this site. Boiling it down, but at the same time radically expanding it's scope. Freeing it from the conceptual mold of blogging, while maintaining the blog like ease of creating and editing right in the web browser.

And, like I said, it's largely done. It's already deployed behind a few sites that I will point to eventually. But how the sites look isn't really the main thing. It's how they run. It's how (hopefully) easy they are to build and maintain. That's the key.

So now I am working on polishing the whole package. This is something I've never done before. I have put the older software behind a number of different blogging type sites (and even tried to adapt it for several business sites,) but it is a monsterous pain in the ass to deploy. It takes me the better part of a day to set it up, and that's assuming everything goes right. There aren't any instructions (which even I need and I wrote the thing,) it's very unintuitive, and the whole thing is just a sprawling mass of weird hacks and dependencies. It runs well once you get it working, but it gives off a sort of "don't even breathe on it" vibe.

So I'm trying to fix that this time around. Yesterday, for instance, I wrote a program that automatically sets up the database for a new installation (of the new software.) If you can believe it, I've never had an automated way to do this. I would just fire up mysql in an ssh session and create all the tables by hand. That's fine if your just deploying one site, but my goal now is to be able to deploy lots of sites. Quickly and painlessly. So that means building the automated tools to do it.

I thought I could do it in a few hours yesterday morning. How hard could it be? Ended up taking 10 hours. And 6,000 lines of code (not that number of lines means anything - a better programmer could have done it in less.) But I now have an automated way to set up the rather complex database schema. And not just that, but I can also run it against an existing database and it will analyze every table and fix anything that is not right. This includes modifying create statements on table columns that are not formed correctly, adding missing table columns, as well as adding completely missing tables.

This brings us the the key point of my recent efforts. Maintainability. This is what I learned when I tried to put the old software behind more than one site. The basic problem is that I would make a fix to one installation, but then that fix would not get propagated to the other sites. So they quickly fell out of sync with each other, and then each one had to be maintained as it's own independent entity. As they say in the biz: this doesn't scale.

So the new mantra is centralization. And the database creation/updater program I wrote yesterday is the first part of that. If I need to make a change to the database now - say, I realize that the user table needs another column to store a contact phone number of users - I create that new column in the program I wrote yesterday, and then I run that program across *all* installations of the software. This will keep everything in sync. Building tools to do what you previously did by hand allows you to scale.

Next up is constructing a similar tool to keep the code in sync across installations. The goal is to make changes in one place (say, on the test server on my development machine,) and then when things are running correctly I want to be able to run one script, and have all the changes to both the database structure and the code itself be replicated across all installations of the software. In other words, as a one person business, I'm trying to make myself scale.

- jim 8-06-2005 6:02 pm [link] [2 comments]

Updates have been sparse, to say the least. I'm just back from some time with my family on Cape Cod. But now it's August, and August is the month for me to get busy. So hopefully there will start to be some real progress, although I guess I have said this before.

The corporation is all set. I've got my little kit from the state. Wow, I didn't realize how much like a game it is. The corporate seal hand puncher thing (looks sort of like a hole punch, except instead of making a hole it creates a raised round corporate seal on a piece of paper,) is pretty cool. I almost expected there to be a secret decoder ring as well.

Now I just have to get the bank account (evidently the only time I will probably ever use the corporate seal,) and I'm ready to start buying hardware.

My thinking has changed a little bit on how to attack the problem, but the specific hardware and software choices have not changed too much. More on that soon.

To the people here who have contributed money, I apologize for the so far rather slow pace. But like I said, things really should pick up now.
- jim 8-03-2005 7:46 pm [link] [add a comment]

Finally got the last of my missing financial documents that are needed to make the new company happen, which is in turn needed to actually purchase the server that I still haven't completely decided on yet. While it may not sound like it, this is a big step. Or, in other words, if you don't do very much moving, any step is a big step. Onward!
- jim 6-15-2005 8:34 pm [link] [add a comment]

My plan was for this next post to explain a little bit about the new software I wrote during my recent break from blogging. It is for building websites. It shares some roots with the software behind this site, except it is more tailored to maintaining large stores of structured data (inventories) then to blogging. And it does one really clever thing I haven't seen before (although, okay, I haven't really examined every other piece of software in this category.)

But instead I want to quickly outline a piece of software that I haven't written, and which is, most likely, beyond my ability to write. Still, sometimes just being able to articulate an idea is a big step.

Read the rest of this post...

This post will be the "which server should I buy?" thread. I plan on doing this with a handful of central questions so that I can return with comments as time goes on. Maybe if it gets too long I will then create a "2nd which server should I buy" thread. These big question threads will be linked from the right hand navigation column.

I don't really expect anyone to be interested in all this. But if someone is, that is great, and if anyone can contribute anything to the discussion that would be even better.

Here goes: Which server should I buy?

Given that I can pretty specifically say what the server is going to be used for, I think there should be a fairly definite answer to the question. I just don't know what it is yet.

Here's what it will be used for: web serving. Most likely using Apache (although I guess you have to at least give a look at Lighttp with all the attention it's been getting lately.) Almost all requests will be to PHP scripts generating dynamic web pages by pulling data out of a MySQL database. Additionally there will be a rather large +1 TB store of ~5MB binary files that will be served straight from the file system over HTTP to a limited number of simultaneous connections (I don't need this to scale very high.) So that's all very basic web server stuff. [The reasoning behind this architecture and the various possible debates here will be a different post.]

I was initially very attracted to Apple's Xserves because the Mac OS X is what I know best (and what I build things on locally even though they get deployed on linux.) Plus Apple, and the Apple community, seem a little more friendly in my particular situation which is something like: I don't mind learning a little and even mucking around on the command line, but it's really not a goal of mine to be a sysadmin, so if Apple can supply me the whole widget, with a nice clean way to automatically download and install binaries, I can just worry about Apache, PHP and MySQL (what I like to do,) and not so much about, say, getting non standard ethernet drivers to compile under linux, or trying to set up DNS without a GUI. In fact, I don't mind paying a little more for someone else (Apple) to make these things easy for me.

Upon further research, however, it seems there are some serious performance questions (they may not actually be problems, but they are certainly questions right now) concerning MySQL. And maybe even Apache as well. Ouch. That's exactly what I want to do. OS X Server and the G5 chip (IBM's 970) are amazing at a whole host of tasks. Unfortunately it seems like the exact thing I need to do isn't one of them.

So while I haven't made my final decision yet, I feel pretty sure -again given specifically what I want to do - that Linux is the OS you are "supposed" to use. This basically means that the programs I need to run are built and optimized with the linux platform in mind. On the other hand, even if some of the more outrageous claims are true, and MySQL and Apache performance really are an order of magnitude slower on OS X, it might be the case that it is still "good enough". I'm not building ebay here. I think we run on a 700 mhz Pentium right now and I think performance is acceptable. (On the other other hand, I want room to grow.... )

So OS X Server vs. Linux is one debate. And then if Linux wins that debate then there is the secondary "which distribution?" question.

I'll get into specific configurations and pricing in the comments.
- jim 6-08-2005 9:33 pm [link] [11 comments]

older posts...