❧ And I Thought HTML Was Supposed to Be a Real Markup Language

Every time I compile some C code I write I get – and I counted to make sure – approximately 43 million errors, all of which are nagging me about unbalanced parentheses and forgotten semicolons. And every time I gesticulate violently and sternly address the screen, “If you’re so smart you fix it!”

Which is, to say, that the compiler doesn’t allow room for error. And it shouldn’t, either. As soon as the compiler tries to be smarter than you are – mind you, it is – and starts fixing your mistakes, it’ll inject some really mind-blowingly stupid code that’ll leave you scratching your head and wondering why you didn’t just fix it in the first place.

What I’m left wondering is why browsers accept bad code. They’re parsing a language with syntax and specifications and all the paperwork to be a legitimate language, yet they willfully drop into “quirks” mode to handle malformed HTML. And the result is predictable: ordinary people don’t give a hoot if their pages are valid because who the heck cares? It renders just fine, doesn’t it?

Why, if my compiler doesn’t, should my browser bend over backward to render pages that are invalid? No programmer would expect invalid code to compile, and yet here we are, something like a decade after HTML was introduced, still treating it as a baby. Can I say something? HTML is dead simple to write. There are like two rules: every opening tag needs a closing tag, and some tags – such as <a> – need specific attributes. Compare that to C, Python, Java, etc. This isn’t rocket science.

The History of Quirks

“Quirks” mode harkens back the dark days of the internet. Fledgling web developers (read: pubescent teenagers fiddling on Angelfire) were crafting Web 1.0, replete with Tomb Raider walkthroughs and Real Ultimate Power (which, as an aside, is still hilarious). And these pioneers had no time for “syntax” or “rules”. How could they? Fueled only by raw vision and Tang the internet was born.

Corporations were taking notice. “Why,” they said, “we could use this newfangledness for intranets.” And they promptly tasked the most capable employees: the aging site admins whose jobs were slowly being replaced by computers. Well, if you can’t beat them – you know.

And there were the visionaries. We all know them. They were the darlings of Wall Street: eBay, etc. These men and women were going to change the world. On their Herman Miller aerochairs they gazed into their crystal balls and revealed to the world its fate, largely a concoction of digital money, internet grocery stores, and beanie babies. The world had enough by 2000, but the bubble boys had left their mark(up. Ha!).

The browsers of the time – Netscape and Internet Explorer – were playing second fiddle to the internet. Success was black and white: either you render the web, or the other guy does. If Joe here sees a jumbled mess at joeisawesome.com with Netscape he’ll do what’s rational: curse loudly and open up Internet Explorer.

And in that dark race quirks mode came to light.

What Quirks Mode Means Today

Quirks mode means that people don’t care. Validating your site is like extra credit on a test. Only that kid with the huge glasses is going to care if he gets it right. Yeah, we’ll try it, but we’re too cool to care.

There’s already a lot of discussion on the subject and it’d be redundant to bring it up. Suffice it to say that the way browsers handle code today is not good.

A Modest Proposal

Kill quirks mode.

But seriously.

We can’t leave out half the web, can we? There’s a lot – a lot – of content out there that’s not valid HTML. And never will be. This is content that people rely on: popular websites, corporate intranets, your website works-in-progress. And cutting out the quirks mode of every browser would mean alienating a lot of people and making life much harder for others. We can’t realistically say that cutting out quirks mode is a good thing (though it’s what I’m personally rooting for). Not to mention that it’ll never happen.

But what if Google didn’t index invalid HTML?

(Giving higher priority to valid over invalid HTML would have a similar effect.)

Google has a very large stick with which to brandish: if you’re not on Google’s search results, you’re not on the internet. Plain and simple. Companies know this and already invest heavily in search engine optimization. Turn web standards into an SEO strategy and you’ll have even the most remote corners of the web evangelized.

How could we ever get Google to drink the Kool-Aid and actually pull this off? I haven’t the slightest clue. It’s a pie-in-the-sky dream. But it could work.

July 31, 2009
Permalink


❧ Introduction to Groff

Groff is the GNU implementation of AT&T’s troff and associated programs (pic, table, etc.). It’s a typesetting language much like LaTeX.

But who cares! Here’s the canonical helloworld.ms:

.TL
Hello World
.AU
Devrin Talen
.AI
Awesome, Inc.
.AB no
.AE
.PP
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum.

Compile this into something useful with

% groff -ms -P-pletter helloworld.ms | ps2pdf - helloworld.pdf

Groff’s info page is chock full of good documentation and should be your first resource for learning how to use it. But here are the rough strokes:

It’s a venerable and time-tested typesetting language should be a part of anyone’s typesetting arsenal.

October 04, 2008
Permalink


❧ I Hate Twitter and Yes, Thank You, I Did Take My Pills This Morning

Three weeks ago I pulled the plug on my Twitter account. Almost a year ago I had fallen head over heels for that service, and when I finally cut the cord I felt as if I had pulled my head out of the internet’s ass and took a breath of fresh air.

Twitter is the cool kid’s table all over again. It’s a positive-feedback system that perpetuates the position of those with the most followers and steals the lunch money from the nerds. Sour grapes, I know. But hear me out.

Meet Devrin

Devrin just signed up for Twitter. Exciting! He posts a message:

Hello, world!

Not brilliant, but it’ll do for an inaugural post. Besides, he has so much to tell the world! He’ll beguile the internet with his 140-character wit and steal their hearts! He’ll have many thousands of followers!

Devrin recovers from his unbridled enthusiasm and ventures into Twitter’s deep bowels. “I must find someone to follow,” he thinks. And follow he does: the first victim is none other than John Gruber, Merlin Mann follows, and Steven Frank is felled soon thereafter. Devrin goes to his timeline and admires it. His post, alongside the likes of those three! He feels moved to post again:

This is super cool!

And quits Twitter forever – or not. He should have. In reality it took 52 weeks and a few hundred “tweets” before he gathered the intestinal fortitude to do it.

Some Background

When Twitter was released it garnered naught but harsh words. No one could see any merit in a system that restricted you to 140 characters. “It’s a waste of bandwidth,” they decided. And moved on.

Twitter exploded a little more than a year later. Suddenly bloggers decided that Twitter was the Next Big Thing and rang up accounts. Their dutiful readerships followed suit, creating accounts for the sole purpose – as I did – of reading their favorite blogger’s tweets. And bloggers loved it: their audience at their fingertips! Have a question? No problem! Tweet it and – zing! – you’ve got answers!

The unexpected explosion melted Twitter’s servers. And now the Big Thing to blog was Twitter’s unreliability: what kind of self-respecting service has hours of downtime? And the slashdot effect increased Twitter’s problems. And users. Today, a bit later, things seem to be running more stably.

Hey You Up There! Twitter Sucks!

Why did Twitter’s first reviewers find so much to hate? They had no audience. Twitter was like writting letters and dropping them on the ground. A little while later these same people were back on Twitter lavishing praise because – neat! – all of sudden there were all these people that wanted to read this crap you threw on the ground! Before 140 characters was an arbitrary limitation, now it’s “inspired” and “revolutionary.” What the heck changed?

The people calling the shots now like Twitter. It’s this New Medium that connects people. In reality it’s a big rat race for followers and favorited tweets. Not for those on the top – for those on the bottom. The basement of Twitter is one big Digg comment thread; it’s the usual mix of ep1c fa1lz, brb bathroom, and OMG! coot puppys! It’s as if you could create a site that looked like Facebook to the guys on top and MySpace to those beneath.

The only people that anyone follows – even the popular users – are the popular users. Strange! This is how Twitter fails: there’s no way to discover users. Sure, point me to the search bar. Point me to “follows” list on everyone’s page. But there’s no way for me to find users that talk about the same stuff I do. Twitter doesn’t put me in touch with these people. Twitter doesn’t forge new connections, it only reproduces the blog & audience relationship that already exists. That’s how Twitter fails. That’s why Twitter sucks.

Twitter is your high school lunchroom where the uncool kids are too busy peering over each other to get a glimpse of the cool table to even notice each other.

One Year Later

And so a year later, thoroughly disillusioned, I pull the plug. Some will read this and agree.

But the vast majority will – I feel – think, “You dolt. You can’t say you don’t like Twitter and because of that Twitter as a whole sucks.” Point taken. Maybe there are those that actually have friends on Twitter – how’d you swing that? – or those that use it because they enjoy reading tweets from those on high. Fair enough. But to the latter: that’s not what Twitter is for. They provide RSS feeds if that’s all you want.

Too many people join Twitter because they hear that it’s awesome. This is to you: don’t. It’s not.

July 30, 2008
Permalink


❧ Is a Filesystem-Based Blog Right for You?

Chris pointed me to an insightful post by Chris Siebenmann on the shortcomings of filesystem-based blogs. I’ll summarize his points:

The best defense I can come up with is something like this: if you’re having these problems with your file-based blog, you probably shouldn’t be using one. I don’t think file-based blogs are superior to anything driven by a database; in fact they’re pretty much dumber overall. If you need the metadata that database-driven blogs provide you’re probably better off just using one rather than trying to turn a file-based blog into something that it’s not.

I think file-based blogs shine when your blog can be better described as a loose coupling of essays. I make no claims that my posts are worthy of being labeled as such, but they are infrequent and permanent. My tumblr page is what I reserve for musings and link-posting; this site is meant for posts that I’d like everyone to be able to see for a while.

Nevertheless, I do believe there are a few simple tricks that you can pull to adequately address some of Siebenmann’s points.

Modification Times

The solution I have is to include the post date along with the title at the top of my posts. The first two lines of this post are:

Is a Filesystem-Based Blog Right for You?
6/8/2008

Can interprets the first line as the title – and formats a slug accordingly – and the second line as the publication date. The post gets published at noon of the day given and isn’t published if the date is in the future. The modification time of the file has nothing to do with the publication time.

Reminds me of how we used to title our homework assignments in grade school.

Metadata

Siebenmann is concerned about a lot more than publication times: metadata can include tags, categories, revisions, modification times, and so on. I don’t think file-based blogs are cut out to handle gobs of metadata; if you find yourself needing that data it’s probably time to move to something like Wordpress.

I feel that Siebenmann’s last point – that having local storage for metadata – is addressed by what I just said above.

Bonus Problem: Post-specific Media

File-based engines have no real way to store media in connection with a post. I considered a few options:

  1. Just don’t. Have a /media directory and hard-code in links in posts. Can’t name two pieces of media the same thing (i.e. no foo.jpg in two posts).

  2. Turn each post into a folder with the post text file and any associated media files inside. Just one problem: it’s a real pain in the butt. Kind of defeats the purpose of being able to just drop posts into a directory and publish them.

  3. Create a folder for each post in a /media directory and have can modify links in the post to point at this new location. I’ll explain this one below.

  4. Embrace the possibility of a blog without any media. Would do that, but I already have posts with screenshots and such.

My initial post on can outlined what I was planning: basically to search and replace links in the post source. To quote myself:

But I the approach I’d like is something like this:

  • Publishing script creates a directory per post in a specified media base directory.
  • Drop any media corresponding to a particular post into said directory.
  • The post source uses a flag – something like class='media' – in links that reference local media. The publishing script looks for these in posts and prepends the post’s media directory to the link.

The first two are right. The third point is crap. What I want is simplicity: can uses Markdown, and including a class for a link means writing out the link by hand. The solution I use is to just have links that look like:

<a href='/media/is_a_filesystem-based_blog_right_for_you.html/foo.jpg'>...</a>

Simple. During publishing can goes through and replaces that link with this:

<a href='/media/is_a_filesystem-based_blog_right_for_you.html/post_slug/foo.jpg'>...</a>

And I drop foo.jpg into that folder to complete the process. Lets me use identical names and, more importantly, it keeps the media folder nice and organized. Still not as easy as database-based blogs, but thankfully I’m text-heavy and tend not to have any media.

(Disclaimer: can actually doesn’t do any of this. But it will. Soon.)

Pick Your Poison

File-based blogs aren’t popular. For many they’re going to be a square peg in a round hole. But if your blog isn’t complicated or updated often then they offer a simplicity that something like Wordpress can’t.

July 29, 2008
Permalink


❧ Can

Can is the blogging engine I’ve rolled for my site, with a generous tip ‘o the hat to Steven Frank. It’s almost unfair to call this an “engine”: really it’s just 100 lines of Python (and not very good Python, either). Whether that’s an indication of Python’s awesomeness or just my laziness I’ll leave to you. Either way it does what I need it to do:

It’s far from where I want it to be. The templating system sucks. There’s no good way of saving media for posts. It doesn’t spit out an RSS feed. But it has one thing going for it: it’s on Launchpad.net. Branch it, add junk, and merge it back in. Or don’t.

This means that I’ve (again) broken all URLs to my site. I’ve noticed that – because I hardly write anything – I can remove (almost) all unecessary cruft in the URLs. Before and after:

http://aneviltrend.com/blog/articles/2008/05/06/can

http://aneviltrend.com/archive/can.html

I’m still slapping myself for that /blog/articles bit of the URL. Tim Berners-Lee, the guy who coded the first internet browser, penned an excellent article on the art of beautiful URLs.

Where do I see can going from this point forward? The templating system needs an overhaul. I have three template files describing a base page layout with about one line of difference between each, in flagrant violation of DRY principles. The backend for the templates is hack as well: searching for known strings and using re.sub() to insert HTML. I feel that a cleaner approach would use Python’s built-in DOM support.

Media support just doesn’t exist. At this point my old posts with screenshots are pulling graphics from a temporary directory I set up. But I the approach I’d like is something like this:

It seems a bit excessive – why not just hard code in the link? – but this approach seems to be the cleanest from a “writing the post” view: I don’t need to worry about what the generated slug will be (since that will likely be the name of the media directory) and it’s clean markup.

May 15, 2008
Permalink


❧ Web Presence

I’ve been signing up for an alarming amount of web apps lately. Nearly every site that I visit asks me to put down my name before it’ll let me in. And, sucker that I am, I tend to use my real name.

spreading thin

Where am I on the net?

Quite a list. I’m making no claims here though: you may have more or less. The point is, though, that our every move on the web is captured. If I post to the Plan 9 mailing list my Google will list that post as my top search result. What if that post was a nasty reply? What if it was just plain stupid? That post is archived by hundreds of sites. It’s not getting lost.

being careful

Granted it’s easy to simply not care. So what if a Google search of my name turns up someone who trolls forums and pesters mailing lists? The easy answer: that it really doesn’t matter. How many people search for me online? How many would, if they saw those posts, even know me? Will these sites even be around ten years down the road?

And if you don’t use your real name then that answer might suffice. The anonymity of the internet makes it easy to be multiple people. But I’d like to focus on those that are trying to cultivate a presence. Just like the “real world” your name on the internet carries weight. It carries your image. And with how prominent the internet has become it’s beginning to carry a significant amount of your identity.

I believe that we take these online personas for granted. With every web app that gets released we have another opportunity to create yet another identity. Unlike our analog counterpart that forgets and is forgotten, the web identity you create is permanent. The internet, cruel mistress that she is, will never lose that terrifically embarrassing photo. Or video. Or post.

March 01, 2008
Permalink


❧ A Cursory Look at Makefiles

New to Makefiles? At best they’re confusing, and at worst completely incomprehensible. Here’s a dissection of a simple Makefile.

structure

The basic format of a Makefile follows this:

<targets> : <dependencies>
    <commands>

Where multiple entries will make up a larger Makefile. A simple Makefile to compile a helloworld.c program might look something like this:

helloworld.o : helloworld.c
    gcc -o helloworld.o helloworld.c

The target is helloworld.o, and it depends on having helloworld.c. Running the following command:

$ make helloworld.o

Will cause gcc to be run as specified in the Makefile.

practical makefiles

Developing a Makefile that might actually be used in a smallish software project involves a bit more work. Generally speaking, the project will consist of several – in this case – .c files, which will need to be linked in interesting ways against each other.

%, $@, and $^ are all special variables. Here’s how they might be used:

%.o : %.c
    gcc -o $@ $^

The % grabs the matching string from the target and applies it to the dependency. If the target is foo.h, the Makefile will search for foo.c. The next variable, $@, grabs whatever file matched the target. Likewise, %^ grabs the file(s) that matched the dependency.

Thus running

$ make helloworld.o

will, as above, run gcc -o helloworld.o helloworld.c. The % operator will match “helloworld”, the $@ grabs helloworld.o, and the $^ grabs helloworld.c.

Most Makefiles will also define some standard targets, such as clean:

clean :
    rm -f *.o

That covers the basics. For additional resources on Makefiles check out:

February 12, 2008
Permalink


❧ a variation on mips

Parallel ISA (PISA) Overview

Uses PC-indirect addressing to specify ‘registers.’ Meant to easily enable out-of-order execution and multiple-issue logic in hardware. Otherwise exactly like MIPS. Pronounced as “pizza.”

The modifications are to the source and destination registers. Source registers are not addressed directly, but rather by providing the PC offset to the instruction whose result will be used. The destination register field is removed.

Example:

addiu   $0, 5   ; x = 5
addiu   $0, 4   ; y = 4
add     -1, -2  ; adds x + y

Hardware can now easily determine that the first two loads can happen in parallel – or out of order – but that the addition depends on the results of the loads. The add cannot be issued until both loads complete. Essentially, the ISA makes dependencies between instructions very clear.

Register File

Essentially a “cache” of registers. Because each entry needs to keep track of the PC of the instruction that wrote it, the register file needs to keep a tag record for each entry. This will incur a much higher hardware overhead in the register file. A direct-mapped approach is used to reduce this penalty.

To support parallel operation two additional bits are needed for each entry:

Non-sequential code

This approach works well for sequential code, but begins to break down when loops and branches are used. Take this example:

addiu   $0, 1
addiu   $0, 5       ; x = 5

loop:
blez    -1, done    ; if( x==0 ) goto done
subi    -2, -3      ; x = x-1
j       loop

done:

The blez branch, because it only checks the result of the ldi 0x05 instruction, will never be taken. The subi instruction stores its result at a +1 offset to the blez, which the branch does not check.

This issue gives us the following addition to the instruction set. The rd field of the MIPS ISA, previously unused in this implementation, will now be used to store an optional destination entry in the register file. The above example can be rewritten as:

addiu   $0, 1
addiu   $0, 5, +2   ; x = 5, store into PC+2

loop:
blez    +1, done    ; if( x==0 ) goto done
subi    0, -3       ; x = x-1
j       loop

done:

The compiler should assign the branch condition to inspect the result of the last instruction within the loop to assign to the register in question. The last instruction before the branch evaluation should be compiled to be written to the same PC as the former instruction.

How does this affect the parallel operation of the processor? It should have no effect on the operation and should require minimal additional hardware. Previously, without the destination register field, instructions were essentially writing to offset 0. All that has changed is that instructions can now write to other offsets. The valid and pending bits will still ensure correct parallel operation of superscalar implementations.

This approach has the disadvantage of being taxing on compilers. Whether this proves to be a major issue or not will need to be seen.

Disadvantages

This approach suffers from another disadvantage: the inability to reference the result of an instruction that is at a greater offset than N, where N is the register file size. Because the register file uses the PC of an instruction to store entries, an instruction more than an offset of N away from another cannot use the result of the latter. This is an inherent limitation of the instruction set.

A workaround is to write a value that must be accessed later to memory. This might lead to an increased number of memory accesses throughout the program, which will lead to decreased performance.

Another simple solution is to simply increase the size of the register file. Though this would lead to increased hardware costs and might lead to longer delays in register file reads, this would solve the problem somewhat.

Interestingly, the problem could also be alleviated by adding a second layer register file cache, much like adding a second layer data cache. This would offer the benefits of making register entries available for longer periods, but has the disadvantage of making program flow harder to predict. A compiler would need to keep track of the simulated cache state to determine if a register entry will still be available at a later point in the program.

January 14, 2008
Permalink


❧ The Assembler

My friend Chris and I recently finished our USB project. I was trying to think of a good way to present this in a post, and decided to highlight one small part of the project: the assembly that drives the lowest layers of the software stack.

Old School

Last semester I had hacked up the assembly for our project without much foresight or care for elegance. We were more preoccupied with trying to grasp the 650-page spec that is USB, and beautiful code was the least of our worries.

Though the code didn’t look good, it still had to be good, and here’s what it had to accomplish:

This becomes a tall task when you only have 9 cycles to work with (one cycle is needed to output on the I/O pins). Let’s look at what the assembly I wrote last semester for transmitting a packet looks like:

#define SIE_TOKEN_BIT
    mov r20, r3
    andi r20,0x01
    add r20, r10
    out %1, r20
    lsr r3
    subi r16, 1
    brne .+4
    jmp .sie_send_token_eop

/* Token: Send bit, and nop until next one */
#define SIE_TOKEN_BIT_NOP
    SIE_TOKEN_BIT
    NOP2

/* Token: Send bit, and buffer another byte */
#define SIE_TOKEN_BIT_BUFFER
    SIE_TOKEN_BIT
    ld r3, X+

The first thing to note is that all loops were unrolled in this assembly. To send a byte, we had this macro:

#define SIE_TOKEN_BYTE
    SIE_TOKEN_BIT_NOP
    SIE_TOKEN_BIT_NOP
    SIE_TOKEN_BIT_NOP
    SIE_TOKEN_BIT_NOP
    SIE_TOKEN_BIT_NOP
    SIE_TOKEN_BIT_NOP
    SIE_TOKEN_BIT_NOP
    SIE_TOKEN_BIT_BUFFER

This would literally copy and paste in the assembly above about eight times. And that wasn’t even the worst of it: because the loops were unrolled, we had to copy in enough loop iterations for the worst case scenario. For the data packet transmit code – which at most could send 103 bits – we had the byte macro copied in 13 times. This equated to about 936 lines of code – for just one small part of the SIE code. The sheer size of this hurt us; our code weighed in at about 20 KB when compiled. On a device with only 32 KB of program flash memory this becomes a bit of a problem (considering that our code was intended as a companion library to an existing user program).

The assembly above shouldn’t be too confusing, but a few notes are in order.

Each of the lines are explained in order:

  1. mov r20, r3

    Copies the buffer register into the temporary register.

  2. andi r20,0x01

    Performs an AND operation on the temporary register with a bit mask to extract the lowest bit. This is the bit that will be sent on the bus.

  3. add r20, r10

    Adds the bit to be sent with 5: what this is essentially doing is differentially encoding the signal and setting the enable pin high at the same time. The pin assignments were: enable on pin 2, D+ on pin 1, and D- on pin 0. Thus if the bit in the temporary register is 1, meaning that we should be sending a differential 1, adding 5 will yield 0b00000110. The enable line is set high, as is D+. D- is low.

  4. out %1, r20

    Outputs the value of the temporary register on PORTA.

  5. lsr r3

    Shifts the buffer register down one, getting it ready for when the next bit will be sent.

  6. subi r16, 1

    Decrements the bit count by one.

  7. brne .+4

    When the bit count hits zero, this branch will not be taken.

  8. jmp .sie_send_token_eop

    If the above branch is not taken – meaning that all bits have been sent – then the code will jump to the end-of-packet handler.

Repeat this seven times, and add a load instruction on the eighth, and you have the complete workings of last semester’s assembly. It worked, true, but it was gross; and with an entire semester to rework stuff I decided to sit down and hammer out some nice code.

New School

We came into this semester knowing that USB with a Mega32 is indeed possible. We also knew what USB was. We figured that a full code overhaul would be in order, and there’s no better place to start than at the bottom.

The assembly, from above, was completely tossed. Little by little we came up with our new assembly – replete with rolled-up loops and clever hacks. These changes required some small modifications to the hardware, but nothing major.

Here’s the code that made it into our final revision:

#define TX_PACKET(label,mem_pointer,bit_count_reg)
    mov     r5, __zero_reg__
    ldi     r20, 0x01

    /* buffer */
    .sie_#label_tx_buffer:
    ld      r10, #mem_pointer+

    /* bit tx */
    .sie_#label_tx_bit:
    lsl     r10
    rol     r20
    out     %4, r20

    /* completion checks */
    dec     #bit_count_reg
    breq    .sie_#label_tx_done

    add     r5, r3
    brcs    .sie_#label_tx_buffer

    ldi     r20, 0x01
    rjmp    .sie_#label_tx_bit

    /* done */
    .sie_#label_tx_done:

I could try to explain all the assembly here, but I’d be repeating an entire chapter of our documentation on the project. Chapter 4 of the documentation covers each line of the assembly and how it works. Check it out, or try and figure out what the assembly is doing on your own. Should you choose to do that, know that:

Explanation aside, the punchline is that nearly 1000 lines of assembly in our previous project got replaced with just 12 lines of cleverness.

January 07, 2008
Permalink


❧ Everyday Linux: rsync

I have two computers, both of which I like to listen to music on: my laptop, for when I’m in class (ahem, studying), and my desktop when I’m in my room. I use iTunes on my laptop and Amarok on my desktop. You can see that I might have a few issues when syncing my music between the two computers. How I do it is today’s Everyday Linux.

rsync

This is my typical scenario: I’m on campus, and I’ve just got some music from a site like soul sides or Amazon’s sweet new music store. I import my music and listen to it on iTunes. Later I get back to my room, and would like to listen to my new music with the better speakers hooked up to my desktop. How do I sync up my music?

I’m never typically near nor have access to my desktop whenever I get new music. So I need a way to copy whatever music I have over to my desktop. But why not just copy over the music manually? Well, I could, but that’s not the cool way to do it.

The more practical reason is that iTunes on my laptop organizes my music as it sees fit, and I’d rather not have to traverse arcane directory names in order to get to the folder that I want to copy over. Amarok, fortunately, is much more forgiving with its organization, and will put up with the structure that iTunes uses. I also prune my music collection from time to time, and would rather not have to track what changes I make to do the same on my desktop. In short: I need rsync so I have a drop-dead simple syncing system for my music.

rsync is a utility that synchronizes files and directories. They can be two local directories, two remote, one local and one remote, it doesn’t matter. All it takes is one terminal command (given, you’ll probably spend some time perfecting this command). The other caveat is that (if you’re using rsync with remote computers) you’ll need to set up the rsync daemon on any remote computers you connect to.

Implementation

The first step was to get my desktop set up for rsync. I created a music folder in my home directory to begin, then set up the rsyncd.conf file in my /etc directory.

To set up rsync on my Ubuntu system I followed along at the ubuntu guide entry and at another excellent page. Here are some of the highlights:

The more exciting parts of this file look like this on my system:

[musicbackup]

path = <home>/music
comment = the music backup location
secrets file = /etc/rsyncd.secrets

Not bad at all. The rsyncd.secrets file is next:

<user>:<password>

For me it’s just one line: my name and password.

Now the fun part is on my laptop. This is the rsync command that I use to do a one-way sync from my laptop to my desktop.

rsync --verbose --progress --stats --compress --rsh=/usr/bin/ssh \
    --recursive --times --delete \
    --exclude "Apple" \
    --exclude "Movies" \
    ...
    --exclude "Video" \
    <home>/Music/iTunes/iTunes\ Music/* \
    devrin@<desktop ip>:music

The options I specify include:

It would be a real pain to have to type out this entire command each time I wanted to sync, so I copied it into its own bash script that I called music_backup.sh. Now each time I want to back up I just invoke my little script:

$ ./music_backup.sh

And my music gets synced up. Not bad! You’ll want to read up on the documentation I linked to above to get a better feel for how to use rsync to accomplish your goals. There’s a few steps there that I didn’t cover but that should be fairly trivial to do.

All in all, rsync is a great system. It works perfectly for what I need it to do, and I’m sure that a lot of people have some sort of syncing problem that could be solved elegantly with rsync.

November 05, 2007
Permalink