IP law is interesting with regards to digital files. Every digital file is a stream of binary digits, and therefore they just represent a number (and the larger the file, the larger the number). I'm going to go over a few basics of math and computers to set up a foundation for this thought exercise, but feel free to skim past the next few paragraphs if you're already familiar.
A number that most of us are more familiar with is represented with decimal digits (base 10) that range from 0 to 9, so a stream of decimal digits could be the number 42. That's digit 4 followed by digit 2.
What we think of as 42 could be represented in binary as 101010, the number is a fairly bigger, but it still represents the number 42.
Files are typically described as "bytes" which are groupings of 8 binary digits (bits). This is very similar to how we often group decimal digits in groups of 3, putting a comma between each other (one thousand and forty-two can be written 1,042). It doesn't change the number, it's just an easier way to read it.
They also tended to convert the binary digits into hexadecimal (base-16) which represents numbers with digits from 0-9 and then A-F to represent decimal 10 through decimal 15. This turns out to be quite easy as every group of 8 digits easily translates into two different hexadecimal digits. So, our binary version of decimal 42, which is 101010, will now become 2A.
Computers encode letters into numbers by using a simple cipher/code, one of the most common is ASCII, which will let you translate 256 different symbols into numbers. To translate a character to a number or a number to a character, you can look at the chart. 42 represents the character "*" for example.
The chart has all of the letters in both upper and lower-case, all of the binary digits from 0-9, many different forms of punctuation, as well as some other special characters that we won't go into further.
| A subset of the ASCII chart |
But, we can use the chart to translate a word into a pure number. The word "LIFE" translates into a hexadecimal number 4C 49 46 45, and into the binary number 1001 100 0100 1001 0100 0110 0100 0101. Both of these can be converted to the decimal number 1279870533.
So, this explains how we can turn a work of text into a number, such as this blog post. But to get other formats, we just need to do a similar trick. A program is a similar stream of digits, "Op Codes." To translate these instructions into numbers, you use a similar chart. (or more likely you use a tool, such as a compiler, to translate something easier to work with into these codes).
| A subset of the Intel instruction set |
A dumb geeky trick I used to do was to type at a DOS prompt a set of instructions that could print out a message, I had just memorized a set of OP codes to do that, but it was a very short set in order for me to memorize it. Here's a video of someone else doing the silly party trick:
But, that program is under 30 bytes long, which is a number that can be typed out in a reasonable fashion.
In a similar fashion, graphics are just representations of what you see on the screen. Generally, they contain a header which describes the image, often with a table that sets up an index saying "this color is represented by this number" and then a list of what color makes up each dot of the picture (using those color numbers).
| The raw details of a graphic file |
In a similar fashion, music and sound files are similar binary representations, with a header file that sets up some details, and then samples of the frequency of the sound at a predetermined rate. Videos are similar, but with a different header and different representations within the file. And we can go down the list of every type of digital file, which are all streams of data with headers and "encoding tags," and various other individual details.
What typically differentiates a sound file, from a graphic file, from a video, from anything else are typically those headers and often something like a filename extension.
| MP3 file header |
In fact, some file formats don't even have headers. For example, in windows, if you were to rename one of your files to .PCM, suddenly it would play back as a type of sound, even if before that file were a program or a graphic.
Nothing changed about the file other than the name. You can use notepad to open just about any file, and it will show up as if it were text, and the file didn't even change extension. You can print that file out, and it's still the same file -- even if it wasn't meant to be printed out.
And this is where the concept of IP on digital files starts to become fascinating, as an intellectual exercise. Because, remember, every one of those files are just a particularly large number.
Let's say you take this blog post, and you save it to your hard drive, I still own the copyright. But, let's say you were to rename it to .PCM -- and then record the sound with a tape recorder. At that point, I don't think that I own the sound you made, I think that you own it. But let's say your method of recording was perfect, and then you take the resulting recorded file and turn it back into text and it matches my blog post. I still own that blog post.
Or let's say you take a screenshot of this post, and then save that to your disk. Do I own the picture? Probably. But let's say you renamed it to the save file format of one of your favorite games, and then took the screenshot of the results. I most likely do not own that screenshot, but let's say you knew that game well enough that you were able to figure out what letter every move represented, and from that you were able to deduce the original text... I still own the copyright of the original text.
Do I own the recording of the sound of an old dot matrix that prints out my blog post, even if someone could listen carefully and know what every character was?
There's some interesting things there, and obviously you cannot defeat copyright by putting an MP3 header on the front of something copyrighted and claiming it's just "really cacophonous music."
There was a fair amount of controversy a while back about the key used to decrypt DVDs, as it was a small enough string that the number was just 16 bytes: "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0."
There was a sincere attempt by various companies to prevent the distribution of this key, and similarily software that could use the key to read the data on DVDs. The more they issued DMCA requests to remove the key, the more it would pop up elsewhere.
At one point, they attempted to have United States laws regarding the export of munitions applied to it, since there was a clause that dealt with encryption. This led to many people exercising civil disobedience to send that string out on blogs or in emails, with the claim that they were officially "international arms smugglers."
But, like the text above, it's just a number. Finally, a few very creative individuals found that there exists a rather large prime number which contains a string of digits that includes this key.
Numbers aren't owned by anyone, so there's nothing the various companies could do to prevent the distribution of a prime number (or a statement of 'take the digits x through y of that prime number' to get the key).
So... I guess I'm not really claiming that I own the number ten after all.