Places to Put Things

Variables and data types

So far, we have learned how to print lines of text to the console and how to draw basic shapes on the canvas. You might be able to spend endless joy-filled hours playing just with this, but it doesn't begin to scratch the surface of what computing is about. To move forward, the next thing we should learn about is how to store data.

Variables

A variable in programming is essentially a place with a name where you can store some data. A common way to visualize a variable is as a box with a name. When programmers are feeling pedantic, they use the term identifier to mean “name”. Whatever you put in the box is the value of the variable.

Math versus programming

You probably remember the concept of the variable from mathematics. Variables in programming are very similar but not identical to variables in mathematics. An example of some mathematical expressions using variables is:

$b = 33$
$a = b + 20$

If I asked you what the value of $a$ is, you would probably reply “fifty-three,” and you'd be right. In the above, both $a$ and $b$ are variables—they are expected to hold numbers whose values can change. The fact that the values can change is a central feature of variables in both mathematics and programming.

Let's now look at a Processing program that does what the mathematics example above does.

variables1.pde

void setup () {
  int a;
  int b;
 
  b = 33;
  a = b + 20;
}

The second part of the example probably makes sense to you. It's really just the mathematical expressions with semicolons tacked onto the ends. (Remember that the end of a statement in Processing must be marked with a semicolon.) But you might be confused about the lines that begin with int. Those are there because in Processing, as well as many other languages, you must explicitly state that you intend to use a variables before you use it. In other words, Processing expects you to say, “Hey, I plan to use a variable named <somename>,” before you use it. The statements

int a;
int b;

do exactly that—they are variable declarations. They tell the Processing compiler that you plan to use two variables, one named a and the other named b.¹⁾

Processing's rules for variable declarations say only that you must declare a variable before you use it. This means that you don't have to bundle all declarations together.²⁾ So, in the above example, you could have written:

int b;
b = 33;
 
int a;
a = b + 20;

as well.

The criterion you should use to decide where you declare variables is ease of reading. A sensible default is to place all variable declarations at the top of the chunk of code where they are used so they can easily be found. However, sometimes it makes more sense to declare a variable immediately before you use it or maybe even in some other place (as long as it comes before you use it). Experience will eventually give you a good feel for the best place to put your variable declarations.

So, one major difference between variables in mathematics and in Processing is that you need to declare variables in Processing. Another major difference is that the kind of data you can store in a variable isn't limited to numbers. This is true of most programming languages. We'll discuss this further below when we talk about data types.

Literal constants

In the program above, the values 20 and 33 are not variables; they are literal constants or simply literals. A literal constant is a thing whose value is literally the thing itself. In the example below,

void setup () {
  println("I like cherries");
}

the string "I like cherries" is also a literal constant.

Outputting variables

If you run the program variables1.pde above, it will appear as though nothing happened. That's because we didn't tell Processing to output anything. A tree has fallen in the forest, and no one asked to hear it.

One way we can learn the value of a variable is to output it to the console as in the code below:

variables2.pde

void setup () {
  int a;
  int b;
 
  b = 33;
  a = b + 20;
 
  println(b);
  println(a);
}

or, if you like,

variables2a.pde

void setup () {
  int a;
  int b;
 
  b = 33;
  a = b + 20;
 
  print("The value of b is: ");
  println(b);
  print("The value of a is: ");
  println(a);
}

You can also use variables in places where Processing expects literal constants. Here's an example:

variables3.pde

void setup () {
  int a;
  int b;
 
  b = 33;
  a = b + 20;
 
  rect(0, 0, a, b);
}

This will draw a rectangle whose upper left corner is at the origin of the canvas, whose width is the value of a (53 pixels) and whose height is the value of b (33 pixels). This example shows that we can use variables whose values might be the result of calculations based on other stuff to control how things are drawn. That should seem at least a little bit interesting.

Identifier rules

Up to now, we've been using very simple names for variables. Single letter variable names are typical in mathematics:

$y = mx + b$
$c^2 = a^2 + b^2$
$x = \frac{{ - b \pm \sqrt {b^2 - 4ac} }}{{2a}}$

However, most programming languages let you give variables long names—subject to certain rules. TODO verify: The rules in Processing say that an identifier must start with an upper or lowercase letter and then can be made up of any combination of upper or lowercase letters, numbers, and the underscore character. Note that spaces in identifiers are not allowed.

There is also a HolyList® of identifiers that Processing has reserved for its own use—so called reserved words. You cannot use reserved words for your own identifiers. For example, setup and int are reserved words. A list of Processing's reserved words is in Appendix A.

Finally, Processing is a case sensitive language, meaning that the identifier myvar is different from the identifiers myVar and MYVAR.

Good names

Knowing these rules, we could rewrite variables3a.pde as follows:

variables3a.pde

void setup () {
  int var_one;
  int var_two;
 
  var_one = 33;
  var_two = var_one + 20;
 
  rect(0, 0, var_two, var_one);
}

or,

variables3b.pde

void setup () {
  int rec_width;
  int rec_height;
 
  rec_height = 33;
  rec_width = rec_height + 20;
 
  rect(0, 0, rec_width, rec_height);
}

The only difference in the the three versions of the program is the names we have given to the variables; the programs are functionally identical. However, of the three versions a lot of programmers would probably prefer variables3b.pde because the variables' names actually describe what they are used for–they are descriptive identifiers. Using descriptive identifiers isn't a language rule—you are free to use descriptive identifiers or not. However, they make reading and understanding your code much easier. And the easier your code is to read, the easier it will be for someone (even you) to fix and expand it.

Closely related to this are coding conventions—mutually agreed upon standard practices for writing code. Coding conventions are not rules; rather they are standard practices that have emerged over time and/or have been agreed upon ahead of time by a team of programmers. A common coding convention for variable names in Processing is to use CamelCase (i.e., capitalizing the first letter of adjacent words) with the first letter of the first word in lower case. Rewriting the example we have been working with so far using the CamelCase convention would yield:

variables3c.pde

void setup () {
  int recWidth;
  int recHeight;
 
  recHeight = 33;
  recWidth = recHeight + 20;
 
  rect(0, 0, recWidth, recHeight);
}

An alternative convention (used in program variables3b.pde) might be to use all lower case letters for variable names and use the underscore character to separate words. CamelCase is the defacto standard with Processing and what I will use in the remainder of this text.

Data types

We learned above that the lines in our example program

int recWidth;
int recHeight;

are variable declarations—they announce to the Processing compiler that you plan to use variables with names recWidth and recHeight. So, what does int mean? Also, we noted earlier that variables in many programming languages, including Processing, can store data that isn't numeric. How does that work? To answer both of these questions, we need to know about data types.

Most programming languages, including Processing, are typed languages. A typed language is one that differentiates between different types of data: an integer number is one type of data, a character such as 'r' is a different type of data, and so on. Typed languages typically predefine a number of data types and let you define additional types.

In Processing, when you declare a variable, you must also state the type of data you will store in the variable. You are free to change the value of the data in a Processing variable whenever you want. However, you are not free to change the type of data you store in it.³⁾

The syntax for declaring a variable in Processing is:

<data-type-of-variable> <variable-identifier>;

To declare a variable named numberOfApples that will be used to store integer data, you would use the statement:

int numberOfApples;

The identifier int is a special Processing term (a keyword) that is used to identify the integer data type. You might remember from mathematics that an integer is any number (positive, negative, or zero) that has no fractional part (…, -3, -2, -1, 0, 1, 2, …). It has effectively the same meaning in Processing. We'll discuss some of the other data types available in Processing shortly.

When you declare variables in a language like Processing, you are actually doing three things. You are:

Telling the system that you plan to use a chunk of computer memory to store some stuff (i.e., the “box” inside which you store values).
Binding an identifier (i.e., give a name to) to the box.
Telling the system what type of data it can put into the box.

Built-in data types

To declare variables, you must know what data types will be recognized in your program. Most typed languages define a set of primitive types that you can use anywhere. They may also define additional built-in types that are composites of the primitive types and/or composites of composites.

Processing's primitive data types

Processing defines eight primitive types that you can use anywhere in your programs.⁴⁾ Each is discussed below.

int

The int type is used to represent integer values—any number that has no fractional part (…, -3, -2, -1, 0, 1, 2, …). In mathematics, integer values have no upper or lower limit; in Processing they do. An int cannot be smaller than -2,147,483,648 nor be larger than 2,147,483,647. An int occupies 32 bits in memory. Example:

int numBeers;
numBeers = 99;

long

A long is identical to an int except that the limits are wider: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. So why would you want to use int when you've also got long available? Because long variables occupy 64 bits of memory, twice as much as an int. TODO verify: Also, on most platforms calculations made using int types will be faster than the same calculations made using long. The long type is available for your use in Processing if you need it, but it is not used by any of the functions available in Processing. Example:

long milesLeftToPluto;
milesLeftToPluto = 3100000000L;

When you want to use a long literal constant, you must add the letter 'L' (upper or lower case) after the value.

byte

A byte is identical to an int except that the limits are smaller: -128 to 127. A byte occupies 8 bits of memory. Example:

byte catLivesLeft;
catLivesLeft = 9;

float

The float type is used to represent floating point numbers, that is, numbers with decimal points in them (e.g., 96.8, -0.00304). The float type uses the IEEE 754 single-precision floating point format and occupies 32 bits of memory. Example:

float zeroToSixtyTime;
zeroToSixtyTime = 7.2;

Floating point numbers and floating point calculations in computing are full of subtle issues that don't show up in pure math. These issues are due to the fact that in pure math floating point numbers have infinite precision, but in computing the precision is limited.

Now is a good time to point out that there is a difference between the literal constant 3.0 and the literal constant 3. The latter specifies a float literal constant while the latter an int literal constant. They are different things.

float floatingpointThree;
int integerThree;
 
floatingpointThree = 3.0;
integerThree = 3;

double

The double type is identical to the float except that it has double the precision—and double the memory requirements. The double type is available for your use in Processing if you need it, but it isn't used otherwise.

char

The char type is used to represent a single alphanumeric or symbolic character—what you might think of as a single letter, number, question mark, etc. A char occupies 16 bits and uses Unicode encoding–a system that translates characters into numeric codes. Character literal constants are created with single quotes. Examples:

char currencySymbol;
char firstInitial;
char secondInitial;
 
currencySymbol = '$';
firstInitial = 'M';
secondInitial = 'K';

Note that in Processing, 3 (without single quotes) is a literal constant for the integer 3 and that '3' (with single quotes) is a literal constant for the character 3. They are different things.

char digitThree;
int integerThree;
 
digitThree = '3';
integerThree = 3;

boolean

The boolean type is used to represent data that can have one of two values: true or false. Example:

boolean isDone;
isDone = false;

color

The color type is used to represent colors. Opaque colors can be specified using Web color format (e.g., #E300CC). You can also specify a color with an alpha channel (for transparency) in hexadecimal notation (e.g., 0x99E300CC). Examples:

color darkRed;
darkRed = #880000;
 
color transparentBlue;
transparentBlue = 0x660000FF

Processing's composite types

In addition to the above primitive types, Processing pre-defines more complex composite data types and also lets you define your own composite types. We will look into this much later.

Geek break: Why doesn't this stuff happen automatically?

Processing is an example of a statically typed language. In statically typed languages, the type of data that can be stored in a variable cannot change once the variable has been created. The most common way to indicate the data type that can be stored in a variable is through declaration. Other examples of statically types languages are C, C++, C#, and Java.

Contrasting with this are dynamically typed languages. In dynamically typed languages the type of data that can be stored in the variable is free to change at any time. It can store a number in one statement and a string of characters in the next. Since the type of data stored in a variable can change, there's no need to explicitly state what type of data you plan to shove into the box. Because of this, many dynamically typed languages will automatically create a variable for you the first time you try to store something in it, and they will automatically change the type associated with the variable as needed. Many programmers are attracted to the simpler syntax that dynamic typing permits. Popular dynamically typed languages include Javascript and Python.

So, dynamically typed languages seem pretty cool. Why don't all languages work that way? One disadvantage of dynamically typed languages is that all the automation associated with them consumes computing resources while the program is running. That means slower execution and/or larger memory requirements. Another, and possibly more important, disadvantage is that dynamic typing can be dangerous. Static typing provides an extra safety net by ensuring that new variables are created only when you really want, rather than because you missspeelled a variable that you already created. They also make sure that only code that is designed for a specific type operates on that type. For example, it might make perfect sense to divide two numbers, but it doesn't really make sense to divide one string by another string.

These might seem like small issues when you are just starting programming, but you quickly learn to appreciate them when your programs become longer and more complex.

Syntactic sugar

Syntactic sugar is any syntax rule that doesn't introduce new power or concepts into the language but rather makes using existing concepts easier, more compact, or more semantically obvious. Two examples of syntactic sugar in Processing are variable initialization and multiple declarations.

Variable initialization

You will notice in the examples above that there is a pattern that occurs over and over again: first we declare a variable, then we give it some initial value. Because this is such a common practice, many languages define rules for variable initialization. In Processing, variable initialization looks like this:

int foo = 66;

The above statement declares an integer variable named foo and gives it an initial value of 66 in one statement. You can initialize any of Processing's primitive types this way—as long as the value on the right of the equals sign is compatible with the variable's type.

If you try to access the value of a variable without initializing it or otherwise giving it a value, the Processing IDE will complain at you. This is because it is widely considered bad programming practice to rely on the default values given to variables by a language. The reason for this is that many languages do not specify any default values for variables—at least for some categories. In other words, the “default” value given to a variable you declare might be a random and/or arbitrary value.⁵⁾

Multiple declarations

Another piece of syntactic sugar that Processing (and many other languages) defines is the ability to declare more than one variable in one declaration statement. You can do this as long as all the variables in the statement are of the same type. The example below:

char firstInitial, secondInitial;

declare two variables—firstInitial and secondInitial—which are both character types. Notice the comma between the two variable names.

You can even combine variable initialization and multiple declaration in the same statement:

char firstInitial = 'M', secondInitial = 'F', thirdInitial = 'K';

Potential pitfalls

Syntactic sugar can be a tricky thing. On the one hand, syntactic sugar makes it easier to write your programs—it increases writability. When done right, it also makes programs easier to read–it increases readability. However, when a language has too much syntactic sugar, then the number of ways to express the same idea increases to such a level that it ends up making it harder to read–because the reader has to know all the intricacies of all the different possible ways of doing things.

¹⁾

Some programming languages automatically create new variables for you as soon as you use them. Processing isn't one of those.

²⁾

Some popular languages require that you declare all the variables you will use at the very start of the code block in which you will use them. Processing is more flexible on this issue.

³⁾

Some typed languages can automatically change the type of data stored in a variable. In other words, sometimes a given variable might be used to store a number and at another time to store a character or some other type of data. Processing isn't one of those.

⁴⁾

These are the same primitive data types used by Java, the language on which Processing is based.

⁵⁾

The language on which Processing is built, Java, //does// specify default values for certain kinds of variables. However, the kinds of variables we have been using up to now and will use for most of this text do not fit into that category.

Mithat Konar (the wiki)

Table of Contents