====== Places to Put Things ====== //Variables and data types// So far, we have learned how to print lines of text to the console and how to draw basic shapes on the canvas. You might be able to spend endless joy-filled hours playing just with this, but it doesn't begin to scratch the surface of what computing is about. To move forward, the next thing we should learn about is how to store data. ===== Variables ===== A **variable** in programming is essentially a place with a name where you can store some data. A common way to visualize a variable is as a box with a name. When programmers are feeling pedantic, they use the term **identifier** to mean "name". Whatever you put in the box is the **value** of the variable. ==== Math versus programming ==== You probably remember the concept of the //variable// from mathematics. Variables in programming are very similar but not identical to variables in mathematics. An example of some mathematical expressions using variables is: $b = 33$\\ $a = b + 20$ If I asked you what the value of $a$ is, you would probably reply "fifty-three," and you'd be right. In the above, both $a$ and $b$ are variables---they are expected to hold numbers whose values can change. The fact that the values can change is a central feature of variables in both mathematics and programming. Let's now look at a Processing program that does what the mathematics example above does. (Note that if you run the program, you won't see any output because it doesn't print or draw anything. It's just a very simple example to show you some syntax.) void setup () { int a; int b; b = 33; a = b + 20; } The second part of the example probably makes sense to you. It's really just the mathematical expressions with semicolons tacked onto the ends. (Remember that the end of a statement in Processing must be marked with a semicolon.) But you might be confused about the lines that begin with ''%%int%%''. Those are there because in Processing, as well as many other languages, you must explicitly state that you intend to use a variables before you use it. In other words, Processing expects you to say, "Hey, I plan to use a variable named ////," before you use it. The statements int a; int b; do exactly that: they are **variable declarations**. They tell the Processing compiler that you plan to use two variables, one named ''%%a%%'' and the other named ''%%b%%''.((Some programming languages automatically create new variables for you as soon as you use them. Processing isn't one of those. )) Processing's rules for variable declarations say only that you must declare a variable before you use it. This means that you don't have to bundle all declarations together. So, in the above example, you could have written: int b; b = 33; int a; a = b + 20; as well. The criterion you should use to decide where you declare variables is ease of reading. A sensible default is to place all variable declarations at the top of the chunk of code where they are used so they can easily be found. However, sometimes it makes more sense to declare a variable immediately before you use it or maybe even in some other place (as long as it comes before you use it). Experience will eventually give you a good feel for the best place to put your variable declarations. So, one major difference between variables in mathematics and in Processing is that you need to declare variables in Processing. Another major difference is that the //kind// of data you can store in a variable isn't limited to numbers. This is true of most programming languages. We'll discuss this further below when we talk about //data types//. ==== Literal constants ==== In the program above, the values ''%%20%%'' and ''%%33%%'' are not variables; they are **literal constants** or simply **literals**. A literal constant is a thing whose value is literally the thing itself. In the example below, void setup () { println("I like cherries"); } the string ''%%"I like cherries"%%'' is also a literal constant. ==== Outputting variables ==== If you run the program ''%%variables1.pde%%'' above, it will appear as though nothing happened. That's because we didn't tell Processing to output anything. A tree has fallen in the forest, and no one asked to hear it. One way we can learn the value of a variable is to output it to the console as in the code below: void setup () { int a; int b; b = 33; a = b + 20; println(b); println(a); } or, if you like, void setup () { int a; int b; b = 33; a = b + 20; print("The value of b is: "); println(b); print("The value of a is: "); println(a); } You can also use variables in places where Processing expects literal constants. Here's an example: void setup () { int a; int b; b = 33; a = b + 20; rect(0, 0, a, b); } This will draw a rectangle whose upper left corner is at the origin of the canvas, whose width is the value of ''%%a%%'' (53 pixels) and whose height is the value of ''%%b%%'' (33 pixels). This example shows that we can use variables whose values might be the result of calculations based on other stuff to control how things are drawn. That should seem at least a little bit interesting. ==== Rules of the name ==== Up to now, we've been using very simple names for variables. Single letter variable names are typical in mathematics: $y = mx + b$\\ \\ $c^2 = a^2 + b^2$\\ \\ $x = \frac{{ - b \pm \sqrt {b^2 - 4ac} }}{{2a}}$ However, most programming languages let you give variables long names---subject to certain rules. **TODO verify:** In Processing, an identifier can be as long as you want. It must start with an upper or lowercase letter, the dollar sign ($), or the underscore character (_). Subsequent characters can be upper or lowercase letters, numbers, the dollar sign, and the underscore character. Note that spaces in identifiers are not allowed. There is also a HolyList® of identifiers that Processing has reserved for its own use---so called **reserved words**. You cannot use reserved words for your own identifiers. For example, ''%%setup%%'' and ''%%int%%'' are reserved words. A list of Processing's reserved words is in Appendix A. Finally, Processing is a **case sensitive** language, meaning that the identifier ''%%myvar%%'' is different from the identifiers ''%%myVar%%'' and ''%%MYVAR%%''. ==== Good names ==== Knowing these rules, we could rewrite ''%%variables3a.pde%%'' as follows: void setup () { int var_one; int var_two; var_one = 33; var_two = var_one + 20; rect(0, 0, var_two, var_one); } or, void setup () { int rect_width; int rect_height; rect_height = 33; rect_width = rect_height + 20; rect(0, 0, rect_width, rect_height); } The only difference in the the three versions of the program is the names we have given to the variables; the programs are functionally identical. However, of the three versions a lot of programmers would probably prefer ''%%variables3b.pde%%'' because the variables' names actually describe what they are used for---they are **descriptive identifiers**. You are free to use descriptive identifiers or not. However, they make reading and understanding your code //much// easier. And the easier your code is to read, the easier it will be for someone (even you) to fix and expand it. Closely related to this are **coding conventions**---mutually agreed upon standard practices for writing code. Coding conventions are not language rules; rather they are standard practices that have emerged over time and/or have been agreed upon ahead of time by a team of programmers. A popular coding convention for variable names in Processing is to capitalize the first letter of adjacent words but keep the first letter of the first word in lower case. This format is sometimes referred to as //CamelCase//. Rewriting the example we have been working with so far using the CamelCase convention would yield: void setup () { int rectWidth; int rectHeight; rectHeight = 33; rectWidth = rectHeight + 20; rect(0, 0, rectWidth, rectHeight); } An alternative convention is often referred to as //snake case//: using all lower case letters for variable names with the underscore character separating words. This format was used in programs ''%%variables3a.pde%%'' and ''%%variables3b.pde%%'' above. Since CamelCase is the defacto standard with Processing, we will use in the remainder of this text. ===== Data types ===== We learned above that the lines in our example program int rectWidth; int rectHeight; are variable declarations---they announce to the Processing compiler that you plan to use variables with names ''%%rectWidth%%'' and ''%%rectHeight%%''. So, you might be wondering what that ''%%int%%'' means. Also, you might remember that we said variables in many programming languages, including Processing, can store data that isn't numeric. You might be wondering how that works. To answer both of these questions, we need to learn about **data types**. Processing is what programmers call a **typed language**. A typed language is one that differentiates between different types of data: a number is one type of data, a character such as 'r' is a different type of data, and so on. Typed languages typically predefine a number of data types, and they usually let you define additional custom types. When Processing creates a variable, that variable will only hold one type of data for its entire life. The value stored in the variable can change, but the type of the value cannot.((Some typed languages can automatically change the type of data stored in a variable. In other words, sometimes a given variable might be used to store a number and at another time to store a character or some other type of data. Processing isn't one of those.)) So, when you declare a variable in Processing, one of the things you need to do is indicate //the type of data// you will store in the variable. The syntax for declaring a variable in Processing is: ; To declare a variable named ''%%numberOfApples%%'' that will be used to store a whole number, you would use the statement: int numberOfApples; The identifier ''%%int%%'' is a special Processing term (a **keyword**) that is used to identify the //integer// data type. You might remember from mathematics that an integer is any number (positive, negative, or zero) that has no fractional part (..., -3, -2, -1, 0, 1, 2, ...). It has effectively the same meaning in Processing. We'll discuss some of the other data types available in Processing shortly. When you declare variables in a language like Processing, you are actually doing three things. You are: * Telling the system that you plan to use a chunk of computer memory to store some stuff (i.e., the "box" inside which you store values). * Binding an identifier (i.e., giving a name to) to the box. * Telling the system what type of data it can put into the box. ==== Built-in data types ==== To declare variables, you must know what data types will be recognized in your program. Most typed languages define a set of primitive types that you can use anywhere. They may also define additional built-in types that are composites of the primitive types and/or composites of composites. ==== Processing's primitive data types ==== Processing defines eight primitive types that you can use anywhere in your programs.((These are the same primitive data types used by Java, the language on which Processing is based. )) Each is discussed below. === int === The ''%%int%%'' type is used to represent integer values---any number that has no fractional part (..., -3, -2, -1, 0, 1, 2, ...). In mathematics, integer values have no upper or lower limit; in Processing they do. An int cannot be smaller than -2,147,483,648 nor be larger than 2,147,483,647. If you're curious about such things, an int occupies 32 bits in memory. Example: int numBeers; numBeers = 99; === long === A ''%%long%%'' is identical to an ''%%int%%'' except that the limits are wider: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. So why would you want to use ''%%int%%'' when you've also got ''%%long%%'' available? Because ''%%long%%'' variables occupy 64 bits of memory, twice as much as an ''%%int%%''. The ''%%long%%'' type is available for your use in Processing if you need it, but it is not used by any of the native Processing functions. Example: long milesLeftToPluto; milesLeftToPluto = 3100000000L; When you want to use a ''%%long%%'' literal constant, you must add the letter 'L' (upper or lower case) after the value. === byte === A ''%%byte%%'' is identical to an ''%%int%%'' except that the limits are smaller: -128 to 127. A ''%%byte%%'' occupies 8 bits of memory. Example: byte catLivesLeft; catLivesLeft = 9; === float === The ''%%float%%'' type is used to represent floating point numbers, that is, numbers with decimal points in them (e.g., 96.8, -0.00304). The ''%%float%%'' type uses the IEEE 754 single-precision floating point format and occupies 32 bits of memory. Example: float zeroToSixtyTime; zeroToSixtyTime = 7.2; Now is a good time to point out that there is a difference between the literal constant ''%%3.0%%'' and the literal constant ''%%3%%''. The latter specifies a ''%%float%%'' literal constant while the latter an ''%%int%%'' literal constant. They are different things. float floatingpointThree; int integerThree; floatingpointThree = 3.0; integerThree = 3; === double === The ''%%double%%'' type is identical to the ''%%float%%'' except that it has double the precision---and double the memory requirements. The double type is available for your use in Processing if you need it, but it isn't used otherwise. === Geek break: Floating point numbers === Floating point numbers calculations that use them in computing are full of subtle issues that don't show up in pure math. As an example, try running this program: void setup () { double a; double b; a = 0.1; b = 3.0; println(a * b); } The laws of mathematics say the program should print 0.3, but the actual result is different. This odd behavior is caused by two things: (1) in pure math floating point numbers have infinite precision, but in computing floating point precision is limited, and (2) Processing's numbers are coded using a base two system (i.e., //binary//) whereas the numbers we humans use are coded in base ten (i.e., //decimal//). So, the takeaway from this is that in computing, you should consider floating point as //approximations// to what they are in the math world. For the kinds of things you're likely to do with them for in Processing, this aspect of floating point numbers is unlikely to present a problem. However, there are situations where it does. === char === The ''%%char%%'' type is used to represent a single alphanumeric or symbolic character---what you might think of as a single letter, number, question mark, etc. A ''%%char%%'' occupies 16 bits and uses Unicode encoding, which is a system that translates characters into numeric codes. Character literal constants are created with single quotes. Examples: char currencySymbol; char firstInitial; char secondInitial; currencySymbol = '$'; firstInitial = 'M'; secondInitial = 'K'; Note that in Processing, ''%%3%%'' (without single quotes) is a literal constant for the integer 3 and that ''%%'3'%%'' (//with// single quotes) is a literal constant for the character 3. //They are different things//. char digitThree; int integerThree; digitThree = '3'; integerThree = 3; === boolean === The ''%%boolean%%'' type is used to represent data that can have one of two values: ''%%true%%'' or ''%%false%%''. Example: boolean isDone; isDone = false; === color === The ''%%color%%'' type is used to represent colors. Opaque colors can be specified using Web color format (e.g., ''#E300CC''). You can also specify a color with an alpha channel (for transparency) in hexadecimal notation (e.g., ''0x99E300CC''). Examples: color darkRed; darkRed = #880000; color transparentBlue; transparentBlue = 0x660000FF ==== Processing's composite types ==== In addition to the above primitive types, Processing pre-defines more complex composite data types and also lets you define your own composite types. We will look into this much later. === Geek break: Why doesn't this stuff happen automatically? === Processing is an example of a **statically typed** language. In statically typed languages, the type of data that can be stored in a variable cannot change once the variable has been created. The most common way to indicate the data type that can be stored in a variable is through declaration. Other examples of statically types languages are C, C++, C#, and Java. Contrasting with this are **dynamically typed** languages. In dynamically typed languages the type of data that can be stored in the variable is free to change at any time. It can store a number in one statement and a character in the next. Since the type of data stored in a variable can change, there's no need to explicitly state what type of data you plan to shove into the box. Because of this, many dynamically typed languages will automatically create a variable for you the first time you try to store something in it, and they will automatically change the type associated with the variable as needed. Many programmers are attracted to the simpler syntax that dynamic typing permits. Popular dynamically typed languages include JavaScript and Python. So, dynamically typed languages seem pretty cool. Why don't all languages work that way? One disadvantage of dynamically typed languages is that all the automation associated with them consumes computing resources while the program is running. That means slower execution and/or larger memory requirements. Another reason is that dynamic typing can be dangerous. Static typing provides an extra safety net by ensuring that new variables are created only when you really want rather than because you missspeelled a variable that you already created. They also make sure that only code that is designed for a specific type operates on that type. For example, it might make perfect sense to divide two numbers, but it doesn't really make sense to divide one string by another string. These might seem like small issues when you are just starting programming, but you quickly learn to appreciate them when your programs become longer and more complex. ===== Syntactic sugar ===== **Syntactic sugar** is any syntax rule that doesn't introduce new power or concepts into the language but instead just makes using existing concepts easier, more compact, or more semantically obvious. Two examples of syntactic sugar in Processing are variable initialization and multiple declarations. ==== Variable initialization ==== You will notice in the examples above that there is a pattern that occurs over and over again: first we declare a variable, then we give it some initial value. Because this is such a common practice, many languages define rules for **variable initialization**. In Processing, variable initialization looks like this: int foo = 66; The above statement declares an integer variable named ''%%foo%%'' and gives it an initial value of 66 in one statement. You can initialize any of Processing's primitive types this way, as long as the value on the right of the equals sign is compatible with the variable's type. If you try to access the value of a variable without initializing it or otherwise giving it a value, the Processing IDE will complain at you. This is because it is widely considered bad programming practice to rely on the default values given to variables by a language. The reason for this is that many languages do not specify //any// default values for variables---at least for some categories. In other words, the "default" value given to a variable you declare might be a random and/or arbitrary value.((The language on which Processing is built, Java, //[[http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html|does]]// specify default values for certain kinds of variables. However, the kinds of variables we have been using up to now and will use for most of this text do not fit into that category. )) ==== Multiple declarations ==== Another piece of syntactic sugar that Processing (and many other languages) defines is the ability to declare more than one variable in one declaration statement. You can do this as long as all the variables in the statement are of the same type. The example below: char firstInitial, secondInitial; declare two variables, ''%%firstInitial%%'' and ''%%secondInitial%%'', which are both character types. Notice the comma between the two variable names. You can even combine variable initialization and multiple declaration in the same statement: char firstInitial = 'M', secondInitial = 'F', thirdInitial = 'K'; ==== Potential pitfalls ==== Syntactic sugar can be a tricky thing. On the one hand, syntactic sugar makes it easier to write your programs---it increases **writability**. When done right, it also makes programs easier to read---it increases **readability**. However, when a language has too much syntactic sugar, then the number of ways to express the same idea increases to such a level that it ends up making it //harder// to read because the reader has to know all the intricacies of all the different possible ways of doing things.