You are on page 1of 20

Lecture notes on computer science

Fengning (David) Ding


May 22, 2014
Contents
1 First introductions 3
1.1 A review of binary and hexadecimal . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The Von-Neumann Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Introduction to Python 7
2.1 Basic Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Input and output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Variable Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.4 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Execution stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Search Algorithms 8
3.1 Linear Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Asymptotic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Sorting Algorithms Part 1 9
4.1 Insertion Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Selection Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 Sorting Algorithms Part 2 10
5.1 Merge Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6 Introduction to Object Oriented Programming 11
6.1 Classes, instances, methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.2 Inheritance and Object Oriented Design . . . . . . . . . . . . . . . . . . . . . . . 11
6.3 Various Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.3.1 Linked List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7 Data Structures 12
7.1 Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.2 Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.3 Binary Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1
8 Glimpse: Computational Models 13
8.1 Finite/Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.2 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.3 Logical Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.4 Quantum Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
9 Glimpse: Machine Learning using Python 14
9.1 Training/Test Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.2 Measuring Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.3 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.3.1 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.3.2 Essemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.3.3 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.3.4 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.4 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.4.1 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.4.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . 14
10 Glimpse: C programming 15
10.1 Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
10.1.1 If statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
11 Glimpse: Systems Programming 17
11.1 Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
11.2 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
12 Glimpse: Web Programming Part 1 18
12.1 HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
12.2 Javascript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
12.3 CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
13 Glimpse: Web Programming Part 2 19
13.1 HTTP protocal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
13.2 Server side programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
13.3 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2
Chapter 1
First introductions
1.1 A review of binary and hexadecimal
Eventually, if you are a computer programmer, you will encounter binary and hexadecimal
numbers, so it is useful to have an idea on how they work.
Machines operate with binary numbers, just as we work with decimal. Physically, 0s cor-
respond to low voltage (switches that are o) and 1s to high voltage (switches that are on). A
binary number 101011
2
represents 2
0
+ 2
1
+ 2
3
+ 2
5
. Each digit of a binary number is called a
bit, and eight bits correspond to a byte.
Example 1. How many values could a one-byte integer store? Well, one byte is 8 bits, so one
byte can store 2
8
integers, from 0 to 255.
Common units in computer science include 32-bit words (4 bytes), 64-bit words (8 bytes),
kilobytes (1024 bytes), megabytes (1024 kilobytes), and gigabytes (1024 megabytes). The later
three are units for storage, and the rst two units in computation. Most older computers can
only work with 32-bits at a time, so all values are 32-bit words. The newest computers can
work with 64 bits simultaneously.
Exercise 1. How many values could a 32-bit word store? A 64-bit word?
Binary numbers are rather unwieldy, so computer scientists often use hexadecimal numbers
(base 16). By convention we use a=10, b=11, c=12, d=13, e=14, f=15, so a number like
a1 represents 161. Also by convention, we prex a hexadecimal number by 0x, so we ought
to write 0xa1. This way, we dont confuse hexadecimal numbers with decimal numbers and
variable names. Hexadecimal numbers are convenient, because each digit represents 4 bits.
Finally, we discuss bitwise operations. The rst operation is bitwise-and, &. For each bit
inside a binary number, 0&0 = 0&1 = 1&0 = 0 and 1&1 = 1 (bitwise-and returns one when
both inputs are one). For example, 01010
2
&10111
2
= 00010
2
. Another operation is bitwise-or,
which returns one if either input is one (0|0 = 0, 0|1 = 1|0 = 1|1 = 1). A third operation
is bitwise-not, which takes one input and returns one if it was zero and zero if it was 1. The
best way to think about these operations is to think of 1 as true, and 0 as false, so bitwise-and
returns true if input 1 and input 2 are true, bitwise-or returns true if input 1 or input 2 are
true, and bitwise-not returns true if the input is not true. It turns out we can express all logical
operations in terms of AND, NOT, and OR. The most useful of these remaining operations
is exclusive-or, XOR, which usually is denoted by n programming languages, but is probably
clearer when denoted by . XOR is true if only one if input 1 or 2 is true. It turns out that
XOR is just addition modulo 2 (per bit).
Finally, a last series of operations are the shift operations which shifts the number left or
right. For example, 11001 << 2 = 1100100, and 11001 >> 2 = 00110. These operations are
multiplication/division by powers of 2.
3
1.2 The Von-Neumann Architecture
To understand computer programming, it is certainly necessary to understand how computers
work. Modern computers use what is called the Von-Neumann architecture. The Von-Neumann
model of computation consists of a CPU, memory, and input/output. Computation proceeds
one instruction at a time, and instructions are executed once every clock cycle. The CPU stores
the program counter, which tells the computer what instruction to execute. At the beginning of
a clock cycle, the CPU checks the program counter to see where the next instruction is stored. It
then goes to memory to retrieve this instruction. Having retrieved this instruction, it executes
this instruction. At the very end, it updates the program counter according to the instruction.
This repeats until the CPU sees an end-of-program marker. Although modern computers do
not exactly follow this prescription (for example, some instructions take more than one clock-
cycle to execute, CPUs can process multiple instructions simultaneously, etc.), they do follow
the main idea of this architecture: a computational core (the CPU), which can store values in
memory, executing a program that itself is also stored in memory.
What instructions can a CPU perform? It can perform basic arithmetic. It can load/store
values into memory. It can change the program counter, and moreover, change it only when a
certain condition is true. And nally, it can perform input and output: it can output the result
of its computation, and take input from the user (strictly speaking, this is part of the operating
system). Instructions are given to the CPU in the form of machine language, which is a series
of 0s and 1s.
In real computers, in addition to the CPU and memory, there is the hard-drive, which is
also for storage. Unlike memory, the hard drive is persistent, meaning that after the computer
is turned o, things you store remain on hard-drive (whereas on memory, everything is lost).
Your les, programs, etc, are all stored in hard-drive. Hard-drive is thousands (maybe even
millions) of times slower than memory, so temporary data should be stored in memory.
1.3 Algorithms
Fundamentally, computer science is about algorithms, which are a denite set of instructions
that solve a problem. These instructions must be precise (i.e., drawn from a specic nite
instruction set), correct (actually solve the problem), and must take nite time to run. For
instance, here is an algorithm to nd a specic word inside a list of words:
1. Set current word to be the rst word inside the list.
2. While the current word is not the last word of the list:
(a) Check if current word is equal to the given word.
(b) If so, return found.
(c) Otherwise, set current word to be the next word in the list.
3. Return not found.
Exercise 2. The above algorithm (linear search) is actually not completely correct. Spot the
mistake!
How can we improve upon the above algorithm? Well, if we dont know anything else about
the list, we can prove that the above algorithm is optimal. But if we know the list is sorted, we
have the following scheme:
1. Set current word to be the word inside the middle of the list.
2. Set upper to be the end of the list, and lower to be the beginning.
4
3. While upper is at least equal to lower:
(a) Check if current word is the given word.
(b) If so, return found.
(c) Otherwise, if the given word is before the current word according to dictionary or-
dering, set upper to be before the current word.
(d) Otherwise, if the given word is after the current word, set lower to be after the current
word.
4. Return not found.
Remark 1. This algorithm is known as binary search. It is remarkably tricky to implement
correctly (even many professionals get it wrong)!
How can we compare the speed of algorithms? One way is just to measure the time the
algorithm takes, but this is a bad measurement since it depends on which machine you used. A
better measurement is to calculate the running time of the algorithm as a function of the input
size.
First, we introduce the big-O notation. We write f(n) = O(g(n)) if there are constants C
1
and C
2
such that
C
1
lim
n
f(n)
g(n)
C
2
.
Intuitively, f(n) = O(g(n)) if they grow at the same speed.
Now, let us analyze the algorithms above. Suppose the input list is of size n. The rst
algorithm requires us scanning the entire list. For each item of the list, it does a constant
number of operations. Hence, it runs in time O(n).
The second algorithm eliminates half of the list each round, so after log n rounds, the entire
list will be eliminated. Each round takes a constant number of operations. Hence, it runs in
time O(log n).
Now, log n grows much slower than n: when we double n, linear search will take twice the
amount of time, but binary search takes only one extra unit of time! Hence, we say that binary
search is much faster than linear search.
1.4 Languages
Finally, we talk about programming languages. We saw that the goal of computer science is to
solve problems by writing down a series of instructions for the computer. We did not specify
what instructions we were allowed to use in the algorithm; the programming language consists
of all the instructions that we are allowed to use.
Why programming languages? After all, machines already have their own instruction set!
Well, the machines instruction set is remarkably low level: add two numbers, store result
in location 0x754354 in memory, and advance program counter to 0x0003e800, etc. People
generally dont think this way. Programming languages is an intermediate: it provides enough
abstractions, so that programmers can think in that language, but is also easy to translate into
machine language. This translation is done automatically by a program called the compiler.
What are necessary features of programming languages? It turns out that in order to do
interesting computation, all languages need a way to store/load values from memory, to do
arithmetic, and to change the program counter conditionally. Most languages build on top of
the C-abstraction by providing the following features:
1. The ability to declare variables and store values into the variables. The programmer
doesnt need to worry about where the variables are stored in memory; they can use
variables just as they use regular values. The compiler takes care of that.
5
2. Basic arithmetic, as well as complex expressions like 2 << 3(4+5/(52)). These complex
expressions are parsed by the compiler, and converted to several machine instructions, akin
to compute 5-2, compute 5/ans, compute 4+ans, etc.
3. If statements, which allow you to execute some instruction only when a condition is
true. Optionally, you can include an else statment with the if statement, so that if the
condition is true, certain instructions are run, and otherwise, the other instructions are
run. If statements allow you to change the program counter conditionally, albeit to some
xed location.
4. Loops, which allow you run a block of code multiple times (for loop), or to run it
while a certain condition is true (while loop). The idea is that you make incremental
changes/computations, and after running a few times, you nish the computation or the
condition becomes no longer true.
5. Functions, which is a block of code that may take some input and may return an output.
You can call a function elsewhere in your program, which runs that block of code. This
allows you to avoid copy and pasting code: if, for example, you need to search a list very
often, it makes sense to put it in a function, so that every time you search a list, you just
call the function. Functions also give you the power of recursion: for instance, to calculate
n!, we could have a function named f, which returns n f(n 1).
6. Input/output functions, which allow you to print messages to the user and to read mes-
sages. It also allows you to read/write from les.
The following list of features are less universal but still useful:
1. Some languages (C, Java, ML) are typed, meaning that all variables must be declared as
being an integer, a oat (decimal), character (like a), string (a sequence of characters),
or some other type. If you declared a to be a string, then the compiler would issue an
error when it sees something like a + 1, since it makes no sense to add an integer to a
string. This allows you to catch many bugs. Often, these languages allow you to create
types.
2. Some languages (C++, Java, Python) are object-oriented, meaning that you can create
objects from a certain template called a class. For example, you can have an object of class
window which contains objects named close button (of class button), minimize
button (of class button), etc. Classes can have methods, which are functions that
operate on objects of that class. For example, the button class can have the method
(click), which is called when you click the button. The click method belongs to specic
objects, in the sense that when I call click for the close button, the click for minimize
button is not called. Moreover, the window class can also have a click method, that is
independent of button. Finally, since the click method for window and button might do
many similar things, I can have another class called widget with a click method, and
the window and button class can inherit from the widget class. This way, I dont need
to rewrite code that is common to both classes. The advantages of object orientation is
self-containment: once I wrote code for buttons, I do not need to worry about where the
button is used, since all the logic for buttons is contained in the button class. I also dont
need to worry about other code interfering with my button code.
3. Some languages (Python, ML) are functional, meaning that functions can be stored in
variables, and you can manipulate these functions to create new functions. This is mostly
a new trend.
6
Chapter 2
Introduction to Python
2.1 Basic Statements
2.1.1 Input and output
2.1.2 Variable Assignments
2.1.3 Data types
2.1.4 Loops
2.2 Functions
2.2.1 Execution stack
2.2.2 Library
2.2.3 Recursion
7
Chapter 3
Search Algorithms
3.1 Linear Search
3.2 Binary Search
3.3 Asymptotic analysis
8
Chapter 4
Sorting Algorithms Part 1
4.1 Insertion Sort
4.2 Selection Sort
9
Chapter 5
Sorting Algorithms Part 2
5.1 Merge Sort
5.2 Quicksort
10
Chapter 6
Introduction to Object Oriented
Programming
6.1 Classes, instances, methods
6.2 Inheritance and Object Oriented Design
6.3 Various Examples
6.3.1 Linked List
11
Chapter 7
Data Structures
7.1 Stack
7.2 Queue
7.3 Binary Tree
7.4 Hash Table
12
Chapter 8
Glimpse: Computational Models
8.1 Finite/Cellular Automata
8.2 Turing Machines
8.3 Logical Circuits
8.4 Quantum Machine
13
Chapter 9
Glimpse: Machine Learning using
Python
9.1 Training/Test Sets
9.2 Measuring Performance
9.3 Supervised Learning
9.3.1 Decision Tree
9.3.2 Essemble Learning
9.3.3 Neural Network
9.3.4 Support Vector Machine
9.4 Unsupervised learning
9.4.1 K-means clustering
9.4.2 Principal Component Analysis
14
Chapter 10
Glimpse: C programming
Without doubt, C is the most inuential language in computer science. It is perhaps dicult
to learn, but it is elegant and gives good insight on how computers work. We start with an
introduction to C before moving on to Python.
10.1 Basic Syntax
Syntax is the rule of forming correct C-statements. A program can be syntactically correct
while producing incorrect answers; syntactic correctness only allows the compiler to recognize
the program and translate it to machine language.
The easiest way of learning syntax is to look at a sample program:
#include <stdio.h>
int main()
{
printf("Hello World\n");
return 0;
}
To compile this program, type in the command line:
gcc -o hello hello.c
This really simple program just prints the message Hello World to the screen, but it
enumerates many aspects of C syntax.
The rst line, #include <stdio.h>, is a preprocessor directive. It is not actually an exe-
cuted instruction, but is rather an instruction for the compiler. It tells the compiler to
take the contents of stdio.h and dump it at the beginning of the le before compiling. The
le stdio.h is in a location known to the compiler, and it contains the functions for input
and output (stdio=standard input and output).
The second line, int main(), is the declaration of a function called main. This function
takes no inputs and returns an integer. When you run the program, main will be the rst
function to be called, and after it runs, the operating system looks at what main returned
to determine if there were any errors. If main returned 0, this means no error occured.
The curly braces enclose a block of code that is associated with the function main.
The fourth line, printf(. . . ), makes a call to a function called printf (which stands for print
format, not print function). This functions is in stdio.h, and takes in a string to print
(hello world\ n; \ n is a symbol for new line). All C-statements must be terminated by
a semicolon.
15
The fth line returns 0 since there is no error. return is a keyword: you cannot use it as
the name of a variable or function.
10.1.1 If statements
Let us now look at if statements in C.
#include <stdio.h>
int main() {
int a;
printf("Enter an integer: ");
scanf("%d", &a);
if (a>=0) {
printf("|a|=%d\n", a);
}
else {
printf("|a|=%d\n", -a);
}
return 0;
}
Line 3, int a;, declares a variable called a that stores an integer value.
Line 5, scanf(. . . ), is the C-input function. The rst argument, %d, is called a format
string. The %d tells scanf that you expect the user to input an integer. The second
argument, &a, tells scanf to store the input into a. The funny & causes the address of
a to be passed scanf instead of the value. To understand why this is necessary, consider
this from the perspective of scanf. If a = 0, and you call scanf(. . . , a), when the program
is run, you pass 0 to scanf, not the name a; hence, scanf doesnt know where to store the
variable. On the other hand, when you pass &a, you pass the address of a to scanf, so
scanf can store the variable into a.
Why doesnt scanf just return a? This has a lot to do with C conventions, which dictate
that functions that return can return multiple values ought to do so via the arguments,
and should return errors.
16
Chapter 11
Glimpse: Systems Programming
11.1 Compilers
11.2 Operating System
17
Chapter 12
Glimpse: Web Programming Part 1
12.1 HTML
12.2 Javascript
12.3 CSS
18
Chapter 13
Glimpse: Web Programming Part 2
13.1 HTTP protocal
13.2 Server side programming
13.3 Databases
19

You might also like