You are on page 1of 120

Part II

Elements of Programming

29

Chapter 4

Coding & Computation


4.0.14

Coding & Computation1


or

What's all this about 1s and 0s?

4.0.15
Topics
Aim: Communicate certain fundamental concepts about computers and
computing. Relax.
Digital encoding
(Digital) computation
Fundamental programming
Applied programming

1 File:

coding-computation-slides.tex.

31

32

CHAPTER 4. CODING & COMPUTATION

4.0.16
Digital encoding
Digital discrete

(Analog continuous)

Encode = to put into or represent with a code


Code \a system of symbols"

. . . used for communication (Morse code, braille, \one if by land,. . . "), for
instructing computers (machine code), &c.

Code are discrete: elements are in or out. Period.

4.0.17
Digital encoding: Examples of codes
Written words: \Bob," \Carol," \Ted," \Alice"

Words \stand for" things, events, relations, etc. Make up sentences, etc.

Letters of the alphabet

What an idea! With a few letters, all the words; with a few thousand
words, an innite number of sentences.

4.0.18
Digital encoding: Examples of codes
Morse code: Dots:
A
B
C
D
E
F
G
H
I

Dashes:
J
K
L
M
N
O
P
Q
R

is
(There's more for other symbols, e.g. A

S
T
U
V
W
X
Y
Z

33

4.0.19
Digital encoding: Comments on Morse code
How many elementary / primitive symbols? Answer: 2. Why?
How many symbols do we need to cover the letters of the 26 letters of the
alphabet?
Answer: 21 + 22 + : : : + 2n until > 26, i.e., n = 4.
Note: Why not just 25 ?
Why do the dierent letters have the symbol patterns they have?

4.0.20
Digital encoding: Examples of codes
Braille: array of 32 = 6 dots, raised or not.
Character codes in computing
1. ASCII (American Standard Code for Information Interchange)
Standard in computing. See the IDT book.
2. EBCDIC: IBM mainframes
3. Unicode: For all the world.

4.0.21
ASCII
Binary code: 1s and 0s.

7 or 8 bits (= binary digits)

P is decimal 80 = 8 101 + 0 100


or in binary = 027 +126 +025 +124 +023 +022 +021 +020
or 01010000 in 8-bit binary. 01010000 in 7-bit binary, with parity even.
11010000 in 7-bit binary, with parity odd.

34

CHAPTER 4. CODING & COMPUTATION

4.0.22
Number coding systems
Decimal (10-based) has 10 symbols possible per `slot': 0, 1, 2,. . . , 9
Binary (2-based) has 2 symbols possible per `slot': 0 and 1
Octal (8-based) has 8 symbols: 0, 1, 2, 3, 4, 5, 6, 7, and 8
Hexidecimal (16-based) has 16 symbols: 0, 1, . . . , 9, A, B, C, D, E, and F
Number coding can be in any base: 2, 3, . . . , n. Why base 2 for computers?
Why base 8, base 16?

4.0.23
Digital encoding.
We encode from a list of atomic symbols (e.g., the alphabet) and compose more complex things by combining these symbols (e.g., words are
composed of letters of the alphabet, sentences are composed of words).
At the most abstract, general level, we can use numbers to be our atomic
symbols (numerals, actually).
So, e.g., P = 80 decimal = 01010000 binary in ASCII.
Let dash (

) = 0, dot ( ) = 1. Then in Morse code, P = 1001 binary

4.0.24
Digital encoding: Comments
Conventionality (arbitrariness)
Why not P = 1010111? etc.

Generality

Can one type of encoding encode everything that another type of encoding
can encode? Does it matter whether we do decimal or binary?

Why? (Prove it!)


Wait: 01010000 is a (binary) number, yet it's an encoding of P. Which is
in Morse code? Or a sound or a
it? How can you tell? Why isn't it AH
picture or a movie?

35

4.0.25
Digital computation
. . . or a program instruction?
Roughly, a computation is a manipulation and/or recognition of a digital
encoding
A computer is a machine that does computations, that manipulates, recognizes, and acts on digital encodings
Our computers work on binary (1s and 0s) encodings. Why?

4.0.26
Digital computation
Is that all? Can't some computers do more than others?
Yes, that's possible. Size, speed, of course.
Actually, just a few instructions are sucient to compute all possible
manipulations on binary digital encodings, and these are in turn fully
general.
An amazing fact.

4.0.27
Digital computation
So, all real computers are fundamentally equivalent, just some are bigger
and faster than others.
What about interacting with the world? `I/O' as we say.
Same thing, just hooked to I/O devices.

Clarication: manipulations aren't just arithmetic; they're anything (on


binary encodings).

36

CHAPTER 4. CODING & COMPUTATION

4.0.28
Fundamental programming
Computers run by executing program instructions, one after another.
The program instructions instruct the computer to manipulate, recognize,
and act upon digital (binary) encodings.
How are program instructions represented to the computer? As binary
encodings. What about the data used by the program? Same thing. How
does it know?
Basic cycle: (1) fetch the next instruction (from memory into the CPU),
(2) execute the instruction, (3) gure out where the next instruction is
and go to (1).

4.0.29
Applied programming
Don't like them 1s and 0s (machine language)
So, `higher-level' languages: metaphor
Compiler: takes your `higher-level' jottings and translates them into machine language, so your program can be executed.
Note: machine language programs are specic to the machine type you
are running: Intel X, Intel Y, Macintosh, Sun, Alpha, IBM, etc.

4.0.30
Applied programming
Interpreter: Accepts a compact `semi-compiled' (byte-code) version of
your jottings and executes it by translating it on the y to machine code
and sending it o for execution. Visual Basic, Excel.
Possibility: Interpreters for each type of mahcine (hardware), but all can
then execute the same byte-code. \Write once, run everywhere."
Think of the Internet.
And: Java.

4.1. BIBLIOGRAPHIC NOTE

37

4.0.31
Applied programming: On the Internet, etc.
Where does/can code execute?
On your PC (your client, whether Wintel, Mac, Linux, Unix,. . . ): Your
browser (Internet Explorer, Netscape, Hot Metal, . . . )
On your PC: Spreadsheet programs, word processing programs, . . .

4.0.32
Where does code execute? (con't.)
On the server: The Web server program that responds to requests from
your browser and serves up les to you.
On the server: Business programs that run in response to your inputs from
your browser: shopping carts, billing, etc. (Think of buying something on
the Web, or participating in an auction.)
On the server: Business programs that create HTML pages and send them
to you on the y. PHP, ASP, JSP.

4.0.33
Where does code execute? (con't.)
On your PC, via your browser: JavaScript, VBScript for graphics and
animation (but primitively)
On your PC, via your browser: ActiveX (Microsoft) and Java applets

Downloaded in real time from the server! Why? Pluses? Minuses? Worries?

/* $Header$ */

4.1

Bibliographic Note

A delightful introduction to many of the topics in this chapter is Code: The


Hidden Language of Computer Hardware and Software, by Charles Petzold,
Microsoft Press: Redmond, Washington, 2000.
/* $Header$ */

38

CHAPTER 4. CODING & COMPUTATION

Chapter 5

Why Program?
People program computers for all sorts of reasons and purposes. Without undue
abuse of reality, we can classify programming activities by degrees of diculty,
or required know-how. In increasing levels of technical challenge we have:
1. End-user programming
2. Utility-and-analysis (U&A) programming.
3. Applications programming
4. Systems programming
An end-user is anyone who interacts with a computer program|such as a
spreadsheet, presentation software, or a word processor|in order to accomplish
a task. Putting together a nontrivial spreadsheet and using it to solve problems counts as, and is perhaps the prototypical case of, end-user programming.
Typically, end-user programming is accomplished through visual or graphical
interfaces. The end-user manipulates these interfaces in order to send instructions to the program. Think|again, prototypically|of selecting a range in
Excel and clicking on a menu in order set the color displayed in the range. Similarly, formatting in word processors (e.g., Microsoft Word, among others) and
presentation software (e.g., Microsoft PowerPoint, among others) is a form of
end-user programming that is usually done by manipulating a graphical user
interface. End-users program because no one else will do it for them. The hope
is that an end-user can use the software as tool to solve the problem at hand,
and can do this faster, cheaper, and more eectively than a specialist programmer. Given the requisite domain knowledge and an appropriate softwtare tool,
the end-user can proceed expeditiously to solve the problem at hand, without
having to take the time to explain the substance of the problem to a technician. Such hopes are in fact often reasonable and indeed fullled. End-user
programming is a main topic of this book.
At the other end of the technical challenge scale, systems programming includes writing operating systems, pieces of operating systems (such as device
39

40

CHAPTER 5. WHY PROGRAM?

drivers), compilers, database systems, communications software. It also includes


high-end applications programming tasks such as real-time systems and parallel
computing software. Programming at this level requires professional-level skill
and dedication. C, C++, and assembly language are representative programming languages for systems programmers. An education in computer science is
all but necessary as a background to this profession, although there are many
who have learned on the job, by apprenticeship.
Applications programming encompasses the great bulk of computer programming in businesses and organizations generally. Transaction processing
systems, systems that handle the conduct of commerce, such as sales and accounting systems, are prototypical examples. These systems are commonly written in Cobol, C/C++, and Java. Because of their \mission critical" nature they
are usually built with a great deal of care. Perhaps the most important aspect of
this is attention to system requirements, which often are largely peculiar to the
host organization. Thus, systems analysis and design is critical for success of
these systems. Ideally, when analysis and design are done well, programming|
or coding as it is called|becomes a fairly straightforward task. Many business
school and engineering school graduates enter the job market in an applications
programming context. Often, these graduates will have studied Management
Information Systems. Initially trained to do coding (e.g., by the consulting rm
that hires them), most of these people will quickly move on to systems analysis
and to ascertaining business requirements, and then into general management,
where they will continue to confront the problems of obtaining and maintaining
mission critical information systems for their organizations.
We are centrally concerned in this book with end-user and U&A programming, and only peripherally concerned with systems programming and applications programming. Since end-user programming is a familiar concept, we will
dwell more at length on U&A programming.
Utility-and-analysis (U&A) programming sits between end-user programming and applications programming. Like applications programming, U&A
programming usually involves programming with a language, rather than by
manipulating a graphical user interface. Scripting languages such as Visual
Basic, Perl, Python, and HyperTalk are the prototypical U&A programming
languages. The motivations for U&A programming are much the same as those
for end-user programming. A task is at hand, an analyst or other professional
who is not primarily a programmer is charged with completing the task, and
the end-user tools available are not sucient. Often it makes excellent sense to
invest some time and eort to write \one-o" or \glue" programs that solve the
problem at hand and that aspire to not much else. Here are some examples;
there are others.
Programming end-user tools.

Programming visually, as end-users do, has the advantage of being very


easy and the disadvantage of being rather limited in what it can do. For
this reason, scripting languages such as Visual Basic for Applications (from
Microsoft) have been created for manipulating end-user tools under pro-

41
gram control. In Excel, Word, PowerPoint and other end-user tools such
programs are called macros, and they can be very valuable indeed. Writing
a macro is a prototypical case of U&A programming. Prototypical tasks
for macros include automated loading and formatting of data from an external le, and automation of large or complex tasks involving repetitions
of certain subtasks.
Ad hoc modeling and analysis.

Formulas in spreadsheets can easily be used to build useful models for


many business purposes, especially for nancial analysis. That is in fact
what spreadsheets were originally designed for. Once these models reach
a certain, surprisingly low, level of complexity, they become dicult to
validate and maintain. This invites falacious decision making. Experience
has shown the invitation is frequently accepted. Building the more complex models in separate scripting languages facilitates validation, maintenance, and reuse outside any particular spreadsheet. As a model grows
in sophistication and importance it may be handed o to an applications
programming context. Having the model written in a scripting language
facilitates this, too.

Internet-related tasks.

Extracting information from emails, from ftp sites, or from Web pages
is often valuable, if not necessary. Modern scripting languages facilitate
doing this under program control and on large scale. A better alternative
than doing it manually.

Data cleaning and formatting.

A very common problem for business analysts is to clean up and properly


format a given set of data, as a prelude to analyzing it and presenting the
results to customers. For example, data may come from several dierent
databases and need to be reformatted and mapped into Excel. The data
may also involve thousands of records, precluding doing this manually.
Scripting languages are rst-rate tools for this sort of thing.

Text formatting and information extraction.

Text, which is the source of so much information, is an even greater challenge than data when it comes to cleaning and formatting for subsequent
analysis. Again, modern scripting languages are excellent tools for this
purpose.

Rapid assembly of applications.

It is often possible and desirable to build special-purpose business applications by assembling them from existing software. Microsoft Oce is often
used this way. A decision support system (DSS) for a special purpose
or analysis project is assembled|\glued together"|from Excel, Access,
Word, and even PowerPoint. The user-analyst sees an Excel interface from

42

CHAPTER 5. WHY PROGRAM?


which models are run, data are extracted and saved to an Access database, reports are generated in Word and PowerPoint|all largely under
program control. The user enters information, makes choices, and clicks
on buttons; the software system does the rest.

As should by now be apparent, good U&A programmers nd themselves


signicantly empowered in many business contexts and seldom lack for opportunities to employ their skills. Our goal here is to get you started on the elements
of U&A programming. We will see that with only a little eort valuable skills
and the resulting empowerment are quite achievable.
This brings us to choice of a scripting language. There are many languages
appropriate for our purposes. The top of any shortlist would include Visual
Basic (for Applications), Perl, and Python. Since we can't study them all we
have to pick one for our main focus, and we have chosen Python. Visual Basic
would probably be our second choice; we shall have occasion to discuss it in
what follows. Here are the main considerations behind this choice.
The strengths of Visual Basic for Applications (VBA) are substantial. Microsoft is committed to it, continues to develop it, and has built it into its Oce
products. Thus if you have Excel you have VBA too. Visual Basic, a standalone superset of VBA, is one of the most popular programming languages in
the world, perhaps the most widely known. Employers of business analysts very
often expect, in the sense of anticipate, that U&A programming will be done
in VB or VBA. There are many (smallish) applications programming projects
that are done in VB. VB/VBA is very good at building graphical user interfaces and for programming other Microsoft applications. VBA comes with a
good development and debugging environment.
VBA also has limitations. Here are some of them. It only works on Windows
machines. It now lacks a rich library of general programming|and U&A!|
tools, e.g., for text handling, for mathematical computations, for Internet programming. VB, unlike VBA, costs money after you've already bought Oce,
and regular upgrades (costing more money) are more or less mandatory.
Python's strengths include these. Python is open source software backed by
a large and committed community with a strong track record of maintaining and
improving it. Python has achieved excellent acceptance as a scripting language
and is in fact widely used (although not nearly so as VB or even Perl). Python
is free (at http://www.python.org, among other places). This is especially important if you have multiple machines on which you want to run Python and
you balk at paying for multiple licenses for VB. Python is available for Windows
machines, Macintoshes, and Linux machines (among others). The code will run
(pretty much) identically on any installation.
Python is a nice, well-designed language. It is simple and easy to learn, and
has been taught successfully to rst-time programming high-school students.
Python can be run in interactive mode, thereby facilitating exploration and
debugging. It can also be run in script mode, as is needed for real applications.
Python works well as a stand-alone language, usable in essentially all modern
computing environments. It comes with a number of well-considered high-level

5.1. INFORMATION SOURCES

43

features that make certain U&A programming tasks very easy. Fundamentally a
scripting language, Python programs can have GUIs (graphical user interfaces),
although we will not discuss them. Python is designed to be extensible and is
continually the beneciary of extensions built by a large community of U&A
programmers. Python's extensions for Internet programming, for mathematical
and numerical computation, for string (or text) manipulation, and for manipulating Microsoft programs (such as Excel, Access, Word, and PowerPoint) are
especially useful for U&A programming problems.
Finally, the skills one acquires in learning any good programming language
transfer readily when learning a new language. Learning Python and a little
VBA, as we shall be doing, positions you well for learning VB and VBA should
that become useful. On the other hand, there are important things to learn
that do not t well (or at all) with VBA and that are easily done in Python.

5.1

Information Sources

What follows in this book about Python is meant to ll in the gaps and slightly
extend introductory material on Python. That material is readily available for
free on the Web. See the Python home page for everything to do with Python,
including documentation and tutorials:
http://www.python.org/
At
http://www.python.org/doc/Newbies.html
you will nd a number of Python tutorials. I recommend reading rst \A NonProgrammer's Tutorial for Python," by Josh Cogliati, at
http://www.honors.montana.edu/~jjc/easytut/easytut/
After that, you might try \How to Think Like a Computer Scientist," the
Python version of Allen Downey's open source book, with Je Elkner. It's
at
http://www.ibiblio.org/obp/thinkCSpy/
Then there's the \Python Tutorial," by Guido van Rossum, with Fred L. Drake,
Jr., editor. Guido created Python. This tutorial is excellent, but it's aimed at
people who are experienced C programmers. If you aren't one, many of the
remarks meant to clarify things will be utterly mysterious. Still, it's a good
source. Guido's tutorial is at
http://www.python.org/doc/current/tut/tut.html
In addition, Python's online documentation is excellent. The \Global Module
Index (for quick access to all documentation)" and the \Library Reference (keep
this under your pillow)" come with the Python installation and are available at
http://www.python.org/doc/.
Several good books on Python are in print, but I think the free tutorials
listed above, along with these notes, and the online documentation should be
plenty. If you are after all a book person, here's a short list.
1. Learning Python by Lutz and Ascher [14] is a good introduction to Python
programming, although not to programming in general.

44

CHAPTER 5. WHY PROGRAM?


2. Python Essential Reference by Beazley [2] is a terric|even essential|
reference for anyone starting out new to Python with a nontrivial programming task at hand.

File: why-program.tex

Chapter 6

Visual Basic for


Applications: A Brief
Tutorial
6.0

Notes

6.0.1

Visual Basic for Applications (VBA)1


Goals:
Familiarity & experience with solving problems algorithmically.
Familiarity with Visual Basic for Applications (VBA). VBA/Excel dialect.
Empower you to build useful applications and to learn more on your own.
Get you comfortable in a programming environment.
Note: But you have to do a lot of work on your own! Lectures are to provide
you a map. You must make the trip yourself. (cf., VBATutor.xls)

1 File:

misnotes-vba-slides.tex.

45

46CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.2
Goals for lecture 1.
1. Introduce the basic concepts of macros in Excel (which are written in
VBA).
2. Show how to record and run a macro, and examine and edit its code in
the Visual Basic Editor.
3. Use the Visual Basic Editor to create a simple VBA program (Sub) and
call it for execution from a button on a worksheet.
4. Use the Visual Basic Editor to create a simple VBA program (Function)
and call it for execution from a cell in a worksheet.
5. Introduce the core structure of VBA programs.
(cf., Worksheets("Lecture1") code module Lecture1)

6.0.3
Macros
Programs in VBA. R^
ole of VB and VBA for Microsoft.
What is VBAnExcel good for?
{ Assembling, \gluing together," applications in MS Oce. (Larger issue:
\component-based applications.")
{ Utility (small job) programming, e.g., for data preparation and manipulation, for programming the interface, . . .
{ For learning how to program.
{ For prototype programming.
{ For learning about modern software concepts (e.g., OOP) and development
environments (now a good one in Excel for VBA).

6.0. NOTES

47

6.0.4
More on macros
Recording macros
{ Tools ) Macro ) Record New Macro. . .

{ Stop, relative addressing

{ Running the macro: Tools ) Macro ) Macros. . .

{ Viewing the macro: Tools ) Macro ) Visual Basic Editor Alt+F11

6.0.5
Recorded macro, called Bob
Sub Bob()
'
' Bob Macro
' Macro recorded 2/13/98 by Steven O. Kimbrough
'
'
Range("B3:C4").Select
Selection.Copy
Range("A1").Select
ActiveSheet.Paste
Application.CutCopyMode = False
Range("A1").Select
End Sub

6.0.6
Basics of Visual Basic for Applications
VB, VBA, VBAnExcel, VBAnWord,. . .
Macro modules
Subs
Functions
Variables & declaring them
Structure of a VBAnExcel application

48CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.7
A simple

Sub

1. Tools ) Macro ) Visual Basic Editor


2. Insert ) Module

(Not: Class Module)

3. View ) Properties Window


4. Then write some code:
Sub HelloWorld()
MsgBox "Hello world!"
End Sub
Use the VB Help menu item to search on MsgBox. (And use it generally
and often!)
Subs do things, but do not return values. Functions do things, and do
return values.

6.0.8
Now, make it run from Excel. . . . . .
1. View ) Toolbars ) Control Toolbox
2. Select and draw a button.
3. Right-click with the button selected ) Properties. Set the properties as
desired, then close the Property Window.
4. Right-click the button selected ) View Code
5. Add a call to HelloWord:
Private Sub cmdHelloWold_Click()
HelloWorld
End Sub
6. Return to Excel, exit design mode, and click the button.

6.0. NOTES

49

6.0.9
A simple

Function

1. Tools ) Macro ) Visual Basic Editor


2. Select a code module, e.g., the one with Sub HelloWord.
3. Add code and save work:
Function dblUtility(X, Hi, Lo, Risk) As Double
dblUtility = ((X - Lo) / (Hi - Lo)) ^ Risk
End Function
4. Return to Excel and use this function in a cell, just as any built-in Excel
function.

6.0.10
Try these procedures in a code module:
Function dblCubed(X As Double) As Double
dblCubed = X ^ 3
End Function
Sub CubeMe()
Dim dblDaNumber As Double
dblDaNumber = _
InputBox("What number " & _
"would you like to cube?")
dblDaNumber = dblCubed(dblDaNumber)
MsgBox "And the answer is " & dblDaNumber
End Sub
Dim dblDaNumber As Double declares (Dimensions) the program variable
dblDaNumber as type Double (precision oating point).

for line continuations. & for string concatenation.

50CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.11
VBAnExcel Programs.
Are collections of Subs and Functions (and
Declarations). Typically they:
1. Are called (started) from Excel.
2. Read in information. From, e.g., worksheets, dialog boxes, les, databases.
3. Store this information in variables.
4. Computationally manipulate the variables.
5. Write out the information. To., e.g., worksheets, dialog boxes, les, databases.
Basic concepts at hand, details now follow.

6.0.12
Goals for lecture 2.
1. Introduce program variables and how to declare them.
2. Discuss and show how|in VBAnExcel|to read and write information
from and to worksheets.
3. Introduce the Object Browser.
(cf., Worksheets("Lecture2") code module Lecture2)

6.0.13
Recall: VBAnExcel Programs.
Are collections of Subs and Functions (and
Declarations). Typically they:
1. Are called (started) from Excel.
2. Read in information. From, e.g., worksheets, dialog boxes, les, databases.
3. Store this information in variables.
4. Computationally manipulate the variables.
5. Write out the information. To., e.g., worksheets, dialog boxes, les, databases.

6.0. NOTES

51

6.0.14
Variables
Expressions (think of names) in programs that can hold dierent values at
dierent times.
X, Hi, Lo, Risk in the dblUtility function.
Option Explicit
Sub FirstVariableExample()
Dim I As Integer
Dim MyFirstVariable As Integer
'Note: Dim I, MyFirstVariable as Integer
' leaves I an Integer and MyFirstVariable
' as a Variant. Thanks, Bill!
MyFirstVariable = 3
For I = 1 To MyFirstVariable
MsgBox "Showing and counting: " & I
Next I
End Sub

6.0.15
Variables have data types
Types:
{ Integer, Long
{ Single, Double
{ Currency
{ Date
{ String
{ Variant
Why?

52CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.16
Programs (in general, I emphasize):
Declare variables
Put values into variables
Make calculations with variables
Store results of calculations in variables

6.0.17
Declaring variables (in VBA)
Variables should always be declared.
{ Why?
{ Variant if not declared.
{ Use Option Explicit in the declarations section to force declaration of
variables.

Variables have scope. Why?


Declaring variables
{ Dim in a procedure. Scope: that procedure.
{ Private (or Dim) in the declarations section of a module. Scope:
that module.
{ Public in the declarations secion of a module. Scope: entire application.

6.0. NOTES

6.0.18
Value(s) of

MySecondVariable?

Option Explicit
Private MySecondVariable As Integer
Sub PublicExample1()
MsgBox "We're in PublicExample1 " & _
"and MySecondVariable = " & _
MySecondVariable
End Sub
Sub PublicExample2()
MySecondVariable = 23
MsgBox "We're in PublicExample2 " & _
"and MySecondVariable = " & _
MySecondVariable
End Sub

6.0.19
Reading from, and writing to, a worksheet
Sub CosineHardWired()
Dim MyNumber As Double
MyNumber = _
Worksheets("Lecture2").Cells(9, 2).Value
'Note: Cells(9,2) = row 9, column 2 of the
'worksheet.
Worksheets("Lecture2").Cells(11, 2).Value = _
Cos(MyNumber)
'This also works:
'Worksheets("Lecture2").Range("B11").Value = _
'Cos(MyNumber)
'And suppose MyTestRange is defined B2:D13.
'Then this works, too:
'With Worksheets("Lecture2").Range("MyTestRange")
'
.Cells(10, 1).Value = Cos(MyNumber)
'End With
End Sub
Try a nonnumber in B9. Debug. Reset button. Later: the debugger.

53

54CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.20
Generalizing on reading & writing
Reading & writing the Excel worksheet are special cases of getting and
setting
object properties.
So far, the Value property of a particular object, a particular cell.
Why not, say, the color of a cell?
Sub SimpleShowColor()
Dim MyTempHolder As Variant
MyTempHolder = _
Worksheets(2).Cells(14, 2).Interior.ColorIndex
MsgBox "The interior color of B14 is " _
& MyTempHolder & "."
End Sub
LOTS of objects and properties in Excel.

6.0.21
Generalizing on reading & writing (con't.)
Sub GetTheWorksheetName()
Dim Temp As String
Temp = Range("CellBob").Worksheet.Name
MsgBox "The name of the worksheet in which " & _
"the range CellBob resides is " & Temp
End Sub
Sub RenameMeTheWorksheet()
Dim Temp As String
Temp = _
InputBox("New name for this worksheet?")
ActiveSheet.Name = Temp
End Sub

6.0. NOTES

55

6.0.22
The object browser, F2 in the VBA Editor
Displays, and lets you explore, all available objects, methods, and properties. Nifty!
member = (method _ property)
Right-click on a member. Your code. Excel's code.
Example: Look in Excel, Range class, Cells member (a property). Call
for help.

6.0.23
The Object Browser (con't.)
A word to the wise:
Be ever vigilant. No program documentation is ever complete|or
completely accurate|and the VBA on-line Help is no exception.
Some of the descriptions are just plain wrong. Some of the code
samples don't work. Any many, many \gotchas" are left unexplored.
Still, if you take the documentation with a small grain of salt, you'll
nd an enourmous amount of important information there. And the
easiest way to get to the information is via the Object Browser.
|from Excel 97 Annoyances, p. 205.

6.0.24
Goals for lecture 3.
1. Introduce control structures.
2. Introduce and discuss arrays (and their uses).
3. Discuss calling procedures from within procedures.
(cf., Worksheets("Lecture3") code module Lecture3)

56CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.25
Control structures: For...Next
For doing something a known number of times.
You've seen this already.

(e.g., Sub FirstVariableExample())

For <counter>=<start> To <end> [Step <incr>]


[statements]
Next [<counter>]
<counter> indicates a required counter expression (number or variable).
[Step <increment>] indicates that optionally you have have the symbol
Step followed by (and now mandatory) an increment expression (number or
variable).
And so on, generally.

6.0.26
Control Structures: If...Then
For doing something (THEN) on condition that something else (IF) is true
(else, skip and continue).
If <condition> Then
[statements]
End If

6.0.27
Control Structures: If...Then...Else
For doing something (THEN) on condition that something else (IF) is
true; otherwise doing the ELSE clause.
If <condition> Then
[if-statements]
Else
[else-statements]
End If
Always: Either the if-statements are executed (when <condition> is true),
or the else-statements are executed (when <condition> is false).

6.0. NOTES

6.0.28
If...Then...Else example
Code for a simple comparison of two (cell) values.
Sub SimpleCompare(Left, Right)
If (Left <> Right) Then
MsgBox "The two values are different."
If (Left < Right) Then
MsgBox "The Right value exceeds the Left."
Else
MsgBox "The Left value exceeds the Right."
End If
Else
MsgBox "The two values are the same."
End If
End Sub

6.0.29
If...Then...Else example (con't.)
The button code to call this Sub.
Private Sub cmdSimpleCompare_Click()
Dim Left
Dim Right
'Both Left and Right will be Variants
Left = _
Cells(4, 2).Value
Right = _
Cells(4, 3).Value
Call SimpleCompare(Left, Right)
'This will also work:
'SimpleCompare Left, Right
End Sub

57

58CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.30
An alternative, using If...Then...ElseIf ...
Sub SimpleCompareElseIf(Left, Right)
If (Left = Right) Then
MsgBox "The two values are the same."
ElseIf (Left < Right) Then
MsgBox "The two values are different."
MsgBox "The Right value exceeds the Left."
ElseIf (Left > Right) Then
'The following also works:
'ElseIf True Then
MsgBox "The two values are different."
MsgBox "The Left value exceeds the Right."
End If
End Sub
Which is better code? Why? What about ElseIf (Left > Right) Then versus ElseIf True Then?

6.0.31
Control structures: Select Case
A way of generalizing If...Then...Else...
Select Case <test expression>
Case <1st expression list>
[1st statements]
Case <2nd expression list>
[2nd statements]
:
:
Case Else
[else statements]
End Select

6.0. NOTES

6.0.32
Example of Select Case
Function Bonus(performance, salary)
'This function is from the
'Microsoft VB Help files, on Select Case.
Select Case performance
Case 1
Bonus = salary * 0.1
Case 2, 3
Bonus = salary * 0.09
Case 4 To 6
Bonus = salary * 0.07
Case Is > 8
Bonus = 100
Case Else
Bonus = 0
End Select
End Function

6.0.33
Somewhat better
Function dblBonus(performance As Integer, _
salary As Double) As Double
If (performance < 1 Or performance > 10) Then
dblBonus = -9999
Exit Function
End If
Select Case performance
Case 1
dblBonus = salary * 0.1
Case 2, 3
dblBonus = salary * 0.09
Case 4 To 6
dblBonus = salary * 0.07
Case Is > 8
dblBonus = 100
Case Else
dblBonus = 0
End Select
End Function

59

60CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.34
Control structures: Do...Loop
Alternative to For...Next
Dierences?
Two versions, two cases each:
Do {While | Until} <condition>
[statements]
Loop
Do
[statements]
Loop {While | Until} <condition>

6.0.35
Arrays
Fundamental, but basic, data structures. Very widely used in programming.
Similar to vectors and matrices in mathematics.
{ But can have more than 2 dimensions
Can have 1, 2, 3, . . . dimensions
Named much as are variables
Great for capturing a range on a worksheet.

6.0. NOTES

61

6.0.36
Reversing the contents of a column range
See VBATutor.xls, "Lecture3 Arrays" worksheet, Lecture3Arrays code module. Line numbers below added.
[1] Sub ReverseRange(FromRange As Range, _
ToRange As Range)
[2] Dim intFromLength As Integer
[3] Dim intToLength As Integer
[4] Dim DaReverseArray() As Variant
[5] Dim I As Integer
[6] intFromLength = FromRange.Rows.Count
[7] intToLength = ToRange.Rows.Count
[8] If intFromLength <> intToLength Then
[9]
MsgBox intFromLength & " To length: " _
[10]
& intToLength
[11]
MsgBox "Sorry, the two ranges are " & _
[12]
"not comformable. Exiting."
[13]
Exit Sub
[14] End If

6.0.37
Reversing the contents of a column range (con't.)
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]

ReDim DaReverseArray(1 To intFromLength)


For I = 1 To intFromLength
DaReverseArray(I) = _
FromRange.Cells(I).Value
Next I
For I = 1 To intFromLength
ToRange.Cells(I).Value = _
DaReverseArray(intFromLength + 1 - I)
Next I
End Sub

62CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.38
The button code for calling ReverseRange and for clearing out the range reversed.
Private Sub cmdCallReverse_Click()
Dim FromRange As Range
Dim ToRange As Range
Set FromRange = Range("reverse")
Set ToRange = Range("reversed")
Call ReverseRange(FromRange, ToRange)
End Sub
Private Sub cmdClearReversed_Click()
Range("reversed").ClearContents
End Sub

6.0.39
Command button code for squaring the reverse
Private Sub cmdSquareReversed_Click()
Dim FromRange As Range
Dim ToRange As Range
Dim VectorLength As Integer
Dim FromVector() As Double
Dim ToVector() As Double
Set FromRange = Range("reversed")
Set ToRange = Range("reversedsquared")
VectorLength = FromRange.Rows.Count
ReDim FromVector(1 To VectorLength)
ReDim ToVector(1 To VectorLength)
Dim I As Integer
'Check for numeric input
For I = 1 To VectorLength
If Not IsNumeric(FromRange.Cells(I).Value) Then
MsgBox "Inputs must be numbers. Exiting."
Exit Sub
End If
Next I

6.0. NOTES

6.0.40
(con't.)
For I = 1 To VectorLength
FromVector(I) = FromRange.Cells(I).Value
Next I
Dim MyErrorCode As String
MyErrorCode = "Calling"
Call SquareTheVector(FromVector, _
ToVector, MyErrorCode)
If MyErrorCode = "OK" Then
For I = 1 To VectorLength
ToRange.Cells(I).Value = ToVector(I)
Next I
Else
MsgBox "Failed in cmdSquareReversed." & _
"Error: " & MyErrorCode
End If
End Sub

63

64CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.41
Code for SquareTheVector
Sub SquareTheVector(StartArray, _
ReturnArray, ErrorCode As String)
Dim intStartLower As Integer
Dim intStartUpper As Integer
Dim intReturnLower As Integer
Dim intReturnUpper As Integer
Dim IsOK As Boolean
Dim I As Integer
IsOK = False
intStartLower = LBound(StartArray, 1)
intStartUpper = UBound(StartArray, 1)
intReturnLower = LBound(ReturnArray, 1)
intReturnUpper = UBound(ReturnArray, 1)
If ((intStartUpper - intStartLower) = _
(intReturnUpper - intReturnLower)) Then
IsOK = True
Else
ErrorCode = "Not OK"
Exit Sub
End If

6.0.42
(con't.)
For I = 1 To (intStartUpper - _
intStartLower + 1)
If (Not IsNumeric(StartArray(intStartLower _
- 1 + I))) Then
MsgBox "Error code 303. Bibi."
Exit Sub
End If
ReturnArray(intReturnLower - 1 + I) = _
StartArray(intStartLower - 1 + I) ^ 2
Next I
ErrorCode = "OK"
End Sub

6.0. NOTES

65

6.0.43
Commnets
Code is perhaps more complex than is strictly required.
But, error-checking is awfully important and there could be more of it
here. (How? Why?)
Also, illustrates generality (e.g., arrays must be conformable, but need not
have the same indexing arrangement)
Useful generally for programming in VBA and for programming generally:
drop things into arrays, pass the arrays around and use them to record
computations and for looping through when you do computations.

6.0.44
Goals for lecture 5.
1. Introduce the VBA/Excel debugger.
2. Introduce forms programming.
3. Discuss VBATutorSortem.xls, a spreadsheet for sorting addresses.

66CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.45
Code in Module1 of Excel Workbook,
VBATutorSortem.xls
Option Explicit
Sub MakeAddressesHorizontal(StartRow _
As Integer, _
StopRow As Integer, DaColumn As Integer, _
DaSheet As String)
Dim intDaColumn As Integer
Dim I As Integer
intDaColumn = DaColumn
Dim Temp
Dim strDaSheet As String
strDaSheet = DaSheet
Dim intDaStartRow As Integer
intDaStartRow = StartRow

6.0.46
Code in Module1
Dim intAddressRow As Integer
Dim intAddressStartRow As Integer
Dim boolDone As Boolean
boolDone = False
'intDaStartDow is the first row in which
'an address lies.
'intAddressStartRow = the absolute
'row number in which a single address
'begins
'intAddressRow = the current (relative)
'row of the address (reset to 1 for the
'top of each address.
intAddressStartRow = intDaStartRow

6.0. NOTES

6.0.47
Code in Module1
With Worksheets(strDaSheet)
intAddressRow = 1
Do While Not boolDone
'Do the next address
'if we have two empty rows, we're done.
If (.Cells(intAddressRow + _
intAddressStartRow, _
intDaColumn).Value = "" And _
.Cells(intAddressRow + _
intAddressStartRow _
+ 1, intDaColumn).Value = "") Then
boolDone = True
StopRow = intAddressRow + _
intAddressStartRow + 1
End If

6.0.48
Code in Module1
'if we have only one empty row,
'we start a new address
If .Cells(intAddressRow + _
intAddressStartRow, _
intDaColumn).Value = "" Then
intAddressStartRow = intAddressRow + _
intAddressStartRow + 1
intAddressRow = 1
End If

67

68CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.49
Code in Module1
.Cells(intAddressRow + intAddressStartRow, _
intDaColumn).Select
Selection.Copy
.Cells(intAddressStartRow, intDaColumn + _
intAddressRow).Select
ActiveSheet.Paste
.Cells(intAddressRow + intAddressStartRow, _
intDaColumn).Select
Application.CutCopyMode = False
Selection.ClearContents
intAddressRow = intAddressRow + 1
Loop
End With
End Sub

6.0. NOTES

6.0.50
Code in Module1
Sub SortHorizontalAddresses(StartRow _
As Integer, StopRow As Integer, _
DaColumn As Integer, DaSheet As String)
Dim Top As Range
Dim Bottom As Range
Set Top = _
Worksheets(DaSheet).Cells(StartRow, _
DaColumn)
Set Bottom = _
Worksheets(DaSheet).Cells(StopRow, _
DaColumn + 4)
Range(Top, Bottom).Select
Selection.Sort _
Key1:=Cells(StartRow, DaColumn), _
Order1:=xlAscending, Header:=xlGuess, _
OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom
Cells.Select
Selection.Columns.AutoFit
Range("A1").Select
End Sub

6.0.51
Code in Module1
Sub StartToSort()
frmSortem.Show
End Sub

69

70CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.52
The button code for Sheet2
Private Sub CommandButton1_Click()
Dim DaStartRow As Integer
Dim DaStopRow As Integer
Dim DaMainColumn As Integer
Dim OurDataSheet As String
OurDataSheet = "sheet2"
DaStartRow = 3
DaMainColumn = 2
DaStopRow = 0
Call MakeAddressesHorizontal(DaStartRow, _
DaStopRow, DaMainColumn, OurDataSheet)
Call SortHorizontalAddresses(DaStartRow, _
DaStopRow, DaMainColumn, OurDataSheet)
End Sub

6.0.53
The button code for Sheet1
Private Sub CommandButton1_Click()
StartToSort
End Sub

6.0.54
Code for the Form (\frmSortem") buttons
Private Sub cmdCancel_Click()
Unload frmSortem
End Sub

6.0. NOTES

71

6.0.55
Code for the Form (\frmSortem") buttons (con't.)
Private Sub cmdOK_Click()
Dim DaSheetName As String
Dim DaStartRow As Integer
Dim DaMainColumn As Integer
Dim DaStopRow As Integer
DaSheetName = txtSheetName.Value
DaStartRow = txtStartRow.Value
DaMainColumn = txtStartColumn.Value
Application.ScreenUpdating = False
Sheets(DaSheetName).Select
Call MakeAddressesHorizontal(DaStartRow, _
DaStopRow, DaMainColumn, DaSheetName)
Call SortHorizontalAddresses(DaStartRow, _
DaStopRow, DaMainColumn, DaSheetName)
End Sub

6.0.56
VB List Boxes
To read values into a list box you must point to the values in the \Initialize" part of the code. You would do this to provide a list of options a user
may then select from.
Select or activate the sheet containing those values before showing the
dialog box or user form.

72CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.0.57
VB List Boxes con't.
The code to assign the selection(s) to variables should be placed in the
\Click" part of the code.
In the design mode, right click on the \OK" or some such button and
select \View Code." This is where variables are assigned values after a
button is clicked by the user.
You can call other subs from anywhere in the le to run from a form.
However, they must be called from the \Click" part of the code to be
recognized.

6.0.58
VB List Boxes Tips
Scroll bars appear/disappear automatically depending on the size of the
box and the values read into the box (e.g., if you have a list of 100 items
and design the box to be 2 inches long, a vertical scroll bar will appear).
Set \Integral Height" to false on the properties of a text box if you want
the size of the box to remain xed.

6.0.59
Stepping through Code
Helpful for debugging
From the VB editor, select
Tools j Macro j Step Into

Yellow bar highlights code about to be executed


Move cursor over code already executed to see values assigned to variables

6.0. NOTES

73

6.0.60
General tips
Use the macro recorder and then add control statements and variables.
Document all code thoroughly so that you can follow what has been done,
if it is not already obvious.
Use meaningful variable (and procedure) names (e.g., \Year1" is a good
name for a starting year, but \A" is not).
Type all code in lower case so VB can automatically capitalize letters to
better ensure you do not have typographical errors.

74CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

Figure 6.1: Contents Tab in the Help Menu for VBA

6.1. FIRST STEPS

6.1

75

First Steps

Visual Basic for Applications (VBA) is the macro language for Excel. It closely
resembles Visual Basic, an independent language from Microsoft, and is used
as the macro language for Microsoft Access and Word. In what follows, we will
be talking for the most part about Visual Basic for Applications as it applies to
Excel. We will feel free to call it VBA, EVB, Visual Basic, VB, etc., so long as
the context makes confusion unnecessary.
Macros consist of one or more VBA code chunks. These code chunks|
procedures|are either functions or subroutines. Here are some simple examples.
' Here is a simple function.
Function bob(x)
bob = x ^ 2 + 3.34
End Function
' Here is a simple VBA sub.
Sub ted()
MsgBox "Hello, world!"
End Sub

Use as any other Excel function.

Note: comments begin with a single quote:


'

Everything afterwards in the line is ignored.

In Excel, VBA macros reside on special workbook sheets, called modules.


To make a macro, one may simply create a new macro module and type in the
functions and procedures. More on this shortly.
Information about VBA is published in many readily-available sources. Both
Microsoft and third-parties publish extensive reference manuals and how-to
books for VBA. In addition, VBA closely resembles Visual Basic and there
is a large literature on that. For good online help on VBA in Excel, explore
\Programming with Visual Basic" in the \Contents" window of the MS Excel
help facility (see Figures 6.1 (page 74) and 6.2 (page 76)). We are assuming in
these notes that the reader will do this.

76CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

Figure 6.2: Index Tab in the Help Menu for VBA

6.2

Second Steps

6.2.1

Recording Macros

Macros (VBA procedures) can be recorded. Use Record Macro under the Tools
menu. After selecting Record New Macro, you will be prompted for the name
of this new macro. Either give it a new name, or accept the default. A small
window will then appear with a stop button in it. You click the stop button
when you are done recording your macro. First, however, perform as usual some
action in the workbook, e.g., copy one range of cells to another place. When you
are done, stop the macro recorder by clicking the stop button. In sum, there is
a four-step process to record a macro:
1. Start the macro recorder.
Do this by selecting the menu: Tools / Record Macro / Record New
Macro.
2. Name the macro.
You will be prompted for a name and may accept the default presented

6.2. SECOND STEPS

77

by Excel, e.g., Macro1. Once you have done this, a window appears with
a button for stopping the recording of the marco.
3. Record the macro by performing normal activities in the workbook.
It is wise to plan these out before starting to record.
4. Stop recording the macro.
Do this by clicking the stop macro button.
This creates VBA code in a (usually new) module sheet, which Excel will
call Module1 or some such thing. Module sheets reside with the other sheets of
the workbook. As with the other sheets, you click on the tab to view the module
sheet. When it appears, you will see VBA code against a blank background.
While worksheets present spreadsheets (arrays of cells), macro sheets present
text editors. Thus, you can examine and edit the VBA code.
Notice, in particulate, a couple of things with regard to your new macro
module sheet. First, macro sheets come with a context-sensitive text editor.
For example, comments (lines beginning with an apostophe) come out green (by
default) and reserved words come out blue and get capitalized automatically.
Second, the new macro that you just recorded is a Sub, rather than a Function.
Recording macros and examining the results is a good way of learning about
VB, but it takes you only so far. We need to go further.

6.2.2

Assigning a Macro to a Button or Graphic Object

In order to run (or execute) a Sub macro, including macros created with the
Record New Macro facility in Excel, one can choose to assign the macro to a
graphic object that can call the macro. Assigning a macro to a button or graphic
object is easy. For a previously-existing object, select it (e.g., hold down the
Ctrl key and click on the button or graphic object), then choose Assign Macro...
from the Tools menu. You will be prompted with a list of existing Subs and you
make your choice from the list. That done, you may now simply click on the
graphic object and Excel will call the macro and cause it to be executed.
Note: Typically, you will want to create a new button and assign the macro
to it. Use Create Button from the Drawing icon and draw a new button. Excel
will automatically prompt you to assign a macro.

6.2.3

Functions versus Subs

VBA functions return values (one value each), but cannot take actions otherwise.
VBA subs (subroutines) do not return values, but can take actions. (However,
VBA subs can set the values of variables and these variables may be accessed
by other procedures.) VBA functions, once dened in a macro sheet, may be
used in worksheet cells just as any of the functions Excel has built into it.
Functions and subs may call one another, thus you may create very complex
programs in VBA. We will discuss that later. First things rst. Now, let's look
at variables in VBA.

78CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

6.3
6.3.1

Variables
The Very Basics of Variables

Here is a simple example involving VBA variables:


'
'

Here's an example of two variables in use, along with


a For...Next... control loop

Sub variableExample1()
' Assign the number 3 to the variable, MyVariable
' Note: You make up your own, nmemonic,
' names for your variables
MyVariable = 3
For i = 1 To MyVariable
MsgBox "Showing and counting: " & i
Next i
End Sub
The two variables are: MyVariable and I.
Such program variables are used extensively in this sort of programming.
Variables hold values and their values may change during program execution.
Basically, you make computations and assign the results to variables. Then you
make new computations, based on the assigned values of these variables, and
you assign the results to other variables. And on and on.

6.3.2

Variables Have Data Types

Some variables are for holding numbers, some for text, some for dates, and so
on. VBA has a special type of variable, called the variant type. It can hold
about anything, but in general you should avoid being so loose.
The main data types in VB are
1. Boolean. Values: True or False
2. Integer. Values: -32,768 to 32,767
3. Long (integer). Values: -2,147,483,648 to 2,147,483,647
4. Single (single precision oating point). Values: [lots]
5. Double. Values: [lots more than singles]
6. Currency. Values: [lots]
7. Date. Values: January 1, 0100 through December 31, 9999
8. String. Values: 0 through 65,535 characters

6.3. VARIABLES

79

9. Variant. Values: Any numeric value thru Double or any character text
You set the data type of a VB variable by declaring it. But, if you don't
declare the data type for a variable (as in the variableexample1 procedure,
above), then the default is that the variable is of type variant.
Within a procedure, you may declaire variables with the Dim (dimension)
statement.
' Now here's variableExample1 again, but
' with the variables properly declared
Sub variableExample2()
' Assign the number 3 to the variable, MyVariable
' Note: You make up your own, nmemonic,
' names for your variables
Dim MyVariable As Integer
Dim I As Integer
MyVariable = 3
For I = 1 To MyVariable
MsgBox "Showing and counting: " & I
Next I
End Sub

6.3.3

Local and Global Variables

Variables declared this way (explicitly in a procedure with Dim or as variant


by default) are local to the procedure. That is, you can't refer to them|use
their names and get their values|in other procedures. In fact, as illustrated
in variableexample1 and variableexample2, above, you can actually reuse
the same variable names in dierent procedures. When you do this, you are
really working with dierent variables, which happen to have the same names.
(Advice: except for counters, like I, and explicitly temporary variables, e.g.,
mytemp, don't do this.)
Point of style: It is normally considered good programming practice to declare all your variables explicitly. Why? In Visual Basic, you can enforce this
by declaring
Option Explicit
in the declarations section of each code module. (The declarations section of
a module is the space before the rst procedure{i.e., at the top.) You should
do this. Then, when VB encounters a variable that hasn't been declared, VB
generates an error message. This may initially be irritating, but it's a very good
idea in the long run, since it prevents otherwise undetected errors.
The scope of a variable need not be limited to being local, however. In VBA
in Excel, the scope of a variable may be the procedure in which it is declared (in
which case we say it is local), the module in which it is declared, or the entire
workbook.

80CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL


When the scope is to be local (within procedure only), declare variables at
the beginning of the procedure with the Dim statement. (See also in the Help
facility: the Static statement.) See examples above, procedures variableExample1
and variableExample2.
When the scope of a variable is to be the module in which it is declared,
declare the variable at the top of the module (in the declarations section), using
Dim. (See also in the Help facility: the Static statement.)
When the scope of a variable is to be the entire workbook, pick a module,
and declare the variable in the declarations section using Public (cf., Global).
Here's an example:
' Each module begins with a declarations section, the
' portion at the top, before the procedure declarations
' begin.
' Declare explicit data type checking
Option Explicit
Public MyVar As Integer
Sub publicExample1()
MyVar = 17
MsgBox "We're in publicexample1 and MyVar = " & MyVar
'publicexample2
End Sub
Sub publicExample2()
MsgBox "We're in publicExample2 and MyVar = " & MyVar
End Sub
Note: \Module-level variables remain in existence while Visual Basic is running until the module in which they are dened is edited" (Visual Basic User's
Guide, Microsoft Excel 5.0, p. 121). So play around with this example and see
how this stu works.

6.3.4

Reading from an Excel worksheet into an Excel Visual Basic Variable

Study these examples:


Sub readfromworksheet1()
Dim fromworksheet
' Note that with Cells(1,2) we are referencing
' the first row and second column of the worksheet.
fromworksheet = Worksheets("Sheet1").Cells(1, 2).Value
' The following line works just as well.
'fromworksheet = Worksheets("Sheet1").Range("b1").Value

6.4. BOOLEAN OPERATORS

81

MsgBox "We're in readfromworksheet1 and fromworksheet = " & _


fromworksheet
' Note above, use of "_" as a continuation sign.
End Sub
Sub readfromworksheet2()
' Now assume we have defined a range, called testrange1,
' whose
' scope is B2:D4
Dim fromworksheet
' Note that with Cells(1,1) we are referencing
' the first row and first column of the named range.
fromworksheet = Range("testrange1").Cells(1, 1).Value
' The following line works just as well.
'fromworksheet = Worksheets("Sheet1").Range("b1").Value
MsgBox "We're in readfromworksheet2. fromworksheet = " & _
fromworksheet
' Note above, use of "_" as a continuation sign.
End Sub

6.3.5

Writing from an Excel Visual Basic Variable to a


Worksheet

Just switch from left to right, e.g.,


Worksheets("Sheet1").Cells(1, 2).Value = fromworksheet
The equal sign, =, in this context is an assignment statement. It puts the stu
on the right into the stu on the left.

6.4

Boolean Operators

Often we have to test for the truth or falsity of an expression, for example
MyVar > 7.3
will be true if MyVar has a value that is greater than 7.3. If its value is less
than 7.3 the expression will be false. Note: If MyVar is Null, then the expression
evaluates to Null. See comparison operators. This greatly complicates things
and in these notes, I'll ignore the question of nulls.
So, expressions may be either true or false, in which case we say they have
truth values. Expressions having truth values may be combined using Boolean
operators to yield larger expressions, which also have truth values. The Boolean
operators available in VB are: And, Or, and Not.
Each of these operators has a characteristic truth table, as follows.
Interestingly, many other Boolean (truth-functional) operators are possible.
That is, there are a lot more other truth tables possible. But, these three suce

82CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL


expression1
T
T
F
F

expression2
T
F
T
F

(expression1 And expression2)


T
F
F
F

Table 6.1: Truth Table for And


expression1
T
T
F
F

expression2
T
F
T
F

(expression1 Or expression2)
T
T
T
F

Table 6.2: Truth Table for Or


in that with them any other possible Boolean (truth functional) operator may
be dened. (How would you prove this?) In fact, Not and And are sucient in
this way, as are Not and Or. Here's something of a proof.

6.5. CONTROL STRUCTURES


expression
T
F

83
(Not expression)
F
T

Table 6.3: Truth Table for Not


exp1
T
T
F
F

exp2
T
F
T
F

(exp1 And exp2)


T
F
F
F

Not(Not exp1 Or Not exp2)


T
F
F
F

Table 6.4: Truth Table Showing Denition of And in terms of Not and Or
Can you think of a single Boolean operator that is by itself sucient?
So, we often need Boolean combinations of statements (or expressions) in
programming. The bottom line is that And, Or, and Not are sucient for
expressing anything we can possibly express in this way.

6.5

Control Structures

There are several of these in Visual Basic, and we'll look at a few of them. (And
you should search the online help under \control structures.") We have already
seen one, the For...Next statement.

6.5.1

For...Next

We've already seen this in action (above). The general structure for a For...Next
statement is:
For <counter> = <start> To <end> [Step <increment>]
[statements]
Next [<counter>]
Note: Items in square brackets, [...], are optional. Items capitalized are
required parts of the statement. Items between left and right angle brackets,
<...>, are required to be lled in by the programmer. Thus, valid examples for
the For...Next statement include the following.
For I = 1 To 3
MsgBox "Hello, world!"
Next
Better style is to do this:
For I = 1 To 3
MsgBox "Hello, world!"
Next I

84CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL


Or you can count down, if, e.g., MyIncrement is negative.
For MyCounter = MyStart To MyFinish Step MyIncrement
MsgBox "MyCounter = " & MyCounter
Next MyCounter
Note: Be sure all these variables have reasonable values set for them before
executing this statement.

6.5.2

If...Then...

This is a very useful statement in programming languages. The basic structure


in VB is:
If <condition> Then
[statements]
End If
When an If...Then... statement is executed, the <condition> is tested as a
Boolean expression. If it evaluates to True, then the [statements] are executed;
otherwise they are skipped and processing continues with the next statement,
if any.
Note: The <condition> can also be an expression that returns a numeric
value. If when evaluated it returns 0, that is treated as False. Anything else is
treated as True.
Example:
If Age >= 65 Then
NumberOfDeductions = NumberOfDeductions + 1
End If
Note: The <condition> expression may be complex. It may be an arbitrarily
complex Boolean combination of statements.

6.5.3

If...Then...Else

Probably used even more often than If...Then...


If <condition> Then
[statements to execute if <condition> is true]
Else
[statement to execute if <condition> is false]
End If
You use If...Then...Else when you want to do one thing if a condition obtains, and another if it does not obtain. The =If(...) function in Excel is an
If...Then...Else type of construct. Example: If the value in a certain cell
(or variable) is valid, then display an OK message; otherwise display a not OK
message.

6.5. CONTROL STRUCTURES

6.5.4

85

Select Case

More general than If...Then...Else is Select Case.


Select Case <test expression>
Case <first expression list>
[first statements]
Case <second expression list>
[second statements]...
Case Else
[else statements]
End Select
Here's an example from a popular Excel/VBA book:
Select Case TotalPoints
Case Is < 50
FinalGrade = "F"
Case Is < 60
FinalGrade = "D"
Case Is < 70
FinalGrade = "C"
Case Is < 80
FinalGrade = "B"
Case Else
FinalGrade = "A"
End Select
This runs, but there's a lot that's wrong with it. The following is much better.
Why?
Sub testcase2()
TotalPoints = 173
Select Case TotalPoints
Case 0 To 50
FinalGrade = "F"
Case 50 To 59
FinalGrade = "D"
Case 60 To 69
FinalGrade = "C"
Case 70 To 79
FinalGrade = "B"
Case 80 To 100
FinalGrade = "A"
Case Else
FinalGrade = "Error in TotalPoints: " & TotalPoints
End Select

86CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL


MsgBox "Final grade is: " & FinalGrade
End Sub

6.5.5

Do...Loop

There are really two forms of Do...Loop: condition-at-the-top and conditionat-the-bottom. Here they are:
Do {While | Until} <condition>
[statements]
Loop
and
Do
[statements]
Loop {While | Until} <condition>
where
{While | Until}
gets unpacked as either While or Until. While <condition> means so long
as the condition is true, and Until <condition> means until the condition
is true. The dierence between the condition-at-the-top and the condition-atthe-bottom versions lies mainly in that the condition-at-the-bottom version is
guaranteed to execute its [statements] at least once.

6.5.6

Exiting a Loop

Sometimes you need to break out of a loop. (Don't we all?) If you're in a


For...Next structure, break out with an Exit For statement. If you're in a
Do...Loop, break out with an Exit Do statement. Note: sometimes you have
to do this, but it's generally considered poor programming practice. Why?

6.6

Arrays

Arrays in VB should not be confused with arrays and array commands in Excel,
even though Excel's terminology invites this. All standard third-generation programming languages support arrays, and programs in these languages typically
rely a lot on arrays. Arrays are rather like vectors and matrices in mathematics.
A one-dimensional array is an ordered collection of values, rather like a vector,
which you can access (store or retrieve values) by position. Here's a simple
example.
' From "Code Module5" of vbtutor.xls
Sub arraytester1()
Dim I, MyFirstArray(1 To 6) As Integer

6.7. MISCELLANEOUS TOPICS

87

' Load up the array


For I = 1 To 6
MyFirstArray(I) = I + 3
Next I
MsgBox "MyFirstArray(6) = " & MyFirstArray(6)
' Dump the array into a worksheet
For I = 6 To 1 Step -1
Worksheets("Sheet1").Cells(I, 6).Value = MyFirstArray(I)
Next I
End Sub
Note: You declare an array in much the same way you declare any other variable.
(But see ReDim in the online help.) All of the elements in an array must have
the same data type. Of course, if the array is of type variant, this is pretty
loose. (But you can't have, e.g., arrays within arrays in VB.)
Here's a more interesting example, using a two-dimensional array.
Sub arraytester2()
Dim I, J As Integer
Dim MySecondArray(1 To 10, 1 To 20) As Single
' Load up the array and dump, forcing
' type conversion from Integer to Single
For I = 1 To 10
For J = 1 To 20
MySecondArray(I, J) = Sin(I + J)
Worksheets("Sheet2").Cells(I, J).Value = MySecondArray(I, J)
Next J
Next I
End Sub
We can go on the high-dimensional arrays, but I think you get the idea. In
Excel VB programs, you typically only need one- and two-dimensional (maybe
three-dimensional) arrays.

6.7

Miscellaneous Topics

Now we'll discuss a list of useful things, things|methods and tricks|that didn't
t easily in the previous discussion.

6.7.1

Constants

Constants are like variables, except that they don't change. You use constants
in order to improve the readability of your program and to help reduce errors.
For example, if the maximum number of students in a classroom is 132, and
you need this value a lot in your program, then you might want to consider
declaring a constant. You might do this:

88CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL


Public Const maxstudents As Integer = 132
Then, throughout your program, you can just use maxstudents, without having
to worry about typing 132 or making a mistake and typing some other number.
(Recall: Option Explicit.)

6.7.2

The Copy Method

Suppose you wish to copy one worksheet range to another worksheet range. You
can do this in Excel VBA with the copy method. For example:
Sub copytest1()
' Suppose "carol" is the range B3:C4 on Sheet3 and
' "alice" is E4:F5 on Sheet3.
' The following works:
Worksheets("Sheet3").Range("carol").Copy _
destination:=Worksheets("Sheet3").Range("alice")
' And so does this:
Worksheets("Sheet3").Range("carol").Copy _
destination:=Worksheets("Sheet3").Cells(9, 9)
' and so does this:
Worksheets("Sheet3").Range("b3:c4").Copy _
destination:=Worksheets("Sheet3").Cells(10, 2)
End Sub

6.7.3

Referring to Single Column or Row Ranges

Suppose the name denise refers to a range consisting of a single column. Then
Range("denise").Cells(1).Value refers to the value in the topmost cell in
the range.
Sub democells1()
x = Range("denise").Cells(1).Value
y = Range("denise").Cells(2).Value
MsgBox "x = " & x & " and y = " & y
End Sub

6.7.4

Sorting Worksheet Ranges

See the sort method. In Excel VBA you can direct the sorting of a worksheet
range. For example, the following subroutine sorts the range, DaRange, in worksheet, DaWorkSheet, on the column, DaColumn, in descending order.
Sub sort()
Worksheets("DaWorkSheet").Range("DaRange").sort _
key1:=Range("DaColumn"), order1:=xlDescending
End Sub

6.7. MISCELLANEOUS TOPICS

6.7.5

89

Calling Subroutines from within Other Subroutines

A reasonable and normal thing to do. In fact it's recommended. Suppose


you had a main subroutine, called main, and you wanted it to call three other
subroutines, named mysub1, mysub2, and mysub3. Here's how:
Sub main()
mysub1
mysub2
mysub3
End Sub

6.7.6

Calling Functions from Other Procedures

Very straightforward. See the bob function at the start of this appendix. Then,
here's an example.
Function bobagain(x)
bobagain = bob(x) * bob(x)
End Function

6.7.7

Selecting Ranges

In RangeTest, we illustrate how to select a worksheet range based on the row


and column indexes of the cells in the corners of the range. Then, we call
SimpleChartRange, passing it a variable whose value has been set to a given
range. SimpleChartRange then produces a (simple) chart based on the values
in the range passed to it. I began by recording a macro for SimpleChartRange
and then I modied it to accept and use the range passed to it.
Option Explicit
Sub RangeTest()
'Assume that we have some numbers in
'A1:E1 in Sheet1 of the workbook. Also
'on that sheet we have a button, that
'when clicked calls this sub.
'Begin by declaring MyRange as a Range
'variable.
Dim MyRange As Range
'How to select a range based on the
'row and column indexes of the corner
'cells in the range?
'Here is one way. The corner cells are
'Cells(1,1)--A1--and Cells(1,5)---E1. Nice, huh?
Set MyRange = _
Worksheets("sheet1").Range(Cells(1, 1), _

90CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL


Cells(1, 5))
'Now we put up a message box just to test
'to see that all went well.
MsgBox MyRange.Cells(1).Value
'Now we select the range. This is
'another check on the code.
MyRange.Select
'Now we call a sub that will do
'a simple chart on the range.
Call SimpleChartRange(MyRange)
End Sub
Sub SimpleChartRange(TheRange As Range)
'This sub is a minor modification of
'a simple macro I recorded to chart
'a series of numbers in the range A1:E1,
'on Sheet1 of a workbook.
'The big difference is that I'm
'passing in a range variable, called
'TheRange and using it where the
'original macro used, e.g., "a1:e1".
'You might also refer to slides 36-40
'of the VBA tutorial.
TheRange.Select
Charts.Add
ActiveChart.ChartType = xlLineMarkers
ActiveChart.SetSourceData _
Source:=TheRange, _
PlotBy:=xlRows
ActiveChart.Location _
Where:=xlLocationAsObject, _
Name:="Sheet1"
With ActiveChart
.HasTitle = False
.Axes(xlCategory, xlPrimary).HasTitle = False
.Axes(xlValue, xlPrimary).HasTitle = False
End With
Application.CommandBars("chart").Visible = False
End Sub

6.7.8

The Month Function

The VBA/Excel function, Month will return the number (integer) corresponding
to the month in the date given the function. This could be used to gure out
when the months change in the data.

6.8. BIBLIOGRAPHIC NOTE

91

Sub DaMonth()
'Assume that cell A5 on Sheet1
'contains a value that is a date,
'e.g., 4/15/98.
'The month function will return
'an integer corresponding to the
'month in the date.
Dim OurMonth As Integer
OurMonth = Month(Worksheets("sheet1").Range("a5").Value)
MsgBox "And the month is " & OurMonth & "."
End Sub

6.7.9

Writing to the Status Bar

Here's code for putting a message on the Excel status bar.


Sub PokeTheStatusBar()
Application.StatusBar = "Macintosh forever!"
End Sub

6.8

Bibliographic Note

A good, elementary introduction to Excel VBA (but not to programming itself)


can be found in [13, page 205].

6.9

Verson Notes

File: dt-vbatutor. Created: 951128, from VBTUTORF.DOC. Revised: 951222,


19980502, 19980512.

92CHAPTER 6. VISUAL BASIC FOR APPLICATIONS: A BRIEF TUTORIAL

Chapter 7

VBA Exercises
1. Consider the following code, Sub Question9.
Sub
Dim
Dim
For

Question9()
I As Integer
J As Integer
I = 5 To 1 Step -1
For J = 2 To 6 Step 2
Worksheets("Question9").Cells(J, I).Value = (I / J) ^ 3
Next J
Next I
End Sub
Assuming that the worksheet \Question9" is empty before the Sub Question9 is executed, what is in cell D4 after Sub Question9 is executed?
(a) D4 is empty
(b) 15.625
(c) 1
(d) 0.125
(e) 8
2. Suppose that the worksheet \Question10" has the numbers 1, 2 and 3 in
the range A1:C1 (i.e., A1 has a 1 in it, B1 has a 2 in it, C1 has 3) and
2, 4, and 6 in the range A2:C2 (i.e., A2 = 2, B2 = 4, C2 = 6). Suppose
further that otherwise the worksheet is empty. What is in cell A5 after
the Sub Question10, below, is executed?
Sub Question10()
Dim I As Integer
Dim Huh As Double
93

94

CHAPTER 7. VBA EXERCISES


Huh = 2
For I = 1 To 3
Huh = Huh + Worksheets("Question10").Cells(1, I).Value _
* Worksheets("Question10").Cells(2, I).Value
Next I
Worksheets("Question10").Range("A5").Value = Huh
End Sub
(a) 56
(b) 28
(c) 0
(d) 54
(e) 30
3. Which of the following is NOT a valid data type in Visual Basic for Applications?
(a) Double
(b) Currency
(c) Short
(d) String
(e) Boolean
4. Suppose in worksheet \Question12" A1 = 12, A2 = 3.4, A3 = 3, and otherwise the worksheet is empty. What is in cell B2 after the Sub Question12,
below, is executed?
Sub Question12()
Dim I As Integer
For I = 1 To 3
Worksheets("Question12").Cells(3 - I + 1, 2).Value = _
Worksheets("Question12").Cells(I, 1).Value
Next I
End Sub
(a) 3.4
(b) The cell remains empty
(c) 12
(d) 3
(e) None of the above
5. Suppose the worksheet \Question13" has the following values in the indicated cells in Figure 7.1 (e.g., B2 = 0) and is otherwise empty. What is
in cell D8 after the Sub Question13, below, is executed?

95

1
2
3
4
5
6
7
8
9
10
11
12

A
4
3
2
1
0
-1
-2
-3
-4
-5
-6
-7

B
1
0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10

Figure 7.1: Table for the sub \Question13"


Sub Question13()
Dim I As Integer: Dim AnArray() As Variant
Dim Count As Integer: Dim Response As Variant
Count = Worksheets("Question13").Cells(1, 1).Value
ReDim AnArray(1 To Count)
For I = 1 To Count
AnArray(I) = Worksheets("Question13").Cells(I + 6, 2).Value
Next I
For I = 1 To Count
Worksheets("Question13").Cells(I + 6, 4).Value _
= AnArray(Count - I + 1)
Next I
End Sub
(a) -8
(b) 0
(c) -7
(d) -1
(e) The cell remains empty
6. Suppose the worksheet \Question14" has the values in the indicated cells
in Figure 7.1 (above, question 5, e.g., B2 = 0) and is otherwise empty.
What is in cell C1 after the Sub Question14, below, is executed?
Sub Question14()

96

CHAPTER 7. VBA EXERCISES


Dim I As Integer
I = Worksheets("Question14").Cells(2, 1).Value
Worksheets("Question14").Cells(1, 3).Value = _
Worksheets("Question14").Cells(I + 1, I - 1).Value
End Sub
(a) 1
(b) -1
(c) 2
(d) -2
(e) None of the above
7. Suppose the worksheet \Question15" has the values in the indicated cells
in Figure 7.1 (above, question 5, e.g., B2 = 0) and is otherwise empty.
What is in cell C1 after the Sub Question15, below, is executed?
Sub Question15()
Dim I As Integer
Dim J As Integer
Dim K As Integer
I = Worksheets("Question15").Cells(J + 2, K + 1).Value ^ 2
J = Worksheets("Question15").Cells(I, 1).Value
Worksheets("Question15").Cells(1, 3).Value = _
Worksheets("Question15").Cells(I - J, 2).Value
End Sub
(a) -7
(b) 0
(c) 4
(d) The cell remains empty
(e) None of the above
8. Suppose the worksheet \Question16" has the values in the indicated cells
in Figure 7.1 (above, question 5, e.g., B2 = 0) and is otherwise empty.
What is in cell C3 after the Sub Question16, below, is executed?
Sub Question16()
Dim I As Integer
Dim J As Integer
Dim K As Integer
Dim L As Integer
Do
I = I + 1
Loop While Worksheets("Question16").Cells(I, 1).Value <> ""

97
Do
J = J + 1
L = Worksheets("Question16").Cells(J, 1).Value
K = Worksheets("Question16").Cells(J, 2).Value
Loop Until L + K < 2
For L = 1 To J
Worksheets("Question16").Cells(L, 3).Value = _
Worksheets("Question16").Cells(L, 1).Value
Worksheets("Question16").Cells(L, 4).Value = _
Worksheets("Question16").Cells(L, 2).Value
Next L
For L = J + 1 To I
Worksheets("Question16").Cells(L, 3).Value = _
Worksheets("Question16").Cells(L, 2).Value
Worksheets("Question16").Cells(L, 4).Value = _
Worksheets("Question16").Cells(L, 1).Value
Next L
End Sub
(a) 2
(b) -1
(c) 4
(d) 0
(e) 1
9. Suppose the worksheet \Question16" has the values in the indicated cells
in Figure 7.1 (above, question 5, e.g., B2 = 0) and is otherwise empty.
What is in cell D12 after the Sub Question16, above, is executed?
(a) -6
(b) -9
(c) The cell is empty
(d) -7
(e) -10
10. Excel supports text box validation on Dialog boxes. Which data type does
it NOT support for this purpose?
(a) Reference
(b) Text
(c) Number
(d) Integer
(e) Date

98

CHAPTER 7. VBA EXERCISES

11. Excel supports a number of controls on Dialog boxes. Which control type
does it NOT support for this purpose?
(a) Text Box
(b) Check Box
(c) Scroll Bar
(d) Slider
(e) Spinner
12. TBA
13. TBA
14. TBA
A
1
1

1
2

B
1
0

C
0
1

D
1
0

E
0
1

Table 7.1: Input data for Questions 15 and 16


15. Suppose that worksheet \Question15" holds input data as indicated in
Table 7.1, and that the sub Question15 (Figure 7.2, page 99) is executed.
What appears in the range A4:E5 of the worksheet \Question15"?
(a)

(b)

(c)

(d)

4
5

A
1
1

B
0
1

C
1
0

D
0
1

E
1
0

4
5

A
1
1

B
1
0

C
1
0

D
0
1

E
1
0

4
5

A
1
1

B
1
0

C
0
1

D
0
1

E
1
0

4
5

A
1
1

B
1
0

C
0
1

D
1
0

E
1
0

(e) None of the above.


Hints: (a) See Figure 7.3, page 100, for the denition of the Mod function
in VBA. (b) Perhaps a better name for this sub would be Crossover.

99

Sub
Dim
Dim
Dim
Dim
Dim
Dim
J =
For

Question15()
Top(1 To 5) As Integer
Bot(1 To 5) As Integer
ResultTop(1 To 5) As Integer
ResultBot(1 To 5) As Integer
I As Integer
J As Integer
0
I = 1 To 5
Top(I) = Worksheets("Question15").Cells(1,
Bot(I) = Worksheets("Question15").Cells(2,
J = J + Top(I) + Bot(I)
Next I
J = (J Mod 4) + 1
For I = 1 To J
ResultTop(I) = Top(I)
ResultBot(I) = Bot(I)
Next I
For I = J + 1 To 5
ResultTop(I) = Bot(I)
ResultBot(I) = Top(I)
Next I
For I = 1 To 5
Worksheets("Question15").Cells(4, I).Value
Worksheets("Question15").Cells(5, I).Value
Next I
End Sub
Figure 7.2: Code for Question 15

I).Value
I).Value

= ResultTop(I)
= ResultBot(I)

100

CHAPTER 7. VBA EXERCISES

This example uses the Mod operator to divide two numbers and
return only the remainder. If either number is a floating-point
number, it is first rounded to an integer.
Dim MyResult
MyResult = 10 Mod 5 '
MyResult = 10 Mod 3 '
MyResult = 12 Mod 4.3
MyResult = 12.6 Mod 5

Returns 0.
Returns 1.
' Returns 0.
' Returns 3.

Figure 7.3: Denitional information on Mod from Micosoft's online help for VBA
16. Suppose that worksheet \Question15" holds input data as indicated in
Table 7.1 (page 98), and that the sub Question16 (Figure 7.4, page 100)
is executed. What number appears when the resulting message box (from
the MsgBox J command in the code) is displayed?
(a) 3
(b) 21
(c) 6
(d) 10101
(e) 91
Sub
Dim
Dim
Dim
J =
For

Question16()
Bot(1 To 5) As Integer
I As Integer
J As Integer
0
I = 1 To 5
Bot(I) = Worksheets("Question15").Cells(2, I).Value
Next I
For I = 1 To 5
J = J + (Bot(6 - I) * 3 ^ (I - 1))
Next I
MsgBox J
End Sub
Figure 7.4: Code for Question 16

101
A
1
1

1
2

B
0
0

C
1
1

D
1
0

E
0
1

Table 7.2: Input data for Question 17


17. Suppose that worksheet \Question17" holds input data as indicated in
Table 7.2, and that the sub Question17 (Figure 7.5, page 102) is executed.
What appears in the range A4:F4 of the worksheet \Question17"?
(a)
(b)
(c)
(d)
(e)

A
0

B
1

C
1

D
1

E
1

F
1

A
0

B
1

C
0

D
1

E
1

F
1

A
1

B
0

C
1

D
1

E
1

F
1

A
1

B
0

C
1

D
0

E
1

F
1

A
1

B
1

C
1

D
1

E
1

F
1

Hints: And and Or work as follows:


Exp1
1
1
0
0

Exp2
1
0
1
0

Exp1 And Exp2


1
0
0
0

Exp1 Or Exp2
1
1
1
0

102

CHAPTER 7. VBA EXERCISES

Sub Question17()
Dim Top(1 To 5) As Integer
Dim Bot(1 To 5) As Integer
Dim Result(1 To 6) As Integer
Dim I As Integer
Dim J As Integer
Dim Carry As Integer
Carry = 0
For I = 1 To 5
Top(I) = Worksheets("Question17").Cells(1, I).Value
Bot(I) = Worksheets("Question17").Cells(2, I).Value
Next I
For I = 5 To 1 Step -1
Result(I + 1) = (Top(I) Or Bot(I)) Or Carry
Carry = (Top(I) And Bot(I)) And Carry
Next I
Result(1) = Carry
For I = 1 To 6
Worksheets("Question17").Cells(4, I).Value = Result(I)
Next I
End Sub
Figure 7.5: Code for Question 17

103
18. Suppose you want to have a function that, when passed a one-dimensional
array holding oating point numbers, would return the largest number in
the array. For example, suppose this function is called Question18 and is
called from the following sub:
Sub TestQ18()
Dim Vector(1 To 4) As Double
Vector(1) = 5
Vector(2) = 17
Vector(3) = 34
Vector(4) = 4
Dim Result As Double
Result = Question18(Vector)
MsgBox Result
End Sub
The value displayed after executing the MsgBox Result command would
be 34.
Suppose, further that someone has provided part of the required function,
Question18, and it is displayed|with absent code indicated|in Figure
7.6, page 105.

104

CHAPTER 7. VBA EXERCISES


By way of answering this question, Question 18, which one of the following
VBA code segments is best for lling in the missing lines of the code in
Figure 7.6, page 105?
(a) MyMax = Max(Vector)
(b) MyMax = 0
For I = Lower To Upper
If MyMax = Vector(I) Then
MyMax = Vector(I)
End If
Next I
(c) MyMax = 0
For I = Lower To Upper
If MyMax < Vector(I) Then
MyMax = Vector(I)
End If
Next I
(d) MyMax = 0
For I = Lower To Upper
If MyMax > Vector(I) Then
MyMax = Vector(I)
End If
Next I
(e) None of the above.

105
Function Question18(Vector) As Double
Dim Lower As Integer
Dim Upper As Integer
Dim I As Integer
Dim MyMax As Double
Lower = LBound(Vector)
Upper = UBound(Vector)
**** Missing line(s) of VBA code go here. ****
Question18 = MyMax
End Function
Figure 7.6: Code for Question 18

106

CHAPTER 7. VBA EXERCISES

Figure 7.7: Illustration of Pythagorean theorem. The triangle has three sides:
a; b; c. The angle between sides a and b is 90 . The lengths of the sides are as
follows: length a = jjajj, length b = jjbjj, length c = jjcjj. According, then, to
the Pythagorean theorem, jjcjj2 = jjajj2 + jjbjj2 .
19. Suppose that we need a function, to be called EDistance2D, for calculating
the (Euclidean) distance between two points in a plane (hence \2D").
Let the points be (x1 ; x2 ) and (y1 ; y2 ) (these are coordinates in the 2dimensional plane, e.g., (4; 5:6)). The function is to be given these four
numbers and is to return the distance between the corresponding two
points.
It may be helpful to recall the Pythagorean theorem, as it pertains to the
hypotenuse of a right (90 ) triangle. See Figure 7.7, page 106.
It will also be helpful to recall that Sqr is the square root function in
VBA. Also, you have been provided with a partial answer to the question,
in the form of a code template. See Figure 7.8, page 107.

107
Function EDistance2D(x1, x2, y1, y2) As Double
'The two points are (x1,x2) and (y1,y2)
**** Missing line(s) of VBA code go here. ****
End Function
Figure 7.8: Code template for Question 19
By way of answering Question 19, which one of the following VBA code
segments is best for lling in the missing line of the code in Figure 7.8,
page 107?
(a) Sqr(EDistance2D) = (x1 - y1) ^ 2 + (x2 - y2) ^ 2
(b) EDistance2D = Sqr((x1 - y1) ^ 2 + (x2 - y2) ^ 2)
(c) EDistance2D = Sqr((x1 - y2) ^ 2 + (y1 - x2) ^ 2)
(d) Sqr(EDistance2D) = (x1 - x2) ^ 2 + (y1 - y2) ^ 2
(e) EDistance2D = Sqr((x1 - x2) ^ 2 + (y1 - y2) ^ 2)

108

CHAPTER 7. VBA EXERCISES

Function Question20(Lower, Upper) As Double


**** Missing line(s) of VBA code go here. ****
End Function
Figure 7.9: Code template for Question 20
20. Suppose we need a function, call it Question20, that returns a real number (a double) randomly and uniformly distributed between the values
Lower and Upper, which we supply to the function when we call it. VBA
has a built-in function, Rnd, that returns a double that is randomly and
uniformly distributed between 0 and 1. So, what we need is a generalization of Rnd, and we are free to use Rnd in writing this new function.
Happily, someone has supplied a partial answer to this question, in the
form of a code template. See Figure 7.9, page 108.
By way of answering this question, Question 20, which one of the following
VBA code segments is best for lling in the missing line of the code in
Figure 7.9, page 108?
(a) Question20 = (Rnd * (Upper - Lower)) + Lower
(b) Question20 = (Rnd * (Upper + Lower)) - Lower
(c) Question20 = (Rnd * Upper) + Lower
(d) Question20 = Rnd * (Upper + Lower)
(e) Question20 = Rnd * (Upper - Lower)

Chapter 8

Text and Pattern


Processing
8.1

The information extraction problem

Consider the following chunk of HTML.


<table border="0" width="100%" cellspacing="0" cellpadding="0">
<!-- WP Market Indices Start -->
<div align="center"><i class="smaller">(As of 10:55 AM on 12/20/01)</i></div><br>
<tr>
<td><p>DJIA</p></td>
<td align="right"><p>&nbsp;&nbsp;10031.90</p></td>
<td align="right"><p>&nbsp;&nbsp;<span class="marketdown">-38.60</span></p></td>
</tr>
<tr>
<td><p>NASDAQ</p></td>
<td align="right"><p>&nbsp;&nbsp;1950.90</p></td>
<td align="right"><p>&nbsp;&nbsp;<span class="marketdown">-31.90</span></p></td>
</tr>
<tr>
<td><p>NYSE</p></td>
<td align="right"><p>&nbsp;&nbsp;584.90</p></td>
<td align="right"><p>&nbsp;&nbsp;<span class="marketdown">-0.20</span></p></td>
</tr>
<tr>
<td><p>S&amp;P 500</p></td>
<td align="right"><p>&nbsp;&nbsp;1145.96</p></td>
<td align="right"><p>&nbsp;&nbsp;<span class="marketdown">-3.60</span></p></td>
</tr>
<tr>
<td><p>AMEX</p></td>
<td align="right"><p>&nbsp;&nbsp;828.74</p></td>
<td align="right"><p>&nbsp;&nbsp;<span class="marketdown">-2.11</span></p></td>
</tr>
<!-- WP Market Indices End -->
</table>

This chunk was extracted from a Web page, which was downloaded from the
109

110

CHAPTER 8. TEXT AND PATTERN PROCESSING

AT&T Web service page at http://www.att.net/ shortly after 11 a.m. (Eastern


Standard Time) on December 20, 2001. The original source page, of course,
includes a great deal more HTML code, but that turns out not to matter for
our present purposes. Seeing why is an important part of these purposes.
Our interest is in extracting|under program control|certain information
presented in this page. Specically, assume rst that we are interested in capturing the value of the S&P 500 stock index, as reported on this page. We can
see that the value is 1145.96. How do we get a program to see it (and then do
something useful automatically for us)?
This very special problem is in fact just a specic case of a very general and
widely-encountered problem:
The information extraction problem: Given a body of text, how can
we automatically recover useful information from it?
Note that if we can automatically recover useful information, we can then|
automatically or not|process that information and obtain additional value. A
simple example: Using just the information in the above chunk of HTML we
might compute which stock index has undergone the largest change, as measured
in percentage.
Usually the information extraction problem presents itself with the following
complication:
Complication to the information extraction problem: The body
of text containing the information of interest changes over time.
The values of the quantities we are interested in change. Some go up, some go
down.
What can a programming language do to help us with the information extraction problem? Think of how you might nd the reported value of the S&P
500. You look for a pattern of characters, or a string, that indicates where the
information|specically, the string|is that you want. The following exerpt
has what we want:
<td><p>S&amp;P 500</p></td>
<td align="right"><p>&nbsp;&nbsp;1145.96</p></td>
How can we describe the pattern? Try this:
Literally, the string "S&amp;P 500", followed by a bunch of nonnumeric junk, followed by a decimal number with two digits to the
right and several digits on the left. That decimal number is the value
we want.
Note: We might be reasonably condent that the S&P 500 index will always
have exactly four digits to the left of its index. Still, prudence counsels us to
allow three to ve.
This is all well and good, but how can we program information extraction,
so that it occurs automatically? Regular expressions (REs) let us solve this
problem. Let us see how.

8.2. REGULAR EXPRESSIONS (RES)

8.2

111

Regular Expressions (REs)

The general idea is to have a language in which we can express patterns (of
strings or text), which can be matched automatically to a given string (or text).
This is a familiar idea to all those who use word processing and are at least
acquainted with using computers. In any Web browser, in Word, and in many
other programs if you type ctrl-f you will get a dialog box that asks you what
you want to nd. You type in a sequence of characters (e.g., theory), click the
appropriate button, and the program nds the rst exact match to your search
string (if there is a match). All this is very helpful, but:
1. Once you nd your pattern, you want your program to do something useful
with it.
Browsers and word processors normally just nd things, then require you
the user to take any required actions.
2. What if you aren't exactly sure of the pattern you are seeking, so that a
strictly literal match won't do?
Example: You don't know whether the name is \Smith" or \Smyth", or
perhaps \color" might be spelled \colour", or perhaps you think the word
might be misspelled, or the target text will change over time, but within
a predictable pattern. As we saw above, the exact value of a stock index
will vary, yet we can expect it to t within a stable pattern.
Regular expressions are a device for handling both of these problems. We'll
discuss the two problems in the context of REs in the next two subsections.

8.2.1

Problem 1: Programmed Matches

REs may be incorporated into programming environments. Python and Perl


are particularly known for how well they support REs. Let's look at a simple
example in Python, in which our target string is the Web page fragment reproduced above, from AT&T WorldNet, and our search string is "S&amp;P 500".
See Figure 8.1 (line numbers have been added).

112

CHAPTER 8. TEXT AND PATTERN PROCESSING

1.>>> import re
2.>>> data = open(r'c:\day\attfragment.html','r').read()
3.>>> len(data)
4.1038
5.>>> sandpmatch = re.search(r'S&amp;P 500',data,re.IGNORECASE)
6.>>> sandpmatch
7.<SRE_Match object at 00F12ED0>
8.>>> sandpmatch.span()
9.(681, 692)
10.>>> sandp = data[681:688]
11.>>> sandp
12.'S&amp;P'
Figure 8.1: Simple re.search Example
Exposition on the lines in the gure:
1. We begin by importing (making available) Python's re module (for regular
expressions).
2. The le we want, holding the target text, is attfragment.html, residing
on the C drive under the day directory. In this line we open the le for
reading, and we read() its contents into our variable data, which now
holds a string corresponding to the entire contents of the target le. In
open, the r in r'c:\day... stands for `raw'. In raw mode the backslashes
are taken literally. Normally the backslash is an escape character (more
on this shortly) and if we actually want a backslash we have to escape it|
with a backslash. Instead of r'c:\day... we would have 'c:\\day...
and so on. Raw mode makes things prettier.
3. Here we use the Python function len to obtain the length in characters
of the string data.
4. Python reports that our le and the corresponding string, data, are 1038
characters long.
5. The search method of the re module takes three arguments: a query
string as a regular expression, the target string, and (optionally) ags to
guide the query. The ag re.IGNORECASE is case-sensitive but it tells the
query engine to match regardless of case. Here, this means that 'S&amp;P
500' would match to 's&amp;p 500'.
search pretty much does what its name suggests: it looks through the
target string to nd the rst match to the query string. search returns a
match object or None, depending on whether it nds a match or not. Note
this unsuccessful search:
>>> sandpmatch2 = re.search(r'S&amp;Q 500',data,re.IGNORECASE)

8.2. REGULAR EXPRESSIONS (RES)

113

>>> sandpmatch2
>>> sandpmatch2 == None
1
>>>
6. In line 6 of the gure we ask for the value of sandpmatch after a successful
search.
7. In line 7 Python tells us the search was successful.
8. Given that sandpmatch != None (i.e., that its search was successful),
there is some place that it found the match. Python's span method for
match objects gives us where the match begins and where the rst character is that is after the match.
9. Python tells us that the match begins at character 681 and continues
through character 691. Character 692 is the rst character after the match.
10. The slice data[681:688] gives us the 7 characters of data beginning with
caracter 681.
11. We ask Python for what was returned from the slice. . .
12. . . . and Python tells us (surprise, surprise) that it is the string "S&amp;P".

8.2.2

Problem 2: Pattern Matches

REs support pattern matching against strings, not just literal matching, as in
Web browsers and as in the example of the previous section. Note that a minor
paradox lurks. We want to use query strings to form patterned (nonliteral)
matches against target strings. How does the matching program \know" which
strings are to be taken literally and which are to be taken otherwise? The
solution principle is obvious: some characters are to be understood as special.
They are metacharacters and are not taken literally.
Using metacharacters for pattern matching is familiar to most computer
users. When at the command prompt one types
C:\dir *.xls
you are requesting a list of les in the current directory, ending in ".xls". The
asterisk { "*" { is a metacharacter (in this context). It means \match to 0 or
more characters of any kind." So, dir *.xls means \Display all les ending in
`.xls', no matter what comes earlier." Ask yourself: What does dir e*e.xls
mean?
The language of regular expressions has a short list of metacharacters (a
dozen or so) and a clever syntax that combine to provide a very powerful pattern
matching capability. Practice and examples are required to learn how to use
this capability. In x8.3 we summarize the syntax for purposes of reference and
in x8.4 we continue our discussion of problem 2 (pattern matches) in the context
of Python and regular expressions.

114

8.3

CHAPTER 8. TEXT AND PATTERN PROCESSING

Python's RE Syntax

The following list is taken more or less directly from the online Python Library
Reference for the re module. We present it here for the sake of convenience
and completeness. (There is additional information online regarding the re
module.) The reader will likely nd many of the items below unduely arcane,
and rarely used. Learning just a few of the syntactic elements will suce for
most purposes. We oer the suggestion that items 1{13 and several of the items
in 25-35 are the most useful.
1. "." (Dot.)
In the default mode, this matches any character except a newline. If the
DOTALL ag has been specied, this matches any character including a
newline.
2. "^" (Caret.)
Matches the start of the string, and in MULTILINE mode also matches
immediately after each newline.
3. "$"
Matches the end of the string, and in MULTILINE mode also matches
before a newline. foo matches both 'foo' and 'foobar', while the regular
expression foo$ matches only 'foo'.
4. "*"
Causes the resulting RE to match 0 or more repetitions of the preceding
RE, as many repetitions as are possible. ab* will match 'a', 'ab', or 'a'
followed by any number of 'b's.
5. "+"
Causes the resulting RE to match 1 or more repetitions of the preceding
RE. ab+ will match 'a' followed by any non-zero number of 'b's; it will
not match just 'a'.
6. "?"
Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.
ab? will match either 'a' or 'ab'.
7. *?, +?, ??
The "*", "+", and "?" qualiers are all greedy; they match as much
text as possible. Sometimes this behaviour isn't desired; if the RE <.*>
is matched against '<H1>title</H1>', it will match the entire string,
and not just '<H1>'. Adding "?" after the qualier makes it perform
the match in non-greedy or minimal fashion; as few characters as possible
will be matched. Using .*? in the previous expression will match only
'<H1>'.

8.3. PYTHON'S RE SYNTAX

115

8. fm,ng

Causes the resulting RE to match from m to n repetitions of the preceding


RE, attempting to match as many repetitions as possible. For example,
af3,5g will match from 3 to 5 "a" characters. Omitting n species an
innite upper bound; you can't omit m.

9. fm,ng?

Causes the resulting RE to match from m to n repetitions of the preceding


RE, attempting to match as few repetitions as possible. This is the nongreedy version of the previous qualier. For example, on the 6-character
string 'aaaaaa', af3,5g will match 5 "a" characters, while af3,5g? will
only match 3 characters.

10. "\"
Either escapes special characters (permitting you to match characters like
"*", "?", and so forth), or signals a special sequence; special sequences
are discussed below.
If you're not using a raw string to express the pattern, remember that
Python also uses the backslash as an escape sequence in string literals;
if the escape sequence isn't recognized by Python's parser, the backslash
and subsequent character are included in the resulting string. However, if
Python would recognize the resulting sequence, the backslash should be
repeated twice. This is complicated and hard to understand, so it's highly
recommended that you use raw strings for all but the simplest expressions.
11. []
Used to indicate a set of characters. Characters can be listed individually,
or a range of characters can be indicated by giving two characters and
separating them by a "-". Special characters are not active inside sets.
For example, [akm$] will match any of the characters "a", "k", "m",
or "$"; [a-z] will match any lowercase letter, and [a-zA-Z0-9] matches
any letter or digit. Character classes such as \w or \S (dened below) are
also acceptable inside a range. If you want to include a "]" or a "-" inside
a set, precede it with a backslash, or place it as the rst character. The
pattern []] will match ']', for example.
You can match the characters not within a range by complementing the
set. This is indicated by including a "^" as the rst character of the set;
"^" elsewhere will simply match the "^" character. For example, [^5]
will match any character except "5".
12. "|"
A|B, where A and B can be arbitrary REs, creates a regular expression that
will match either A or B. An arbitrary number of REs can be separated
by the "|" in this way. This can be used inside groups (see below) as well.
REs separated by "|" are tried from left to right, and the rst one that

116

CHAPTER 8. TEXT AND PATTERN PROCESSING


allows the complete pattern to match is considered the accepted branch.
This means that if A matches, B will never be tested, even if it would
produce a longer overall match. In other words, the "|" operator is never
greedy. To match a literal "|", use \|, or enclose it inside a character
class, as in [|].

13. (...)
Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved
after a match has been performed, and can be matched later in the string
with the \number special sequence, described below. To match the literals
"(" or ")", use \( or \), or enclose them inside a character class: [(]
[)].
14. (?...)
This is an extension notation (a "?" following a "(" is not meaningful
otherwise). The rst character after the "?" determines what the meaning
and further syntax of the construct is. Extensions usually do not create a
new group; (?P<name>...) is the only exception to this rule. Following
are the currently supported extensions.
15. (?iLmsux)
(One or more letters from the set "i", "L", "m", "s", "u", "x".) The
group matches the empty string; the letters set the corresponding ags
(re.I, re.L, re.M, re.S, re.U, re.X) for the entire regular expression. This is useful if you wish to include the ags as part of the regular
expression, instead of passing a ag argument to the compile() function.
Note that the (?x) ag changes how the expression is parsed. It should
be used rst in the expression string, or after one or more whitespace
characters. If there are non-whitespace characters before the ag, the
results are undened.
16. (?:...)
A non-grouping version of regular parentheses. Matches whatever regular
expression is inside the parentheses, but the substring matched by the
group cannot be retrieved after performing a match or referenced later in
the pattern.
17. (?P<name>...)
Similar to regular parentheses, but the substring matched by the group
is accessible via the symbolic group name name. Group names must be
valid Python identiers. A symbolic group is also a numbered group, just
as if the group were not named. So the group named 'id' in the example
above can also be referenced as the numbered group 1.
For example, if the pattern is (?P<id>[a-zA-Z ]\w*), the group can be
referenced by its name in arguments to methods of match objects, such as

8.3. PYTHON'S RE SYNTAX

117

m.group('id') or m.end('id'), and also by name in pattern text (e.g.


(?P=id)) and replacement text (e.g. \g<id>).
18. (?P=name)
Matches whatever text was matched by the earlier group named name.
19. (?#...)
A comment; the contents of the parentheses are simply ignored.
20. (?=...)
Matches if ... matches next, but doesn't consume any of the string.
This is called a lookahead assertion. For example, Isaac (?=Asimov)
will match 'Isaac ' only if it's followed by 'Asimov'.
21. (?!...)
Matches if ... doesn't match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it's
not followed by 'Asimov'.
22. (?<=...)
Matches if the current position in the string is preceded by a match for
... that ends at the current position. This is called a positive lookbehind
assertion. (?<=abc)def will match "abcdef", since the lookbehind will
back up 3 characters and check if the contained pattern matches. The
contained pattern must only match strings of some xed length, meaning
that abc or a|b are allowed, but a* isn't.
23. (?<!...)
Matches if the current position in the string is not preceded by a match
for .... This is called a negative lookbehind assertion. Similar to positive
lookbehind assertions, the contained pattern must only match strings of
some xed length.
The special sequences consist of "\" and a character from the list below. If
the ordinary character is not on the list, then the resulting RE will match the
second character. For example, \$ matches the character "$".
24. \number
Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \1 matches 'the the' or '55
55', but not 'the end' (note the space after the group). This special
sequence can only be used to match one of the rst 99 groups. If the rst
digit of number is 0, or number is 3 octal digits long, it will not be interpreted as a group match, but as the character with octal value number.
Inside the "[" and "]" of a character class, all numeric escapes are treated
as characters.

118

CHAPTER 8. TEXT AND PATTERN PROCESSING

25. \A
Matches only at the start of the string.
26. \b
Matches the empty string, but only at the beginning or end of a word. A
word is dened as a sequence of alphanumeric characters, so the end of a
word is indicated by whitespace or a non-alphanumeric character. Inside a
character range, \b represents the backspace character, for compatibility
with Python's string literals.
27. \B
Matches the empty string, but only when it is not at the beginning or end
of a word.
28. \d
Matches any decimal digit; this is equivalent to the set [0-9].
29. \D
Matches any non-digit character; this is equivalent to the set [^0-9].
30. \s
Matches any whitespace character; this is equivalent to the set
[\t\n\r\f\v].
31. \S
Matches any non-whitespace character; this is equivalent to the set
[^\t\n\r\f\v].
32. \w
When the LOCALE and UNICODE ags are not specied, matches any
alphanumeric character; this is equivalent to the set [a-zA-Z0-9 ]. With
LOCALE, it will match the set [0-9 ] plus whatever characters are dened as letters for the current locale. If UNICODE is set, this will match
the characters [0-9 ] plus whatever is classied as alphanumeric in the
Unicode character properties database.
33. \W
When the LOCALE and UNICODE ags are not specied, matches any
non-alphanumeric character; this is equivalent to the set [^a-zA-Z0-9 ].
With LOCALE, it will match any character not in the set [0-9 ], and
not dened as a letter for the current locale. If UNICODE is set, this will
match anything other than [0-9 ] and characters marked at alphanumeric
in the Unicode character properties database.
34. \Z
Matches only at the end of the string.

8.4. PROBLEM 2 (CON'T.): PATTERN MATCHES WITH PYTHON

119

35. \\
Matches a literal backslash.

8.4

Problem 2 (con't.): Pattern Matches with


Python

Here in Figure 8.2 is a Python program using REs that uses pattern matching
to nd the value of the S&P 500 index.
1.>>> sandpindex = re.search(r'(S&amp;P 500)(.*?)(\d+\.\d+)',data,\
re.IGNORECASE|re.DOTALL)
2.>>> sandpindex.group(0)
3.'S&amp;P 500</p></td>\n<td align="right"><p>&nbsp;&nbsp;1145.96'
4.>>> sandpindex.group(1)
5.'S&amp;P 500'
6.>>> sandpindex.group(2)
7.'</p></td>\n<td align="right"><p>&nbsp;&nbsp;'
8.>>> sandpindex.group(3)
9.'1145.96'
10.>>> sandpindex.span(3)
11.(735, 742)
12.>>> data[735:742]
13.'1145.96'
>>>
Figure 8.2: RE Program for Finding S&P 500 Index
Points arising:
1. In line 1, in the RE we use parentheses to create three groups. Reading
from left to right: group 1 is our old friend (S&amp;P 500); group 2,
(.*?), is the junk between group 1 and the index value; and group 3,
(\d+\.\d+), matches the index value. Lines 4{5 show the match to group
1; lines 6{7 show the match to group 2; lines 8{9 show the match to group
3; and since group 0 is the entire match, lines 2{3 show it.
2. Groups 2 and 3 use Python's RE metacharacters and syntax to eect
pattern matching. Group 1, as we saw previously, is an RE that eects
only literal matching.
3. Group 2 = (.*?). (See x8.3 items 1 (page 114), 4 (page 114), and 7 (page
114).) The parentheses dene the group. The dot means \any character
at all, except a newline," because because the ag re.DOTALL is present
(line 1), even newlines are matched. So, the dot matches any character at
all in this query. The asterisk means \0 or more occurences matching the

120

CHAPTER 8. TEXT AND PATTERN PROCESSING


previous expression," i.e., any number of characters at all. The question
mark means \don't be greedy; stop at the rst pattern satisfying the
previous expression."

4. Group 3 = (\d+\.\d+). (See x8.3 items 28 (page 118), 5 (page 114), and
35 (page 119).) \d+ means \one or more decimal digits." \. means \one
occurrence of the dot character." Note: The backslash is used to escape
from the normal meaning of the dot character, causing the RE engine to
take it literally.
5. The backslash at the end of the top of line 1 is Python's line continuation
character. Not strictly necessary, it is useful for display purposes.
6. Note the re.IGNORECASE|re.DOTALL construction in line 1. This sets two
ags for the RE. As mentioned earlier, IGNORECASE says to do matching
regardless of upper or lower case, and DOTALL says that the dot should
match every character, including the newline character, \, which by default
it fails to match.

8.5

For More Information

See \Regular Expression HOWTO," by A.M. Kuchling. This document may be


found at
http://www.python.org/doc/howto/, and also at
http://py-howto.sourceforge.net/pdf/regex.pdf
Also, most of the standard Python books have at least brief introductions to
the re module. The standard reference on regular expressions is Jerey E.F.
Friedl, 1977. Mastering Regular Expressions, O'Reilly [8]. Unfortunately, Friedl
focuses on Perl and assumes the old Python RE syntax. But it's a good book
and the principles remain valid.

8.6

Exercises

8.6.1
Find two sources on the Web that report the NASDAQ levels. As of December
25, 2001, both
http://money.cnn.com/markets/nasdaq.html, and
http://moneycentral.msn.com/investor/research/msnbc/newsnap.asp?symbol=$COMPX

8.7. *COMPILING REGULAR EXPRESSIONS

121

do this.
Write a Python script, to be run from the command line (in script mode),
that grabs the reported value of the NASDAQ, and that prints out (to the
screen):
1. what the two values are,
2. where they came from,
3. what the dierence between them is, and
4. the date/time at which these queries were made.
Hints: In addition to using the re module to extract necessary information,
you will nd it useful to use the urllib module for downloading Web pages and
the time module for nding the date and time. The following interaction with
Python in interactive mode should be helpful to you in this regard.
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> import urllib
>>> cnnnasdaq = "http://money.cnn.com/markets/nasdaq.html"
>>> cnnpage = urllib.urlopen(cnnnasdaq).read()
>>> len(cnnpage)
26697
>>> cnnfile = open('d:\\cnn.txt','w')
>>> cnnfile.write(cnnpage)
>>> cnnfile.close()
>>> import time
>>> now = time.localtime(time.time())
>>> print time.asctime(now)
Tue Dec 25 14:52:30 2001

8.7

*Compiling Regular Expressions

Here is a script, bob.py, that illustrates compiliation of REs.


import re
data = open(r'c:\day\attfragment.html').read()
print len(data)
tomatch = re.compile(r'S&amp;P 500',re.DOTALL|re.IGNORECASE)
mymatch = tomatch.search(data)
print mymatch.group(0)
tomatch = re.compile(r"""(S&amp;P 500)(.*?)(\d+\.\d+)""",
re.S|re.I)
# Note: re.S = re.DOTALL, re.I = re.IGNORECASE

122

CHAPTER 8. TEXT AND PATTERN PROCESSING

if tomatch == None:
print "tomatch is None."
else:
print tomatch
mymatch = tomatch.search(data)
if mymatch == None:
print "It's None"
else:
print mymatch.group(3)
When run, bob.py produces this output (as it should):
>>>
1038
S&amp;P 500
<SRE_Pattern object at 0CAB5590>
1145.96
>>>
Note:
1. If queries are to be done repeatedly on a regular expression, then compiling
then once and running particular queries many times will be more ecient.
2. Compilation produces an SRE PatternObject from an RE. Roughly:
SRE PatternObject = re.compile(RE)
Then, instead of re.search(RE,TargetString) we use
SRE PatternObject.search(TargetString). The eects are the same.
3. I had trouble with the compilation ags. If re.VERBOSE was present the
search failed to nd anything. Moreover, even with triple quoting, spreading out the RE to be compiled across several lines only produced errors.
But what's above (") does work.

File: text-pattern.tex

Chapter 9

Programming Excel
Microsoft's COM (Component Object Model) is a standard by which so-called
client programs can manipulate other programs called server programs.1 In
the lingo of the trade, a COM server (server program) \exposes its objects" in
accordance with the COM standard so that client programs can get the server to
do things for them. Client programs, of course, must know how to communicate
with COM objects.
Excel, Access, Word, PowerPoint and the other parts of the Microsoft Oce
suite all support and conform to COM. That is, each of these programs is capable
of being a COM server. Moreover, the VBA built into them is COM-aware so
that VBA programs can be COM clients. In fact, an Excel (Access, Word,. . . )
macro is a COM client that uses the Excel (Access, Word,. . . ) host as a COM
server.
From the perspective of a COM client, a COM server program has a certain
look and feel. A server is a hierarchical collection of objects, each object having
its own properties. Each application, of course, has its own characteristic objects
and hierarchy, its own object model. Once you know the object model for a
server, you can write a client-side program to manipulate the server. What this
amounts to is simply changing the properties of the server's objects.
This is in principle a very elegant design. When you work interactively with
Excel, manipulating it in the end-user programming style (using the graphical interface), you are essentially just creating and deleting objects (such as
worksheets in a workbook), and changing properties of objects. To enter a new
number in a cell (object) is just to change the cell's Value property. Coloring
and other formatting operations should be understood similarly. Here by way
of example is a line of Excel VBA code:
1 Dierent names abound and create confusion. Names have come and gone, including
OLE, Automation, DCOM, COM+, and ActiveX. The programming community has more or
less settled on `COM' as the name of this evolving technology standard. The following passage
from a recent Microsoft publication indicates that Microsoft may have acceded to common
practice. \The key technology that makes individual Oce applications programmable and
makes creating an integrated Oce solution possible is the Component Object Model (COM)
technology known as automation" [15, page 37]. We'll stick with `COM'.

123

124

CHAPTER 9. PROGRAMMING EXCEL

Worksheets("Sheet1").Range("A1").Interior.ColorIndex = 3
This is a simple assignment statement. It assigns the value 3 to the stu on the
left. Interpreted, what's going on is 3 is assigned to the ColorIndex property of
the Interior (object), which is part of the A1 Range object (the northwest-most
cell), which is part of the Sheet1 Worksheet object. Since 3 is Excel's code for
the color red, execution of this statement causes the A1 cell on Sheet1 to turn
red. Similarly,
Worksheets("Sheet1").Range("A1").Value = "Yo, world!"
is VBA code for putting \Yo, world!" in A1, i.e., setting the Value property of
A1 to \Yo, world!".
We're now going to discuss in more detail how to manipulate Excel from
Python, instead of from VBA. As we will see, however, the dierence is slight.
This is a consequence of using COM. Once you've learned it from Python, you've
learned from VBA, and vice versa.

9.1

Excel COM from Python

Our focus will be entirely on using Python for COM client-side programming.
That is, our Python scripts will manipulate the exposed objects of a COM
server, here Microsoft Excel. Python can also be used to create a COM server
(which could be manipulated by an Excel VBA client, among others!). The
additional steps to do this are really minimal. We shall forebear in part because
security considerations prohibit this kind of programming on machines in public
labs. The interested reader might consult [9] or the Website
http://starship.python.net/crew/mhammond/ppw32/
and proceed cheerfully at home.
Here's the obligatory \Hello, world" program, with Python the client and
Excel the COM server. And to liven things up, we even turn the A1 cell red.
(Line numbers have been added.)
Python 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
1.>>> import win32com.client
2.>>> xl = win32com.client.Dispatch('Excel.Application')
3.>>> xl.Visible = 1
4.>>> xl.Workbooks.Add()
5.<COMObject Add>
6.>>> xl.Worksheets("Sheet1").Range("A1").Value = "Yo, world!"
7.>>> xl.Worksheets("Sheet1").Range("A1").Interior.ColorIndex = 3
What's going on here is really quite simple. Line 1 imports the Win32 COM
client module. Making Python COM aware boils down to importing the right
module. That's it. Now we can do things. The rst thing we do is to launch|
\dispatch" is the technical term|a new Excel process. If we wanted to dispatch
Word we would say "Word.Application" instead of "Excel.Application",

9.1. EXCEL COM FROM PYTHON

125

and so on. Note that xl is a Python variable, which will now be an instance of
an Excel COM object. (Try typing type(xl) at the Python prompt.) We could
have used any other valid Python variable. xl was chosen for purely mnemonic
reasons. Think how confusing it would have been to use word.
By default Excel is dispatched in the background, so you can't see it. In
line 3 we set the Visible property of the dispatched Excel application instance,
xl, to true, i.e., to 1. After Python executes this line you will see Excel come
alive but without any worksheets. We have to add them and we do in line 4.
Here, the interpretation is slightly new. Add() is not a property of Workbooks,
it is a method (note the parentheses), a program associated with the object
Workbooks, which now gets executed. The eect is that a new workbook, by
default Book1 with 3 worksheets, is added to the collection of workbooks in xl.
If we execute this line again, Book2 with 3 worksheets is added. We'll stick with
Book1 for now.
Line 6 is our \Hello, world" program. I wrote it by typing xl. and then
copying
Worksheets("Sheet1").Range("A1").Value = "Yo, world!"
from the VBA program discussed above. I did the same thing in line 7. The
eects are the same as in the VBA program, and for the same reasons. The
pattern is evident and straightforward. You program Excel from Python or
VBA or any other environment supporting COM clients by identifying the Excel
object you want to aect and then either assign a value to one of its properties
or identify and execute one of its methods. That's essentially it.
The rest is details. The big question now is \How do I learn about the Excel
object model?" We'll cover what you most need here. You can also explore
with the Object Browser in Excel (View then Object Browser from the VBA
development environment in Excel). The best documentation I have found is
in [15], but note the blub on the cover: \The hard-core programming guide to
Microsoft Oce XP development." (You don't need it.)
Perhaps the easiest way to get a handle on the Excel object model is to
record VBA macros in Excel and examine them. Here's one, for copying from
the range C2:D3 to C6:D7.
Sub Macro4()
'
' Macro4 Macro
' Macro recorded 12/26/2001 by Steven O. Kimbrough
'
Range("B2:C3").Select
Selection.Copy
Range("B5").Select
ActiveSheet.Paste
Application.CutCopyMode = False
Range("A1").Select
End Sub
What it does is plain. On Excel's Activesheet, it copies the range C2:D3 to

126

CHAPTER 9. PROGRAMMING EXCEL

the range whose northwest-most cell is C6. Then it turns o CutCopyMode. A


literal transformation to Python won't quite work, but this does (when executed
from the command line).
import win32com.client
xl = win32com.client.Dispatch('Excel.Application')
print xl.Name
xl.Range("B2:C3").Select()
xl.Selection.Copy()
xl.Range("B5").Select()
xl.ActiveSheet.Paste()
xl.Application.CutCopyMode = 0
xl.Range("A1").Select()
See the pattern? Just a few changes are required. The Python line
xl.Application.CutCopyMode = False
won't work as is because Python (rightly) treats False as a variable. If you
declare False = 0 then this line will work. Also, Select, Copy, and Paste are
all methods, rather than properties, and so in Python require their parentheses
to be present.
There are simpler ways to do copying.
>>> xl.Range("C2:D3").Copy(xl.Range("C6"))
1
>>>
See the pattern? The 1 returned by Python indicates that the operation was
successful. Here is another way of copying one range to another.
>>> xl.ActiveSheet.Range("C2:D3").Value
((u'a', u'c'), (u'b', u'd'))
>>> xl.ActiveSheet.Range("C5:D6").Value = xl.ActiveSheet.Range("C2:D3").Value
>>> bob = xl.ActiveSheet.Range("C2:D3").Value
>>> bob
((u'a', u'c'), (u'b', u'd'))
>>> xl.ActiveSheet.Range("A10:B11").Value = bob
>>>
Notice that this won't take any formatting along. The tuple (u'a', u'c')
represents the top row of the range C2:D3. C2 = 'a' and D2 = 'b'. u'a' means
that 'a' is encoded with unicode, which is what Excel does with strings. If you
are copying unicode from one part of Excel to another, keeping everything in
unicode is ne. If you wish to work with Excel strings in Python, you need to
convert the unicode to ascii. The Python str() function is available for this:
>>> xl.ActiveSheet.Range("C2:D3").Value
((u'a', u'c'), (u'b', u'd'))
>>> bob = xl.ActiveSheet.Range("C2:D3").Value

9.1. EXCEL COM FROM PYTHON

127

>>> bob
((u'a', u'c'), (u'b', u'd'))
>>> carol = str(bob[0][0]),str(bob[0][1])
>>> carol
('a', 'c')
If you have a string in Python and you want to write it out to Excel, Python
automatically makes the conversion to unicode.
Most of what you want to do using Python as a client to Excel is one of the
following three things:
1. Write information onto a worksheet
2. Read information o of a worksheet
3. Format a worksheet
Writing and reading to Excel are simple inverses. Here are some writing (to
Excel) examples.
>>>
>>>
>>>
>>>

xl.Range("A6").Value = "Hello."
xl.Range("A7").Value = 12.3
xl.Range("A8").Value = 12.3
xl.Cells(9,1).Formula = "=sum(a7:a8)"

And here are some related reading (from Excel) examples:


>>> bob = xl.Cells(6,1).Value
>>> bob
u'Hello.'
>>> str(bob)
'Hello.'
>>> carol = xl.Range("A7").Value
>>> carol
12.300000000000001
>>> ted = xl.Cells(8,1).Value
>>> ted
12.300000000000001
>>> alice = xl.Range("A9").Value
>>> alice
24.600000000000001
>>> alicebee = xl.Range("A9").Formula
>>> alicebee
u'=SUM(A7:A8)'
>>> str(alicebee)
'=SUM(A7:A8)'
>>>
And that's about all there really is for reading and writing to Excel cells. Yes,
there are tricks, such as

128

CHAPTER 9. PROGRAMMING EXCEL

>>> xl.Range("A1:B3").Value = 12
which assigns 12 to every cell in the range. These are things you learn when
you need them. Better when starting out to keep to the KISS principle.
And formatting is simply a minor variation. Instead of reading or writing
the Value or Formula property of a cell (or range of cells), we read or write
some other property, such as (see above) the Interior.ColorIndex property. The
details, however, are a bit daunting, especially since Excel relies on special
program constants, standing for integers whose values are really hard to nd.
So, I'm going to show you how, but I'll put the information in an optional
section.

9.2

*Programming Excel's Formats

Suppose you recorded an Excel VBA macro while you selected the \Borders"
toolbar icon and choose the \Bottom Double Border" item. The resulting VBA
macro would look like this:
Sub Macro2()
'
' Macro2 Macro
' Macro recorded 12/28/2001 by Steven O. Kimbrough
'
'
Selection.Borders(xlDiagonalDown).LineStyle = xlNone
Selection.Borders(xlDiagonalUp).LineStyle = xlNone
Selection.Borders(xlEdgeLeft).LineStyle = xlNone
Selection.Borders(xlEdgeTop).LineStyle = xlNone
With Selection.Borders(xlEdgeBottom)
.LineStyle = xlDouble
.Weight = xlThick
.ColorIndex = xlAutomatic
End With
Selection.Borders(xlEdgeRight).LineStyle = xlNone
End Sub
Suppose now you wanted to format cell A9 with a \Bottom Double Border"
border. You might think this command will do that:
xl.Range("A9").Borders(xlEdgeBottom).LineStyle = xlDouble
You would be wrong. Python naturally thinks that xlEdgeBottom and xlDouble
are Python variables, and Excel will certainly get the wrong message, if it gets
any message at all. You could quote the variables, but then Excel will think
they are strings and will get very confused. The problem is that in VBA,
xlEdgeBottom is a constant, standing for the integer 9, and xlDouble is a constant whose value is the integer -4119. This command does work:

9.2. *PROGRAMMING EXCEL'S FORMATS

129

xl.Range("A9").Borders(9).LineStyle = -4119
This solves our problem, provided we can discover what Excel/VBA's constants
stand for. Here's what you do. Launch PythonWin and enter the following
commands at the prompt in interactive mode:
>>> import win32com.client
>>> xl = win32com.client.Dispatch('Excel.Application')
>>> xl.Visible = 1
>>> win32com.client.constants.xlNone
-4142
>>>
If this is what you get, then proceed after skipping the next paragraph.
If, instead of getting a -4142 in response to your last line you get an error message, do this. In PythonWin, under the Tools menu you will see the item COM
Makepy utility. Choose it. You will be presented with a very long list of applications. Scroll down until you nd \Microsoft Excel X.Y Object Library (W.Z)"
where W, X, Y, and Z are all small integers. Select it, say OK, and wait until
Python builds a le for you. When the prompt returns in the interactive shell
(hit RETURN if things quiet down), try win32com.client.constants.xlNone
again. It should work. If so, your Python installation is set up for what we need
to do. This only needs doing once per Python installation.
At this point we're essentially done. If X is an Excel VBA constant, then
win32com.client.constants.X will reveal its value. Here are two examples.
>>> win32com.client.constants.xlEdgeBottom
9
>>> win32com.client.constants.xlDouble
-4119
>>>
So if you can identify the Excel constant, Python will tell you its value, and
you can happily program away. Also, you can use either the integers or their
Python expressions in your Python code. The integer version
xl.Range("A9").Borders(9).LineStyle = -4119
works just as well as the Python object version
xl.Range("A9").Borders(9).LineStyle = win32com.client.constants.xlDouble
My own druthers are to use Python variables as mnemonics.
>>> xlDouble = win32com.client.constants.xlDouble
>>> xlDouble
-4119
>>> xlEdgeBottom = win32com.client.constants.xlEdgeBottom
>>> xlEdgeBottom
9
>>>
And now this does work:

130

CHAPTER 9. PROGRAMMING EXCEL

xl.Range("A9").Borders(xlEdgeBottom).LineStyle = xlDouble
You weren't so wrong after all.

9.3

A Little on the Excel Object Model

Excel and other Microsoft Oce programs are structured as hierarchical collections of objects. The application is an object. The application's workbooks
are objects. The worksheets within a workbook are objects, as are the charts.
Ranges are objects contained by worksheets, but not by charts. And so on. Objects are particular; they are said to be instances of their classes. For example,
Workbooks(3).Worksheets(2).Range("A2:B4") is a particular object. It is an
instance of the Range class.
Objects have properties and methods. Together, they are called members.
(Typically, a property is also an object, which may be confusing.) Properties
are types of values, which usually can be set under program control. We have
seen that Value and Formula are two of the properties of objects in the Range
class. Methods are programs that their objects can execute. Methods may or
may not require parameters on input, but they always require parentheses. An
example of a parameterless method is Add, which we have seen before:
xl.Workbooks.Add()
So, Add is a method for objects in the Workbooks class. If we look at the Object
Browser in Excel (in the VBA editor, ViewjObject Browser) and we focus on
the Excel library, we nd Workbooks among the classes. Selecting Workbooks
we see that Add is the very rst member of the Workbooks class. (Notice how
the icons distinguish the properties and the methods.) If we select Add, rightclick on it, and select Help, we see a display that tells us Add is a method for
adding objects to collections. It is used with many dierent types of collections
(classes). If we select Workbooks we see this information displayed.
Add Method (Workbooks Collection)
Creates a new workbook. The new workbook becomes the active
workbook. Returns a Workbook object.
Syntax
expression.Add(Template)
expression Required. An expression that returns a Workbooks object.
Template Optional Variant. Determines how the new workbook is
created. If this argument is a string specifying the name of an existing Microsoft Excel le, the new workbook is created with the
specied le as a template. If this argument is a constant, the new
workbook contains a single sheet of the specied type. Can be one of

9.3. A LITTLE ON THE EXCEL OBJECT MODEL

131

the following XlWBATemplate constants: xlWBATChart, xlWBATExcel4IntlMacroSheet, xlWBATExcel4MacroSheet, or xlWBATWorksheet. If this argument is omitted, Microsoft Excel creates a
new workbook with a number of blank sheets (the number of sheets
is set by the SheetsInNewWorkbook property).
Remarks
If the Template argument species a le, the le name can include
a path.
Objects have properites and methods. You call the methods and get
or set the properties. See the Excel object browser and then use the
help facility. Also, record VBA macros.
Exploring in this manner will tell you much about Excel's object model. The
Object Browser is there when you need it, so you shouldn't bother with memorization. Recording VBA macros and studying them (when you need to know
something) is also a good way to learn about the object model.
Finally, Python is very helpful for learning about the object model. Recall:
>>> xl.Name
u'Microsoft Excel'
Microsoft Excel is our top-level object. We can also get the name and the parent
of any (particular) object:
>>> xl.Workbooks(1).Name
u'Book1'
>>> xl.Workbooks(1).Parent.Name
u'Microsoft Excel'
Notice that the top-level object is its own parent:
>>> xl == xl.Parent
1
>>> xl.Parent.Name
u'Microsoft Excel'
>>> xl.Workbooks(1) == xl.Workbooks(1).Parent
0
And the pattern generalizes:
>>> xl.ActiveSheet.Name
u'Sheet2'
>>> xl.ActiveSheet.Parent.Name
u'Book1'
Notice that the Name property is settable:
>>> xl.ActiveSheet.Name = 'Ted'
>>> xl.ActiveSheet.Name
u'Ted'

132

9.4
9.4.1

CHAPTER 9. PROGRAMMING EXCEL

Miscellany
Gotchyas

Capitalization:
>>> xl.Name
u'Microsoft Excel'
>>> xl.name
Traceback (most recent call last):
File "<pyshell#15>", line 1, in ?
xl.name
File "C:\Python21\win32com\client\__init__.py", line 348, in __getattr__
raise AttributeError, attr
AttributeError: name
>>>
Lesson: Get it right.

9.4.2

Range Names

>>> xl.Workbooks(1).Names(1).Name
u'ted'
>>> xl.Workbooks(1).Names(1).RefersTo
u'=Sheet1!$C$2:$D$3'
>>> myted = xl.Workbooks(1).Names(1)
>>> myted.Name
u'ted'
>>> xl.Workbooks(1).Names.Add('alice','=Sheet1!R1C1')
<win32com.gen_py.Microsoft Excel 9.0 Object Library.Name>
>>> xl.Workbooks(1).Names.Count
2
>>> xl.Workbooks(1).Names(2).Name
u'ted'
>>> xl.Workbooks(1).Names(1).Name
u'alice'
>>> xl.Workbooks(1).Worksheets(1).Range("alice").Value = 19
>>> xl.Range("alice").Value = 22
The last line works only if workbook 1 and worksheet 1 are active; otherwise
Excel would be confused (and understandably so). Assuming it is, this line
turns an entire range red:
>>> xl.Range("ted").Interior.ColorIndex = 3

9.4.3

Saving Workbooks

>>> xl.ActiveWorkbook.SaveAs("C:\day\pythonegs3.xls")

9.4. MISCELLANY

133

>>> xl.ActiveWorkbook.Name
u'pythonegs3.xls'
>>> xl.ActiveWorkbook.Save()
>>>

9.4.4

Directories

Python's os (operating system) has a number of useful methods for handling


directories and les on the local system. Perhaps most useful are those for
returning a list of the contents of a directory and for creating new directories.
Notice that this code segment was run on Windows NT.
>>> import os
>>> os.listdir("C:\\day")
['attfragment.html', 'atthome.html', 'awrapper.aux',
>>> os.name
'nt'
>>> os.mkdir("c:\\day\\mydir")
Also very useful is os.system(...), which you can use to run other programs,
including batch les (in Windows).
>>> os.system("c:\\test")
Runs the batch le test.bat. Try it with this as the contents:
dir *.*
time
(But try test.bat rst from the command prompt.)

9.4.5

Grabbing Command Line Arguments

When you run a Python program in script mode it is often useful to be able
to specify arguments on the command line, which the script will process as it
executes. Here's an example. The Python le, arglist.py, looks like this:
import sys
argumentlist = sys.argv
print argumentlist
Here is what it does when executed from the command line, with and without
arguments.
C:\>arglist.py
['C:\\arglist.py']
C:\>arglist.py Now is the time for 12.3
['C:\\arglist.py', 'Now', 'is', 'the', 'time', 'for', '12.3']
C:\>

134

CHAPTER 9. PROGRAMMING EXCEL

In a real program, you would process the list argumentlist, converting the
strings, where necessary, to numbers. Example:
>>> int('12')
12
>>> float('12.3')
12.300000000000001
>>>

9.4.6

Grabbing User Input

raw input prompts the user and awaits a response. The response is read into a
string for subsequent processing by the program. Here's an example. In an IDE,
when input = raw input("Tell me: ") is executed a dialog box appears and
the program stops until the user gives a response. That response is read into
input and the program continues.
>>> input = raw_input("Tell me: ")
>>> input
'Go Eagles!'
>>> type(input)
<type 'string'>
>>>

9.4.7

Copying Worksheets

This code puts a copy of worksheet 1 before worksheet 3.


xl.Worksheets(1).Copy(xl.Worksheets(3))

9.5
9.5.1

Gotchyas
Forgetting Parentheses in Methods

Some things don't work the way you might think they should. Here's an example.
PythonWin 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)]
on win32.
Portions Copyright 1994-2001 Mark Hammond (MarkH@ActiveState.com)
- see 'Help/About PythonWin' for further copyright information.
>>> import win32com.client
>>> xl = win32com.client.Dispatch('Excel.Application')
>>> xl.Visible
0
>>> xl.Visible = 1
>>> xl.Workbooks.Add()

9.5. GOTCHYAS

135

<win32com.gen_py.Microsoft Excel 9.0 Object Library.Workbook>


>>> xl.Workbooks.Add()
<win32com.gen_py.Microsoft Excel 9.0 Object Library.Workbook>
>>> xl.ActiveWorkbook.Name
u'Book2'
>>> xl.Workbooks(1).Activate
<method _Workbook.Activate of _Workbook instance at 010A8AA4>
>>> xl.ActiveWorkbook.Name
u'Book2'
The command to activate workbook 1 is accepted without error, but proves
ineective. The problem is that Activate is a method and needs parentheses.
The line
xl.Workbooks(1).Activate()
would work.

9.5.2

Capitalization

Excel's COM objects and members (properties and attributes) have an ocial
capitalization scheme|and this matters. Usually, that is. Sometimes it seems
that you can get away with incorrect capitalization. Mostly you can't.
>>> xl.Name
u'Microsoft Excel'
>>> xl.Selection.Address
u'$A$1'
>>> xl.ActiveSheet.Range("B5:C8").Select()
>>> xl.Selection.Address
u'$B$5:$C$8'
>>> xl.Selection.address
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "D:\Python21\win32com\client\__init__.py", line 348, in __getattr__
raise AttributeError, attr
AttributeError: address
>>> xl.name
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "D:\Python21\win32com\client\__init__.py", line 348, in __getattr__
raise AttributeError, attr
AttributeError: name
>>> xl.Name
u'Microsoft Excel'
>>> xl.Activesheet.name
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "D:\Python21\win32com\client\__init__.py", line 348, in __getattr__

136

CHAPTER 9. PROGRAMMING EXCEL

raise AttributeError, attr


AttributeError: Activesheet
>>> xl.Activesheet.Name
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "D:\Python21\win32com\client\__init__.py", line 348, in __getattr__
raise AttributeError, attr
AttributeError: Activesheet
>>> xl.ActiveSheet.Name
u'Sheet1'
>>>

9.6

For More Information

On Python and COM:


QuickStartClientCom.html, which comes with the win32com installation.
http://www.python.org/windows/, the ocial source.
http://starship.python.net/crew/mhammond/conferences/
Python Programming on Win32 by Mark Hammond and Andy Robinson,
O'Reilly, 2000.
See in support his site: http://starship.python.net/crew/mhammond/ppw32/.
http://aspn.activestate.com/ASPN/Python/Reference/Products/ActivePython/win32com/win32com.html
http://aspn.activestate.com//ASPN/Python/Reference/Products/ActivePython/win32com/win32com/win32com/test/
http://aspn.activestate.com//ASPN/Python/Reference/Products/ActivePython/win32com/win32com/win32com/html/docindex.html
Learning about COM is a bit tough. Here's something helpful. In Excel, in
VBA, under View, select ObjectBrowser. Browse and right click on what you're
interested in, then ask for help.

9.7

Exercises

9.7.1
The SEC (Securities and Exchange Commission) requires various reports to be
led by American corporations. Stock market analysts are particularly keen in
following the 10-Q reports, which are led quarterly and contain, among other
things, nancial statements by the companies. The SEC ling, including the 10Q reports, are available online at http://www.sec.gov. Write a Python program

9.7. EXERCISES

137

that grabs data from 10-Q reports for 4-6 companies and loads these data into
a well-designed spreadsheet format for analysis and comparison purposes.

File: python-excel.tex

138

CHAPTER 9. PROGRAMMING EXCEL

Chapter 10

Python and Database via


DAO
10.1

Preliminaries

See in the Excel VBA editor ToolsjRefrences. Scroll down until you nd something like \Microsoft DAO 3.6 Object Library." This tells you that you need
to dispatch "DAO.DBEngine.36". You can also see the References from Access.
Note the directions from Microsoft's online help:
On the Tools menu, click References. The References command
on the Tools menu is available only when a Module window is open
and active in Design view.
Also, you should rst run the makepy utility. In Pythonwin under the Tools
menu choose \COM Makepy utility" then select \Microsoft DAO 3.6 Object
Library". You only need to do this once.

10.2

Getting Connected and Getting Data

The basic structure for connecting to an Access database is like that for connecting to Excel. The following code imports the win32com.client module,
dispatches the appropriate DAO.DBEngine (version 3.6), and opens an existing
database, called db1.mdb.
>>> import win32com.client
>>> engine = win32com.client.Dispatch("DAO.DBEngine.36")
>>> db = engine.OpenDatabase(r'c:\day\db1.mdb')
As usual, the database is an object with properties.
>>> db.Name
u'c:\\day\\db1.mdb'
139

140

CHAPTER 10. PYTHON AND DATABASE VIA DAO

Usually, we will be wanting to access a database in order to run SQL SELECT


queries. To do so, we run a query and get returned a recordset, conceptually
the result of the query put into a table in memory.
>>> rs = db.OpenRecordset('select count(*) from Table1')
The argument for OpenRecordset can be any valid SQL query string. Successful
completion of the query produces a recordset object, here rs. Here we obtain
information about it.
>>> rs.Parent.Name
u'c:\\day\\db1.mdb'
>>> for i in range(rs.Fields.Count):
... print rs.Fields(i).Name
...
Expr1000
>>> type(rs.Fields(0).Name)
<type 'unicode'>
>>> field = rs.Fields(0).Name
>>> field
u'Expr1000'
>>> dacount = rs.Fields(0).value
>>> print dacount
2
>>> type(dacount)
<type 'int'>
>>> rs.Fields.Count
1
Notice the pattern: a recordset is an object with properties; we access those
properties in the usual way, under program control. We do more of this:
>>> rs = db.OpenRecordset('select * from Table1')
>>> bob = rs.Fields('wordid').value
>>> bob
3
>>> rs.Fields.Count
3
>>> rs.MoveLast
<method CDispatch.MoveLast of CDispatch instance at 0192CB44>
>>> for i in range(rs.Fields.Count):
... print rs.Fields(i).Value
...
1
1
3
>>> rs.Parent.Name
u'c:\\day\\db1.mdb'

10.2. GETTING CONNECTED AND GETTING DATA

141

>>> rs.Fields(0).Name
u'docid'
>>> for i in range(rs.Fields.Count):
... print rs.Fields(i).Name
...
docid
position
wordid
>>>
>>> db.Close()
MoveLast is a recordset method. Given a recordset there is always a cursor
or record pointer pointing a some record in the recordset. MoveFirst moves
the cursor to the rst record in the recordset. MoveLast moves it to the last
record. MoveNext and MovePrevious do the obvious things. Fields is a property of a recordset, and Count is a property of Fields. In the above interaction,
we see that Table1 has three elds (columns), called \docid," \position," and
\wordid."
Now we do another SELECT query.
>>> rs = db.OpenRecordset('select * from Table1')
>>> while not rs.EOF:
print rs.Fields('wordid').value
rs.MoveNext()

3
4
What this does is tell us that \wordid" is 3 and 4 in the two records of this
recordset. EOF means \end of le." Basically, while not rs.EOF: prevents the
program from trying to access a nonexistent record in the recordset.
Here we explore the database further, looking at various properties.
>>> db = engine.OpenDatabase(r'c:\day\db1.mdb')
>>> for i in range(db.TableDefs.Count):
... print i, db.TableDefs(i).Name
...
0 MSysAccessObjects
1 MSysACEs
2 MSysObjects
3 MSysQueries
4 MSysRelationships
5 Table1
>>> for i in range(db.TableDefs(5).Fields.Count):
... print i, db.TableDefs(5).Fields(i).Name
...

142

CHAPTER 10. PYTHON AND DATABASE VIA DAO

0 docid
1 position
2 wordid
>>> rs = db.OpenRecordset('select * from Table1')
>>> rs.MoveLast()
>>> rs.RecordCount
2
>>> rs.MoveFirst()
>>> rs.GetRows()
((1,), (1,), (3,))
>>> rs.GetRows()
((1,), (2,), (4,))
>>> rs.GetRows()
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
[further error message]
>>> rs.Close()
Notive (above) that there are actually 6 tables in this database. Five of them
are Access housekeeping tables (you can access them). Only Table1 is \for
real." Note that for i in range(db.TableDefs(5).Fields.Count): does
here what while not rs.EOF: did above: prevent an error due to exceeding
the recordset. RecordCount is new here. Very useful, but be sure you rst
execute
>>> rs.MoveLast()
Finally, GetRows() is new. GetRows() retrieves one row|the one being pointed
to by the cursor|as a tuple of tuples. (The notation \(1,)" means a tuple with
one element, the integer 1.) GetRows(n) retrieves n rows as tuples of tuples,
starting with the current row. Notice that if you try to get a row beyond the
recordset you get an error.

10.3

Beyond SELECT

To run SQL commands other than SELECT commands you use a dierent
mechanism in DAO. You use Execute. Here's an example.
>>> db.Execute("delete * from Table1")
>>> rs = db.OpenRecordset("select * from Table1")
>>> rs.MoveLast()
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
[further error message]
>>> rs.RecordCount
0
>>> db.Execute("INSERT INTO Table1 VALUES(10, 11, 12)")

10.4. HANDLING QUOTES IN SQL SELECT QUERIES

143

>>> rs = db.OpenRecordset("select * from Table1")


>>> rs.MoveLast()
>>> rs.RecordCount
1
>>> rs.MoveFirst()
>>> rs.GetRows()
((10,), (11,), (12,))
>>>
Here we see a DELETE and an INSERT command from SQL. It all works ne.
Word of Warning: Be sure to test your SQL statements directly in Access
before puzzling about why they don't work in Python.

10.4

Handling Quotes in SQL SELECT Queries

These interactions demonstrate use of strings to dene the SQL queries, and
the use of SQL WHERE clauses.
>>> rs.Fields.Item(0).Name
u'docid'
>>> db.Name
u'c:\\day\\db1.mdb'
>>> rs = db.OpenRecordset("select * from Table1")
>>> rs.MoveLast()
>>> rs.RecordCount
1
>>> rs.GetRows()
((10,), (11,), (12,))
>>> db.Execute("INSERT INTO Table1 VALUES(20, 21, 22)")
>>> rs = db.OpenRecordset("select * from Table1")
>>> rs.MoveLast()
>>> rs.RecordCount
2
>>> rs.GetRows()
((20,), (21,), (22,))
>>> strSQL = "SELECT * FROM Table1 WHERE docid < 15"
>>> rs =db.OpenRecordset(strSQL)
>>> rs.MoveLast()
>>> rs.RecordCount
1
>>> rs.GetRows()
((10,), (11,), (12,))
>>>
Finally, this interaction with the spj-begin.mdb database illustrates construction of an SQL query string containing an element that has to be quoted.

144

CHAPTER 10. PYTHON AND DATABASE VIA DAO

Also, Access allows odd characters in its eld names. \S#" is not permitted in
standard SQL. So, Access uses a square bracket notation to indicate that this
is indeed a valid eld name (in Access). This mechanism is also needed when
eld names have spaces in them.
>>> db.TableDefs(5).Fields(0).Type
4
>>> spdb = engine.OpenDatabase(r'c:\day\spj-begin.mdb')
>>> spdb.Name
u'c:\\day\\spj-begin.mdb'
>>> strSQLsp = "SELECT [S#] From s WHERE SNAME = 'Adams'"
>>> rssp = spdb.OpenRecordset(strSQLsp)
>>> rssp.MoveLast()
>>> rssp.RecordCount
1
>>> rssp.GetRows()
((u'S5',),)
>>>

10.5

Gotchyas

If you are accessing a database from Python and then you open it up Microsoft
Access, suddenly your Python DAO commands won't work. The problem is
that Access is a single user system.
Referential integrity considerations in Access may prevent deletion of a
record. If so, your DAO SQL DELETE command won't work; it'll do nothing.
Lesson: try things rst in Access in SQL.

10.6

For More Information

Microsoft's [15], especially page 730, has useful information on DAO. The help
le is DAO360.CHM and is a useful install. Helen Feddema's DAO Object Model:
The Denitive Reference [7] is excellent. Two terric Web pages for Python and
DAO and Access:
http://starship.python.net/crew/bwilk/access.html
http://www.e-coli.net/pyado.html

File: python-dao.tex

Chapter 11

Python Quick Reference


11.1

Setting a Convenient Path

To use an example without loss of generality, suppose you have Python module
le, startcom.py, at D:\athomepc\day, as follows:
import win32com.client
xl = win32com.client.Dispatch('Excel.Application')
If you try to import this module, you'll likely get an error message.
Python 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> from startcom import *
Traceback (most recent call last):
File "<pyshell#0>", line 1, in ?
from startcom import *
ImportError: No module named startcom
>>>
The problem is that D:\athomepc\day is not in Python's search path. The
following code shows this.
>>> import sys
>>> sys.path
['D:\\PYTHON21\\Tools\\idle', 'D:\\Python21\\win32',
'D:\\Python21\\win32\\lib','D:\\Python21',
'D:\\Python21\\Pythonwin', 'D:\\PYTHON21\\DLLs',
'D:\\PYTHON21\\lib',
'D:\\PYTHON21\\lib\\plat-win',
'D:\\PYTHON21\\lib\\lib-tk']
>>>
145

146

CHAPTER 11. PYTHON QUICK REFERENCE

The thing to do is to add (append) the path you want Python to look in to
the sys.path list. You can do it as follows.
>>> sys.path.append('d:\\athomepc\\day')
>>> sys.path
['D:\\PYTHON21\\Tools\\idle', 'D:\\Python21\\win32',
'D:\\Python21\\win32\\lib', 'D:\\Python21',
'D:\\Python21\\Pythonwin', 'D:\\PYTHON21\\DLLs',
'D:\\PYTHON21\\lib', 'D:\\PYTHON21\\lib\\plat-win',
'D:\\PYTHON21\\lib\\lib-tk', 'd:\\athomepc\\day']
>>>
Now your import will work just ne.
>>> from startcom import *
>>> xl.Name
u'Microsoft Excel'
>>>

11.2

Dispatching Excel

PythonWin 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)]


on win32.
Portions Copyright 1994-2001 Mark Hammond (MarkH@ActiveState.com)
- see 'Help/About PythonWin' for further copyright information.
>>> import win32com.client
>>> xl = win32com.client.Dispatch('Excel.Application')
>>> xl.Visible = 1
>>> xl.Workbooks.Add()
<win32com.gen_py.Microsoft Excel 9.0 Object Library.Workbook>

11.3

Using Excel Constants from Python

>>> xlDouble = win32com.client.constants.xlDouble


>>> xlDouble
-4119
>>> xlEdgeBottom = win32com.client.constants.xlEdgeBottom
>>> xlEdgeBottom
9
>>>
Then you can use these Python variables as if they were Excel VBA constants.
xl.Range("A9").Borders(xlEdgeBottom).LineStyle = xlDouble

11.4. USING FORMULAR1C1-STYLE FORMATS

147

>>> db.Properties(0).Name
u'Name'
>>> db.Properties.Count
13
>>> for i in range(db.Properties.Count):
... print i, db.Properties(i).Name
...
0 Name
1 Connect
2 Transactions
3 Updatable
4 CollatingOrder
5 QueryTimeout
6 Version
7 RecordsAffected
8 ReplicaID
9 DesignMasterID
10 Connection
11 AccessVersion
12 Build
>>> db.Properties(11).Name
u'AccessVersion'
>>> db.Properties(11).Value
u'08.50'
>>>

11.4

Using FormulaR1C1-Style Formats

>>> xl.ActiveWorkbook.Worksheets('Sheet1').cells(4,2).FormulaR1C1 = \
"=SUM(R[-2]C:R[-1]C)"
Note: \ is Python's line-continuation character.

11.5

Dispatching DAO for MS Access

>>> import win32com.client


>>> engine = win32com.client.Dispatch("DAO.DBEngine.36")
>>> db = engine.OpenDatabase(r'c:\\day\\db1.mdb')
>>> db.Name
u'c:\\day\\db1.mdb'
>>> rs = db.OpenRecordset('select count(*) from Table1')
>>> rs.Parent.Name
u'c:\\day\\db1.mdb'
>>> for i in range(rs.Fields.Count):
... print rs.Fields(i).Name

148
...
Expr1000

File: python-quick-ref.tex

CHAPTER 11. PYTHON QUICK REFERENCE

You might also like