You are on page 1of 8

Project

2 Reflection
Teresa Anderson-Myers
Knowledge-Based Artificial Intelligence
Spring 2017


What is your agents reasoning approach?
My agent uses a purely visual approach. This matches the way a human would be likely to solve a Ravens
Matrix Problem. Rather than keeping dozens of words in memory, it is more natural for a human to look for
movement as shapes and images change vertically and horizontally.

Of course, in order to determine what the movement is between images, my agent does store information
about what it sees as possible transformations between images as objects with text fields. My agent is not as
good at looking at the entire grid to see patterns as a human would be.

How does your agent represent/store the images efficiently?
My agent creates a VisualFigure object which holds information about itself at 3 levels of granularity:
Pixels (a matrix of Boolean values indicating whether each pixel is white or non-white)
Lines (created by finding the outlines of shapes and linking together connecting non-white pixels)
Shapes (created by linking touching lines)


















This technique of creating Objects to store the same information about each of a series of entities, comes
from the KBAI topic of Frames. The slots for VisualFigures, Shapes, Lines, and Transformations are the same.
But each individual entity will have different values in the slots. The agent can then perform the same kinds of
analysis on each Frame (or Object) and compare results in order to objectively compare what is going on
between various figures.

What is your agents overall problem-solving process?
My agent uses the concept of a Production System in order to take the various Frame objects described above
and apply rules to them. In the short-term memory of the agent, there are VisualFigures and Shapes. The
agent is methodically going through each VisualFigure or Shape and applying its long-term knowledge in order
to find the pattern shown in the matrix.


Figure 1: Overall transformation analysis. The agent assumes that images with greater than 80 lines are too complex to parse into individual
images. Instead, it looks for differences between each image as a whole. It does this by comparing the fill ratios of the each image, using Python
Pillows ImageChops to subtract images from each other to find a difference, and by dividng the image down the middle and transposing it so that
the midline becomes the outer edges to look for a match with another image.

The agent attempts to find patterns in Transformations that occur between either individual shapes or the
entire image as a whole. The general outline of steps for solving a 3 x 3 matrix is as follows:

1) For each image, the agent attempts to discern shapes based on the edge lines it has created from the
pixel matrix of the image

a. If all of the lines found in the VisualFigure are parsed into shapes (no leftover hanging lines)
then the agent attempts to match shapes between figures and assign a Transformation that
must have occurred in order for the Shapes in Figure X to become the Shapes in Figure Y.

b. If there are leftover hanging lines, the agent determines that it cannot use the shapes it has
created to accurately define transformations. Instead, it uses the Pillow library to find the
differences between the two figures in question.
i. It looks for shapes that exist in Figure X but not Figure Y (deleted shapes)
ii. It looks for shapes that exist in Figure Y but not Figure X (added shapes)
iii. It weights the added/deleted shapes based on:
1. Size of Shape
2. Fill (ratio of number of non-white to white pixels) of Shape
iv. It gives a score to each answer figure based on how closely the added/deleted shapes,
and their corresponding sizes and fill ratios, match the first two columns and first two
rows.





2) In the 3 x 3 matrix, the agent looks for transformations between the following images:

a. Vertical



b. Horizontal



3) The agent compares the transformations that would occur in the above Transformations and finds the
Transformation sequence that most closely matches what is seen in the top two (complete) rows and
the left two (complete) columns.

The design of this problem-solving method comes from by modelling human behavior in solving a Ravens
Progressive Matrix. A human mind looks at the problem in pieces and asks questions like:

What kinds of changes are happening in each row from left to right? In each column from top to
bottom?
How does the first image in a row affect the last image in the row?
How does the second image in a column affect the last image in the column?
Which of the possible answer images shows similar changes to the images in the right column and
bottom row?

My agent attempts to separate the image into discrete shapes, but if the analysis of the shapes is too difficult-
it looks for a larger pattern. This is similar to the way a human might go about solving these problems. For
example, in Problem C-09 I could not find a 1:1 transformation happening between individual shapes in
different images. However, when I looked at the entire matrix as a whole, I could see that all of the white
squares are clustered in the bottom right and the black squares are more prevalent in the top left. This
strategy leads my human brain to think that 5 is the correct answer, based simply on the fact that it follows
the overall color change of the entire problem. Trying to match individual shapes doesnt always work for our
brains, and it also doesnt always work best for my agent.


Figure 2: A KBAI agent might have more success solving a problem like this by looking at overall color distributions rather than at individual shapes.


What mistakes does your agent make?
1) My agent is far too brittle and dependent on the specific transformations that I saw and
addressed in the Basic problems. In fact, in order to get a higher number of Basic C problems
to pass, the success of my agent at solving the Basic B and Test C problems went down
slightly.

2) My agent has a much harder time with complex problems with a greater number of shapes. It
attempts to match shapes between images by looking at their line makeup, fill, size, and
position. But it often cannot find the shape match that it is looking for. Therefore, many times
my agent will have ADDED and DELETED transformations when in fact the shape has just
moved or transformed in a way that my agent didnt catch. My agent comes up with phantom
transformations between figures.

3) My agent is not trying to understand how the two-steps in each row or column can give
information when taken into account together. For example, if my agent sees the following:

a. A B is a clear rotation 90 degrees to the right


b. B C could either be a horizontal flip or a rotation 90 degrees to the right
Instead of taking into account the transformation seen in A B and determining that it makes
more sense to call the B C transformation a rotation as well, my agent is still going to call
the transformation from B C a horizontal flip (because the agent weights flips more highly
than rotations).

Could these mistakes be resolved within your agents current approach, or are they
fundamental problems with the way your agent approaches these problems?

1) For Project 1, I relied entirely on finding lines and using those lines to create shapes. Every
transformation was based on mapping shapes between figures. This has not worked as well
for Project 2 and I predict it will work even more poorly for Project 3. Im hoping that I can
invest some more in the Python Pillow librarys methods of addition and subtraction to focus on
just the differences between images, rather than mapping all shapes (including ones that have
not changed).

2) I cannot move to a completely shape-free Rules system for my agent. It still needs to
determine somehow that discrete groups of pixels (Shapes) are related between images. I
would like to rely less on defining shapes by contiguous lines and more on defining shapes by
using image subtraction.

3) It looks like Project 3 incorporates some diagonal relationships into the 3 x 3 matrices. I will
need to invest in a strategy to compare not just two images at a time, but 3 images at a time.
This will allow me to capture information happening along an entire row, column, or diagonal.
Having information from two steps should allow my agent to make more informed decisions
when identifying a transformation and assigning a weight to it. It may also help in mapping
shapes between figures.

Agent Results
Accuracy:
11/12 Basic C
5/12 Test C

Efficiency:
My project takes approximately 5 minutes to run on bonnie for project submission. I cleaned up some of the
repetitive loops over pixels used in my Project 1 version of the agent. But that has been completely countered
by new functionality I have introduced for Project 2. There are chunks of functionality that are still being
repeated unnecessarily.

Generality:
My agent is terrible at generalizing problems. I can tailor my agent to look for a specific transformation that I
have in front of me, but it does not do well at finding patterns in problems that I (and my human brain) have
not already seen.

One thing I plan on doing before I start addressing the new problems in Project 3, is to go back and have my
agent correctly solve all of the B problems again. My agent is currently missing two Basic B problems that it
was getting correct for Project 1. I hope that going back to a more basic approach to solve the B problems will
help me target (and remove) code that I have written to increase my score on the C problems that is too
specific to help solve general problems.


Flexibility:
My agent has hard-coded Transformations that it is looking for (Rotations, Color Changes, etc.). It would be
much more flexible for my agent to store Transformations as some kind of Frame that has slots for things like:
- Color Change
- Vertical Movement
- Horizontal Movement
- Size Change
- Etc

Then I could compare transformations without having to make some kind of determination of what
transformation I am looking at. Rather, I could give each field a weight and score similarity of transformations
on a much more granular level.

Please provide an explanation of how your methods/components/ideas in your agent's design
are/might be similar to (or can be related) to specific KBAI methods discussed in class

I use Knowledge Representations to answer the two questions from the paper What is a Knowledge
Representation? (Davis, Shrobe, and Szolovits, 1993).

1) what is it a surrogate for?

I use the Knowledge Representation of Frames to store:
o VisualFigures (surrogates for each square in the matrix)
o Shapes (surrogates for the shapes in each image)
o Lines (surrogates for physical lines)
o Transformations (surrogates for the information regarding how a Shape would have to change
to move from one VisualFigure to another)

2) how close is the surrogate to the real thing?

My Knowledge Representation is fairly accurate at measuring VisualFIgures at a pixel level, Shapes that
are clear and discrete, and Lines that are not diagonal. My Knowledge Representation is not a terribly
accurate surrogate for Transformations. My agent can parse and classify static data, but is much worse
at representing and classifying changes between states of data.


My agent also uses Logic in order to make determinations as to what Transformation is happening between
two VisualFigures. It comes up with Logical rules, based on what it sees in the complete columns and rows,
and searches for answer figures that follow those rules. As an example, here is a subsection of Logic my agent
uses to answer problem C-05:



























What does the design and performance of your agent tell us about human cognition?

It can be difficult to understand human cognition on a non-verbal level. Since we communicate verbally, we
are used to organizing thoughts and ideas into precise words with specific meanings. The Raven Progressive
Matrices test focuses on abstract reasoning termed fluid intelligence. The test consists of increasingly
difficult pattern matching tasks and has little dependency on language abilities (Bilker, Brensinger, Richard,
Gur, Gur, 2012). By attempting to create an agent that can solve problems that are completely visual, we are
able to translate some of the visual learning that we do automatically without translating into specific words.

For me, this need to step away from specific words is difficult. The most obvious starting point is to start by
listing every possible transformation that can occur in a sequence. I have found that this is impractical in the
long term. I need to design an agent that can understand granular changes across images without having to
label them, because there are too many small variations to make labelling each of them an efficient or
practical prospect.

Rather, for Project 3 I will be looking towards removing the concrete Transformation definitions in my current
agent. I will replace them with a more fluid approach that uses simpler, smaller concepts to mix and match
into larger and more precise concepts of transformation (without names). Then I will attempt to match the
smaller pieces between sequences rather than matching a specific, all-encompassing transformation. This is
similar to the way a human is able to approach these problems. Rather than keeping a list of what each shape
is doing in a row or column, we can see the movement that happens in between the frames by combining
several small observations into a concept of change over time.


References:
Davis, R., Shrobe, H., & Szolovits, P. (1993). AI Magazine, pp. 17 33. Retrieved from
http://groups.csail.mit.edu/medg/ftp/psz/k-rep.html

Bilker, W. B., Hansen, J. A., Brensinger, C. M., Richard, J., Gur, R. E., & Gur, R. C. (2012, September).
Development of Abbreviated Nine-item Forms of the Ravens Standard Progressive Matrices Test. Retrieved
March 20, 2017, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410094/

You might also like