You are on page 1of 7

Product Proposal/ Calendar

Introduction and Statement of Purpose:

Through my Independent Study in Software Development, I hoped to gain an


understanding of the various fields within Software Development and for my Final
Product, I will be focusing on Data Programming. My Final Product entails utilizing
extensive data from the NCAA from a variety of basketball tournaments in order to
conduct spreadsheet analysis, program visualizations, and- ultimately- create an
application that allows users to predict the success of teams depending on certain
variables- such as game time or fouls. The product will consist of artifacts from all three
of the stages with the final application being the main product. As I complete the
project, I will be taking note of all the obstacles and resources I come across in order to
coauthor a tutorial blog post with my mentor for Google to enable other students to try
the project and learn the fundamentals of Data Programming. During my final
presentation, I will elaborate on how each stage allowed me to further my knowledge in
Data Programming and transition into the next stage. Furthermore, I will show the
work from all three stages and allow my audience to use the final application and
download it if they are interested. I will also show the audience parts of the tutorial that
my mentor and I will be writing in order to better explain the process I used to
complete the final product. This product utilizes data not yet available to the public;
therefore, the result will be unique. The reason why my mentor and I decided on this
for my final product is the exposure I will attain to a different side of Software
Development. The learning from this project is different from the programming projects
I have done in the past and will give me insight into utilizing data and Artificial
Intelligence. Both of which will be crucial to me as the tech industry is going further
towards data and Artificial Intelligence. Through working on this project, I will attain
an introduction into Data Programming, Cloud Computing, SQL, Python, and Machine
Learning while also furthering my understanding of application development
principles and coding best practices. Furthermore, I will have to learn how to connect
cloud-based spreadsheets to applications through an API- which will assist me in
creating more complicated and smarter applications in general. Ultimately, the project
will allow me to utilize my background knowledge in programming and bridge it to
another sector.

Review of Skills and Research:

Throughout this project, I will have to learn many new programming concepts in
order to successfully complete it. This includes becoming familiar with new tools and
Software, learning programming languages, and developing an understanding of
computing and Machine Learning principles.

The tools that I will be using include Google Sheets, BigQuery, Data Studio,
Cloud Data Lab, Machine Learning Studio, the PyCharm Python IDE, and an API to
connect the Python code to the Cloud Data Studios. All of these tools will allow me to
progress towards creating the application. Google Sheets is simple to use and I have
used it many times in the past; therefore, it will be the easiest tool that I work with.
Cloud Data Lab and the Machine Learning Studio will probably be the most
complicating stages as they are professional development studios. Both are not as
straightforward as Google Sheets. However, considering that I will be stair-stepping up
from the easier tools, the transition should not be too challenging. Additionally, they
are more powerful than the tools I have worked with, so I should be able to attain
interesting results. Utilizing PyCharm will be doable as I am comfortable with similar
Java IDEs. The most complicating aspect of this step will be identifying an appropriate
API to assist in the development of the application. This will require intense research as
I want to find a relatively simple method of linking the code and data; however, I also
want the solution to be reliable.

The two programming languages that I will be working with are Python and
SQL. SQL will be a little challenging to learn as I have never done any Database
Programming and the language is completely new to me. Furthermore, it is not similar
to the languages that I am familiar with- Java and Swift. However, SQL will be crucial
to this project as almost all the stages, especially the initial ones. Furthermore, learning
SQL will allow me to connect applications to databases and analyze the statistics. In
order to learn the language, I have been watching Khan Academy videos and have
started a CodeAcademy course. Furthermore, I will attain a lot of practice using SQL as
I will be using it exclusively while working with BigQuery. Python should be easier for
me to pick up as it is similar to Java and Swift. Furthermore, since I am familiar with
programming logic, the only aspect that will be new is some Python syntax. In order to
learn the Python language, I will be reading through a reference book, using online
tutorials, and practicing on Khan Academy. I have already started looking through
some of the Python resources as well.

The most important aspect of this project is that I learn more about Machine
Learning and Cloud, Data, and Distributed Computing. Since I hope to continue
researching these fields through college and the work place, I want to have a solid
understanding of the fundamentals. I will learn these principles through discussions
with my mentor and reading research articles as I engage in the process of developing
this project.

Materials:

This entire project will be completed on a laptop and the materials are free. There may
be some costs associated with the Google Development Studios and API; however, for
my relatively simplistic purposes- the free threshold should not be crossed. The
following data, websites, and software will be utilized:

1. Extracted sr_pbp_summaries.sr_teams_games_v2 data from BigQuery. This data


is not yet public; however, it is extensive and is the data that this entire project is
based off of.
2. Google Sheets (https://www.google.com/sheets) to develop a data dictionary
and gain comfort with the available data as well as conduct initial analysis by
creating pivot tables and filtering the data.
3. BigQuery (https://bigquery.cloud.google.com) to allow conducting rapid data
analysis and processing using SQL.
4. Data Studio (https://datastudio.google.com) to create simple tables, charts and
graphs from the data.
5. Cloud Datalab (https://cloud.google.com/datalab/) to extract and create
visualizations from BigQuery and build a Machine Learning model.
6. Machine Learning Studio (https://console.cloud.google.com/mlengine) to
conduct a Machine Learning experiment and identify the probability of teams’
success dependent upon variables such as geographic location and game
speculations.
7. PyCharm Python IDE (https://www.jetbrains.com/pycharm/) installed to code
the GUI that will extract the Machine Learning data and allow users to input
fields and attain predictions regarding the success of a team. TkInter will be
imported and used to create the graphics.
8. API to connect my Python code to the Cloud Data Studios. I will have to research
and find an appropriate API to help me connect the project together.
9. Lucidchart UML Diagraming Tool (https://www.lucidchart.com) to plan out the
application and organize the code.
10. Photo editing software to develop graphics for the application and design the
application

Methodology:
Prior to starting any Data Programming or Machine Learning project, it is vital
that there is an extensive amount of data to use. This allows for more profound and
more accurate results. My mentor came across a lot of NCAA data; therefore, this step
was completed.

The official first step to this project is compiling a data dictionary utilizing the
rows and columns available in the spreadsheet. The purpose of creating a data
dictionary is to gain familiarity with the available data and note the type of the data to
prepare for programming with the data. I have already completed this step. This is
followed by utilizing Google Sheets to create simple visuals to examine the data. This
includes filtering the data itself, creating pivot tables, and creating charts. These steps
allow looking closer at the data and gaining a perspective regarding the effects of
certain variables with straightforward steps. Furthermore, this does not require much
background knowledge. I have also completed this step.

Following this is using SQL in BigQuery to identify patterns in the data and
closely look at certain categories- such as the 10 lowest scores or the 20 longest games or
the 15 games played with the least amount of travel time for the visiting team. I am
currently working on this step; however, I have not been able to work on this step using
the game data due to security reasons. Instead, I have been getting accustomed to the
BigQuery platform by working with public data sets such as temperatures. BigQuery is
special in that it enables fast analysis and iterates through the data with great speeds.
Furthermore, the parallel processing capabilities will make it faster for BigQuery to
process SQL queries that I code.

After this, I will work with Data Studio and Cloud Datalab. They are
professional development studios and will allow me to create more sophisticated and
more meaningful visualizations with the data. This will be crucial as it will allow me to
truly understand the data and embark on utilizing Machine Learning. Data Studio
builds off of BigQuery and makes it easier to observe significant patters while Cloud
Datalab allows reviewing, visualizing, and coding all in one place. It is used by
professional researchers. Therefore, this stage will be the most profound in that it is at
the heart of the project- analyzing visual representations and utilizing Machine
Learning algorithms to make predictions. This will additionally entail that I go through
a multitude of articles and tutorials so that I will be able to make the most out of the
platforms.

Following this, I will utilize Python to create the application. First, I will create a
diagram and design my application on Lucidchart. While working on this, I will keep
the Software Development principles I have learned through research and the coding
best practices that I have learned through the BPA Java Competition. This step will
allow me to be more methodological and efficient when I am programming.
Additionally, I will design the application on a photo editing software so that I can
reason through what code I need in order to create the output I am hoping for. After
that, I will focus on learning how to connect the local Python code to the online
Machine Learning server and online data spreadsheet and code that aspect. This will
entail researching and finding an appropriate API to enable me to complete that task.

After getting all of the information in, I will look into utilizing TkInter in order to
code the graphics aspects. This part will be a little tricky as I have not had much
experience with animating programmatically; however, it aligns more with the
programming I am used to. Also, I will create the images and backgrounds using a
photo editing software during this step. This will allow me to finish coding the
application and ensure it can be downloaded and run from devices.

Finally, I will test my application on my device and adjust or enhance it based on


peer feedback, my mentor’s guidance, and feedback from Mr. Pirtle. After completing
the project, I will post the tutorial for this project with my mentor. Additionally, I will
use that along with a lot of visuals I programmed to assist me in delivering my Final
Presentation. The last step will be creating a script for my Final Presentation and
organizing the above steps into a visual aide.

Conclusion:

Although this final product will be challenging, I am confident that with


dedicated effort I will be able to complete it due to my high interest level and curiosity
towards Machine Learning. I am certain that the application will be functional and
aesthetically pleasing and that all of the visualizations will be neat and clear to
interpret. Furthermore, I yearn put a lot of effort into the product so that the application
gives insightful results with close to accurate success rates. I hope to utilize the
application and watch the results of future games in order to test the accuracy of the
application.

This project is apt for me to work on as it will allow me to gain exposure in areas
of Computer Science that I do not have much experience with- Data Programming and
Machine Learning. Furthermore, since both of these areas are growing in the tech
industry at rapid rates, it will be useful knowledge for me in the future and give me a
taste of the field I hope to dive into during and after college.
Even though I may not publish this application for the public, I will be sharing
the steps along with my mentor on the Google blogs. I hope to have a product that is
captivating to look at and reflects months of work. Furthermore, since I will be utilizing
the same resources in the future; I hope to program and work through all aspects using
my best effort so that I will be able to look back at the product if I am stuck in the
future.

Overall, I am excited about this great learning opportunity and exploration of


Machine Learning and Data Programming and hope to have a final product that will
help me captivate my audience on Final Presentation Night and convey my passion
towards the Software Development industry.

Calendar
Completed Steps
 Create a data dictionary
 Use Google Sheets to filter the data, create tables, and create charts
 Analyze the visuals from Google Sheets
Week One (3/10-3/16)
 Learn the fundamentals of SQL
 Work with BigQuery and use standard SQL to look at specific patterns
 Work on formatting queries, exploring my data to optimize cost, commenting
query code, and substring matching within BigQuery to familiarize myself with
the
 Bind Data Studio to my data source and start reading tutorials
Week Two (3/17-3/23)
 Create simple tables, graphs, and charts on Data Studio
 Start working with Cloud DataLab and get accustomed to it after watching
tutorials
Week Three (3/24-3/30)
 Export the NCAA data to BigQuery
 Machine Learning through Cloud DataLab
Week Four (3/31-4/06)
 Wrap up the Machine Learning portion with Machine Learning Studio and
Cloud Datalab
Week Five (4/07-4/13)
 Create the UML Diagram
 Design the GUI
 Research how to connect Python code to Machine Learning servers and cloud
databases
 Continue learning Python
Week Six (4/14-4/20)
 Connect the application to the cloud database and Machine Learning server
 Create all the images on a photo editing software
Week Seven (4/21-4/27)
 Code the graphics in the application
Week Eight (4/28-5/04)
 Finish coding the application
 Make the application able to run on devices
 Start working on the Final Presentation script
Week Nine (5/05-5/11)
 Attain feedback and make adjustments if necessary to the application
 Post tutorial to Google blogs
 Practice for Final Presentation night
Week Ten (5/12-5/18)
 Practice for final presentation night