Professional Documents
Culture Documents
Project short title: This project aims to extend the current functionality and
capabilities of the R package ‘markovchain’ in order to provide statisticians a more
functional tool to perform analysis of stochastic projects related to Markov chains
(MCs).
Bio of Student:
I am a computer science student studying in Indian Institute of Technology
(Banaras Hindu University), Varanasi, India. I have relevant coding experience in R
that would be needed to build this package. I have previously worked on datasets
like Hubway visualization challenge, movielens dataset. In addition, I have been
working on building a shiny web application using the Rgbif package currently
(Github link below). A similar visualization application can be a part of the proposed
package. In addition to R programming, I am also familiar with c++ programming
and the Rcpp package. I have implemented all the assignments and projects in the
data structures and algorithms course using c++ language and hence a command
on c++ also. Along with this I am also familiar with git/github (version control).
Academics:
I have been enrolled in many MOOCs (Massive Open Online Course) related to
data science in R language, Data science specialization courses on Coursera
among others. In total, I have taken three courses in computer programming along
with Data structures course. Currently I am attending an Algorithms course and an
Artificial Intelligence course at my college. I have also taken a statistics and
probability course in my institute. I have implemented a Hidden Markov Model and
then used the viterbi algorithm to perform part of speech tagging as an assignment
in one of the courses, hence I am familiar with stochastic processes. link -
https://github.com/vandit15/AI-lab-codes .
2. Built a website for training and placement cell IIT(BHU) using django for
backend, MySQL for database management and materializecss for front-end
designing.
Link - https://github.com/vandit15/IITBHU-TPO-site
Contact Information:
Student Name: Vandit Jain
Student postal address: 138-A, R K Puram, near Gauri hospital, Kota, Rajasthan,
India (pin code-324005)
Phone number: (+91) 8764340070, 7233013328
Email: jainvandit15@gmail.com, vandit.jain.cse15@itbhu.ac.in
Student affiliation:
Institution: Indian Institute of Technology (Banaras Hindu University), Varanasi,
India
Program: Bachelor of Technology (B.Tech) in Computer Science and Engineering.
Stage of completion: Part 2 (4th Semester)
Contact to verify:
Dr. Rajeev Shrivastava
Professor, Department of Computer Science and Engineering
Email: rs.cse@iitbhu.ac.in
Schedule Conflicts:
I do not find myself working in any kind of internship/part time jobs/other jobs during
summer of 2017. I have no conflicts with the GSoC schedule. I am willing to invest
whole of my three months towards the success of my GSoC project.
Mentors:
Mentor-1 - Sai Bhargav Yalamanchi
Mentor-2 - Giorgio A. Spedicato
I established contact with the mentors after solving the tests for the project. We
have been in contact since then.
Coding plan and Methods:
The project at its heart is to improve the markovchain package, improve run time of
current functions and add more functions.
Optimisation of current functions – For optimisation I would search for
opportunities in the current package where I can improve run-time. This would take
overviewing the code. R has packages such as microbenchmark among others that
can be used to detect bottlenecks in the code. Looping in R is quite a slow process.
After detection of slow running parts of code using above methods, the task is to
fasten the process. If I find a slow running loop, I would replace it with apply family
of functions. This would considerably improve the running time. All functions written
in R that are slow can be written in c++ using Rcpp package. This also improves
running time considerably. Fine-tuning current functions also includes improving
current documentation and unit-testing according to changes made. The package
also uses RcppParallel. I intend to use it wherever possible.
Joint Distributions of the number of visits for Finite-State MCs – This function
when implemented is expected to return a pdf of the number of visits to the various
states of the DTMC during the first N steps or before the Nth visit.
Markovchain Statistics - Currently computation of only the first passage time has
been implemented. The pdfs for each of these can be obtained by solving a set of
equations with similar forms but varying initial conditions for a ‘minimal’ solution. I
will be spending time building two functions that perform these tasks: Extending the
first passage time pdf computation for a set of states A and the expected first
passage time. Second is function that takes two disjoint sets A, B, the pdf which
takes an initial state i and tells you the probability that A is hit before B. Functions
would be implemented using the idea given in
(http://www2.math.uu.se/~takis/L/McRw/mcrw.pdf). Proper unit-testing and
documentation using roxygen2 would be an important part.
Timeline:
According to the coding plan, the timeline is set so as to implement considerable
deliverables at the time of both the mid-term evaluation (June 30th , July 28th ) and
the final evaluation (29th August) .
Pre community Bonding Period (April 3rd – May 4th ) - I would invest this period
of time in improving my knowledge about markovchain through sources one of them
definitely being Dobrow, Introduction to Stochastic Processes in R. I have took a
basic course in statistics and probability and also implemented a hidden
markovchain model as an assignment in Artificial Intelligence course which would
help. I would also brush up my R skills especially Rcpp.
Community Bonding Period (May 5th – May 29th ) – This period is important as
this time would be invested in discussing about the structure about the proposed
functions for the project. Also I would go through the whole package as currently I
have read very few functions from the package (during solving the tests). I would at
least write pseudo code or summary for some functions (after studying the papers
referred to) and also start implementing them if time permits. Also I intend to
perform optimisation related work in this period.
Coding Period -
Continuing from the work done in Community Bonding Period coding period would
be divided as follows:
30th May - 4th June – Complete optimization related work carrying on from the
community bonding period.
5th June - 12th June – Discuss with mentors and write pseudo code for functions
related to CTMCs. for p(t) read page 301 of book
1st July - 4th July – Discuss with mentors and write pseudo code for implementing
stability tests.
5th July - 12th July – Implement the discussed functions in the previous week.
13th July - 16th July - Discuss with mentors and write pseudo code for functions
related to markovchain statistics.
17th July - 24th July – Implement the discussed functions in the previous week.
25th July - 26th July – Unit testing of functions implemented after first evaluation.
27th July – Write documentation for the implemented functions.
29th July - 30th July - Discuss with mentors and write pseudo code for
improvement in graphics for the package.
31st July - 4th August – Implement the graphics functions for the package and
update documentation.
5th August - 9th August - Discuss with mentors and write pseudo code for
proposed functions related to HOMMCs.
10th August - 13th August - Implement the discussed functions.
14th August - 15th August – Unit testing and documentation for modified and newly
implemented functions.
16th August - 20th August – Implementing miscellaneous functions related to
computation of rewards and number of visits for finite state Mcs.
21st August - 28th August – Revising all modifications, updating documentation,
unit-testing and bug-fixing.
Tests:
For the markovchain package, I had to submit pull requests for issue #106 and
#115.
Issue #115 pertains to round off error in steadyStates function. This is the link to my
fork.
https://github.com/vandit15/markovchain