Finding out the importance and frequency of use of the Scala programming language by Data Scientists on GitHub
GitHub Pull requests by Month and Year
Pull requests by User
With almost 30k commits and a history spanning over ten years, Scala is a mature programming language. It is a general-purpose programming language that has recently become another prominent language for data scientists.
Scala is also an open source project. Open source projects have the advantage that their entire development histories -- who made changes, what was changed, code reviews, etc. -- are publicly available.
We're going to read in, clean up, and visualize the real world project repository of Scala that spans data from a version control system (Git) as well as a project hosting site (GitHub). We will find out who has had the most influence on its development and who are the experts.
The Dataset we will use, which has been previously mined and extracted from GitHub was cleaned to ensure consistency.
Exploratory Data Analysis was done to gain meaningful insights from the data and draw our conclusions.
Data Visualization was done using Matplotlib Inline to illustrate the insights gotten from the data.
Jupyter Notebook
A Notebook used as the Integrated Development Environment for the Project execution.
Publication
An Article was Published on Datacamp to share the project to the internet.