The cloud (public and private) provides an array of virtual machines, available with a range of cores, RAM and specialized hardware. Research faculty at small and medium-size institutions have a variety of requirements for computing and data resources, but need to be efficient in their use of these resources. Three current best practices (Terraform, Docker, AWS S3) are an integrated toolset able to provision compute/data resources and configure compute/data services in a customizable and efficient manner. The primary objective of the Tailored Research Environments (directed study) project is to create a web-based product that enables its user to run R, Python and Spark code in Jupyter notebooks on a user configured computer environment. Importantly, the user's notebooks (code) and datasets are stored separately from the computing environment. In addition, compute resources (of this environment) can be added and removed as needed, which will significantly reduce the overall cost of using the product.
---
The project is a solution to the problem of providing researchers with individually tailored compute resources in a cost-efficient manner. We propose to use an integrated set of technical tools to easily create tailored technical research resources in the cloud (AWS, Google, Microsoft and Open Stack.) This integrated toolset is AWS S3, Docker and Terraform.
AWS S3 provides inexpensive long-term persistent storage for datasets, programs, results and associated reports. Terraform provides the means to easily detail, record and share the specs/configurations of an array of technical services (primarily virtual machines, but there are others) from the four providers listed above. In addition, these services can be easily created and destroyed, which is the primary reason for this solution being cost-efficient. Docker provides an extensive selection of, mostly open source, software components that can be run on virtual machines.
The design of this solution follows the pattern wherein the data and code are persistent, but the use of computing resources is short-lived and impermanent. The later either reduce the cost of their use or facilitates the sharing of these resources.
Specific objectives of the project are to create a data analysis environment, which:
- is created and managed using "infrastructure as code" techniques
- provides distributed computing to the user
- Decouples code, data and compute capacity
The specific objectives of students working on the project are to:
- Create a working product
- Create product documentation
- Develop a working knowledge of infrastructure-as-code techniques
- Develop an understanding of distributed computing techniques and a working knowledge of Spark configuration
- Present a single tutorial and demo of the product to the Data Lab and incorporate feedback into the product and documentation
Project Information Subsection
The project deliverables are:
- Terraform configuration files for creating customized collections of resources
- Documentation on the use of AWS S3, Docker and Terraform to create these resources
- Use of the above tools to make provision specialized hardware from the four cloud providers
{Empty}
The student profile should include:
- Command line skills on Linux
- An understanding of IP addresses and ports
- Ability to debug and investigate in an unfamiliar environment
{Empty}
Some hands-on experience
{Empty}
Bentley University
175 Forest Street Waltham, Massachusetts. 02452
NE-MGHPCC
09/01/2018
No
Already behind3Start date is flexible
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
The student will learn:
- To use AWS, Google Compute Platform, Microsoft Azure and Open Stack
- How to use Terraform to provision from the above providers
- The Linux command line
- How to run, configure and create Docker containers
{Empty}
The Cyberteam will learn:
- Another solution to make use of private and public virtual compute environments
- To create tailored environments for researchers
Two types of resources are needed:
1.) Public cloud virtual machine resources (i.e. funds to create them)
2.) Private cloud virtual machine resources (Open Stack virtualization of possibly specialized hardware)
I would like to work with at least one Bentley student and would be happy to work with up to 3 non Bentley students. Having a Bentley student would help me communicate to the group.