This lesson is in the early stages of development (Alpha version)

Introduction to Machine Learning For Engineering Research (ML4ER): Glossary

Key Points

Basics of Machine Learning
  • Machine learning (specifically supervised learning) can be used to model complex materials properties that are hard to obtain experimentally

  • Machine learning models make predictions by learning relationships and patterns from existing data, and use those learned patterns to make predictions of properties of new materials

  • A workflow for building machine learning models can be broken down into key steps: Data Cleaning, Feature Generation, Feature Engineering, Model Assessment, Model Optimization, and Model Predictions

Establishing Research Workflows
  • MAST-ML is a machine learning workflow package that enables rapid iteration of model building and analysis due to the ability to customize each of the key steps in a machine learning workflow.

  • With the notebook we’ve provided MAST-ML can be installed on Google Colab

  • In this activity users have run one example workflow with one set of choices at each step. Users can then change each step to explore other workflow versions and choices.

Comparing Model Types
  • MASTML is divided into seperate sections which execute key steps in a machine learning workflow. By changing individual steps with a few lines of code we can change settings and configurations at each step.

  • In the exercises we demonstrate changes to the model type and hyperparemeters. Additionally changes can be made to data cleaning, feature generation/engineering, model assessment by making similar edits in the notebook.

Optimizing Model Hyperparameters
  • A sequential grid search of candidate hyperparameter values can progressively search for the best combination of model hyperparameters.

  • Often time multiple grid searches are required to fully explore a hyperparameter space in sufficient detail.

  • If exploring higher numbers of hyperparameters (>4 or so) it may be better to use more sophisticated search techniques due to computational constraints.

Ethical Data Cleaning
  • Removing or ignoring data simply because it performs poorly is not appropriate

Navigating Roadblocks and Obstacles
Creating a Group Compact
  • Establishing a group compact can enable group participants to engage with each other productively and positively.

Addressing Model Limitations
  • The data used to train a ML model can impart implicit bias into the models predictions

  • Considering how representative the training data is of predictions that we want to make can help us understand where it is appropriate to make predictions

Glossary

BENUPDATE