Dave Voyles

Hi! I'm Dave Voyles
Software Engineer at Microsoft
Libraries (View all)
Getting Started
0 clones

If you haven't created Azure Machine Learning Workspace yet, first run configuration notebook. If you created a Workspace from Azure portal and launched notebooks from there, your workspace is configured already, and you can proceed to examples.

Try the example 01.run-experiment to connect to your workspace and run a basic experiment using Azure Machine Learning Python SDK, and then 02.deploy-web-service to deploy a model as a web service.

Then move to more comprehensive examples in tutorials folder, or explore different features in how-to-use-azureml folder.

See also:

Important: You must select Python 3.6 as the kernel for your notebooks to use the SDK.

Note: The config.json file in this folder was created for you with details of your Azure Machine Learning service workspace. Both these notebooks use this file to connect to your workspace. You can also copy this file into other places where you have code that needs this connection.

Modified on: Apr 8, 2019

For full documentation for Azure Machine Learning service, visit https://aka.ms/aml-docs.

To run the notebooks in this repository use one of these methods:

  1. Import sample notebooks into Azure Notebooks.

  2. Follow the instructions in the 00.configuration notebook to create and connect to a workspace.

  3. Open one of the sample notebooks.

    Make sure the Azure Notebook kernel is set to when you open a notebook.

Video walkthrough:

  1. Setup a Jupyter Notebook server and install the Azure Machine Learning SDK.

  2. Clone this repository.

  3. You may need to install other packages for specific notebook.

    • For example, to run the Azure Machine Learning Data Prep notebooks, install the extra dataprep SDK:
  4. Start your notebook server.

  5. Follow the instructions in the 00.configuration notebook to create and connect to a workspace.

  6. Open one of the sample notebooks.

Note: Looking for automated machine learning samples? For your convenience, you can use an installation script instead of the steps below for the automated ML notebooks. Go to the automl folder README and follow the instructions. The script installs all packages needed for notebooks in that folder.

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Modified on: Nov 2, 2018
Clover Code With
0 clones

More information can be found on our Microsoft Teams chat. Reach out to Dave to learn more.

Modified on: Jan 24, 2019
Modified on: Apr 23, 2018

Misc samples, examples, data, and workshops which don't quite fit in another library.

Modified on: May 6, 2019

This video series will teach you how to solve machine learning problems using Python's popular scikit-learn library. It was featured on Kaggle's blog in 2015.

There are 9 video tutorials totaling 4 hours, each with a corresponding Jupyter notebook. The notebook contains everything you see in the video: code, output, images, and comments.

You can watch the entire series on YouTube, and view all of the notebooks using nbviewer.

There is also a binder linked to this repository, which will allow you to interact with the notebooks online (without downloading them).

Once you complete this video series, I recommend enrolling in my online course, Machine Learning with Text in Python, to gain a deeper understanding of scikit-learn and Natural Language Processing.

  1. What is machine learning, and how does it work? (video, notebook, blog post)

    • What is machine learning?
    • What are the two main categories of machine learning?
    • What are some examples of machine learning?
    • How does machine learning "work"?
  2. Setting up Python for machine learning: scikit-learn and IPython Notebook (video, notebook, blog post)

    • What are the benefits and drawbacks of scikit-learn?
    • How do I install scikit-learn?
    • How do I use the IPython Notebook?
    • What are some good resources for learning Python?
  3. Getting started in scikit-learn with the famous iris dataset (video, notebook, blog post)

    • What is the famous iris dataset, and how does it relate to machine learning?
    • How do we load the iris dataset into scikit-learn?
    • How do we describe a dataset using machine learning terminology?
    • What are scikit-learn's four key requirements for working with data?
  4. Training a machine learning model with scikit-learn (video, notebook, blog post)

    • What is the K-nearest neighbors classification model?
    • What are the four steps for model training and prediction in scikit-learn?
    • How can I apply this pattern to other machine learning models?
  5. Comparing machine learning models in scikit-learn (video, notebook, blog post)

    • How do I choose which model to use for my supervised learning task?
    • How do I choose the best tuning parameters for that model?
    • How do I estimate the likely performance of my model on out-of-sample data?
  6. Data science pipeline: pandas, seaborn, scikit-learn (video, notebook, blog post)

    • How do I use the pandas library to read data into Python?
    • How do I use the seaborn library to visualize data?
    • What is linear regression, and how does it work?
    • How do I train and interpret a linear regression model in scikit-learn?
    • What are some evaluation metrics for regression problems?
    • How do I choose which features to include in my model?
  7. Cross-validation for parameter tuning, model selection, and feature selection (video, notebook, blog post)

    • What is the drawback of using the train/test split procedure for model evaluation?
    • How does K-fold cross-validation overcome this limitation?
    • How can cross-validation be used for selecting tuning parameters, choosing between models, and selecting features?
    • What are some possible improvements to cross-validation?
  8. Efficiently searching for optimal tuning parameters (video, notebook, blog post)

    • How can K-fold cross-validation be used to search for an optimal tuning parameter?
    • How can this process be made more efficient?
    • How do you search for multiple tuning parameters at once?
    • What do you do with those tuning parameters before making real predictions?
    • How can the computational expense of this process be reduced?
  9. Evaluating a classification model (video, notebook, blog post)

    • What is the purpose of model evaluation, and what are some common evaluation procedures?
    • What is the usage of classification accuracy, and what are its limitations?
    • How does a confusion matrix describe the performance of a classifier?
    • What metrics can be computed from a confusion matrix?
    • How can you adjust classifier performance by changing the classification threshold?
    • What is the purpose of an ROC curve?
    • How does Area Under the Curve (AUC) differ from classification accuracy?

At the PyCon 2016 conference, I taught a 3-hour tutorial that builds upon this video series and focuses on text-based data. You can watch the tutorial video on YouTube.

Here are the topics I covered:

  1. Model building in scikit-learn (refresher)
  2. Representing text as numerical data
  3. Reading a text-based dataset into pandas
  4. Vectorizing our dataset
  5. Building and evaluating a model
  6. Comparing models
  7. Examining a model for further insight
  8. Practicing this workflow on another dataset
  9. Tuning the vectorizer (discussion)

Visit this GitHub repository to access the tutorial notebooks and many other recommended resources.

Modified on: Feb 23, 2018
Starred Libraries (View all starred libraries)