Data Science 2: Beginners Data Science for Python Developers

Every day new data is created. New parts are made and shipped from factories, people continuously tweet, and companies grow and fluctuate causing major changes in the market. With the addition of more data comes the difficulty of being able to process that data. As humans, we can understand complex scenarios, but computers are much better at being able to analyze large datasets. In this workshop, you will get a glimpse into how we can teach machines to analyze complex scenarios at a much larger scale than we're able to. After you've cleaned and organized your data, you will have an opportunity to train and test machine learning models, and even publish your predictor online for others to explore.

You do not need any prior experience with data science to attend this workshop. You are likely someone who has ~1 year experience coding, preferrably in Python, but not a requirement. You are interested in learning how to use Python libraries to call machine learning models and make predictions on your data.

You should have your own laptop (Windows or Mac) with an Internet browser. You will be using Azure Notebooks, a cloud-based Jupyter Notebooks instance, and Azure Machine Learning Studio. All you will need is a Microsoft Account, which only requires an email address and for which you can sign up for at the event.

This workshop is meant to be highly interactive. The instructor will lead you in two interactive teaching styles:

  1. Interactive Lecturing: The majority of content for this workshop is in a Notebook. Though the content will be introduced via PowerPoint, the rest of the workshop will consist of walking them through the Azure Notebooks. During this time, instructors will employ an interactive lecture style, where learners will be asked to participate by asking questions and offering up ideas.
  2. Think, Pair, Share: For some of the more complex topics, the instructor will use the "Think, Pair, Share" method. This is where you will be asked a question and given about 45 seconds to think quietly to yourself. During this time it is imperative that you are not discussing with others yet. Then, you will have an opportunity to disucss with the 1-2 people next to you. Make sure you don't just share your answer, but why you think that is the answer. Finally, the isntructor will ask for a few people to share what they discussed with their neighbors.

Notice: Various interactive cues are called out in the Notebooks. These are suggestions and at the instructor's discression.

The primary source of content will be relatively bare Azure Notebooks where the instructor will guide you through discovering the different features of Python, NumPy, Pandas, and general data cleaning and manipulation. There is also a folder called "Reference Material" which has all of the same content in the primary notebooks, plus written explanations and additional features not covered in this workshop.

Azure Notebooks is still in Preview. This means that there are some times when it will fail. Here are some tips for avoiding losing your work:

  • Ensure their work is being saved. In the Jupyter Notebook there is always one of two messages to the right of the title of the notebook: or . Make sure you're noticing that your work is being saved. You should consider checking every 10 minutes or so.
  • Sometimes Notebooks get into a state where the Kernel cannot be started. Sometimes re-starting the kernel will work. But often you will have to somepletely sign out of Azure Notebooks and then sign back in.

Additionally, if you need a referesher on how to code in Python or work with NumPy or Pandas, we recommend you check out the materials from our other Reactor Wowrkshop: Data Science 1: Introduction to Python for Data Science

Clones Terminal Edit
Showing 9 notebooks