undefined

Profile Picture
This user has no profile information.
Libraries (View all)

This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.

You can read the book in its entirety online at https://jakevdp.github.io/PythonDataScienceHandbook/

The book was written and tested with Python 3.5, though older Python versions (including Python 2.7) should work in nearly all cases.

The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it's a fast-paced introduction to the Python language aimed at researchers and scientists.

See Index.ipynb for an index of the notebooks available to accompany the text.

The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.

The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:

To create a stand-alone environment named with Python 3.5 and all the required package versions, run the following:

You can read more about using conda environments in the Managing Environments section of the conda documentation.

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.

Modified on: Jan 29, 2018
scribe
0 clones

See blog post

"A project by Sam Greydanus" "You know nothing Jon Snow" (print) "You know nothing Jon Snow" (cursive)

"lowering the bias" "makes the writing messier" "but more random"

For an easy intro to the code (along with equations and explanations) check out these Jupyter notebooks:

  • install dependencies (see below).
  • download the repo
  • navigate to the repo in bash
  • download and unzip folder containing pretrained models: https://goo.gl/qbH2uw
    • place in this directory

Now you have two options:

  1. Run the sampler in bash:
  2. Open the sample.ipynb jupyter notebook and run cell-by-cell (it includes equations and text to explain how the model works)

This model is trained on the IAM handwriting dataset and was inspired by the model described by the famous 2014 Alex Graves paper. It consists of a three-layer recurrent neural network (LSTM cells) with a Gaussian Mixture Density Network (MDN) cap on top. I have also implemented the attention mechanism from the paper which allows the network to 'focus' on character at a time in a sequence as it draws them.

The model at one time step looks like this

Unrolling in time, we get

I've implemented the attention mechanism from the paper:

  • All code is written in python 2.7. You will need:
  • Numpy
  • Matplotlib
  • TensorFlow 1.0
  • OPTIONAL: Jupyter (if you want to run sample.ipynb and dataloader.ipynb)
Modified on: Feb 1, 2019
Modified on: Feb 12, 2019

The Team Data Science Process (TDSP) is an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently. TDSP helps improve team collaboration and learning. It contains a distillation of the best practices and structures from Microsoft and others in the industry that facilitate the successful implementation of data science initiatives. The goal is to help companies fully realize the benefits of their analytics program.

The TDSP is comprised of the following key components:

  • A data science lifecycle definition
  • A standardized project structure Infrastructure and resources for data science projects Tools and utilities for project execution

Note: You can follow a complete example of this process using Azure Machine Learning:

This workshop guides you through a series of exercises you can use to learn to implement the TDSP in your Data Science project, using only Python in a Notebook. You can change the Setup and Lab cells in this Notebook to use another language, another platform, and with more or fewer prompts based on your audience's needs.

Modified on: May 10, 2018
Twitter Cosmos
0 clones
Modified on: May 4, 2018
Starred Libraries (View all starred libraries)