Interactive notebooks like Jupyter have become more and more popular in the recent past and build the core of many data scientist’s workplace. Being accessed via web browser they allow scientists to easily structure their work by combining code and documentation. Yet notebooks often lead to isolated and disposable analysis artefacts. Keeping the computation inside those notebooks does not allow for convenient concurrent model training, model exposure or scheduled model retraining.
Those issues can be addressed by taking advantage of recent developments in the discipline of software engineering. Over the past years containerization became the technology of choice for crafting and deploying applications. Building a data science platform that allows for easy access (via notebooks), flexibility and reproducibility (via containerization) combines the best of both worlds and addresses Data Scientist’s hidden needs.