How data science workers work with data
Michael Muller, Ingrid Lange, et al.
CHI 2019
Many data scientists use computational notebooks to test and present their work, as a notebook can weave code and documentation together (computational narrative), and support rapid iteration on code experiments. However, it is not easy to write good documentation in a data science notebook, partially because there is a lack of a corpus of well-documented notebooks as exemplars for data scientists to follow. To cope with this challenge, this work looks at Kaggle - a large online community for data scientists to host and participate in machine learning competitions - and considers highly-voted Kaggle notebooks as a proxy for well-documented notebooks. Through a qualitative analysis at both the notebook level and the markdown-cell level, we find these notebooks are indeed well documented in reference to previous literature. Our analysis also reveals nine categories of content that data scientists write in their documentation cells, and these documentation cells often interplay with different stages of the data science lifecycle. We conclude the paper with design implications and future research directions.
Michael Muller, Ingrid Lange, et al.
CHI 2019
Dakuo Wang, Michael Muller, et al.
PACM HCI
Shannon Briggs, Michael Perrone, et al.
HST 2019
Dakuo Wang, Haoyu Wang, et al.
PACM HCI