论文笔记 Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices
招募信的开头简介探索式数据分析
EDA is an approach to analyzing data, usually undertaken at the beginning of an analysis, to familiarize oneself with a dataset. Typical goals are to suggest hypotheses, assess assumptions, and support selection of further tools, techniques, and datasets. Despite being a necessary part of any analysis, it remains a nebulous art, that is defined by an attitude and a collection of techniques, rather than a systematic methodology
Abstract
对30位专业数据分析师的采访(interview)
Highlights of the findings include:
- distinctions between exploration as a precursor to more directed analysis versus truly open-ended exploration;
- confirmation that some analysts see “finding something interesting” as a valid goal of data exploration while others explicitly disavow this goal;
- conflicting views about the role of intelligent tools in data exploration;
- and pervasive use of visualization for exploration, but with only a subset using direct manipulation interfaces.
These findings provide guidelines for future tool development, as well as a better understanding of the meaning of the term “data exploration” based on the words of practitioners “in the wild.”
O’Day and Jeffries [14] studied the analysis stage of the sensemaking process of 15 business analysts, classifying 80% of this into six main types:
finding trends, making comparisons, aggregation, identifying a critical subset, assessing, and interpreting.
The remainder consisted of cross-referencing, summarizing, and finding evocative visualizations
biggest challenges were lack of documentation, metadata, and provenance.
were concerned about their exploration process resulting in a biased outcome if they always focused on the same things or started in the same way, especially when the data set is very large and can’t all be examined. Here the challenge is develop- ing a strategy to ensure good coverage, while avoiding the problems mentioned in Section 4.3.