2013 NGWA Summit — The National and International Conference on Groundwater

The Untapped Potential of Exploratory and Graphical Data Analysis in Environmental Remediation

Wednesday, May 1, 2013: 2:55 p.m.
Regency East 1 (Hyatt Regency San Antonio)
Steven E. Panter, CGWP, Fleming Lee Shue, Inc.

Inferential statistics (hypothesis and significance testing) originated long before computers. Descriptive statistical techniques (exploratory data analysis and visualization techniques) have traditionally been used as a first step in these formal inferential methods.  Early applications of inferential and descriptive statistics were meant to extract meaning from data using a reasonable amount of effort given the tools available at the time.

In contrast, today’s computers and software permit straightforward manipulation of large amounts of data, allowing the analyst to explore data without the drudgery.  Interactively “playing around” –creating graphs and plots etc. –is perhaps the best way to understand your data and extract the real meaning without the distributional assumptions of statistical inference.  For many if not most purposes, descriptive techniques are all that is needed to effectively characterize sites and make decisions such as where and how to treat and monitor.  Simple tools used effectively reveal the essence of the data in ways that determine the follow-up actions.  

This paper illustrates common exploratory data analysis techniques and visual methods: box plots, quantile plots, quantile-quantile plots, ratio and double ratio plots, stem-and-leaf plots, and scatter plots are used to identify patterns or trends in various types of environmental data.  Also shown are a simple but effective means to assess the distribution of data (the 5-point data characterization method) and an example of a user-friendly statistical and graphical software package (STATA). 

In one particularly striking example, a plot prepared using exploratory and graphical data analysis showed contaminant from a dense non-aqueous phase liquid (DNAPL) was in soils above the confining stratum, contrary to what was thought; the implications for treatment were dramatic.  While this paper focuses mainly on analytical data, the methods can be applied to field parameter data as well.

  Handout


Steven E. Panter, CGWP , Fleming Lee Shue, Inc.

Steven E. Panter, CGWP, is a Senior Consultant and hydrogeologist with Fleming Lee Shue Inc. of New York. He has over 25 years of experience in site evaluation and remediation of soils and groundwater. He is the developer of RemMetrik, a remediation process that combines mass estimates and contaminant targeting with subsurface pressure waves to promote remediation. Panter’s experience encompasses work on a wide variety of sites including manufactured gas plants, power plants, chemical and manufacturing facilities, and urban settings with a broad range of contaminants. His experience includes sediment investigations and environmental forensic work.