DANFIP Analysis
Seeing the Forest AND the Trees

2/8/98

Abstract: The DANFIP ( D istribution A nalysis by N on-linear F itting of I ntegrated P robabilities)analysis allows extraction of a detailed, multicomponent distribution model from complex data. For a tutorial on the method see the DANFIP Tutorial and Wampler (1990). A broad range of examples of the use of this method to analyze random samples is discussed. These include single cell pH measurements during the developmental cycle of the cellular slim mold, Dictyostellium discoedium (Furukawa et al. , 1990); motion trajectories from molecular dynamics simulations of small electron-transport proteins (Wampler, unpublished); and temperature factor data taken from protein structures in the X-ray structure database (see Wampler, 1997).

This presentation in based on work carried out with a number of very talented people:

Dr. Marcus Fechheimer - Professor of Cellular Biology, University of Georgia
Dr. Ruth Furukawa - Research Scientist, Department of Cellular Biology, University of Georgia
Dr. Elizabeth Neuhaus - Staff Scientist, Department of Biochemistry, University of Georgia
Dr. David Stewart - Academic Professional, University Computer and Network Services, University of Georgia

Click here to start


Table of Contents

DANFIP Analysis

Outline of talk:

Data used

Studies of Slime Mold Development

“Dicty” development cont.

50 cell pH averages

Measurements of “ring-flip” motions in Molecular Dynamics simulations.

Torsion angle measurement

Ring flip measured by pseudo-torsion angle

Temperature factor data from X-ray crystal structures.

The Temperature- or B-factor...

Conventional wisdom: B-factors tend to be high for exterior atoms and low for interior atoms

Review of Distributions & Analysis

A finite data sample drawn from such a distribution does not reflect these probabilities precisely.

Different distributions have different defining parameters, eg. Lorentzian:

These functions have characteristic shapes, both in the derivative (probability function) form and the integral (integrated probability) form.

A data sample is typically a poor representation of the distribution shape -e.g. HISTOGRAM of number of values in each small window (bin).

Histograms of six samples (50 values) from the same distribution:

So with data, we don’t often examine the distribution in detail?

OR

The Histogram is to the Probability Function.....

what the eCDF is to integrated probablity function.

DANFIP Analysis

Knowing components, often allows reasonable interpretation of complex data

Knowing components, often allows reasonable interpretation of complex data

Now, lets look back at the pH measurement data on Dictyostelium discoideum

The 50 measurements at each pH do not sample a single Gaussian!

Analysis of each set of data shows:

2) The composite average (green circles) varies due to different contributions from the two components!

Contribution of the acid (pH 6) component to the average during development

DANFIP analysis can give better estimates of the population parameters with truncated samples

With the ribonuclease temperature factor data of Tilton et al., just such a truncation is seen in some of the data.

Some data, particularly at low temperatures are truncated at a value of 0.5

The correction “pulls down” both outliers - NO BREAK IN CURVE

B-factors in Lysozyme - demonstration of using DANFIP with X-ray Data

Experimental issues:

Issues addressed

Are B-factor values normally distributed?

Similar size samples drawn from a Gaussian distribution show near perfect match between mean and median and have a skewness value very near zero.

Using the 4LYTB B-factors as an example, clearly more than one mode.

Are B-factor values correlated within a structure?

Even in this more narrowly defined case, the values are not from a single Gaussian population.

Are B-factor values correlated between structures?

Should B-factors be sub-divided before processing?

So, look at the entire data set.

Each of these curves can be fit with a multiple Gaussian model, the best fit is...

The worst case (fit shown by dark line) is the 3LYTB data...

The derivative of the fit (white line) and the scaled distribution curves of the components compared to the histogram.

By approximate assignment of colors to components (see Wampler, 1998), we can see how the B-factor distributions vary.

Results from Lysozyme structures...

Results from Lysozyme structures...

Conclusions

Conclusions...

Conclusions...

Author: J. E. Wampler

Home Page: this server

References Cited:

Tilton, R. F. Jr., Dewan, J. C., and Petsko, G. A. (1992) "Effects of Temperature on Protein Structure and Dynamics: X-ray Crystallographic Studies of the Protein Ribonuclease-A at Nine Different Temperatures from 98 to 320 K," Biochemistry 31 , 2469

Furukawa, R., Wampler, J. E., and Fechheimer, M. (1990). "Measurement of the Cytoplasmic pH of Dictyostelium discoideum during Development using a Low-Light Level Microspectrofluorometer," J. Cell Biol. 110 , 1947-1954.

Wampler, J. E. (1990). "Analysis of the Probability Distribution of Small Random Samples by Non-Linear Fitting of Integrated Probabilities," Analytical Biochemistry, 186 , 209-219.

Wampler, J. E. (1997). "Distribution Analysis of the Variation of B-factors of X-ray Crystal Structures: Temperature and Structural Variations in Lysozyme," J. Chemical Info. & Computer Sci 37 , 1171-1180.

Young, A.C.M., Dewan, J.C., Nave, C., and Tilton, R. F. (1993). "Comparison of Radiation-Induced Decay and Structure Refinement from X-ray Data Collected from Lysozyme Crystals at Low and Ambient Temperatures" J Appl. Cryst. 26 , 309-319.