Updated documentations (typos, etc)

a0ef5420 · Alberto Ramos · 37235722 · a0ef5420 · a0ef5420 · a0ef5420
Commit a0ef5420 authored Jul 17, 2020 by Alberto Ramos
Hide whitespace changes
Inline Side-by-side

Showing with 99 additions and 67 deletions

LICENSE.md LICENSE.md +4 -4

README.md README.md +44 -29

docs/src/api.md docs/src/api.md +0 -1

docs/src/tutorial.md docs/src/tutorial.md +51 -33

No files found.
--- a/LICENSE.md
+++ b/LICENSE.md
-> "THE BEER-WARE LICENSE":
-> Alberto Ramos wrote this file. As long as you retain this 
-> notice you can do whatever you want with this stuff. If we meet some 
-> day, and you think this stuff is worth it, you can buy me a beer in 
+> "THE BEER-WARE LICENSE":  
+> Alberto Ramos wrote this file. As long as you retain this  
+> notice you can do whatever you want with this stuff. If we meet some  
+> day, and you think this stuff is worth it, you can buy me a beer in  
 > return. <alberto.ramos@cern.ch>
--- a/README.md
+++ b/README.md
 # ADerrors.jl

-Error propagation and analysis of Monte Carlo data with the (``\Gamma``) method and automatic differentiation in `Julia`
+Error propagation and analysis of Monte Carlo data with the $`\Gamma`$ method and automatic differentiation in `Julia`

-The full documentation of the package is available via the usual
-[Julia `REPL` help
-mode](https://docs.julialang.org/en/v1/stdlib/REPL/#Help-mode-1) and
-online in [HTML format](https://ific.uv.es/~alramos/docs/ADerrors/).
+- [Features](#features)
+- [Installation](#installation)
+- [Tutorial](#tutorial)
+- [Full documentation](#full-documentation)
+- [Performance tips](#performance-tips)
+- [How to cite](#how-to-cite)

-This work is an implementation of several ideas in data analysis. If you use this package for your scientific work, please consider citing:
- U. Wolff, "Monte Carlo errors with less errors".
-  Comput.Phys.Commun. 156 (2004) 143-153. DOI: 10.1016/S0010-4655(03)00467-3
- F. Virotta, "Critical slowing down and error analysis of lattice QCD simulations." PhD thesis.
- Stefan Schaefer, Rainer Sommer, Francesco Virotta, "Critical slowing
-  down and error analysis in lattice QCD simulations". Nucl.Phys.B 845 (2011) 93-119.
- A. Ramos, "Automatic differentiation for error analysis of Monte Carlo data". Comput.Phys.Commun. 238 (2019) 19-35. DOI: 10.1016/j.cpc.2018.12.020. 
- M. Bruno, R. Sommer, In preparation.
+## Features
+
+- **Exact** linear error propagation, even in iterative algorithms.
+- **Exact** linear error propagation in **fit parameters**,
+  **integrals** and **roots** of non linear functions. 
+- Handles data from **any number of ensembles** (i.e. simulations with
+  different parameters).
+- Support for **replicas** (i.e. several runs with the same simulation
+  parameters). 
+- **Irrelgular MC measurements** are handled transparently.

 ## Installation

-The package in not in the general registry. Still one can use the package manager
+The package in not in the general registry. Still one can use the
+package manager. `ADerrors.jl` also depends on `BDIO.jl` that is also
+not registered and should be installed beforehand:
 ```julia
 julia> import Pkg
+(v1.1) pkg> add https://gitlab.ift.uam-csic.es/alberto/bdio.jl
 (v1.1) pkg> add https://gitlab.ift.uam-csic.es/alberto/aderrors.jl
 ```
-## Features
+## Tutorial

- **Exact** linear error propagation, even in iterative algorithms
-  (i.e. error propagation in fit parameters).
- Handles data from **any number of ensembles** (i.e. simulations with
-  different parameters).
- Support for **replicas** (i.e. several runs with the same simulation
-  parameters). 
- Irrelagular MC measurements are handled transparently.
+Please, have a look at the [Getting started](https://ific.uv.es/~alramos/docs/ADerrors/tutorial/) guide.

-## Tutorial
+## Full documentation

-It is better to start with the [Getting started](https://ific.uv.es/~alramos/docs/ADerrors/tutorial/) guide.
+The full documentation of the package is available via the usual
+[Julia `REPL` help
+mode](https://docs.julialang.org/en/v1/stdlib/REPL/#Help-mode-1) and
+online in [HTML format](https://ific.uv.es/~alramos/docs/ADerrors/).

-# Alleviating time to first run
+## Performance tips

-`Julia` is well known for being slow the first time that you run some
-routines. On the first call to a function Julia not only runs the
+`Julia` is well known for being slow the first time that you call a
+function. This is because `Julia` not only runs the 
 code, but also compiles it, making the first call slow.
 This problem can be alleviated in general with
 [PackageCompiler.jl](https://github.com/JuliaLang/PackageCompiler.jl). This
@@ -54,14 +58,14 @@ annotate the functions that are compiled
 julia --trace-compile=precompile_aderrors.jl typical.jl
 ```
 Now the functions annotated in `precompile_aderrors.jl` can be
-compiled and included in a `sysimage` that is autmatically loaded
+compiled and included in a `sysimage` that is automatically loaded
 whenever you start Julia
 ```julia
 julia> using PackageCompiler
 julia> PackageCompiler.create_sysimage(:ADerrors; precompile_statements_file="precompile_aderrors.jl", replace_default=true)
 ```

-This will make `ADerrors` from the first call. Obviously you can tune
+This will make `ADerrors` fast from the first call. Obviously you can tune
 the file `typical.jl` to your usage, or add other packages. Please
 note that packages included in the sysimage are locked to the versions
 of the sysimage. If you update `ADerrors` make sure to re-generate the
@@ -69,5 +73,16 @@ sysimage. Probably is better to read [the documentation of
 PackageCompiler](https://julialang.github.io/PackageCompiler.jl/dev/sysimages/)
 in order to fully understand the drawbacks.

+## How to cite
+
+This work is an implementation of several ideas in data analysis. If you use this package for your scientific work, please consider citing:
+- U. Wolff, "Monte Carlo errors with less errors".
+  Comput.Phys.Commun. 156 (2004) 143-153. 
+- F. Virotta, "Critical slowing down and error analysis of lattice QCD simulations." PhD thesis.
+- Stefan Schaefer, Rainer Sommer, Francesco Virotta, "Critical slowing
+  down and error analysis in lattice QCD simulations". Nucl.Phys.B 845 (2011) 93-119.
+- A. Ramos, "Automatic differentiation for error analysis of Monte Carlo data". Comput.Phys.Commun. 238 (2019) 19-35. 
+- M. Bruno, R. Sommer, In preparation.
+


--- a/docs/src/api.md
+++ b/docs/src/api.md
 # API

-
 ## Creating `uwerr` data types

 ```@docs

--- a/docs/src/tutorial.md
+++ b/docs/src/tutorial.md
 # Getting started

 ## Basic tutorial
-`ADerrors.jl` is a package for error propagation and analysis of Monte carlo data. At the core of the package is the `uwreal` data type, that is able to store variables with uncertainties
+`ADerrors.jl` is a package for error propagation and analysis of Monte
+carlo data. At the core of the package is the `uwreal` data type. It
+contains variables with uncertainties. In very simple cases this is
+just a central value and an error
 ```@repl gs
 using ADerrors
 a = uwreal([1.0, 0.1], 1) # 1.0 +/- 0.1
 ```
-It can also store MC data
+But it also supports the case where the uncertainty comes from Monte
+Carlo measurements
 ```@repl gs
-# Generate some correlated data
+# Generate some MC correlated data
 eta  = randn(1000);
 x    = Vector{Float64}(undef, 1000);
 x[1] = 0.0;
@@ -22,7 +26,7 @@ end
 b = uwreal(x.^2, 200)
 c = uwreal(x.^4, 200)
 ```
-Correlations between variables are treated consistently. This requires that each variable that is defined with `uwreal` contains an ensemble `ID` tag. In the previous examples `a` has been measured on ensemble `ID` 1, while both `b` and `c` have been measured on ensemble `ID` 200. This will treat the measurements in `b` and `c` as statistically correlated, while `a` will be uncorrelated with both `b` and `c`.
+Correlations between variables are treated consistently. Each call to `uwreal` contains an ensemble `ID` tag. In the previous examples `a` has been measured on ensemble `ID` 1, while both `b` and `c` have been measured on ensemble `ID` 200. Monte Carlo measurements with the same `ID` tag are assumed to be measured on the same configurations: `b` and `c` above are statistically correlated, while `a` will be uncorrelated with both `b` and `c`. 

 One can perform operations with `uwreal` variables as if there were normal floats
 ```@repl gs
@@ -30,20 +34,20 @@ d = 2.0 + sin(b)/a + 2.0*log(c)/(a+b+1.0)
 ```
 Error propagation is done automatically, and correlations are consistently taken into account. Note that once complex derived `uwreal` variables are defined, as for example `d` above, they will in general depend on more than one ensemble `ID`. 

-In order to perform the error analysis of a variable, one should use the `uwerr` function
+The message "(Error not available... maybe run uwerr)" in all the above statements suggests that newly defined `uwreal` variables do not know their error yet. In order to perform the error analysis of a variable, one should use
+the `uwerr` function
 ```@repl gs
 uwerr(d);
 println("d: ", d)
 ```
-One can get detailed information on the error of a variables 
+A call to `uwerr` will apply the ``\Gamma``-method to MC ensembles. `ADerrors.jl` can give detailed information on the error of a variable (this requires that `uwerr` has been called)
 ```@repl gs
-uwerr(d);
 println("Details on variable d: ")
 details(d)
 ```
-where we can clearly see that there are two ensembles contributing to the uncertainty in `d`. We recognize this as `d` being a function of both `a` (measured on ensemble 1), and `b` and `c`, measured on ensemble 200.
+Here we can see that there are two ensembles contributing to the uncertainty in `d`. We recognize this as `d` being a function of both `a` (measured on ensemble 1), and `b` and `c`, measured on ensemble 200. Most of the error in `d` comes in fact from ensemble 200.

-Note that one does not need to use `uwerr` on a variable unless one is interested in the error on that variable. For example
+Note that one does not need to use `uwerr` on a variable unless one is interested in the error on that variable. For example, continuing with the previous example
 ```@repl gs
 global x = 1.0
 for i in 1:100
@@ -53,13 +57,13 @@ uwerr(x)
 print("Root of d*cos(x) - x:" )
 details(x)
 ```
-determines the root on ``d\cos(x) - x`` by using Newton's method, and propagates the error on `d` into an error on the root. The error on `x` is only determined once, after the 100 iterations are over. 
+determines the root on ``d\cos(x) - x`` by using Newton's method, and propagates the error on `d` into an error on the root. The error on `x` is only determined once, after the 100 iterations are over. This feature is useful becase calls to `uwerr` are typically numerically expensive in comparison with mathematical operations of `uwreal`'s.

 ## Advanced topics

 ### Errors in fit parameters

-`ADerrors.jl` does not provide an interface to perform fits, but once the minima of the ``\chi^2`` has been found, it can help propagating the error from the data to the fit parameters. 
+`ADerrors.jl` does not provide an interface to perform fits, but once the minima of the ``\chi^2`` has been found, it can return the fit parameters as `uwreal` variables.

 Here we are going to [repeat the example](https://github.com/JuliaNLSolvers/LsqFit.jl) of the package `LsqFit.jl` using `ADerrors.jl` for error propagation.
 ```@repl fits
@@ -80,26 +84,38 @@ using LsqFit, ADerrors
 xdata = range(0, stop=10, length=20);
 ydata = Vector{uwreal}(undef, length(xdata));
 for i in eachindex(ydata)
-	ydata[i] = uwreal([model(xdata[i], [1.0 2.0]) + 0.01*getindex(randn(1),1), 0.01], i)
+	# Note that we assign to point i the ID 1000*i
+	ydata[i] = uwreal([model(xdata[i], [1.0 2.0]) + 0.01*getindex(randn(1),1), 0.01], 1000*i)
 	uwerr(ydata[i]) # We will need the errors for the weights of the fit
 end
 p0 = [0.5, 0.5];
 ```
-We are ready to fit `(xdata, ydata)` to our model using `LsqFit.jl`, but for error propagation using `ADerrors.jl` we need the ``\chi^2`` function.
+We are ready to fit `(xdata, ydata)` to our model using `LsqFit.jl`
+```@repl fits
+# Fit the data with LsqFit.jl
+fit = curve_fit(model, xdata, value.(ydata), 1.0 ./ err.(ydata).^2, p0) 
+```
+In order to use `ADerrors.jl` we need the ``\chi^2`` function.
 ```@repl fits
 # ADerrors will need the chi^2 as a function of the fit parameters and 
-# the data. This can be constructed easily from the model above
+# the data. This can be constructed easily from the model function above
 chisq(p, d) = sum( (d .- model(xdata, p)) .^2 ./ 0.01^2)
 ```
-Now we can fit the data and compute the uncertainties in our fit parameters
+A call to `fit_error` will return the fit parameters as a `Vector{uwreal}` and also the expected value of the ``\chi^2`` as a `Float64`
 ```@repl fits
-fit = curve_fit(model, xdata, value.(ydata), 1.0 ./ err.(ydata).^2, p0) # This is LsqFit.jl
-(fitp, cexp) = fit_error(chisq, coef(fit), ydata); # This is error propagation with ADerrors.jl
-uwerr.(fitp);
-
-println("chi^2 / chi^2_exp: ", sum(fit.resid .^2), " / ", cexp, " (dof: ", dof(fit), ")")
+# This is error propagation with ADerrors.jl
+(fitp, cexp) = fit_error(chisq, coef(fit), ydata);
+println("Fit results: \n
+	     chi^2 / chi^2_exp: ", sum(fit.resid .^2), " / ", cexp, " (dof: ", dof(fit), ")")
+
+# fitp are the fit parameters in uwreal
+# format. One can use them as any other uwreal
+# For example to determine the main contribution
+# to its uncertainty
 for i in 1:2
-	println("Fit parameter: ", i, ": ", fitp[i])
+	uwerr(fitp[i])
+	print("Fit parameter: ", i, ": ")
+	details(fitp[i])
 end
 ```

@@ -109,7 +125,7 @@ end

 ### Missing measurements 

-`ADerrors.jl` can deal with observables that are not measured on every configuration, and still take correctly the correlations/autocorrelations into account. Here we show an extreme case where one observable is measured only in the even configurations, while the oher is measured on the odd coficurations.
+`ADerrors.jl` can deal with observables that are not measured on every configuration, and still take correctly the correlations/autocorrelations into account. Here we show an extreme case where one observable is measured only in the even configurations, while the other is measured on the odd configurations.
 ```@repl gaps
 using ADerrors, Plots
 pgfplotsx();
@@ -129,11 +145,13 @@ end
 x2 = uwreal(x[1:2:9999].^2,  1001, collect(1:2:9999),  10000)
 # x^4 only measured on even configurations
 x4 = uwreal(x[2:2:10000].^4, 1001, collect(2:2:10000), 10000)
-
-rat = x2/x4
-uwerr(rat);
 ```
-In this case `uwerr` complains because with the chosen window the variance for ensemble with ID 1001 is negative. Looking at the normalized autocorrelation function we can easily spot the problem
+the variables `x2` and `x4` are normal `uwreal` variables, and one can work with them as with any other variable. Note however that the automatic determination of the summation window can fail in these cases, because the autocorrelation function with missing measurements can be very complicated. For example
+```@repl gaps
+rat = x2/x4 # This is perfectly fine
+uwerr(rat); # Use automatic window: might fail!
+```
+In this case `uwerr` complains because with the automatically chosen window the variance for ensemble with ID 1001 is negative. Let's have a look at the normalized autocorrelation function
 ```@repl gaps
 iw = window(rat, 1001)
 r  = rho(rat, 1001);
@@ -141,19 +159,19 @@ dr = drho(rat, 1001);
 plot(collect(1:100), 
 	r[1:100], 
 	yerr = dr[1:100], 
-	seriestype = :scatter, title = "Chosen Window: " * string(iw))
+	seriestype = :scatter, title = "Chosen Window: " * string(iw), label="autoCF")
 savefig("rat_cf.png") # hide
 ```
 ![rat plot](rat_cf.png)
-The normalized autocorrelation function is oscillating, and the chosen window is 2! We better fix the window to 50 for this case
+The normalized autocorrelation function is oscillating, and the automatically chosen window is 2, producing a negative variance! We better fix the window to 50 for this case
 ```@repl gaps
 wpm = Dict{Int64, Vector{Float64}}()
 wpm[1001] = [50.0, -1.0, -1.0, -1.0]
-uwerr(rat, wpm)
+uwerr(rat, wpm) # repeat error analysis with our choice (window=50)
 println("Ratio: ", rat)
 ```
-
-Note however that this is very observable dependent
+After a visual examination of the autocorrelation function and a reasonable choice of the summation window, the error in `rat` is determined. 
+Note however that it is very difficult to anticipate which observables will have a strange autocorrelation function. For example the product of `x2` and `x4` shows a reasonably well behaved autocorrelation function.
 ```@repl gaps
 prod = x2*x4
 uwerr(prod)
@@ -163,11 +181,11 @@ dr = drho(prod, 1001);
 plot(collect(1:2*iw), 
 	r[1:2*iw], 
 	yerr = dr[1:2*iw], 
-	seriestype = :scatter, title = "Chosen Window: " * string(iw))
+	seriestype = :scatter, title = "Chosen Window: " * string(iw), label="autoCF")
 savefig("prod_cf.png") # hide
 ```
 ![pod plot](prod_cf.png)
-In general error analysis with data with arbitrary gaps is possible, and fully supported in `ADerrors.jl`, but it can be tricky, and certaiinly requires to examine the data carefully.
+In general error analysis with data with arbitrary gaps is possible, and fully supported in `ADerrors.jl`. However in cases where many gaps are present, data analysis can be tricky, and certainly requires some care.

 ### `BDIO` Native format