Definition & Method – Hex, Bugs and More Physics | Emre S. Tasci http://www.emresururi.com/physics a blog about physics, computation, computational physics and materials... Wed, 15 Jan 2014 13:36:05 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.3 Coordination Numbers http://www.emresururi.com/physics/?p=91 http://www.emresururi.com/physics/?p=91#respond Thu, 03 Sep 2009 15:31:10 +0000 http://www.emresururi.com/physics/?p=91 Recently, we are working on liquid alloys and for our work, we need the coordination number distribution. It’s kind of a gray area when you talk about coordination number of atoms in a liquid but it’s better than nothing to understand the properties and even more. I’d like to write a log on the tools I’ve recently coded/used in order to refer to in the future if a similar situation occurs. Also, maybe there are some others conducting similar research so, the tools introduced here may be helpful to them.

Recently, while searching recent publications on liquid alloys, I’ve found a very good and resourceful article on liquid Na-K alloys by Dr. J.-F. Wax (“Molecular dynamics study of the static structure of liquid Na-K alloys”, Physica B 403 (2008) 4241-48 | doi:10.1016/j.physb.2008.09.014). In this paper, among other properties, he had also presented the partial pair distribution functions. Partial-pair distribution functions is the averaged probability that you’ll encounter an atom in r distance starting from another atom. In solid phase, due to the symmetry of the crystal structure, you have discrete values instead of a continuous distribution and everything is crystal clear but I guess that’s the price we pay dealing with liquids. 8)

Partial-pair distribution functions are very useful in the sense that they let you derive the bond-length between constituent atom species, corresponding to the first minimums of the plots. Hence, one can see the neighborhood ranges by just looking at the plots.

I’ve sent a mail to Dr. Wax explaining my research topics and my interest and asking for his research data to which he very kindly and generously replied by providing his data.

He had sent me a big file with numerous configurations (coordinates of the atoms) following each other. The first -and the simplest- task was to split the file into seperate configuration files via this bash script:

#!/bin/bash
# Script Name: split_POS4.sh
# Splits the NaK conf files compilation POS4_DAT into sepearate
# conf files.
# Emre S. Tasci <e.tasci@tudelft.nl>
#                                              01/09/09

linestart=2
p="p"
for (( i = 1; i <= 1000; i++ ))
do
    echo $i;
    lineend=$(expr $linestart + 2047)
    confname=$(printf "%04d" $i)
    echo -e "3\n2048" > $confname
    sed -n "$linestart,$(echo $lineend$p)" POS4_DAT >> $confname
    linestart=$(expr $lineend + 1)
    cat $confname | qhull i TO $confname.hull
done

(The compilation was of 1000 configurations, each with 2048 atoms)

As you can check in the line before the “done” closer, I’ve -proudly- used Qhull software to calculate the convex hull of each configuration. A convex hull is the -hopefully minimum- shape that covers all your system. So, for each configuration I now had 2 files: “xxxx” (where xxxx is the id/number of the configuration set) storing the coordinates (preceded by 3 and 2048, corresponding to dimension and number of atoms information for the qhull to parse & process) and “xxxx.hull” file, generated by the qhull, containing the vertices list of each facet (preceded by the total number of facets).

A facet is (in 3D) the plane formed by the vertice points. Imagine a cube, where an atom lies at each corner, 3 in the inside. So, we can say that our system consists of 11 atoms and the minimal shape that has edges selected from system’s atoms and covering the whole is a cube, with the edges being the 8 atoms at the corners. Now, add an atom floating over the face at the top. Now the convex hull that wraps our new system would no longer be a 6-faced, 8-edged cube but a 9-faced, 9-edged polygon. I need to know the convex hull geometry in order to correctly calculate the coordination number distributions (details will follow shortly).

The aforementioned two files look like:

sururi@husniya hull_calcs $ head 0001
3
2048
 0.95278770E+00 0.13334565E+02 0.13376155E+02
 0.13025256E+02 0.87618381E+00 0.20168993E+01
 0.12745038E+01 0.14068998E-01 0.22887323E+01
 0.13066590E+02 0.20788591E+01 0.58119183E+01
 0.10468218E+00 0.58523640E-01 0.64288678E+01
 0.12839412E+02 0.13117012E+02 0.79093881E+01
 0.11918105E+01 0.12163854E+02 0.10270868E+02
 0.12673985E+02 0.11642538E+02 0.10514597E+02
sururi@husniya hull_calcs $ head 0001.hull
180
568 1127 1776
1992 104 1551
956 449 1026
632 391 1026
391 632 1543
391 956 1026
966 632 1026
966 15 1039
632 966 1039

The sets of 3 numbers in the xxxx.hull file is the ids/numbers of the atoms forming the edges of the facets, i.e., they are the vertices.

To calculate the coordination number distribution, you need to pick each atom and then simply count the other atoms within the cut-off distance depending on your atom’s and the neighboring atom’s species. After you do the counting for every atom, you average the results and voilà! there is your CN distribution. But here’s the catch: In order to be able to count properly, you must make sure that, your reference atom’s cutoff radii stay within your system — otherwise you’ll be undercounting. Suppose your reference atom is one of the atoms at the corners of the cube: It will (generally/approximately) lack 7/8 of the neighbouring atoms it would normally have if it was to be the center atom of the specimen/sample. Remember that we are studying a continuous liquid and trying to model the system via ensembles of samples. So we should omit these atoms which has neighborhoods outside of our systems. Since neighborhoods are defined through bond-length, we should only include the atoms with a distance to the outside of at least cut-off radii. The “outside” is defined as the region outside the convex hull and it’s designated by the facets (planes).

From the xxxx.hull file, we have the ids of the vertice atoms and their coordinates are stored in the xxxx file. To define the plane equation

we will use the three vertice points, say p1, p2 and p3. From these 3 points, we calculate the normal n as:

then, the plane equation (hence the coefficients a,b,c,d) can be derived by comparing the following equation:

Here, (x0,y0,z0) are just the coordinates of any of the vertice points. And finally, the distance D between a point p1(x1,y1,z1) and a plane (a,b,c,d) is given by:

Since vector operations are involved, I thought it’s better to do this job in Octave and wrote the following script that calculates the coefficients of the facets and then sets down to find out the points whose distances to all the facets are greater than the given cut-off radius:

sururi@husniya trunk $ cat octave_find_atoms_within.m
DummyA = 3;
%conf_index=292;

% Call from the command line like :
%   octave -q --eval "Rcut=1.65;conf_index=293;path='$(pwd)';" octave_find_atoms_within.m
% hence specifying the conf_index at the command line.

% Emre S. Tasci <e.tasci@tudelft.nl>                    *
%
% From the positions and vertices file, calculates the plane
% equations (stored in "facet_coefficients") and then
% filters the atoms with respect to their distances to these
% facets. Writes the output to
% "hull_calcs/xxxx_insiders.txt" file
%                                              03/09/09 */

conf_index_name = num2str(sprintf("%04d",conf_index));;
%clock
clear facet_coefficients;
facet_coefficients = [];
global p v facet_coefficients
fp = fopen(strcat(path,"/hull_calcs/",conf_index_name));
k = fscanf(fp,"%d",2); % 1st is dim (=3), 2nd is number of atoms (=2048)
p = fscanf(fp,"%f",[k(1),k(2)]);
p = p';
fclose(fp);
%p = load("/tmp/del/delco");
fp = fopen(strcat(path,"/hull_calcs/",conf_index_name,".hull"));
k = fscanf(fp,"%d",1);% number of facets
v = fscanf(fp,"%d",[3,k]);
v = v';
fclose(fp);
%v = load("/tmp/del/delver");

function [a,b,c,d,e] = getPlaneCoefficients(facet_index)
    global p v
    % Get the 3 vertices
    p1 = p(v(facet_index,1)+1,:); % The +1s come from the fact 
    p2 = p(v(facet_index,2)+1,:); % that qhull enumeration begins
    p3 = p(v(facet_index,3)+1,:); % from 0 while Octave's from 1

    %printf("%d: %f %f %f %f %f %f %f %f %f\n",facet_index,p1,p2,p3);

    % Calculate the normal
    n = cross((p2-p1),(p3-p1));

    % Calculate the coefficients of the plane :
    % ax + by + cz +d = 0
    a = n(1);
    b = n(2);
    c = n(3);
    d = -(dot(n,p1));
    e = norm(n);
    if(e==0)
        printf("%f\n",p1);
        printf("%f\n",p2);
        printf("%f\n",p3);
        printf("%d\n",facet_index);
    endif
endfunction

function dist = distanceToPlane(point_index,facet_index)
    global p facet_coefficients
    k = facet_coefficients(facet_index,:);
    n = [k(1) k(2) k(3)];
    dist = abs(dot(n,p(point_index,:)) + k(4)) / k(5);
endfunction

for i = [1:size(v)(1)]
    [a,b,c,d,e] = getPlaneCoefficients(i);
    facet_coefficients = [ facet_coefficients ; [a,b,c,d,e]];
endfor

%Rcut = 1.65; % Defined from command line calling
% Find the points that are not closer than Rcut
inside_atoms = [];
for point = [1:size(p)(1)]
%for point = [1:100]
    inside=true;
    for facet = [1:size(v)(1)]
    %for facet = [1:10]
        %printf("%d - %d : %f\n",point,facet,dist);
        if (distanceToPlane(point,facet)<Rcut)
            inside = false;
            break;
        endif
    endfor
    if(inside)
        inside_atoms = [inside_atoms point-1];
    endif
endfor

fp = fopen(strcat(path,"/hull_calcs/",conf_index_name,"_insiders.txt"),"w");
fprintf(fp,"%d\n",inside_atoms);
fclose(fp);
%clock

This script is runned within the following php code which then takes the filtered atoms and does the counting using them:
<?PHP
/* Emre S. Tasci <e.tasci@tudelft.nl>                    *
 *
 * parse_configuration_data.php
 *
 * Written in order to parse the configuation files
 * obtained from J.F. Wax in order to calculate the 
 * Coordination Number distribution. Each configuration 
 * file contain 2048 atoms' coordinates starting at the 
 * 3rd line. There is also a convex hull file generated 
 * using qhull (http://www.qhull.com) that contains the 
 * indexes (starting from 0) of the atoms that form the 
 * vertices of the convex hull. Finally there is the 
 * IP.dat file that identifies each atom (-1 for K; 1 for 
 * Na -- the second column is the relative masses).
 *
 *                                              02/09/09 */

/*
# if present, remove the previous run's results file
if(file_exists("results.txt"))
    unlink("results.txt");
 */

error_reporting(E_ALL ^ E_NOTICE);

if(!file_exists("results.txt"))
{
    $fp = fopen("results.txt","w");
    fwrite($fp,"CONF#"."\tNa".implode("\tNa",range(1,20))."\tK".implode("\tK",range(1,20))."\n");
    fclose($fp);
}

# Support for command line variable passing:
for($i=1;$i<sizeof($_SERVER["argv"]);$i++)
{
        list($var0,$val0) = explode("=",$_SERVER["argv"][$i]);
        $_GET[$var0] = $val0;
}

if($_GET["configuration"])
    calculate($_GET["configuration"]);
else
    exit("Please specify a configuration number [php parse_configuration_data.php configuration=xxx]\n");

function calculate($conf_index)
{
    # Define the atoms array
    $arr_atoms = Array();

    # Get the information from the files
    parse_types($arr_atoms);
    $refs = get_vertices($conf_index);
    parse_coordinates($arr_atoms,$conf_index);

    # Define the Rcut-off values (obtained from partial parid distribution minimums)
    $Rcut_arr = Array();
    $Rscale = 3.828; # To convert reduced distances to Angstrom (if needed)
    $Rscale = 1; # To convert reduced distances to Angstrom
    $Rcut_arr["Na"]["Na"] = 1.38 * $Rscale;
    $Rcut_arr["Na"]["K"]  = 1.65 * $Rscale;
    $Rcut_arr["K"]["Na"]  = 1.65 * $Rscale;
    $Rcut_arr["K"]["K"]   = 1.52 * $Rscale;
    $Rcut = $Rcut_arr["Na"]["K"]; # Taking the maximum one as the Rcut for safety

    /*
    # We have everything we need, so proceed to check 
    # if an atom is at safe distance wrt to the vertices

    # Define all the ids
    $arr_test_ids = range(0,2047);
    # Subtract the ref atoms' ids
    $arr_test_ids = array_diff($arr_test_ids, $refs);
    sort($arr_test_ids);
    //print_r($arr_test_ids);

    $arr_passed_atom_ids = Array();
    # Check each atom against refs
    for($id=0, $size=sizeof($arr_test_ids); $id<$size; $id++)
    {
        //echo $id."\n";
        $arr_atoms[$arr_test_ids[$id]]["in"] = TRUE;
        foreach ($refs as $ref)
        {
            $distance = distance($arr_atoms[$arr_test_ids[$id]],$arr_atoms[$ref]);
            //echo "\t".$ref."\t".$distance."\n";
            if($distance < $Rcut)
            {
                $arr_atoms[$arr_test_ids[$id]]["in"] = FALSE;
                break;
            }
        }
        if($arr_atoms[$arr_test_ids[$id]]["in"] == TRUE)
            $arr_passed_atom_ids[] = $arr_test_ids[$id];
    }
     */

    # Run the octave script file to calculate the inside atoms
    if(!file_exists("hull_calcs/".sprintf("%04d",$conf_index)."_insiders.txt"))
    exec("octave -q --eval \"Rcut=".$Rcut.";conf_index=".$conf_index.";path='$(pwd)';\" octave_find_atoms_within.m");

    # Read the file generated by Octave
    $arr_passed_atom_ids = file("hull_calcs/".sprintf("%04d",$conf_index)."_insiders.txt",FILE_IGNORE_NEW_LINES);

    $arr_test_ids = range(0,2047);
    $arr_test_ids = array_diff($arr_test_ids, $refs);
    sort($arr_test_ids);
    for($i=0, $size=sizeof($arr_test_ids);$i<$size;$i++)
        $arr_atoms[$arr_test_ids[$i]]["in"]=FALSE;

    # Begin checking every passed atom
    foreach($arr_passed_atom_ids as $passed_atom_id)
    {
        $arr_atoms[$passed_atom_id]["in"] = TRUE;
        for($i=0, $size=sizeof($arr_atoms); $i<$size; $i++)
        {
            $distance = distance($arr_atoms[$passed_atom_id],$arr_atoms[$i]);
            //echo $passed_atom_id."\t---\t".$i."\n";
            if($distance < $Rcut_arr[$arr_atoms[$passed_atom_id]["id"]][$arr_atoms[$i]["id"]] && $distance>0.001)
                $arr_atoms[$passed_atom_id]["neighbour_count"] += 1;
        }
    }

    # Do the binning
    $CN_Na = Array();
    $CN_K  = Array();
    for($i=0, $size=sizeof($arr_atoms); $i<$size; $i++)
    {
        if(array_key_exists("neighbour_count",$arr_atoms[$i]))
        {
            ${"CN_".$arr_atoms[$i]['id']}[$arr_atoms[$i]["neighbour_count"]] += 1;
        }
    }

    ksort($CN_Na);
    ksort($CN_K);

    //print_r($arr_atoms);
    //print_r($CN_Na);
    //print_r($CN_K);

    # Report the results
    $filename = "results/".sprintf("%04d",$conf_index)."_results.txt";
    $fp = fopen($filename,"w");
    fwrite($fp, "### Atoms array ###\n");
    fwrite($fp,var_export($arr_atoms,TRUE)."\n");
    fwrite($fp, "\n### CN Distribution for Na ###\n");
    fwrite($fp,var_export($CN_Na,TRUE)."\n");
    fwrite($fp, "\n### CN Distribution for K ###\n");
    fwrite($fp,var_export($CN_K,TRUE)."\n");

    # Percentage calculation:
    $sum_Na = array_sum($CN_Na);
    $sum_K  = array_sum($CN_K);
    foreach($CN_Na as $key=>$value)
        $CN_Na[$key] = $value * 100 / $sum_Na;
    foreach($CN_K  as $key=>$value)
        $CN_K[$key]  = $value * 100 / $sum_K;
    //print_r($CN_Na);
    //print_r($CN_K);
    fwrite($fp, "\n### CN Distribution (Percentage) for Na ###\n");
    fwrite($fp,var_export($CN_Na,TRUE)."\n");
    fwrite($fp, "\n### CN Distribution (Percentage) for K ###\n");
    fwrite($fp,var_export($CN_K,TRUE)."\n");
    fclose($fp);

    # Write the summary to the results file
    $fp = fopen("results.txt","a");
    for($i=1;$i<=20;$i++)
    {
        if(!array_key_exists($i,$CN_Na))
            $CN_Na[$i] = 0;
        if(!array_key_exists($i,$CN_K))
            $CN_K[$i] = 0;
    }
    ksort($CN_Na);
    ksort($CN_K);
    fwrite($fp,sprintf("%04d",$conf_index)."\t".implode("\t",$CN_Na)."\t".implode("\t",$CN_K)."\n");
    fclose($fp);

}

function parse_types(&$arr_atoms)
{
    # Parse the types
    $i = 0;
    $fp = fopen("IP.DAT", "r");
    while(!feof($fp))
    {
        $line = fgets($fp,4096);
        $auxarr = explode(" ",$line);
        if($auxarr[0]==-1)
            $arr_atoms[$i]["id"] = "Na";
        else
            $arr_atoms[$i]["id"] = "K";
        $i++;
    }
    fclose($fp);
    array_pop($arr_atoms);
    return 0;
}

function get_vertices($conf_index)
{
    $arr_refs = Array();

    # Get the ids of the vertices of the convex hull
    $filename = "hull_calcs/".sprintf("%04d",$conf_index).".hull";
    $fp = fopen($filename, "r");

    # Bypass the first line
    $line = fgets($fp,4096);

    while(!feof($fp))
    {
        $line = fgets($fp,4096);
        $auxarr = explode(" ",$line);
        array_pop($auxarr);
        $arr_refs = array_merge($arr_refs, $auxarr);
    }
    fclose($fp);
    // $arr_refs = array_unique($arr_refs); # This doesn't lose the keys
    $arr_refs = array_keys(array_count_values($arr_refs)); # But this does.
    return $arr_refs;
}

function parse_coordinates(&$arr_atoms, $conf_index)
{
    # Parse the coordinates
    $i = 0;
    $filename = "hull_calcs/".sprintf("%04d",$conf_index);
    $fp = fopen($filename, "r");

    # Bypass the first two lines
    $line = fgets($fp,4096);
    $line = fgets($fp,4096);

    while(!feof($fp))
    {
        $line = fgets($fp,4096);
        $arr_atoms[$i]["coords"] = explode(" ",$line);
        array_shift($arr_atoms[$i]["coords"]);
        $i++;
    }
    fclose($fp);
    array_pop($arr_atoms);
    return 0;
}

function distance(&$atom1,&$atom2)
{
    # Calculate the distance between the two atoms
    $x1 = $atom1["coords"][0];
    $x2 = $atom2["coords"][0];
    $y1 = $atom1["coords"][1];
    $y2 = $atom2["coords"][1];
    $z1 = $atom1["coords"][2];
    $z2 = $atom2["coords"][2];
    return sqrt(pow($x1-$x2,2) + pow($y1-$y2,2) + pow($z1-$z2,2));
}

?>


The code above generates a “results/xxxx_results.txt” file containing all the information obtained in arrays format and also appends to “results.txt” file the relevant configuration file’s CN distribution summary. The systems can be visualized using the output generated by the following plotter.php script:
<?PHP
/* Emre S. Tasci <e.tasci@tudelft.nl>                    *
 * Parses the results/xxxx_results.txt file and generates 
 * an XYZ file such that the atoms are labeled as the 
 * vertice atoms, close-vertice atoms and inside atoms..
 *                                              02/09/09 */

# Support for command line variable passing:
for($i=1;$i<sizeof($_SERVER["argv"]);$i++)
{
        list($var0,$val0) = explode("=",$_SERVER["argv"][$i]);
        $_GET[$var0] = $val0;
}

if($_GET["configuration"])
    $conf_index = $_GET["configuration"];
else 
    exit("Please specify a configuration number [php plotter.php configuration=xxx]\n");

$filename = sprintf("%04d",$conf_index);

# Get the atom array from the results file. Since results file 
# contains other arrays as well, we slice the file using sed for the 
# relevant part
$last_arr_line = exec('grep "### CN Distribution" results/'.$filename.'_results.txt -n -m1|sed "s:^\([0-9]\+\).*:\1:gi"');
exec('sed -n "2,'.($last_arr_line-1).'p" results/'.$filename.'_results.txt > system_tmp_array_dump_file');
$str=file_get_contents('system_tmp_array_dump_file');
$atom_arr=eval('return '.$str.';');
unlink('system_tmp_array_dump_file');

# Now that we have the coordinates, we itarate over the atoms to check 
# the characteristic and designate them in the XYZ file.
$fp = fopen("system_".$filename.".xyz","w");
fwrite($fp,sizeof($atom_arr)."\n");
fwrite($fp,"Generated by plotter.php\n");
for($i=0, $size=sizeof($atom_arr);$i<$size;$i++)
{
    if(!array_key_exists("in",$atom_arr[$i]))
        $atomlabel = "Mg";#Vertices
    elseif($atom_arr[$i]["in"]==TRUE)
        $atomlabel = $atom_arr[$i]["id"];#Inside atoms
    else
        $atomlabel = "O";#Close-vertice atoms
    fwrite($fp,$atomlabel."\t".implode("\t",$atom_arr[$i]["coords"]));
}
fclose($fp);
?>

which generates a “system_xxxx.xyz” file, ready to be viewed in a standard molecule viewing software. It designates the vertice atoms (of the convex hull, remember? 8)as “Mg” and the atoms close to them as “O” for easy-viewing purpose. A sample parsed file looks like this:

The filtered case of a sample Na-K system

The filtered case of a sample Na-K system

The big orange atoms are the omitted vertice atoms; the small red ones are the atoms critically close to the facets and hence also omitted ones. You can see the purple and yellow atoms which have passed the test sitting happilly inside, with the counting done using them.. 8)

]]>
http://www.emresururi.com/physics/?feed=rss2&p=91 0
Binning is easy (with eyes closed – misunderstanding all you see..) http://www.emresururi.com/physics/?p=73 http://www.emresururi.com/physics/?p=73#respond Mon, 02 Jun 2008 12:34:49 +0000 http://www.emresururi.com/physics/?p=73 When you have some nice discrete random variables, or more correctly, some sets of nice discrete random variables and given that they behave well (like the ones you encounter all the time in the textbook examples), calculating the entropies and then moving on to the deducing of mutual information between these sets is never difficult. But things don’t happen that well in real situations: they tend to get nastier. The methods you employ for binary-valued sets are useless when you have 17000 different type of data among 25000 data points.

As an example, consider these two sets:

A={1,2,3,4,5}

B={1,121,43,12,100}

which one would you say is more ordered? A? Really? But how? — each of them has 5 different values which would amount to the same entropy for both. Yes, you can define two new sets based on these sets with actually having the differences between two adjacent members:

A’ = {1,1,1,1}

B’ = {120,-78,-31,88}

and the answer is obvious. This is really assuring – except the fact that it doesn’t have to be this obvious and furthermore, you don’t usually get only 6 membered sets to try out variations.

For this problem of mine, I have been devising a method and I think it’s working well. It’s really simple, you just bin (down) your data according some criteria and if it goes well, at the end you have classified your data points in much more smaller number of bins.

Today, I re-found the works of Dominik Endres and his colleagues. Their binning method is called the Bayesian Binning (you can find the detailed information at Endres’ personal page) and it pretty looks solid. I’ll still be focusing on my method (which I humbly call Tracking the Eskimo and the method itself compared to the Bayesian Binning, seems very primitive) but it’s always nice to see that, someone also had a similar problem and worked his way through it. There is also the other thing which I mention from time to time : when you treat your information as data, you get to have nice tools that were developed not by researchers from your field of interest but from totally different disciplines who were also at one time visited the 101010 world. 8) (It turns out that Dominik Endres and his fellows are from Psychology. One other thing is that I had begun thinking about my method after seeing another method for cytometry (the scientific branch which counts the cells in blood flow – or smt like that 8) applications…)

References:

D. Endres and P. Földiák, Bayesian bin distribution inference and mutual information, IEEE Transactions on Information Theory, vol. 51, no. 11, pp. 3766-3779, November 2005 DOI: 10.1109/TIT.2005.856954

D. Endres and P. Földiák, Exact Bayesian Bin Classification: a fast alternative to Bayesian Classification and its application to neural response analysis,
Journal of Computational Neuroscience, 24(1): 24-35, 2008. DOI: 10.1007/s10827-007-0039-5.

D. Endres, M. Oram, J. Schindelin and P. Földiák, Bayesian binning beats approximate alternatives: estimating peri-stimulus time histograms,
pp. 393-400, Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA, 2008 http://books.nips.cc/nips20.html

]]>
http://www.emresururi.com/physics/?feed=rss2&p=73 0
Quote of the day: On Neural Networks and Bayesian Statistics http://www.emresururi.com/physics/?p=68 http://www.emresururi.com/physics/?p=68#respond Wed, 07 May 2008 08:33:35 +0000 http://www.emresururi.com/physics/?p=68 "The two ideas of neural network modelling and Bayesian statistics might seem uneasy bed-fellows. Neural networks are non-linear parallel computational devices inspired by the structure of the brain. ‘Backpropagation networks’ are able to learn, by example, to solve prediction and classification problems. Such a neural network is typically viewed as a black box which finds by hook or by crook an incomprehensible solution to a poorly understood problem. In contrast, Bayesian methods are characterized by an insistence on coherent inference based on clearly defined axioms; in Bayesian circles, and ‘ad hockery’ is a capital offense. Thus Bayesian statistics and neural networks might seem to occupy opposite extremes of the data modelling spectrum."

Probable Networks and Plausible Predictions – A Review of Practical Bayesian Methods for Supervised Neural Networks *

David J.C. MacKay *

]]>
http://www.emresururi.com/physics/?feed=rss2&p=68 0
Specification of an extensible and portable file format for electronic structure and crystallographic data http://www.emresururi.com/physics/?p=67 http://www.emresururi.com/physics/?p=67#respond Tue, 06 May 2008 14:09:17 +0000 http://www.emresururi.com/physics/?p=67 Specification of an extensible and portable file format for electronic structure and crystallographic data

X. Gonzea, C.-O. Almbladh, A. Cucca, D. Caliste, C. Freysoldt, M.A.L. Marques, V. Olevano, Y. Pouillon and M.J. Verstraet

Comput. Mater. Sci. (2008) doi:10.1016/j.commatsci.2008.02.023

Abstract

In order to allow different software applications, in constant evolution, to interact and exchange data, flexible file formats are needed. A file format specification for different types of content has been elaborated to allow communication of data for the software developed within the European Network of Excellence “NANOQUANTA”, focusing on first-principles calculations of materials and nanosystems. It might be used by other software as well, and is described here in detail. The format relies on the NetCDF binary input/output library, already used in many different scientific communities, that provides flexibility as well as portability across languages and platforms. Thanks to NetCDF, the content can be accessed by keywords, ensuring the file format is extensible and backward compatible.

Keywords

Electronic structure; File format standardization; Crystallographic datafiles; Density datafiles; Wavefunctions datafiles; NetCDF

PACS classification codes

61.68.+n; 63.20.dk; 71.15.Ap

Problems with the binary representations:

  1. lack of portability between big-endian and little-endian platforms (and vice-versa), or between 32-bit and 64-bit platforms;
  2. difficulties to read the files written by F77/90 codes from C/C++ software (and vice-versa)
  3. lack of extensibility, as one file produced for one version of the software might not be readable by a past/forthcoming version

(Network Common Data Form) library solves these issues. (Alas, it is binary and in the examples presented in  http://www.unidata.ucar.edu/software/netcdf/examples/files.html, it seems like the CDF representations are larger in size than those of the CDL metadata representations)

The idea of standardization of file formats is not new in the electronic structure community [X. Gonze, G. Zerah, K.W. Jakobsen, and K. Hinsen. Psi-k Newsletter 55, February 2003, pp. 129-134. URL: <http://psi-k.dl.ac.uk>] (This article is a must read article, by the way. It’s a very insightful and concerning overview of the good natured developments in the atomistic/molecular software domain. It is purifying in the attempt to produce something good..). However, it proved difficult to achieve without a formal organization, gathering code developers involved in different software project, with a sufficient incentive to realize effective file exchange between such software.

HDF (Hierarchical Data Format) is also an alternative. NetCDF is simpler to use if the data formats are flat, while HDF has definite advantages if one is dealing with hierarchical formats. Typically, we will need to describe many different multi-dimensional arrays of real or complex numbers, for which NetCDF is an adequate tool.

Although a data specification might be presented irrespective of the library actually used for the implementation, such a freedom might lead to implementations using incompatible file formats (like NetCDF and XML for instance). This possibility would in effect block the expected standardization gain. Thus, as a part of the standardization, we require the future implementations of our specification to rely on the NetCDF library only. (Alternative to independent schemas is presented as a mandatory format which is against the whole idea of development. The reason for NetCDF being incompatible with XML lies solely on the inflexibility/inextensibility of the prior format. Although such a readily defined and built format is adventageous considering the huge numerical data and parameters the ab inito software uses, it is very immobile and unnecessary for compound and element data.)

The compact representation brought by NetCDF can be by-passed in favour of text encoding (very agreable given the usage purposes: XML type extensible schema usage is much more adequate for structure/composition properties). We are aware of several standardization efforts [J. Junquera, M. Verstraete, X. Gonze, unpublished] and [J. Mortensen, F. Jollet, unpublished. Also check Minutes of the discussion on XML format for PAW setups], relying on XML, which emphasize addressing by content to represent such atomic data.

4 types of data types are distinguished:

  • (A) The actual numerical data (which defines whether a file contains wavefunctions, a density, etc), for which a name must have been agreed in the specification.
  • (B) The auxiliary data that is mandatory to make proper usage of the actual numerical data of A-type. The name and description of this auxiliary information is also agreed.
  • (C) The auxiliary data that is not mandatory to make proper usage of the A-type numerical data, but for which a name and description has been agreed in the specification.
  • (D) Other data, typically code-dependent, whose availability might help the use of the file for a specific code. The name of these variables should be different from the names chosen for agreed variables of A–C types. Such type D data might even be redundant with type A–C data.

The NetCDF interface adapts the dimension ordering to the programming language used. The notation here is C-like, i.e. row-major storage, the last index varying the fastest. In FORTRAN, a similar memory mapping is obtained by reversing the order of the indices. (So, the ordering/reverse-ordering is handled by the interface/library)

Concluding Remarks

We presented the specifications for a file format relying on the NetCDF I/O library with a content related to electronic structure and crystallographic data. This specification takes advantage of all the interesting properties of NetCDF-based files, in particular portability and extensibility. It is designed for both serial and distributed usage, although the latter characteristics was not presented here.

Several software in the Nanoquanta and ETSF [15] context can produce or read this file format: ABINIT [16] and [17], DP [18], GWST, SELF [19], V_Sim [20]. In order to further encourage its use, a library of Fortran routines [5] has been set up, and is available under the GNU LGPL licence.

Additional Information:

Announcement for a past event
( http://www.tddft.org/pipermail/fsatom/2003-February/000004.html )

CECAM – psi-k – SIMU joint Tutorial

1) Title : Software solutions for data exchange and code gluing.

Location : Lyon Dates : 8-10 october, 2003

Purpose : In this tutorial, we will teach software tools and standards that have recently emerged in view of the exchange of data (text and binary) and gluing of codes : (1) Python, as scripting language, its interfaces with C and FORTRAN ; (2) XML, a standard for representing structured data in text files ; (3) netCDF, a library and file format for the exchange and storage of binary data, and its interfaces with C, Fortran, and Python

Organizers : X. Gonze gonze@pcpm.ucl.ac.be

K. Hinsen hinsen@cnrs-orleans.fr

2) Scientific content

Recent discussions, related to the CECAM workshop on "Open Source Software for Microscopic Simulations", June 19-21, 2002, to the GRID concept (http://www.gridcomputing.com), as well as to the future Integrated Infrastructure Initiative proposal linked to the European psi-k network (http://psi-k.dl.ac.uk), have made clear that one challenge for the coming years is the ability to establish standards for accessing codes, transferring data between codes, testing codes against each other, and become able to "glue" them (this being facilitated by the Free Software concept).

In the present tutorial, we would like to teach three "software solutions" to face this challenge : Python, XML and netCDF.

Python is now the de facto "scripting langage" standard in the computational physics and chemistry community. XML (eXtended Markup Language) is a framework for building mark-up languages, allowing to set-up self-describing documents, readable by humans and machines. netCDF allows binary files to be portable accross platforms. It is not our aim to cover all possible solutions to the above-mentioned challenges (e.g. PERL, Tcl, or HDF), but these three have proven suitable for atomic-scale simulations, in the framework of leading projects like CAMPOS (http://www.fysik.dtu.dk/campos), MMTK (http://dirac.cnrs-orleans.fr/MMTK), and GROMACS (http://www.gromacs.org). Other software projects like ABINIT (http://www.abinit.org) and PWSCF (http://www.pwscf.org – in the DEMOCRITOS context), among others, have made clear their interest for these. All of these software solutions can be used without having to buy a licence.

Tentative program of the tutorial. Lectures in the morning, hands-on training in the afternoon.

1st day ——- 2h Python basics 1h Interface : Python/C or FORTRAN 1h XML basics Afternoon Training with Python, and interfaces with C and FORTRAN

2nd day ——- 2h Python : object oriented (+ an application to GUI and Tk) 1h Interface : Python/XML 1h Interface : XML + C or FORTRAN Afternoon Training with XML + interfaces

3rd day ——- 1h Python : numerical 1h netCDF basics 1h Interface : netCDF/Python 1h Interface : netCDF/C or FORTRAN Afternoon Training with netCDF + interfaces

3) List of lecturers

K. Hinsen (Orleans, France), organizer X. Gonze (Louvain-la-Neuve, Belgium), organizer K. Jakobsen (Lyngby, Denmark), instructor J. Schiotz (Lyngby, Denmark), instructor J. Van Der Spoel (Groningen, The Netherlands), instructor M. van Loewis (Berlin, Germany), instructor

4) Number of participants : around 20, Most of the participants should be PhD students, postdoc or young permanent scientists, involved in code development. It is assumed that the attendants have a good knowledge of UNIX, and C or FORTRAN.

Our budget will allow contributing to travel and local expenses of up to 20 participants.

XML and NetCDF:

from Specification of file formats for NANOQUANTA/ETSF Specification – www.etsf.eu/research/software/nq_specff_v2.1_final.pdf

Section 1. General considerations concerning the present file format specifications.

One has to consider separately the set of data to be included in each of different types of files, from their representation. Concerning the latter, one encounters simple text files, binary files, XML-structured files, NetCDF files, etc … It was already decided previously (Nanoquanta meeting Maratea Sept. 2004) to evolve towards formats that deal appropriately with the self- description issue, i.e. XML and NetCDF. The inherent flexibility of these representations will also allow to evolve specific versions of each type of files progressively, and refine earlier working proposals. The same direction has been adopted by several groups of code developers that we know of.

Information on NetCDF and XML can be obtained from the official Web sites,

http://www.unidata.ucar.edu/software/netcdf/ and

http://www.w3.org/XML/

There are numerous other presentations of these formats on the Web, or in books.

The elaboration of file formats based on NetCDF has advanced a lot during the Louvain-la- Neuve mini-workshop. There has been also some remarks about XML.

Concerning XML :

(A) The XML format is most adapted for the structured representation of relatively small quantity of data, as it is not compressed.

(B) It is a very flexible format, but hard to read in Fortran (no problem in C, C++ or Python). Recently, Alberto Garcia has set up a XMLF90 library of routines to read XML from Fortran. http://lcdx00.wm.lc.ehu.es/~wdpgaara/xml/index.html Other efforts exists in this direction http://nn-online.org/code/xml/

Concerning NetCDF

  • (A) Several groups of developers inside NQ have already a good experience of using it, for the representation of binary data (large files).
  • (B) Although there is no clear advantage of NetCDF compared to HDF (another possibility for large binary files), this experience inside the NQ network is the main reason for preferring it. By the way, NetCDF and HDF are willing to merge (this might take a few years, though).
  • (C) File size limitations of NetCDF exist, see appendix D, but should be overcome in the future.

Thanks to the flexibility of NetCDF, the content of a NetCDF file format suitable for use for NQ softwares might be of four different types :

(1) The actual numerical data (that defines a file for wavefunctions, or a density file, etc …), whose NetCDF description would have been agreed.

(2) The auxiliary data that are mandatory to make proper usage of the actual numerical data. The NetCDF description of these auxiliary data should also be agreed.

(3) The auxiliary data that are not mandatory, but whose NetCDF description has been agreed, in a larger context.

(4) Other data, typically code-dependent, whose existence might help the use of the file for a specific code.

References:
[5] URL: <http://www.etsf.eu/index.php?page=tools>.

[15] URL: <http://www.etsf.eu/>.

[16] X. Gonze, J.-M. Beuken, R. Caracas, F. Detraux, M. Fuchs, G.-M. Rignanese, L. Sindic, M. Verstraete, G. Zerah, F. Jollet, M. Torrent, A. Roy, M. Mikami, Ph. Ghosez, J.-Y. Raty and D.C. Allan, Comput. Mater. Sci. 25 (2002), pp. 478–492.

[17] X. Gonze, G.-M. Rignanese, M. Verstraete, J.-M. Beuken, Y. Pouillon, R. Caracas, F. Jollet, M. Torrent, G. Zérah, M. Mikami, Ph. Ghosez, M. Veithen, J.-Y. Raty, V. Olevano, F. Bruneval, L. Reining, R. Godby, G. Onida, D.R. Hamann and D.C. Allan, Zeit. Kristall. 220 (2005), pp. 558–562.

[18] URL: <http://dp-code.org>.

[19] URL: <http://www.bethe-salpeter.org>.

[20] URL: http://www-drfmc.cea.fr/sp2m/L_Sim/V_Sim/index.en.html.

]]>
http://www.emresururi.com/physics/?feed=rss2&p=67 0
A (Lame) Proof of the Probability Sum Rule http://www.emresururi.com/physics/?p=54 http://www.emresururi.com/physics/?p=54#respond Fri, 21 Dec 2007 16:23:03 +0000 http://www.emresururi.com/physics/?p=54 Q: Prove the Probability Sum Rule, that is:

Formula: % MathType!MTEF!2!1!+-<br />
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br />
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br />
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br />
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br />
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm<br />
% aabaGaamOqaaGaayjkaiaawMcaaiabg2da9maaqahabaGaamiuamaa<br />
% bmaabaGaamOqaiaacYhacaWGbbGaeyypa0JaamODamaaBaaaleaaca<br />
% WGQbaabeaaaOGaayjkaiaawMcaaiaadcfadaqadaqaaiaadgeacqGH<br />
% 9aqpcaWG2bWaaSbaaSqaaiaadQgaaeqaaaGccaGLOaGaayzkaaaale<br />
% aacaWGQbGaeyypa0JaaGymaaqaaiaadUgaa0GaeyyeIuoaaaa!4E55!<br />
\[<br />
P\left( B \right) = \sum\limits_{j = 1}^k {P\left( {B|A = v_j } \right)P\left( {A = v_j } \right)} <br />
\]<br />

(where A is a random variable with arity (~dimension) k) using Axioms:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGimaiabgs
% MiJkaadcfadaqadaqaaiaadgeaaiaawIcacaGLPaaacqGHKjYOcaaI
% XaGaaiilaiaaykW7caWGqbWaaeWaaeaacaWGubGaamOCaiaadwhaca
% WGLbaacaGLOaGaayzkaaGaeyypa0JaaGymaiaacYcacaWGqbWaaeWa
% aeaacaWGgbGaamyyaiaadYgacaWGZbGaamyzaaGaayjkaiaawMcaai
% abg2da9iaaicdacaWLjaWaamWaaeaacaaIXaaacaGLBbGaayzxaaaa
% aa!549F!
\[
0 \leqslant P\left( A \right) \leqslant 1,\,P\left( {True} \right) = 1,P\left( {False} \right) = 0 & \left[ 1 \right]
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaGaamyqaiabgIIiAlaadkeaaiaawIcacaGLPaaacqGH9aqpcaWG
% qbWaaeWaaeaacaWGbbaacaGLOaGaayzkaaGaey4kaSIaamiuamaabm
% aabaGaamOqaaGaayjkaiaawMcaaiabgkHiTiaadcfadaqadaqaaiaa
% dgeacqGHNis2caWGcbaacaGLOaGaayzkaaGaaCzcamaadmaabaGaaG
% OmaaGaay5waiaaw2faaaaa!4D8F!
\[
P\left( {A \vee B} \right) = P\left( A \right) + P\left( B \right) - P\left( {A \wedge B} \right) & \left[ 2 \right]
\]

and assuming:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuaiaacI
% cacaWGbbGaeyypa0JaamODamaaBaaaleaacaWGPbaabeaakiabgEIi
% zlaadgeacqGH9aqpcaWG2bWaaSbaaSqaaiaadQgaaeqaaOGaaiykai
% abg2da9maaceaabaqbaeqabiGaaaqaaiaaicdaaeaacaqGPbGaaeOz
% aiaabccacaWGPbGaeyiyIKRaamOAaaqaaiaadcfadaqadaqaaiaadg
% eacqGH9aqpcaWG2bWaaSbaaSqaaiaadMgaaeqaaaGccaGLOaGaayzk
% aaaabaGaaeyAaiaabAgacaqGGaGaamyAaiabg2da9iaadQgaaaGaaC
% zcamaadmaabaGaaG4maaGaay5waiaaw2faaaGaay5Eaaaaaa!599B!
\[
P(A = v_i \wedge A = v_j ) = \left\{ {\begin{array}{*{20}c}
0 & {{\text{if }}i \ne j} \\
{P\left( {A = v_i } \right)} & {{\text{if }}i = j} \\
\end{array} & \left[ 3 \right]} \right.
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuaiaacI
% cacaWGbbGaeyypa0JaamODamaaBaaaleaacaaIXaaabeaakiabgIIi
% AlaadgeacqGH9aqpcaWG2bWaaSbaaSqaaiaaikdaaeqaaOGaeyikIO
% TaaiOlaiaac6cacaGGUaGaeyikIOTaamyqaiabg2da9iaadAhadaWg
% aaWcbaGaam4AaaqabaGccaGGPaGaeyypa0JaaGymaiaaxMaacaWLja
% WaamWaaeaacaaI0aaacaGLBbGaayzxaaaaaa!5054!
\[
P(A = v_1 \vee A = v_2 \vee ... \vee A = v_k ) = 1 & & \left[ 4 \right]
\] 


 

We will first prove the following equalities to make use of them later in the actual proof:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaGaamyqaiabg2da9iaadAhadaWgaaWcbaGaaGymaaqabaGccqGH
% OiI2caWGbbGaeyypa0JaamODamaaBaaaleaacaaIYaaabeaakiabgI
% IiAlaac6cacaGGUaGaaiOlaiabgIIiAlaadgeacqGH9aqpcaWG2bWa
% aSbaaSqaaiaadMgaaeqaaaGccaGLOaGaayzkaaGaeyypa0ZaaabCae
% aacaWGqbWaaeWaaeaacaWGbbGaeyypa0JaamODamaaBaaaleaacaWG
% QbaabeaaaOGaayjkaiaawMcaaiaaxMaadaWadaqaaiaaiwdaaiaawU
% facaGLDbaaaSqaaiaadQgacqGH9aqpcaaIXaaabaGaamyAaaqdcqGH
% ris5aaaa!5B50!
\[
P\left( {A = v_1 \vee A = v_2 \vee ... \vee A = v_i } \right) = \sum\limits_{j = 1}^i {P\left( {A = v_j } \right) & \left[ 5 \right]} \]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaGaamOqaiabgEIizpaadmaabaGaamyqaiabg2da9iaadAhadaWg
% aaWcbaGaaGymaaqabaGccqGHOiI2caWGbbGaeyypa0JaamODamaaBa
% aaleaacaaIYaaabeaakiabgIIiAlaac6cacaGGUaGaaiOlaiabgIIi
% AlaadgeacqGH9aqpcaWG2bWaaSbaaSqaaiaadMgaaeqaaaGccaGLBb
% GaayzxaaaacaGLOaGaayzkaaGaeyypa0ZaaabCaeaacaWGqbWaaeWa
% aeaacaWGcbGaey4jIKTaamyqaiabg2da9iaadAhadaWgaaWcbaGaam
% OAaaqabaaakiaawIcacaGLPaaacaWLjaWaamWaaeaacaaI2aaacaGL
% BbGaayzxaaaaleaacaWGQbGaeyypa0JaaGymaaqaaiaadMgaa0Gaey
% yeIuoaaaa!622D!
\[
P\left( {B \wedge \left[ {A = v_1 \vee A = v_2 \vee ... \vee A = v_i } \right]} \right) = \sum\limits_{j = 1}^i {P\left( {B \wedge A = v_j } \right) & \left[ 6 \right]}
\]

Proof of [5]:

Using [2], we can write:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaGaamyqaiabg2da9iaadAhadaWgaaWcbaGaamyAaaqabaGccqGH
% OiI2caWGbbGaeyypa0JaamODamaaBaaaleaacaWGQbaabeaaaOGaay
% jkaiaawMcaaiabg2da9iaadcfadaqadaqaaiaadgeacqGH9aqpcaWG
% 2bWaaSbaaSqaaiaadMgaaeqaaaGccaGLOaGaayzkaaGaey4kaSIaam
% iuamaabmaabaGaamyqaiabg2da9iaadAhadaWgaaWcbaGaamOAaaqa
% baaakiaawIcacaGLPaaacqGHsislcaWGqbWaaeWaaeaacaWGbbGaey
% ypa0JaamODamaaBaaaleaacaWGPbaabeaakiabgEIizlaadgeacqGH
% 9aqpcaWG2bWaaSbaaSqaaiaadQgaaeqaaaGccaGLOaGaayzkaaaaaa!5D1D!
\[
P\left( {A = v_i \vee A = v_j } \right) = P\left( {A = v_i } \right) + P\left( {A = v_j } \right) - P\left( {A = v_i \wedge A = v_j } \right)
\]

where, from the assumption [3], we have

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaGaamyqaiabg2da9iaadAhadaWgaaWcbaGaamyAaaqabaGccqGH
% OiI2caWGbbGaeyypa0JaamODamaaBaaaleaacaWGQbaabeaaaOGaay
% jkaiaawMcaaiabg2da9iaadcfadaqadaqaaiaadgeacqGH9aqpcaWG
% 2bWaaSbaaSqaaiaadMgaaeqaaaGccaGLOaGaayzkaaGaey4kaSIaam
% iuamaabmaabaGaamyqaiabg2da9iaadAhadaWgaaWcbaGaamOAaaqa
% baaakiaawIcacaGLPaaacaWLjaGaaCzcamaabmaabaGaamyAaiabgc
% Mi5kaadQgaaiaawIcacaGLPaaaaaa!56BE!
\[
P\left( {A = v_i \vee A = v_j } \right) = P\left( {A = v_i } \right) + P\left( {A = v_j } \right) & & \left( {i \ne j} \right)
\]

Using this fact, we can now expand the LHS of [5] as:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGqb
% WaaeWaaeaacaWGbbGaeyypa0JaamODamaaBaaaleaacaaIXaaabeaa
% kiabgIIiAlaadgeacqGH9aqpcaWG2bWaaSbaaSqaaiaaikdaaeqaaO
% GaeyikIOTaaiOlaiaac6cacaGGUaGaeyikIOTaamyqaiabg2da9iaa
% dAhadaWgaaWcbaGaamyAaaqabaaakiaawIcacaGLPaaacqGH9aqpca
% WGqbWaaeWaaeaacaWGbbGaeyypa0JaamODamaaBaaaleaacaaIXaaa
% beaaaOGaayjkaiaawMcaaiabgUcaRiaadcfadaqadaqaaiaadgeacq
% GH9aqpcaWG2bWaaSbaaSqaaiaaikdaaeqaaaGccaGLOaGaayzkaaGa
% ey4kaSIaaiOlaiaac6cacaGGUaGaey4kaSIaamiuamaabmaabaGaam
% yqaiabg2da9iaadAhadaWgaaWcbaGaamyAaaqabaaakiaawIcacaGL
% PaaaaeaacqGH9aqpdaaeWbqaaiaadcfadaqadaqaaiaadgeacqGH9a
% qpcaWG2bWaaSbaaSqaaiaadQgaaeqaaaGccaGLOaGaayzkaaaaleaa
% caWGQbGaeyypa0JaaGymaaqaaiaadMgaa0GaeyyeIuoaaaaa!703C!
\[
\begin{gathered}
P\left( {A = v_1 \vee A = v_2 \vee ... \vee A = v_i } \right) = P\left( {A = v_1 } \right) + P\left( {A = v_2 } \right) + ... + P\left( {A = v_i } \right) \hfill \\
= \sum\limits_{j = 1}^i {P\left( {A = v_j } \right)} \hfill \\
\end{gathered}
\]

Proof of [6]:

Using the distribution property, we can write the LHS of [6] as:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGqb
% WaaeWaaeaacaWGcbGaey4jIK9aamWaaeaacaWGbbGaeyypa0JaamOD
% amaaBaaaleaacaaIXaaabeaakiabgIIiAlaadgeacqGH9aqpcaWG2b
% WaaSbaaSqaaiaaikdaaeqaaOGaeyikIOTaaiOlaiaac6cacaGGUaGa
% eyikIOTaamyqaiabg2da9iaadAhadaWgaaWcbaGaamyAaaqabaaaki
% aawUfacaGLDbaaaiaawIcacaGLPaaaaeaacqGH9aqpcaWGqbWaaeWa
% aeaadaWadaqaaiaadkeacqGHNis2caWGbbGaeyypa0JaamODamaaBa
% aaleaacaaIXaaabeaaaOGaay5waiaaw2faaiabgIIiApaadmaabaGa
% amOqaiabgEIizlaadgeacqGH9aqpcaWG2bWaaSbaaSqaaiaaikdaae
% qaaaGccaGLBbGaayzxaaGaeyikIOTaaiOlaiaac6cacaGGUaGaeyik
% IO9aamWaaeaacaWGcbGaey4jIKTaamyqaiabg2da9iaadAhadaWgaa
% WcbaGaamyAaaqabaaakiaawUfacaGLDbaaaiaawIcacaGLPaaaaaaa
% !7256!
\[
\begin{gathered}
P\left( {B \wedge \left[ {A = v_1 \vee A = v_2 \vee ... \vee A = v_i } \right]} \right) \hfill \\
= P\left( {\left[ {B \wedge A = v_1 } \right] \vee \left[ {B \wedge A = v_2 } \right] \vee ... \vee \left[ {B \wedge A = v_i } \right]} \right) \hfill \\
\end{gathered}
\]

from [2], it is obvious that:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGqb
% WaaeWaaeaadaWadaqaaiaadkeacqGHNis2caWGbbGaeyypa0JaamOD
% amaaBaaaleaacaWGPbaabeaaaOGaay5waiaaw2faaiabgIIiApaadm
% aabaGaamOqaiabgEIizlaadgeacqGH9aqpcaWG2bWaaSbaaSqaaiaa
% dQgaaeqaaaGccaGLBbGaayzxaaaacaGLOaGaayzkaaaabaGaeyypa0
% JaamiuaiaacIcacaWGcbGaey4jIKTaamyqaiabg2da9iaadAhadaWg
% aaWcbaGaamyAaaqabaGccaGGPaGaey4kaSIaamiuaiaacIcacaWGcb
% Gaey4jIKTaamyqaiabg2da9iaadAhadaWgaaWcbaGaamOAaaqabaGc
% caGGPaGaeyOeI0YaaGbaaeaacaWGqbWaaeWaaeaadaWadaqaaiaadk
% eacqGHNis2caWGbbGaeyypa0JaamODamaaBaaaleaacaWGPbaabeaa
% aOGaay5waiaaw2faaiabgEIizpaadmaabaGaamOqaiabgEIizlaadg
% eacqGH9aqpcaWG2bWaaSbaaSqaaiaadQgaaeqaaaGccaGLBbGaayzx
% aaaacaGLOaGaayzkaaaaleaacaaIWaGaaGPaVpaabmaabaGaamyAai
% abgcMi5kaadQgaaiaawIcacaGLPaaaaOGaayjo+daabaGaeyypa0Ja
% amiuaiaacIcacaWGcbGaey4jIKTaamyqaiabg2da9iaadAhadaWgaa
% WcbaGaamyAaaqabaGccaGGPaGaey4kaSIaamiuaiaacIcacaWGcbGa
% ey4jIKTaamyqaiabg2da9iaadAhadaWgaaWcbaGaamOAaaqabaGcca
% GGPaaaaaa!8FC2!
\[
\begin{gathered}
P\left( {\left[ {B \wedge A = v_i } \right] \vee \left[ {B \wedge A = v_j } \right]} \right) \hfill \\
= P(B \wedge A = v_i ) + P(B \wedge A = v_j ) - \underbrace {P\left( {\left[ {B \wedge A = v_i } \right] \wedge \left[ {B \wedge A = v_j } \right]} \right)}_{0\,\left( {i \ne j} \right)} \hfill \\
= P(B \wedge A = v_i ) + P(B \wedge A = v_j ) \hfill \\
\end{gathered}
\]

then using [5], we can rearrange and rewrite the LHS of [6] as:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGqb
% WaaeWaaeaacaWGcbGaey4jIK9aamWaaeaacaWGbbGaeyypa0JaamOD
% amaaBaaaleaacaaIXaaabeaakiabgIIiAlaadgeacqGH9aqpcaWG2b
% WaaSbaaSqaaiaaikdaaeqaaOGaeyikIOTaaiOlaiaac6cacaGGUaGa
% eyikIOTaamyqaiabg2da9iaadAhadaWgaaWcbaGaamyAaaqabaaaki
% aawUfacaGLDbaaaiaawIcacaGLPaaaaeaacqGH9aqpcaWGqbGaaiik
% aiaadkeacqGHNis2caWGbbGaeyypa0JaamODamaaBaaaleaacaaIXa
% aabeaakiaacMcacqGHRaWkcaWGqbGaaiikaiaadkeacqGHNis2caWG
% bbGaeyypa0JaamODamaaBaaaleaacaaIYaaabeaakiaacMcacqGHRa
% WkcaGGUaGaaiOlaiaac6cacqGHRaWkcaWGqbGaaiikaiaadkeacqGH
% Nis2caWGbbGaeyypa0JaamODamaaBaaaleaacaWGPbaabeaakiaacM
% caaeaacqGH9aqpdaaeWbqaaiaadcfadaqadaqaaiaadkeacqGHNis2
% caWGbbGaeyypa0JaamODamaaBaaaleaacaWGQbaabeaaaOGaayjkai
% aawMcaaaWcbaGaamOAaiabg2da9iaaigdaaeaacaWGPbaaniabggHi
% Ldaaaaa!7DE8!
\[
\begin{gathered}
P\left( {B \wedge \left[ {A = v_1 \vee A = v_2 \vee ... \vee A = v_i } \right]} \right) \hfill \\
= P(B \wedge A = v_1 ) + P(B \wedge A = v_2 ) + ... + P(B \wedge A = v_i ) \hfill \\
= \sum\limits_{j = 1}^i {P\left( {B \wedge A = v_j } \right)} \hfill \\
\end{gathered}
\]

Final Proof :

Before proceeding, I will include two assumptions about the Universal Set (this part is the main reason for the proof to be lame by the way).

Define the universal set U as the set that includes all possible values of some random variable X. All the other sets are subsets of U and for an arbitrary set A, we assume that:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGbb
% Gaey4jIKTaamyvaiabg2da9iaadgeaaeaacaWGbbGaeyikIOTaamyv
% aiabg2da9iaadwfaaaaa!403E!
\[
\begin{gathered}
A \wedge U = A \hfill \\
A \vee U = U \hfill \\
\end{gathered}
\]

Furthermore, we will be assuming that, a condition that includes all the possible outcomes(/values) for a variable A is equivalent to the universal set. Let A have arity k, then:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaadaqada
% qaaiaadohacaWGVbGaamyBaiaadwgacaaMc8Uaam4yaiaad+gacaWG
% UbGaamizaiaadMgacaWG0bGaamyAaiaad+gacaWGUbGaaGPaVlaadI
% facqGHNis2daagaaqaamaadmaabaGaamyqaiabg2da9iaadAhadaWg
% aaWcbaGaaGymaaqabaGccqGHOiI2caWGbbGaeyypa0JaamODamaaBa
% aaleaacaaIYaaabeaakiabgIIiAlaac6cacaGGUaGaaiOlaiabgIIi
% AlaadgeacqGH9aqpcaWG2bWaaSbaaSqaaiaadUgaaeqaaaGccaGLBb
% GaayzxaaaaleaacaWGvbaakiaawIJ-aaGaayjkaiaawMcaaaqaaiab
% g2da9maabmaabaGaam4Caiaad+gacaWGTbGaamyzaiaaykW7caWGJb
% Gaam4Baiaad6gacaWGKbGaamyAaiaadshacaWGPbGaam4Baiaad6ga
% caaMc8UaamiwaaGaayjkaiaawMcaaaqaamaabmaabaGaam4Caiaad+
% gacaWGTbGaamyzaiaaykW7caWGJbGaam4Baiaad6gacaWGKbGaamyA
% aiaadshacaWGPbGaam4Baiaad6gacaaMc8UaamiwaiabgIIiApaaya
% aabaWaamWaaeaacaWGbbGaeyypa0JaamODamaaBaaaleaacaaIXaaa
% beaakiabgIIiAlaadgeacqGH9aqpcaWG2bWaaSbaaSqaaiaaikdaae
% qaaOGaeyikIOTaaiOlaiaac6cacaGGUaGaeyikIOTaamyqaiabg2da
% 9iaadAhadaWgaaWcbaGaam4AaaqabaaakiaawUfacaGLDbaaaSqaai
% aadwfaaOGaayjo+daacaGLOaGaayzkaaaabaGaeyypa0ZaaeWaaeaa
% caWGvbaacaGLOaGaayzkaaaaaaa!A18B!
\[
\begin{gathered}
\left( {some\,condition\,X \wedge \underbrace {\left[ {A = v_1 \vee A = v_2 \vee ... \vee A = v_k } \right]}_U} \right) \hfill \\
= \left( {some\,condition\,X} \right) \hfill \\
\left( {some\,condition\,X \vee \underbrace {\left[ {A = v_1 \vee A = v_2 \vee ... \vee A = v_k } \right]}_U} \right) \hfill \\
= \left( U \right) \hfill \\
\end{gathered}
\]

You can think of the condition (U) as the definite TRUE 1.

Now we can begin the proof of the equality

Formula: % MathType!MTEF!2!1!+-<br />
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br />
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br />
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br />
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br />
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm<br />
% aabaGaamOqaaGaayjkaiaawMcaaiabg2da9maaqahabaGaamiuamaa<br />
% bmaabaGaamOqaiaacYhacaWGbbGaeyypa0JaamODamaaBaaaleaaca<br />
% WGQbaabeaaaOGaayjkaiaawMcaaiaadcfadaqadaqaaiaadgeacqGH<br />
% 9aqpcaWG2bWaaSbaaSqaaiaadQgaaeqaaaGccaGLOaGaayzkaaaale<br />
% aacaWGQbGaeyypa0JaaGymaaqaaiaadUgaa0GaeyyeIuoaaaa!4E55!<br />
\[<br />
P\left( B \right) = \sum\limits_{j = 1}^k {P\left( {B|A = v_j } \right)P\left( {A = v_j } \right)} <br />
\]<br />

using the Bayes Rule

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaGaamyqaiaacYhacaWGcbaacaGLOaGaayzkaaGaeyypa0ZaaSaa
% aeaacaWGqbWaaeWaaeaacaWGbbGaey4jIKTaamOqaaGaayjkaiaawM
% caaaqaaiaadcfadaqadaqaaiaadkeaaiaawIcacaGLPaaaaaaaaa!44AC!
\[
P\left( {A|B} \right) = \frac{{P\left( {A \wedge B} \right)}}
{{P\left( B \right)}}
\]

we can convert the inference relation into intersection relation and rewrite RHS as:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaadaaeWb
% qaamaalaaabaGaamiuamaabmaabaGaamOqaiabgEIizlaadgeacqGH
% 9aqpcaWG2bWaaSbaaSqaaiaadQgaaeqaaaGccaGLOaGaayzkaaaaba
% GaamiuamaabmaabaGaamyqaiabg2da9iaadAhadaWgaaWcbaGaamOA
% aaqabaaakiaawIcacaGLPaaaaaGaamiuamaabmaabaGaamyqaiabg2
% da9iaadAhadaWgaaWcbaGaamOAaaqabaaakiaawIcacaGLPaaaaSqa
% aiaadQgacqGH9aqpcaaIXaaabaGaam4AaaqdcqGHris5aaGcbaGaey
% ypa0ZaaabCaeaacaWGqbWaaeWaaeaacaWGcbGaey4jIKTaamyqaiab
% g2da9iaadAhadaWgaaWcbaGaamOAaaqabaaakiaawIcacaGLPaaaaS
% qaaiaadQgacqGH9aqpcaaIXaaabaGaam4AaaqdcqGHris5aaaaaa!60EA!
\[
\begin{gathered}
\sum\limits_{j = 1}^k {\frac{{P\left( {B \wedge A = v_j } \right)}}
{{P\left( {A = v_j } \right)}}P\left( {A = v_j } \right)} \hfill \\
= \sum\limits_{j = 1}^k {P\left( {B \wedge A = v_j } \right)} \hfill \\
\end{gathered}
\]

using [6], this is nothing but:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGqb
% WaaeWaaeaacaWGcbGaey4jIK9aaGbaaeaadaWadaqaaiaadgeacqGH
% 9aqpcaWG2bWaaSbaaSqaaiaaigdaaeqaaOGaeyikIOTaamyqaiabg2
% da9iaadAhadaWgaaWcbaGaaGOmaaqabaGccqGHOiI2caGGUaGaaiOl
% aiaac6cacqGHOiI2caWGbbGaeyypa0JaamODamaaBaaaleaacaWGRb
% aabeaaaOGaay5waiaaw2faaaWcbaGaamyvaaGccaGL44paaiaawIca
% caGLPaaaaeaacqGH9aqpcaWGqbWaaeWaaeaacaWGcbaacaGLOaGaay
% zkaaaaaaa!5642!
\[
\begin{gathered}
P\left( {B \wedge \underbrace {\left[ {A = v_1 \vee A = v_2 \vee ... \vee A = v_k } \right]}_U} \right) \hfill \\
= P\left( B \right) \hfill \\
\end{gathered}
\]

]]>
http://www.emresururi.com/physics/?feed=rss2&p=54 0
Some “trivial” derivations http://www.emresururi.com/physics/?p=52 http://www.emresururi.com/physics/?p=52#respond Tue, 04 Dec 2007 11:49:53 +0000 http://www.emresururi.com/physics/?p=52  Information Theory, Inference, and Learning Algorithms by David MacKay, Exercise 22.5:

A random variable x is assumed to have a probability distribution that is a mixture of two Gaussians,

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaGaamiEaiaacYhacqaH8oqBdaWgaaWcbaGaaGymaaqabaGccaGG
% SaGaeqiVd02aaSbaaSqaaiaaikdaaeqaaOGaaiilaiabeo8aZbGaay
% jkaiaawMcaaiabg2da9maadmaabaWaaabCaeaacaWGWbWaaSbaaSqa
% aiaadUgaaeqaaOWaaSaaaeaacaaIXaaabaWaaOaaaeaacaaIYaGaeq
% iWdaNaeq4Wdm3aaWbaaSqabeaacaaIYaaaaaqabaaaaOGaciyzaiaa
% cIhacaGGWbWaaeWaaeaacqGHsisldaWcaaqaamaabmaabaGaamiEai
% abgkHiTiabeY7aTnaaBaaaleaacaWGRbaabeaaaOGaayjkaiaawMca
% amaaCaaaleqabaGaaGOmaaaaaOqaaiaaikdacqaHdpWCdaahaaWcbe
% qaaiaaikdaaaaaaaGccaGLOaGaayzkaaaaleaacaWGRbGaeyypa0Ja
% aGymaaqaaiaaikdaa0GaeyyeIuoaaOGaay5waiaaw2faaaaa!63A5!
\[
P\left( {x|\mu _1 ,\mu _2 ,\sigma } \right) = \left[ {\sum\limits_{k = 1}^2 {p_k \frac{1}
{{\sqrt {2\pi \sigma ^2 } }}\exp \left( { - \frac{{\left( {x - \mu _k } \right)^2 }}
{{2\sigma ^2 }}} \right)} } \right]
\]

where the two Gaussians are given the labels k = 1 and k = 2; the prior probability of the class label k is {p1 = 1/2, p2 = 1/2}; Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaiWaaeaacq
% aH8oqBdaWgaaWcbaGaam4AaaqabaaakiaawUhacaGL9baaaaa!3AFA!
\[
{\left\{ {\mu _k } \right\}}
\]
are the means of the two Gaussians; and both have standard deviation sigma. For brevity, we denote these parameters by

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaCiUdiabgg
% Mi6oaacmaabaWaaiWaaeaacqaH8oqBdaWgaaWcbaGaam4Aaaqabaaa
% kiaawUhacaGL9baacaGGSaGaeq4WdmhacaGL7bGaayzFaaaaaa!42AB!
\[
{\mathbf{\theta }} \equiv \left\{ {\left\{ {\mu _k } \right\},\sigma } \right\}
\]

A data set consists of N points Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaiWaaeaaca
% WG4bWaaSbaaSqaaiaad6gaaeqaaaGccaGL7bGaayzFaaWaa0baaSqa
% aiaad6gacqGH9aqpcaaIXaaabaGaamOtaaaaaaa!3DF8!
\[
\left\{ {x_n } \right\}_{n = 1}^N
\]
which are assumed to be independent samples from the distribution. Let kn denote the unknown class label of the nth point.

Assuming that Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaiWaaeaacq
% aH8oqBdaWgaaWcbaGaam4AaaqabaaakiaawUhacaGL9baaaaa!3AFA!
\[
{\left\{ {\mu _k } \right\}}
\]
and Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4Wdmhaaa!37B0!
\[
\sigma
\]
are known, show that the posterior probability of the class label kn of the nth point can be written as

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGqb
% WaaeWaaeaadaabcaqaaiaadUgadaWgaaWcbaGaamOBaaqabaGccqGH
% 9aqpcaaIXaaacaGLiWoacaWG4bWaaSbaaSqaaiaad6gaaeqaaOGaai
% ilaiaahI7aaiaawIcacaGLPaaacqGH9aqpdaWcaaqaaiaaigdaaeaa
% caaIXaGaey4kaSIaciyzaiaacIhacaGGWbWaamWaaeaacqGHsislda
% qadaqaaiabeM8a3naaBaaaleaacaaIXaaabeaakiaadIhadaWgaaWc
% baGaamOBaaqabaGccqGHRaWkcqaHjpWDdaWgaaWcbaGaaGimaaqaba
% aakiaawIcacaGLPaaaaiaawUfacaGLDbaaaaaabaGaamiuamaabmaa
% baWaaqGaaeaacaWGRbWaaSbaaSqaaiaad6gaaeqaaOGaeyypa0JaaG
% OmaaGaayjcSdGaamiEamaaBaaaleaacaWGUbaabeaakiaacYcacaWH
% 4oaacaGLOaGaayzkaaGaeyypa0ZaaSaaaeaacaaIXaaabaGaaGymai
% abgUcaRiGacwgacaGG4bGaaiiCamaadmaabaGaey4kaSYaaeWaaeaa
% cqaHjpWDdaWgaaWcbaGaaGymaaqabaGccaWG4bWaaSbaaSqaaiaad6
% gaaeqaaOGaey4kaSIaeqyYdC3aaSbaaSqaaiaaicdaaeqaaaGccaGL
% OaGaayzkaaaacaGLBbGaayzxaaaaaaaaaa!7422!
\[
\begin{gathered}
P\left( {\left. {k_n = 1} \right|x_n ,{\mathbf{\theta }}} \right) = \frac{1}
{{1 + \exp \left[ { - \left( {\omega _1 x_n + \omega _0 } \right)} \right]}} \hfill \\
P\left( {\left. {k_n = 2} \right|x_n ,{\mathbf{\theta }}} \right) = \frac{1}
{{1 + \exp \left[ { + \left( {\omega _1 x_n + \omega _0 } \right)} \right]}} \hfill \\
\end{gathered}
\]

 and give expressions for Formula: \[\omega _1 \] and Formula: \[\omega _0 \].


 Derivation:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaWaaqGaaeaacaWGRbWaaSbaaSqaaiaad6gaaeqaaOGaeyypa0Ja
% aGymaaGaayjcSdGaamiEamaaBaaaleaacaWGUbaabeaakiaacYcaca
% WH4oaacaGLOaGaayzkaaGaeyypa0ZaaSaaaeaacaWGqbWaaeWaaeaa
% caWG4bWaaSbaaSqaaiaad6gaaeqaaOWaaqqaaeaacaWGRbWaaSbaaS
% qaaiaad6gaaeqaaOGaeyypa0JaaGymaiaacYcacaWH4oaacaGLhWoa
% aiaawIcacaGLPaaacaWGqbWaaeWaaeaacaWGRbWaaSbaaSqaaiaad6
% gaaeqaaOGaeyypa0JaaGymaaGaayjkaiaawMcaaaqaaiaadcfadaqa
% daqaaiaadIhadaWgaaWcbaGaamOBaaqabaaakiaawIcacaGLPaaaaa
% aaaa!598D!
\[
P\left( {\left. {k_n = 1} \right|x_n ,{\mathbf{\theta }}} \right) = \frac{{P\left( {x_n \left| {k_n = 1,{\mathbf{\theta }}} \right.} \right)P\left( {k_n = 1} \right)}}
{{P\left( {x_n } \right)}}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaGaamiEamaaBaaaleaacaWGUbaabeaakmaaeeaabaGaam4Aamaa
% BaaaleaacaWGUbaabeaakiabg2da9iaaigdacaGGSaGaaCiUdaGaay
% 5bSdaacaGLOaGaayzkaaGaeyypa0ZaaSaaaeaacaaIXaaabaWaaOaa
% aeaacaaIYaGaeqiWdaNaeq4Wdm3aaWbaaSqabeaacaaIYaaaaaqaba
% aaaOGaciyzaiaacIhacaGGWbWaaeWaaeaacqGHsisldaWcaaqaamaa
% bmaabaGaamiEamaaBaaaleaacaWGUbaabeaakiabgkHiTiabeY7aTn
% aaBaaaleaacaaIXaaabeaaaOGaayjkaiaawMcaamaaCaaaleqabaGa
% aGOmaaaaaOqaaiaaikdacqaHdpWCdaahaaWcbeqaaiaaikdaaaaaaa
% GccaGLOaGaayzkaaaaaa!59EC!
\[
P\left( {x_n \left| {k_n = 1,{\mathbf{\theta }}} \right.} \right) = \frac{1}
{{\sqrt {2\pi \sigma ^2 } }}\exp \left( { - \frac{{\left( {x_n - \mu _1 } \right)^2 }}
{{2\sigma ^2 }}} \right)
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaGaamiEamaaBaaaleaacaWGUbaabeaaaOGaayjkaiaawMcaaiab
% g2da9maaqahabaGaamiuamaabmaabaWaaqGaaeaacaWG4bWaaSbaaS
% qaaiaad6gaaeqaaaGccaGLiWoacaWGRbWaaSbaaSqaaiaad6gaaeqa
% aOGaeyypa0JaamyAaiaacYcacaWH4oaacaGLOaGaayzkaaGaamiuam
% aabmaabaGaam4AamaaBaaaleaacaWGUbaabeaakiabg2da9iaadMga
% aiaawIcacaGLPaaaaSqaaiaadMgacqGH9aqpcaaIXaaabaGaaGOmaa
% qdcqGHris5aaaa!53AA!
\[
P\left( {x_n } \right) = \sum\limits_{i = 1}^2 {P\left( {\left. {x_n } \right|k_n = i,{\mathbf{\theta }}} \right)P\left( {k_n = i} \right)}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGqb
% WaaeWaaeaadaabcaqaaiaadUgadaWgaaWcbaGaamOBaaqabaGccqGH
% 9aqpcaaIXaaacaGLiWoacaWG4bWaaSbaaSqaaiaad6gaaeqaaOGaai
% ilaiaahI7aaiaawIcacaGLPaaacqGH9aqpdaWcaaqaamaalaaabaGa
% aGymaaqaamaakaaabaGaaGOmaiabec8aWjabeo8aZnaaCaaaleqaba
% GaaGOmaaaaaeqaaaaakiGacwgacaGG4bGaaiiCamaabmaabaGaeyOe
% I0YaaSaaaeaadaqadaqaaiaadIhadaWgaaWcbaGaamOBaaqabaGccq
% GHsislcqaH8oqBdaWgaaWcbaGaaGymaaqabaaakiaawIcacaGLPaaa
% daahaaWcbeqaaiaaikdaaaaakeaacaaIYaGaeq4Wdm3aaWbaaSqabe
% aacaaIYaaaaaaaaOGaayjkaiaawMcaaiaadcfadaqadaqaaiaadUga
% daWgaaWcbaGaamOBaaqabaGccqGH9aqpcaaIXaaacaGLOaGaayzkaa
% aabaWaaSaaaeaacaaIXaaabaWaaOaaaeaacaaIYaGaeqiWdaNaeq4W
% dm3aaWbaaSqabeaacaaIYaaaaaqabaaaaOGaciyzaiaacIhacaGGWb
% WaaeWaaeaacqGHsisldaWcaaqaamaabmaabaGaamiEamaaBaaaleaa
% caWGUbaabeaakiabgkHiTiabeY7aTnaaBaaaleaacaaIXaaabeaaaO
% GaayjkaiaawMcaamaaCaaaleqabaGaaGOmaaaaaOqaaiaaikdacqaH
% dpWCdaahaaWcbeqaaiaaikdaaaaaaaGccaGLOaGaayzkaaGaamiuam
% aabmaabaGaam4AamaaBaaaleaacaWGUbaabeaakiabg2da9iaaigda
% aiaawIcacaGLPaaacqGHRaWkdaWcaaqaaiaaigdaaeaadaGcaaqaai
% aaikdacqaHapaCcqaHdpWCdaahaaWcbeqaaiaaikdaaaaabeaaaaGc
% ciGGLbGaaiiEaiaacchadaqadaqaaiabgkHiTmaalaaabaWaaeWaae
% aacaWG4bWaaSbaaSqaaiaad6gaaeqaaOGaeyOeI0IaeqiVd02aaSba
% aSqaaiaaikdaaeqaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacaaIYa
% aaaaGcbaGaaGOmaiabeo8aZnaaCaaaleqabaGaaGOmaaaaaaaakiaa
% wIcacaGLPaaacaWGqbWaaeWaaeaacaWGRbWaaSbaaSqaaiaad6gaae
% qaaOGaeyypa0JaaGOmaaGaayjkaiaawMcaaaaaaeaacqGH9aqpdaWc
% aaqaaiaaigdaaeaacaaIXaGaey4kaSIaciyzaiaacIhacaGGWbWaae
% WaaeaacqGHsisldaWcaaqaamaabmaabaGaamiEamaaBaaaleaacaWG
% UbaabeaakiabgkHiTiabeY7aTnaaBaaaleaacaaIYaaabeaaaOGaay
% jkaiaawMcaamaaCaaaleqabaGaaGOmaaaaaOqaaiaaikdacqaHdpWC
% daahaaWcbeqaaiaaikdaaaaaaOGaey4kaSYaaSaaaeaadaqadaqaai
% aadIhadaWgaaWcbaGaamOBaaqabaGccqGHsislcqaH8oqBdaWgaaWc
% baGaaGymaaqabaaakiaawIcacaGLPaaadaahaaWcbeqaaiaaikdaaa
% aakeaacaaIYaGaeq4Wdm3aaWbaaSqabeaacaaIYaaaaaaaaOGaayjk
% aiaawMcaamaabmaabaWaaSaaaeaacaaIXaGaeyOeI0Iaamiuamaabm
% aabaGaam4AamaaBaaaleaacaWGUbaabeaakiabg2da9iaaigdaaiaa
% wIcacaGLPaaaaeaacaWGqbWaaeWaaeaacaWGRbWaaSbaaSqaaiaad6
% gaaeqaaOGaeyypa0JaaGymaaGaayjkaiaawMcaaaaaaiaawIcacaGL
% Paaaaaaaaaa!CC7A!
\[
\begin{gathered}
P\left( {\left. {k_n = 1} \right|x_n ,{\mathbf{\theta }}} \right) = \frac{{\frac{1}
{{\sqrt {2\pi \sigma ^2 } }}\exp \left( { - \frac{{\left( {x_n - \mu _1 } \right)^2 }}
{{2\sigma ^2 }}} \right)P\left( {k_n = 1} \right)}}
{{\frac{1}
{{\sqrt {2\pi \sigma ^2 } }}\exp \left( { - \frac{{\left( {x_n - \mu _1 } \right)^2 }}
{{2\sigma ^2 }}} \right)P\left( {k_n = 1} \right) + \frac{1}
{{\sqrt {2\pi \sigma ^2 } }}\exp \left( { - \frac{{\left( {x_n - \mu _2 } \right)^2 }}
{{2\sigma ^2 }}} \right)P\left( {k_n = 2} \right)}} \hfill \\
= \frac{1}
{{1 + \exp \left( { - \frac{{\left( {x_n - \mu _2 } \right)^2 }}
{{2\sigma ^2 }} + \frac{{\left( {x_n - \mu _1 } \right)^2 }}
{{2\sigma ^2 }}} \right)\left( {\frac{{1 - P\left( {k_n = 1} \right)}}
{{P\left( {k_n = 1} \right)}}} \right)}} \hfill \\
\end{gathered}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyypa0ZaaS
% aaaeaacaaIXaaabaGaaGymaiabgUcaRiGacwgacaGG4bGaaiiCamaa
% dmaabaGaeyOeI0YaaeWaaeaadaqadaqaamaalaaabaWaaeWaaeaacq
% aH8oqBdaWgaaWcbaGaaGymaaqabaGccqGHsislcqaH8oqBdaWgaaWc
% baGaaGOmaaqabaaakiaawIcacaGLPaaaaeaacqaHdpWCdaahaaWcbe
% qaaiaaikdaaaaaaaGccaGLOaGaayzkaaGaamiEamaaBaaaleaacaWG
% UbaabeaakiabgUcaRmaabmaabaWaaSaaaeaadaqadaqaaiabeY7aTn
% aaBaaaleaacaaIYaaabeaakmaaCaaaleqabaGaaGOmaaaakiabgkHi
% TiabeY7aTnaaBaaaleaacaaIXaaabeaakmaaCaaaleqabaGaaGOmaa
% aaaOGaayjkaiaawMcaaaqaaiaaikdacqaHdpWCdaahaaWcbeqaaiaa
% ikdaaaaaaaGccaGLOaGaayzkaaaacaGLOaGaayzkaaaacaGLBbGaay
% zxaaaaaaaa!5E70!
\[
= \frac{1}
{{1 + \exp \left[ { - \left( {\left( {\frac{{\left( {\mu _1 - \mu _2 } \right)}}
{{\sigma ^2 }}} \right)x_n + \left( {\frac{{\left( {\mu _2 ^2 - \mu _1 ^2 } \right)}}
{{2\sigma ^2 }}} \right)} \right)} \right]}}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyypa0ZaaS
% aaaeaacaaIXaaabaGaaGymaiabgUcaRiGacwgacaGG4bGaaiiCamaa
% dmaabaGaeyOeI0YaaeWaaeaacqaHjpWDdaWgaaWcbaGaaGymaaqaba
% GccaWG4bWaaSbaaSqaaiaad6gaaeqaaOGaey4kaSIaeqyYdC3aaSba
% aSqaaiaaicdaaeqaaaGccaGLOaGaayzkaaaacaGLBbGaayzxaaaaaa
% aa!4921!
\[
= \frac{1}
{{1 + \exp \left[ { - \left( {\omega _1 x_n + \omega _0 } \right)} \right]}}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqyYdC3aaS
% baaSqaaiaaigdaaeqaaOGaeyyyIO7aaSaaaeaadaqadaqaaiabeY7a
% TnaaBaaaleaacaaIXaaabeaakiabgkHiTiabeY7aTnaaBaaaleaaca
% aIYaaabeaaaOGaayjkaiaawMcaaaqaaiabeo8aZnaaCaaaleqabaGa
% aGOmaaaaaaGccaGG7aGaaCzcaiabeM8a3naaBaaaleaacaaIWaaabe
% aakiabggMi6oaalaaabaWaaeWaaeaacqaH8oqBdaWgaaWcbaGaaGOm
% aaqabaGcdaahaaWcbeqaaiaaikdaaaGccqGHsislcqaH8oqBdaWgaa
% WcbaGaaGymaaqabaGcdaahaaWcbeqaaiaaikdaaaaakiaawIcacaGL
% PaaaaeaacaaIYaGaeq4Wdm3aaWbaaSqabeaacaaIYaaaaaaaaaa!5809!
\[
\omega _1 \equiv \frac{{\left( {\mu _1 - \mu _2 } \right)}}
{{\sigma ^2 }}; & \omega _0 \equiv \frac{{\left( {\mu _2 ^2 - \mu _1 ^2 } \right)}}
{{2\sigma ^2 }}
\]

 


 

Assume now that the means Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaiWaaeaacq
% aH8oqBdaWgaaWcbaGaam4AaaqabaaakiaawUhacaGL9baaaaa!3AFA!
\[
{\left\{ {\mu _k } \right\}}
\]
are not known, and that we wish to infer them from the data Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaiWaaeaaca
% WG4bWaaSbaaSqaaiaad6gaaeqaaaGccaGL7bGaayzFaaWaa0baaSqa
% aiaad6gacqGH9aqpcaaIXaaabaGaamOtaaaaaaa!3DF8!
\[
\left\{ {x_n } \right\}_{n = 1}^N
\]
. (The standard deviation Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4Wdmhaaa!37B0!
\[
\sigma
\]
 is known.) In the remainder of this question we will derive an iterative algorithm for finding values for Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaiWaaeaacq
% aH8oqBdaWgaaWcbaGaam4AaaqabaaakiaawUhacaGL9baaaaa!3AFA!
\[
{\left\{ {\mu _k } \right\}}
\]
that maximize the likelihood,

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaWaaiWaaeaacaWG4bWaaSbaaSqaaiaad6gaaeqaaaGccaGL7bGa
% ayzFaaWaa0baaSqaaiaad6gacqGH9aqpcaaIXaaabaGaamOtaaaakm
% aaeeaabaWaaiWaaeaacqaH8oqBdaWgaaWcbaGaam4Aaaqabaaakiaa
% wUhacaGL9baacaGGSaGaeq4WdmhacaGLhWoaaiaawIcacaGLPaaacq
% GH9aqpdaqeqbqaaiaadcfadaqadaqaaiaadIhadaWgaaWcbaGaamOB
% aaqabaGcdaabbaqaamaacmaabaGaeqiVd02aaSbaaSqaaiaadUgaae
% qaaaGccaGL7bGaayzFaaGaaiilaiabeo8aZbGaay5bSdaacaGLOaGa
% ayzkaaaaleaacaWGUbaabeqdcqGHpis1aOGaaiOlaaaa!5BD3!
\[
P\left( {\left\{ {x_n } \right\}_{n = 1}^N \left| {\left\{ {\mu _k } \right\},\sigma } \right.} \right) = \prod\limits_n {P\left( {x_n \left| {\left\{ {\mu _k } \right\},\sigma } \right.} \right)} .
\]

Let L denote the natural log of the likelihood. Show that the derivative of the log likelihood with respect to Formula: \[{\mu _k }\] is given by

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaacq
% GHciITaeaacqGHciITcqaH8oqBdaWgaaWcbaGaam4AaaqabaaaaOGa
% amitaiabg2da9maaqafabaGaamiCamaaBaaaleaacaWGRbGaaiiFai
% aad6gaaeqaaOWaaSaaaeaadaqadaqaaiaadIhadaWgaaWcbaGaamOB
% aaqabaGccqGHsislcqaH8oqBdaWgaaWcbaGaam4AaaqabaaakiaawI
% cacaGLPaaaaeaacqaHdpWCdaahaaWcbeqaaiaaikdaaaaaaOGaaiil
% aaWcbaGaamOBaaqab0GaeyyeIuoaaaa!4F8E!
\[
\frac{\partial }
{{\partial \mu _k }}L = \sum\limits_n {p_{k|n} \frac{{\left( {x_n - \mu _k } \right)}}
{{\sigma ^2 }},}
\]

where Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiCamaaBa
% aaleaacaWGRbGaaiiFaiaad6gaaeqaaOGaeyyyIORaamiuamaabmaa
% baGaam4AamaaBaaaleaacaWGUbaabeaakiabg2da9iaadUgadaabba
% qaaiaadIhadaWgaaWcbaGaamOBaaqabaGccaGGSaGaaCiUdaGaay5b
% SdaacaGLOaGaayzkaaaaaa!47DF!
\[
p_{k|n} \equiv P\left( {k_n = k\left| {x_n ,{\mathbf{\theta }}} \right.} \right)
\]
appeared above.


Derivation:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaacq
% GHciITaeaacqGHciITcqaH8oqBdaWgaaWcbaGaam4AaaqabaaaaOGa
% ciiBaiaac6gacaWGqbWaaeWaaeaadaGadaqaaiaadIhadaWgaaWcba
% GaamOBaaqabaaakiaawUhacaGL9baadaqhaaWcbaGaamOBaiabg2da
% 9iaaigdaaeaacaWGobaaaOWaaqqaaeaadaGadaqaaiabeY7aTnaaBa
% aaleaacaWGRbaabeaaaOGaay5Eaiaaw2haaiaacYcacqaHdpWCaiaa
% wEa7aaGaayjkaiaawMcaaiabg2da9maalaaabaGaeyOaIylabaGaey
% OaIyRaeqiVd02aaSbaaSqaaiaadUgaaeqaaaaakmaaqafabaGaciiB
% aiaac6gacaWGqbWaaeWaaeaacaWG4bWaaSbaaSqaaiaad6gaaeqaaO
% WaaqqaaeaadaGadaqaaiabeY7aTnaaBaaaleaacaWGRbaabeaaaOGa
% ay5Eaiaaw2haaiaacYcacqaHdpWCaiaawEa7aaGaayjkaiaawMcaaa
% WcbaGaamOBaaqab0GaeyyeIuoaaaa!6A60!
\[
\frac{\partial }
{{\partial \mu _k }}\ln P\left( {\left\{ {x_n } \right\}_{n = 1}^N \left| {\left\{ {\mu _k } \right\},\sigma } \right.} \right) = \frac{\partial }
{{\partial \mu _k }}\sum\limits_n {\ln P\left( {x_n \left| {\left\{ {\mu _k } \right\},\sigma } \right.} \right)}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyypa0ZaaS
% aaaeaacqGHciITaeaacqGHciITcqaH8oqBdaWgaaWcbaGaam4Aaaqa
% baaaaOWaaabuaeaaciGGSbGaaiOBamaadmaabaWaaabuaeaacaWGWb
% WaaSbaaSqaaiaadUgaaeqaaOWaaSaaaeaacaaIXaaabaWaaOaaaeaa
% caaIYaGaeqiWdaNaeq4Wdm3aaWbaaSqabeaacaaIYaaaaaqabaaaaO
% GaciyzaiaacIhacaGGWbWaamWaaeaacqGHsisldaWcaaqaamaabmaa
% baGaeqiVd02aaSbaaSqaaiaadUgaaeqaaOGaeyOeI0IaamiEamaaBa
% aaleaacaWGUbaabeaaaOGaayjkaiaawMcaamaaCaaaleqabaGaaGOm
% aaaaaOqaaiaaikdacqaHdpWCdaahaaWcbeqaaiaaikdaaaaaaaGcca
% GLBbGaayzxaaaaleaacaWGRbaabeqdcqGHris5aaGccaGLBbGaayzx
% aaaaleaacaWGUbaabeqdcqGHris5aaaa!6080!
\[
= \frac{\partial }
{{\partial \mu _k }}\sum\limits_n {\ln \left[ {\sum\limits_k {p_k \frac{1}
{{\sqrt {2\pi \sigma ^2 } }}\exp \left[ { - \frac{{\left( {\mu _k - x_n } \right)^2 }}
{{2\sigma ^2 }}} \right]} } \right]}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyypa0Zaaa
% buaeaadaWadaqaamaalaaabaGaamiCamaaBaaaleaacaWGRbaabeaa
% kmaalaaabaGaaGymaaqaamaakaaabaGaaGOmaiabec8aWjabeo8aZn
% aaCaaaleqabaGaaGOmaaaaaeqaaaaakmaalaaabaWaaeWaaeaacaWG
% 4bWaaSbaaSqaaiaad6gaaeqaaOGaeyOeI0IaeqiVd02aaSbaaSqaai
% aadUgaaeqaaaGccaGLOaGaayzkaaaabaGaeq4Wdm3aaWbaaSqabeaa
% caaIYaaaaaaakiGacwgacaGG4bGaaiiCamaadmaabaGaeyOeI0YaaS
% aaaeaadaqadaqaaiabeY7aTnaaBaaaleaacaWGRbaabeaakiabgkHi
% TiaadIhadaWgaaWcbaGaamOBaaqabaaakiaawIcacaGLPaaadaahaa
% WcbeqaaiaaikdaaaaakeaacaaIYaGaeq4Wdm3aaWbaaSqabeaacaaI
% YaaaaaaaaOGaay5waiaaw2faaaqaamaaqafabaGaamiCamaaBaaale
% aacaWGRbaabeaakmaalaaabaGaaGymaaqaamaakaaabaGaaGOmaiab
% ec8aWjabeo8aZnaaCaaaleqabaGaaGOmaaaaaeqaaaaakiGacwgaca
% GG4bGaaiiCamaadmaabaGaeyOeI0YaaSaaaeaadaqadaqaaiabeY7a
% TnaaBaaaleaacaWGRbaabeaakiabgkHiTiaadIhadaWgaaWcbaGaam
% OBaaqabaaakiaawIcacaGLPaaadaahaaWcbeqaaiaaikdaaaaakeaa
% caaIYaGaeq4Wdm3aaWbaaSqabeaacaaIYaaaaaaaaOGaay5waiaaw2
% faaaWcbaGaam4Aaaqab0GaeyyeIuoaaaaakiaawUfacaGLDbaaaSqa
% aiaad6gaaeqaniabggHiLdaaaa!7CFE!
\[
= \sum\limits_n {\left[ {\frac{{p_k \frac{1}
{{\sqrt {2\pi \sigma ^2 } }}\frac{{\left( {x_n - \mu _k } \right)}}
{{\sigma ^2 }}\exp \left[ { - \frac{{\left( {\mu _k - x_n } \right)^2 }}
{{2\sigma ^2 }}} \right]}}
{{\sum\limits_k {p_k \frac{1}
{{\sqrt {2\pi \sigma ^2 } }}\exp \left[ { - \frac{{\left( {\mu _k - x_n } \right)^2 }}
{{2\sigma ^2 }}} \right]} }}} \right]}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabuaeaaca
% WGWbWaaSbaaSqaaiaadUgaaeqaaOWaaSaaaeaacaaIXaaabaWaaOaa
% aeaacaaIYaGaeqiWdaNaeq4Wdm3aaWbaaSqabeaacaaIYaaaaaqaba
% aaaOGaciyzaiaacIhacaGGWbWaamWaaeaacqGHsisldaWcaaqaamaa
% bmaabaGaeqiVd02aaSbaaSqaaiaadUgaaeqaaOGaeyOeI0IaamiEam
% aaBaaaleaacaWGUbaabeaaaOGaayjkaiaawMcaamaaCaaaleqabaGa
% aGOmaaaaaOqaaiaaikdacqaHdpWCdaahaaWcbeqaaiaaikdaaaaaaa
% GccaGLBbGaayzxaaaaleaacaWGRbaabeqdcqGHris5aOGaeyypa0Za
% aabuaeaacaWGqbWaaeWaaeaadaabcaqaaiaadIhadaWgaaWcbaGaam
% OBaaqabaaakiaawIa7aiaadUgadaWgaaWcbaGaamOBaaqabaGccqGH
% 9aqpcaWGPbGaaiilaiaahI7aaiaawIcacaGLPaaacaWGqbWaaeWaae
% aacaWGRbWaaSbaaSqaaiaad6gaaeqaaOGaeyypa0JaamyAaaGaayjk
% aiaawMcaaaWcbaGaamyAaaqab0GaeyyeIuoaaaa!6973!
\[
\sum\limits_k {p_k \frac{1}
{{\sqrt {2\pi \sigma ^2 } }}\exp \left[ { - \frac{{\left( {\mu _k - x_n } \right)^2 }}
{{2\sigma ^2 }}} \right]} = \sum\limits_i {P\left( {\left. {x_n } \right|k_n = i,{\mathbf{\theta }}} \right)P\left( {k_n = i} \right)}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiCamaaBa
% aaleaacaWGRbaabeaakmaalaaabaGaaGymaaqaamaakaaabaGaaGOm
% aiabec8aWjabeo8aZnaaCaaaleqabaGaaGOmaaaaaeqaaaaakiGacw
% gacaGG4bGaaiiCamaadmaabaGaeyOeI0YaaSaaaeaadaqadaqaaiab
% eY7aTnaaBaaaleaacaWGRbaabeaakiabgkHiTiaadIhadaWgaaWcba
% GaamOBaaqabaaakiaawIcacaGLPaaadaahaaWcbeqaaiaaikdaaaaa
% keaacaaIYaGaeq4Wdm3aaWbaaSqabeaacaaIYaaaaaaaaOGaay5wai
% aaw2faaiabg2da9iaadcfadaqadaqaaiaadUgadaWgaaWcbaGaamOB
% aaqabaGccqGH9aqpcaWGRbaacaGLOaGaayzkaaGaamiuamaabmaaba
% GaamiEamaaBaaaleaacaWGUbaabeaakmaaeeaabaGaam4AamaaBaaa
% leaacaWGUbaabeaakiabg2da9iaadUgacaGGSaGaaCiUdaGaay5bSd
% aacaGLOaGaayzkaaaaaa!6347!
\[
p_k \frac{1}
{{\sqrt {2\pi \sigma ^2 } }}\exp \left[ { - \frac{{\left( {\mu _k - x_n } \right)^2 }}
{{2\sigma ^2 }}} \right] = P\left( {k_n = k} \right)P\left( {x_n \left| {k_n = k,{\mathbf{\theta }}} \right.} \right)
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyO0H49aaS
% aaaeaacqGHciITaeaacqGHciITcqaH8oqBdaWgaaWcbaGaam4Aaaqa
% baaaaOGaciiBaiaac6gacaWGqbWaaeWaaeaadaGadaqaaiaadIhada
% WgaaWcbaGaamOBaaqabaaakiaawUhacaGL9baadaqhaaWcbaGaamOB
% aiabg2da9iaaigdaaeaacaWGobaaaOWaaqqaaeaadaGadaqaaiabeY
% 7aTnaaBaaaleaacaWGRbaabeaaaOGaay5Eaiaaw2haaiaacYcacqaH
% dpWCaiaawEa7aaGaayjkaiaawMcaaiabg2da9maaqafabaWaaSaaae
% aacaWGqbWaaeWaaeaacaWGRbWaaSbaaSqaaiaad6gaaeqaaOGaeyyp
% a0Jaam4AaaGaayjkaiaawMcaaiaadcfadaqadaqaaiaadIhadaWgaa
% WcbaGaamOBaaqabaGcdaabbaqaaiaadUgadaWgaaWcbaGaamOBaaqa
% baGccqGH9aqpcaWGRbGaaiilaiaahI7aaiaawEa7aaGaayjkaiaawM
% caaaqaamaaqafabaGaamiuamaabmaabaWaaqGaaeaacaWG4bWaaSba
% aSqaaiaad6gaaeqaaaGccaGLiWoacaWGRbWaaSbaaSqaaiaad6gaae
% qaaOGaeyypa0JaamyAaiaacYcacaWH4oaacaGLOaGaayzkaaGaamiu
% amaabmaabaGaam4AamaaBaaaleaacaWGUbaabeaakiabg2da9iaadM
% gaaiaawIcacaGLPaaaaSqaaiaadMgaaeqaniabggHiLdaaaaWcbaGa
% amOBaaqab0GaeyyeIuoakmaalaaabaWaaeWaaeaacaWG4bWaaSbaaS
% qaaiaad6gaaeqaaOGaeyOeI0IaeqiVd02aaSbaaSqaaiaadUgaaeqa
% aaGccaGLOaGaayzkaaaabaGaeq4Wdm3aaWbaaSqabeaacaaIYaaaaa
% aaaaa!89F6!
\[
\Rightarrow \frac{\partial }
{{\partial \mu _k }}\ln P\left( {\left\{ {x_n } \right\}_{n = 1}^N \left| {\left\{ {\mu _k } \right\},\sigma } \right.} \right) = \sum\limits_n {\frac{{P\left( {k_n = k} \right)P\left( {x_n \left| {k_n = k,{\mathbf{\theta }}} \right.} \right)}}
{{\sum\limits_i {P\left( {\left. {x_n } \right|k_n = i,{\mathbf{\theta }}} \right)P\left( {k_n = i} \right)} }}} \frac{{\left( {x_n - \mu _k } \right)}}
{{\sigma ^2 }}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyypa0Zaaa
% buaeaacaWGWbWaaSbaaSqaaiaadUgacaGG8bGaamOBaaqabaaabaGa
% amOBaaqab0GaeyyeIuoakmaalaaabaWaaeWaaeaacaWG4bWaaSbaaS
% qaaiaad6gaaeqaaOGaeyOeI0IaeqiVd02aaSbaaSqaaiaadUgaaeqa
% aaGccaGLOaGaayzkaaaabaGaeq4Wdm3aaWbaaSqabeaacaaIYaaaaa
% aaaaa!4840!
\[
= \sum\limits_n {p_{k|n} } \frac{{\left( {x_n - \mu _k } \right)}}
{{\sigma ^2 }}
\]


 

Assuming that Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4Wdmhaaa!37B0!
\[
\sigma
\]
=1, sketch a contour plot of the likelihood function as a function of mu1 and mu2 for the data set shown above. The data set consists of 32 points. Describe the peaks in your sketch and indicate their widths.


 

Solution:

We will be trying to plot the function

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaWaaiWaaeaacaWG4bWaaSbaaSqaaiaad6gaaeqaaaGccaGL7bGa
% ayzFaaWaa0baaSqaaiaad6gacqGH9aqpcaaIXaaabaGaamOtaaaakm
% aaeeaabaWaaiWaaeaacqaH8oqBdaWgaaWcbaGaam4Aaaqabaaakiaa
% wUhacaGL9baacaGGSaGaeq4WdmNaeyypa0JaaGymaaGaay5bSdaaca
% GLOaGaayzkaaaaaa!4B35!
\[
P\left( {\left\{ {x_n } \right\}_{n = 1}^32 \left| {\left\{ {\mu _k } \right\},\sigma = 1} \right.} \right)
\]

if we designate the function

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaaca
% aIXaaabaWaaOaaaeaacaaIYaGaeqiWdahaleqaaaaakiGacwgacaGG
% 4bGaaiiCamaabmaabaGaeyOeI0YaaSaaaeaadaqadaqaaiaadIhacq
% GHsislcqaH8oqBaiaawIcacaGLPaaadaahaaWcbeqaaiaaikdaaaaa
% keaacaaIYaaaaaGaayjkaiaawMcaaaaa!458F!
\[
{\frac{1}
{{\sqrt {2\pi } }}\exp \left( { - \frac{{\left( {x - \mu } \right)^2 }}
{2}} \right)}
\]

as p[x,mu] (remember that Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4Wdmhaaa!37B0!
\[
\sigma
\]
=1 and  Formula: \[\frac{1}{{\sqrt {2\pi } }} = {\text{0}}{\text{.3989422804014327}}\]),

Formula: % MathType!MTEF!2!1!+-<br />
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br />
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br />
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br />
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br />
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGqb<br />
% WaaeWaaeaacaWG4bGaaiiFamaacmaabaGaeqiVd0gacaGL7bGaayzF<br />
% aaGaaiilaiabeo8aZbGaayjkaiaawMcaaiabg2da9maadmaabaWaaa<br />
% bCaeaadaqadaqaaiaadchadaWgaaWcbaGaam4AaaqabaGccqGH9aqp<br />
% caGGUaGaaGynaaGaayjkaiaawMcaamaalaaabaGaaGymaaqaamaaka<br />
% aabaGaaGOmaiabec8aWnaabmaabaGaeq4Wdm3aaWbaaSqabeaacaaI<br />
% YaaaaOGaeyypa0JaaGymamaaCaaaleqabaGaaGOmaaaaaOGaayjkai<br />
% aawMcaaaWcbeaaaaGcciGGLbGaaiiEaiaacchadaqadaqaaiabgkHi<br />
% TmaalaaabaWaaeWaaeaacaWG4bGaeyOeI0IaeqiVd02aaSbaaSqaai<br />
% aadUgaaeqaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacaaIYaaaaaGc<br />
% baGaaGOmaiabeo8aZnaaCaaaleqabaGaaGOmaaaaaaaakiaawIcaca<br />
% GLPaaaaSqaaiaadUgacqGH9aqpcaaIXaaabaGaaGOmaaqdcqGHris5<br />
% aaGccaGLBbGaayzxaaaabaGaeyypa0JaaiOlaiaaiwdadaqadaqaai<br />
% aahchadaWadaqaaiaadIhacaGGSaGaamyBaiaadwhacaaIXaaacaGL<br />
% BbGaayzxaaGaey4kaSIaaCiCamaadmaabaGaamiEaiaacYcacaWGTb<br />
% GaamyDaiaaikdaaiaawUfacaGLDbaaaiaawIcacaGLPaaacqGHHjIU<br />
% caWHWbGaaCiCamaadmaabaGaamiEaiaacYcacaWGTbGaamyDaiaaig<br />
% dacaGGSaGaamyBaiaadwhacaaIYaaacaGLBbGaayzxaaaaaaa!8AA0!<br />
\[<br />
\begin{gathered}<br />
  P\left( {x|\left\{ \mu  \right\},\sigma } \right) = \left[ {\sum\limits_{k = 1}^2 {\left( {p_k  = .5} \right)\frac{1}<br />
{{\sqrt {2\pi \left( {\sigma ^2  = 1^2 } \right)} }}\exp \left( { - \frac{{\left( {x - \mu _k } \right)^2 }}<br />
{{2\sigma ^2 }}} \right)} } \right] \hfill \\<br />
   = .5\left( {{\mathbf{p}}\left[ {x,mu1} \right] + {\mathbf{p}}\left[ {x,mu2} \right]} \right) \equiv {\mathbf{pp}}\left[ {x,mu1,mu2} \right] \hfill \\ <br />
\end{gathered} <br />
\]

Formula: % MathType!MTEF!2!1!+-<br />
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br />
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br />
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br />
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br />
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGqb<br />
% WaaeWaaeaadaGadaqaaiaadIhadaWgaaWcbaGaamOBaaqabaaakiaa<br />
% wUhacaGL9baadaqhaaWcbaGaamOBaiabg2da9iaaigdaaeaacaWGob<br />
% aaaOWaaqqaaeaadaGadaqaaiabeY7aTnaaBaaaleaacaWGRbaabeaa<br />
% aOGaay5Eaiaaw2haaiaacYcacqaHdpWCaiaawEa7aaGaayjkaiaawM<br />
% caaiabg2da9maarafabaGaamiuamaabmaabaGaamiEamaaBaaaleaa<br />
% caWGUbaabeaakmaaeeaabaWaaiWaaeaacqaH8oqBdaWgaaWcbaGaam<br />
% 4AaaqabaaakiaawUhacaGL9baacaGGSaGaeq4WdmhacaGLhWoaaiaa<br />
% wIcacaGLPaaaaSqaaiaad6gaaeqaniabg+GivdaakeaacqGH9aqpda<br />
% qeqbqaaiaahchacaWHWbWaamWaaeaacaWG4bGaaiilaiaad2gacaWG<br />
% 1bGaaGymaiaacYcacaWGTbGaamyDaiaaikdaaiaawUfacaGLDbaaaS<br />
% qaaiaad6gaaeqaniabg+GivdGccqGHHjIUcaWHWbGaaCiCaiaahcha<br />
% daWadaqaaiaadIhacaGGSaGaamyBaiaadwhacaaIXaGaaiilaiaad2<br />
% gacaWG1bGaaGOmaaGaay5waiaaw2faaaaaaa!791F!<br />
\[<br />
\begin{gathered}<br />
  P\left( {\left\{ {x_n } \right\}_{n = 1}^N \left| {\left\{ {\mu _k } \right\},\sigma } \right.} \right) = \prod\limits_n {P\left( {x_n \left| {\left\{ {\mu _k } \right\},\sigma } \right.} \right)}  \hfill \\<br />
   = \prod\limits_n {{\mathbf{pp}}\left[ {x,mu1,mu2} \right]}  \equiv {\mathbf{ppp}}\left[ {x,mu1,mu2} \right] \hfill \\ <br />
\end{gathered} <br />
\]

then we have:

Formula: % MathType!MTEF!2!1!+-<br />% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br />% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br />% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br />% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br />% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGqb<br />% WaaeWaaeaadaGadaqaaiaadIhadaWgaaWcbaGaamOBaaqabaaakiaa<br />% wUhacaGL9baadaqhaaWcbaGaamOBaiabg2da9iaaigdaaeaacaWGob<br />% aaaOWaaqqaaeaadaGadaqaaiabeY7aTnaaBaaaleaacaWGRbaabeaa<br />% aOGaay5Eaiaaw2haaiaacYcacqaHdpWCaiaawEa7aaGaayjkaiaawM<br />% caaiabg2da9maarafabaGaamiuamaabmaabaGaamiEamaaBaaaleaa<br />% caWGUbaabeaakmaaeeaabaWaaiWaaeaacqaH8oqBdaWgaaWcbaGaam<br />% 4AaaqabaaakiaawUhacaGL9baacaGGSaGaeq4WdmhacaGLhWoaaiaa<br />% wIcacaGLPaaaaSqaaiaad6gaaeqaniabg+GivdaakeaacqGH9aqpda<br />% qeqbqaaiaahchacaWHWbWaamWaaeaacaWG4bGaaiilaiaad2gacaWG<br />% 1bGaaGymaiaacYcacaWGTbGaamyDaiaaikdaaiaawUfacaGLDbaaaS<br />% qaaiaad6gaaeqaniabg+GivdGccqGHHjIUcaWHWbGaaCiCaiaahcha<br />% daWadaqaaiaadIhacaGGSaGaamyBaiaadwhacaaIXaGaaiilaiaad2<br />% gacaWG1bGaaGOmaaGaay5waiaaw2faaaaaaa!791F!<br />\[<br />\begin{gathered}<br />  P\left( {\left\{ {x_n } \right\}_{n = 1}^N \left| {\left\{ {\mu _k } \right\},\sigma } \right.} \right) = \prod\limits_n {P\left( {x_n \left| {\left\{ {\mu _k } \right\},\sigma } \right.} \right)}  \hfill \\<br />   = \prod\limits_n {{\mathbf{pp}}\left[ {x,mu1,mu2} \right]}  \equiv {\mathbf{ppp}}\left[ {x,mu1,mu2} \right] \hfill \\ <br />\end{gathered} <br />\]

And in Mathematica, these mean:

mx=Join[N[Range[0,2,2/15]],N[Range[4,6,2/15]]]
Length[mx]
ListPlot[Table[{mx[[i]],1},{i,1,32}]]

p[x_,mu_]:=0.3989422804014327` * Exp[-(mu-x)^2/2];
pp[x_,mu1_,mu2_]:=.5 (p[x,mu1]+p[x,mu2]);
ppp[xx_,mu1_,mu2_]:=Module[
{ptot=1},
For[i=1,i<=Length[xx],i++,
ppar = pp[xx[[i]],mu1,mu2];
ptot *= ppar;
(*Print[xx[[i]],"\t",ppar];*)
];
Return[ptot];
];

Plot3D[ppp[mx,mu1,mu2],{mu1,0,6},{mu2,0,6},PlotRange->{0,10^-25}];

ContourPlot[ppp[mx,mu1,mu2],{mu1,0,6},{mu2,0,6},{PlotRange->{0,10^-25},ContourLines->False,PlotPoints->250}];(*It may take a while with PlotPoints->250, so just begin with PlotPoints->25 *)

That’s all folks! (for today I guess 8) (and also, I know that I said next entry would be about the soft K-means two entries ago, but believe me, we’re coming to that, eventually 😉

Attachments: Mathematica notebook for this entry, MSWord Document (actually this one is intended for me, because in the future I may need them again)

]]>
http://www.emresururi.com/physics/?feed=rss2&p=52 0
Likelihood of Gaussian(s) – Scrap Notes http://www.emresururi.com/physics/?p=51 http://www.emresururi.com/physics/?p=51#respond Mon, 03 Dec 2007 16:04:30 +0000 http://www.emresururi.com/physics/?p=51 Given a set of N data x, Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaiWaaeaaca
% WG4baacaGL7bGaayzFaaWaa0baaSqaaiaadMgacqGH9aqpcaaIXaaa
% baGaamOtaaaaaaa!3CCA!
\[
{\left\{ x \right\}_{i = 1}^N }
\],  the optimal parameters for a Gaussian Probability Distribution Function defined as:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaciiBaiaac6
% gadaWadaqaaiGaccfadaqadaqaamaaeiaabaWaaiWaaeaacaWG4baa
% caGL7bGaayzFaaWaa0baaSqaaiaadMgacqGH9aqpcaaIXaaabaGaam
% OtaaaaaOGaayjcSdGaeqiVd0Maaiilaiabeo8aZbGaayjkaiaawMca
% aaGaay5waiaaw2faaiabg2da9iabgkHiTiaad6eaciGGSbGaaiOBam
% aabmaabaWaaOaaaeaacaaIYaGaeqiWdahaleqaaOGaeq4WdmhacaGL
% OaGaayzkaaGaeyOeI0YaaSGbaeaadaWadaqaaiaad6eadaqadaqaai
% abeY7aTjabgkHiTiqadIhagaqeaaGaayjkaiaawMcaamaaCaaaleqa
% baGaaGOmaaaakiabgUcaRiaadofaaiaawUfacaGLDbaaaeaacaaIYa
% Gaeq4Wdm3aaWbaaSqabeaacaaIYaaaaaaaaaa!627A!
\[
\ln \left[ {\operatorname{P} \left( {\left. {\left\{ x \right\}_{i = 1}^N } \right|\mu ,\sigma } \right)} \right] = - N\ln \left( {\sqrt {2\pi } \sigma } \right) - {{\left[ {N\left( {\mu - \bar x} \right)^2 + S} \right]} \mathord{\left/
{\vphantom {{\left[ {N\left( {\mu - \bar x} \right)^2 + S} \right]} {2\sigma ^2 }}} \right.
\kern-\nulldelimiterspace} {2\sigma ^2 }}
\]

are:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaiWaaeaacq
% aH8oqBcaGGSaGaeq4WdmhacaGL7bGaayzFaaWaaSbaaSqaaiaad2ea
% caWGHbGaamiEaiaadMgacaWGTbGaamyDaiaad2gacaWGmbGaamyAai
% aadUgacaWGLbGaamiBaiaadMgacaWGObGaam4Baiaad+gacaWGKbaa
% beaakiabg2da9maacmaabaGabmiEayaaraGaaiilamaakaaabaWaaS
% GbaeaacaWGtbaabaGaamOtaaaaaSqabaaakiaawUhacaGL9baaaaa!5316!
\[
\left\{ {\mu ,\sigma } \right\}_{MaximumLikelihood} = \left\{ {\bar x,\sqrt {{S \mathord{\left/
{\vphantom {S N}} \right.
\kern-\nulldelimiterspace} N}} } \right\}
\]

with the definitions

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmiEayaara
% Gaeyypa0ZaaSaaaeaadaaeWbqaaiaadIhadaWgaaWcbaGaamOBaaqa
% baaabaGaamOBaiabg2da9iaaigdaaeaacaWGobaaniabggHiLdaake
% aacaWGobaaaiaacYcacaWLjaGaam4uaiabg2da9maaqahabaWaaeWa
% aeaacaWG4bWaaSbaaSqaaiaad6gaaeqaaOGaeyOeI0IabmiEayaara
% aacaGLOaGaayzkaaWaaWbaaSqabeaacaaIYaaaaaqaaiaad6gacqGH
% 9aqpcaaIXaaabaGaamOtaaqdcqGHris5aaaa!5057!
\[
\bar x = \frac{{\sum\limits_{n = 1}^N {x_n } }}
{N}, & S = \sum\limits_{n = 1}^N {\left( {x_n - \bar x} \right)^2 }
\]

Let’s see this in an example:

Define the data set mx: 
mx={1,7,9,10,15}

Calculate the optimal mu and sigma:
dN=Length[mx];
mu=Sum[mx[[i]]/dN,{i,1,dN}];
sig =Sqrt[Sum[(mx[[i]]-mu)^2,{i,1,dN}]/dN];
Print["mu = ",N[mu]];
Print["sigma = ",N[sig]];

Now, let’s see this Gaussian Distribution Function:
<<Statistics`NormalDistribution`
ndist=NormalDistribution[mu,sig];

<<Graphics`MultipleListPlot`
MultipleListPlot[Table[{x,PDF[NormalDistribution[mu,sig],x]}, {x,0,20,.04}],Table[{mx[[i]], PDF[NormalDistribution[mu,sig],mx[[i]]]},{i,5}], {PlotRange->{Automatic,{0,.1}},PlotJoined->{False,False}, SymbolStyle->{GrayLevel[.8],GrayLevel[0]}}]

]]>
http://www.emresururi.com/physics/?feed=rss2&p=51 0
K-means Clustering – the hard way… http://www.emresururi.com/physics/?p=50 http://www.emresururi.com/physics/?p=50#respond Mon, 03 Dec 2007 10:23:50 +0000 http://www.emresururi.com/physics/?p=50 Clustering (or grouping) items using a defined similarity of selected properties is an effective way to classify the data, but also (and probably more important) aspect of clustering is to pick up abnormal pieces that are different from the other clusterable "herd". These exceptions may be the ones that we’re looking (or unlooking) for!

There is this simple clustering method called K-means Clustering. Suppose that you have N particles (atoms/people/books/shapes) defined by 3 properties (coordinates/{height, weight, eye color}/{title,number of sold copies,price}/{number of sides,color,size}). You want to classify them into K groups. A way of doing this involves the K-means Clustering algorithm. I have listed some examples to various items & their properties, but from now on, for clarity and easy visionalizing, I will stick to the first one (atoms with 3 coordinates, and while we’re at it, let’s take the coordinates as Cartesian coordinates).

The algorithm is simple: You introduce K means into the midst of the N items. Then apply the distance criteria for each means upon the data and relate the item to the nearest mean to it. Store this relations between the means and the items using a responsibility list (array, vector) for each means of size (dimension) N (number of items).

Let’s define some notation at this point:

  • Designate the position of the kth mean by mk.
  • Designate the position of the jth item by xj.
  • Designate the responsibility list of the kth mean by rk such that the responsibility between that kth mean and the jth item is stored in the jth element of rk. If  rk[j] is equal to 1, then it means that this mean is the closest mean to the item and 0 means that, some other mean is closer to this item so, it is not included in this mean’s responsibility.

The algorithm can be built in 3 steps:

0. Ä°nitialize the means. (Randomly or well calculated)

1. Analyze: Check the distances between the means and the items. Assign the items to their nearest mean’s responsibility list.

2. Move: Move the means to the center of mass of the items each is responsible for. (If a mean has no item it is responsible for, then just leave it where it is.)

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaCyBamaaBa
% aaleaacaWGRbaabeaakiabg2da9maalaaabaWaaabCaeaacaWGYbWa
% aSbaaSqaaiaadUgaaeqaaOWaamWaaeaacaWGPbaacaGLBbGaayzxaa
% GaeyyXICTaaCiEamaaBaaaleaacaWGPbaabeaaaeaacaWGPbGaeyyp
% a0JaaGymaaqaaiaad6eaa0GaeyyeIuoaaOqaamaaqahabaGaamOCam
% aaBaaaleaacaWGRbaabeaakmaadmaabaGaamyAaaGaay5waiaaw2fa
% aaWcbaGaamyAaiabg2da9iaaigdaaeaacaWGobaaniabggHiLdaaaa
% aa!5305!
\[
{\mathbf{m}}_k = \frac{{\sum\limits_{i = 1}^N {r_k \left[ i \right] \cdot {\mathbf{x}}_i } }}
{{\sum\limits_{i = 1}^N {r_k \left[ i \right]} }}
\]

3. Check: If one or more mean’s position has changed, then go to 1, if not exit.

That’s all about it. I have written a C++ code for this algorithm. It employs an object called kmean (can you guess what it is for? ) which has the properties and methods as:

class kmean
{
 private:
  unsigned int id,n;//"id" is for the unique identifier and "n" is the total number of items in the world.
  double x,y,z;//coordinates of the mean
  gsl_vector * r;//responsibility list of the mean
  char buffer [150];//some dummy variable to use in the "whereis" method
  unsigned int r_tott;//total number of data points this mean is responsible for

 public:
  kmean(int ID,double X, double Y, double Z,unsigned int N);
  ~kmean();
  //kmean self(void);
  int move(double X, double Y, double Z);//moves the mean to the point(X,Y,Z)
  int calculate_distances(gsl_matrix *MatrixIN, int N, gsl_matrix *distances);//calculates the distances of the "N" items with their positions defined in "MatrixIN" and writes these values to the "distances" matrix
  char * whereis(void);//returns the position of the mean in a formatted string
  double XX(void) {return x;}//returns the x-component of the mean’s position
  double YY(void) {return y;}//returns the y-component of the mean’s position
  double ZZ(void) {return z;}//returns the z-component of the mean’s position
  int II(void) {return id;}//returns the mean’s id
  int NN(void) {return n;}//returns the number of items in the world (as the mean knows it)
  int RR(void);//prints out the responsibility list (used for debugging purposes actually)
  gsl_vector * RRV(void) {return r;}//returns the responsibility list
  int set_r(int index, bool included);//sets the "index"th element of the responsibility list as the bool "included". (ie, registers the "index". item as whether it belongs or not belongs to the mean)
  int moveCoM(gsl_matrix *MatrixIN);//moves the mean to the center of mass of the items it is responsible for
  int RTOT(void) {return r_tott;}//returns the total number of items the mean is responsible for
  };
int check_distances(gsl_matrix* distances, int numberOfWorldElements, kmean **means);//checks the "distances" matrix elements for the means in the "means" array and registers each item in the responsibility list of the nearest mean.
double distance(double a1, double a2,double a3,double b1,double b2,double b3);//distance criteria is defined with this function

Now that we have the essential kmean object, the rest of the code is actually just details. Additionaly, I’ve included a random world generator (and by world, I mean N items and their 3 coordinates, don’t be misled). So essentialy, in the main code, we have the following lines for the analyze section:

  for (i=0;i<dK;i++)
   means[i]->calculate_distances(m,dN,distances);
  
  check_distances(distances,dN,means);

which first calculates the distances between each mean (defined as elements of the means[] array) and item (defined as the rows of the m matrice with dNx4 dimension (1st col: id, cols 2,3,4: coordinates) and writes the distances into the distances matrice (of dimensions dNxdK) via the calculate_distances(…) method. dN is the total number of items and dK is the total number of means. Then analyzes the distances matrice, finds the minimum of each row (because rows represent the items and columns represent the means) and registers the item as 1 in the nearest means’s responsibility list, and as 0 in the other means’ lists via the check_distances(…) method.

Then, the code executes its move section:

  for(i=0; i<dK; i++)
  {
   means[i]->moveCoM(m);
   […]
  }

which simply moves the means to the center of mass of the items they’re responsible for. This section is followed by the checkSufficiency section which simply decides if we should go for one more iteration or just call it quits.

  counter++;
  if(counter>=dMaxCounter) {f_maxed = true;goto finalize;}

  gsl_matrix_sub(prevMeanPositions,currMeanPositions);
  if(gsl_matrix_isnull(prevMeanPositions)) goto finalize;
  […]
  goto analyze;

as you can see, it checks for two things: whether a predefined iteration limit has been reached or, if at least one (two 😉 of the means moved in the last iteration. finalize is the section where some tidying up is processed (freeing up the memory, closing of the open files, etc…)

The code uses two external libraries: the Gnu Scientific Library (GSL) and the Templatized C++ Command Line Parser Library (TCLAP). While GSL is essential for the matrix operations and the structs I have intensely used, TCLAP is used for a practical and efficient solution considering the argument parsing. If you have enough reasons for reading this blog and this entry up to this line and (a tiny probability but, anyway) have not heard about GSL before, you MUST check it out. For TCLAP, I can say that, it has become a habit for me since it is very easy to use. About what it does, it enables you to enable the external variables that may be defined in the command execution even if you know nothing about argument parsing. I strictly recommend it (and thank to Michael E. Smoot again 8).

The program can be invoked with the following arguments:

D:\source\cpp\Kmeans>kmeans –help

USAGE:

   kmeans  -n <int> -k <int> [-s <unsigned long int>] [-o <output file>]
           [-m <int>] [-q] [–] [–version] [-h]

Where:

   -n <int>,  –numberofelements <int>
     (required)  total number of elements (required)

   -k <int>,  –numberofmeans <int>
     (required)  total number of means (required)

   -s <unsigned long int>,  –seed <unsigned long int>
     seed for random number generator (default: timer)

   -o <output file>,  –output <output file>
     output file basename (default: "temp")

   -m <int>,  –maxcounter <int>
     maximum number of steps allowed (default: 30)

   -q,  –quiet
     no output to screen is generated

   –,  –ignore_rest
     Ignores the rest of the labeled arguments following this flag.

   –version
     Displays version information and exits.

   -h,  –help
     Displays usage information and exits.

   K-Means by Emre S. Tasci, <…@tudelft.nl> 2007

 The code outputs various files:

<project-name>.means.txt : Coordinates of the means evolution in the iterations
<project-name>.meansfinal.txt : Final coordinates of the means
<project-name>.report : A summary of the process
<project-name>.world.dat : The "world" matrice in Mathematica Write format
<project-name>.world.txt : Coordinates of the items (ie, the "world" matrice in XYZ format)
<project-name>-mean-#-resp.txt : Coordinates of the items belonging to the #. mean

and one last remark to those who are not that much accustomed to coding in C++: Do not forget to include the "include" directories! 

Check the following command and modify it according to your directory locations (I do it in the Windows-way 🙂

g++ -o kmeans.exe kmeans.cpp -I c:\GnuWin32\include  -L c:\GnuWin32\lib -lgsl -lgslcblas -lm

(If you are not keen on modifying the code and your operating system is Windows, you also have the option to download the compiled binary from here)

The source files can be obtained from here.

And, the output of an example with 4000 items and 20 means can be obtained here, while the output of a tinier example with 100 items and 2 means can be obtained here.

I’m planning to continue with the soft version of the K-Means algorithm for the next time…

References: Information Theory, Inference, and Learning Algorithms by David MacKay

]]>
http://www.emresururi.com/physics/?feed=rss2&p=50 0
Bayesian Probabilities & Inference – Another example http://www.emresururi.com/physics/?p=46 http://www.emresururi.com/physics/?p=46#respond Tue, 27 Nov 2007 10:05:32 +0000 http://www.emresururi.com/physics/?p=46 Yesterday, I told my battalion-buddy, Andy about the pregnancy example and, he recalled another example, with the shuffling of the three cups and a ball in one of them. The situation is proposed like this:

The dealer puts the ball in one of the cups and shuffles it so fast that at one point you lose the track and absolutely have no idea which cup has the ball. So, you pick a cup randomly (let’s say the first cup) but for the moment, don’t look inside. At this point, the dealer says, he will remove one of the remaining two cups and he guarantees you that, the one he removes does not have the ball. He even bestows you the opportunity to change your mind and pick the other one if you like. The question is this: Should you

a) Stick to the one you’ve selected or,

b) Switch to the other one or,

c) (a) or (b) – what does it matter?

Today, while I was studying MacKay’s book Theory, Inference, and Learning Algorithms, I came upon to the same example and couldn’t refrain myself from including it on this blog.

First, let’s solve the problem with simple notation. Let’s say that, initially we have picked the first cup, the probability that this cup has the ball is 1/3. There are two possibilities: Whether in reality the 1st cup has the ball indeed, or not.

i) 1st cup has the ball (1/3 of all the times): With this situation, it doesn’t matter which of the two cups remaining is removed.

ii) 1st cup doesn’t have the ball (2/3 of all the times): This time, the dealer removes the one, and the remaining cup has the ball.

So, to summarize, left with the two cups, there’s a 1/3 probability that the ball is within the cup we have chosen in the beginning while there’s the 2/3 probability that the ball is within the other cup. So, switching is the advantegous (did I spell it correctly? Don’t think so.. ) / To put it in other words: option (b) must be preffered to option (a) and actually, option (c) is wrong.

Now, let’s compute the same thing with the formal notations:

Let Hi be define the situation that the ball is in cup i. D denotes the removed cup (either 2 or 3). Now, initially the ball can be in any of the three cups with equal probabilities so

Formula: % MathType!MTEF!2!1!+-<br />
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br />
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br />
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br />
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br />
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm<br />
% aabaGaamisamaaBaaaleaacaaIXaaabeaaaOGaayjkaiaawMcaaiab<br />
% g2da9iaadcfadaqadaqaaiaadIeadaWgaaWcbaGaaGymaaqabaaaki<br />
% aawIcacaGLPaaacqGH9aqpcaWGqbWaaeWaaeaacaWGibWaaSbaaSqa<br />
% aiaaigdaaeqaaaGccaGLOaGaayzkaaGaeyypa0ZaaSqaaSqaaiaaig<br />
% daaeaacaaIZaaaaaaa!46E7!<br />
\[<br />
P\left( {H_1 } \right) = P\left( {H_1 } \right) = P\left( {H_1 } \right) = \tfrac{1}<br />
{3}<br />
\]<br />

Now, since we have selected the 1st cup, either 2nd or the 3rd cup will be removed. And we have three cases for the ball position:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaqGaaeaafa
% qabeGabaaabaGaamiuaiaacIcacaWGebGaeyypa0JaaGOmaiaacYha
% caWGibWaaSbaaSqaaiaaigdaaeqaaOGaaiykaiabg2da9maaleaale
% aacaaIXaaabaGaaGOmaaaaaOqaaiaadcfacaGGOaGaamiraiabg2da
% 9iaaiodacaGG8bGaamisamaaBaaaleaacaaIXaaabeaakiaacMcacq
% GH9aqpdaWcbaWcbaGaaGymaaqaaiaaikdaaaaaaaGccaGLiWoafaqa
% beGabaaabaGaamiuaiaacIcacaWGebGaeyypa0JaaGOmaiaacYhaca
% WGibWaaSbaaSqaaiaaikdaaeqaaOGaaiykaiabg2da9iaaicdaaeaa
% caWGqbGaaiikaiaadseacqGH9aqpcaaIZaGaaiiFaiaadIeadaWgaa
% WcbaGaaGOmaaqabaGccaGGPaGaeyypa0JaaGymaaaadaabbaqaauaa
% beqaceaaaeaacaWGqbGaaiikaiaadseacqGH9aqpcaaIYaGaaiiFai
% aadIeadaWgaaWcbaGaaG4maaqabaGccaGGPaGaeyypa0JaaGymaaqa
% aiaadcfacaGGOaGaamiraiabg2da9iaaiodacaGG8bGaamisamaaBa
% aaleaacaaIZaaabeaakiaacMcacqGH9aqpcaaIWaaaaaGaay5bSdaa
% aa!7259!
\[
\left. {\begin{array}{*{20}c}
{P(D = 2|H_1 ) = \tfrac{1}
{2}} \\
{P(D = 3|H_1 ) = \tfrac{1}
{2}} \\
\end{array} } \right|\begin{array}{*{20}c}
{P(D = 2|H_2 ) = 0} \\
{P(D = 3|H_2 ) = 1} \\
\end{array} \left| {\begin{array}{*{20}c}
{P(D = 2|H_3 ) = 1} \\
{P(D = 3|H_3 ) = 0} \\
\end{array} } \right.
\]

The first column represents that the ball is in our selected cup, so it doesn’t matter which of the remaining cups the dealer will remove. Each of the remaining cup has the same probability to be removed which is 0.5. The second column is the situation when the ball is in the second cup. Since it is in the second cup, the dealer will be forced to remove the third cup (ie P(D=3|H2)=1 ).Third column is similar to the second column.

Let’s suppose that, the dealer removed the third cup. With the 3rd cup removed, the probability that the ball is in the i. cup is given by: (remember that Hi represents the situation that the ball is in the i. cup)

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiuamaabm
% aabaGaamisamaaBaaaleaacaWGPbaabeaakiaacYhacaWGebGaeyyp
% a0JaaG4maaGaayjkaiaawMcaaiabg2da9maalaaabaGaamiuamaabm
% aabaGaamiraiabg2da9iaaiodacaGG8bGaamisamaaBaaaleaacaWG
% PbaabeaaaOGaayjkaiaawMcaaiaadcfadaqadaqaaiaadIeadaWgaa
% WcbaGaamyAaaqabaaakiaawIcacaGLPaaaaeaacaWGqbWaaeWaaeaa
% caWGebGaeyypa0JaaG4maaGaayjkaiaawMcaaaaaaaa!4FF2!
\[
P\left( {H_i |D = 3} \right) = \frac{{P\left( {D = 3|H_i } \right)P\left( {H_i } \right)}}
{{P\left( {D = 3} \right)}}
\]

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaqGaaeaaca
% WGqbWaaeWaaeaacaWGibWaaSbaaSqaaiaaigdaaeqaaOGaaiiFaiaa
% dseacqGH9aqpcaaIZaaacaGLOaGaayzkaaGaeyypa0ZaaSaaaeaada
% WcbaWcbaGaaGymaaqaaiaaikdaaaGccqGHflY1daWcbaWcbaGaaGym
% aaqaaiaaiodaaaaakeaacaWGqbWaaeWaaeaacaWGebGaeyypa0JaaG
% 4maaGaayjkaiaawMcaaaaaaiaawIa7aiaadcfadaqadaqaaiaadIea
% daWgaaWcbaGaaGOmaaqabaGccaGG8bGaamiraiabg2da9iaaiodaai
% aawIcacaGLPaaacqGH9aqpdaWcaaqaaiaaigdacqGHflY1daWcbaWc
% baGaaGymaaqaaiaaiodaaaaakeaacaWGqbWaaeWaaeaacaWGebGaey
% ypa0JaaG4maaGaayjkaiaawMcaaaaadaabbaqaaiaadcfadaqadaqa
% aiaadIeadaWgaaWcbaGaaGOmaaqabaGccaGG8bGaamiraiabg2da9i
% aaiodaaiaawIcacaGLPaaacqGH9aqpdaWcaaqaaiaaicdacqGHflY1
% daWcbaWcbaGaaGymaaqaaiaaiodaaaaakeaacaWGqbWaaeWaaeaaca
% WGebGaeyypa0JaaG4maaGaayjkaiaawMcaaaaaaiaawEa7aaaa!70DB!
\[
\left. {P\left( {H_1 |D = 3} \right) = \frac{{\tfrac{1}
{2} \cdot \tfrac{1}
{3}}}
{{P\left( {D = 3} \right)}}} \right|P\left( {H_2 |D = 3} \right) = \frac{{1 \cdot \tfrac{1}
{3}}}
{{P\left( {D = 3} \right)}}\left| {P\left( {H_2 |D = 3} \right) = \frac{{0 \cdot \tfrac{1}
{3}}}
{{P\left( {D = 3} \right)}}} \right.
\]

for our purposes, P(D=3) is not relevant, because we are looking for the comparison of the first two probabilities (P(D=3) = 0.5, by the way since it is either D=2 or D=3).

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaaca
% WGqbWaaeWaaeaacaWGibWaaSbaaSqaaiaaigdaaeqaaOGaaiiFaiaa
% dseacqGH9aqpcaaIZaaacaGLOaGaayzkaaaabaGaamiuamaabmaaba
% GaamisamaaBaaaleaacaaIYaaabeaakiaacYhacaWGebGaeyypa0Ja
% aG4maaGaayjkaiaawMcaaaaacqGH9aqpdaWcaaqaaiaaigdaaeaaca
% aIYaaaaaaa!47DB!
\[
\frac{{P\left( {H_1 |D = 3} \right)}}
{{P\left( {H_2 |D = 3} \right)}} = \frac{1}
{2}
\]

meaning that it is 2 times more likely that the ball is in the other cup than the one we initially had selected.

Like John, Jo’s funny boyfriend in the previous bayesian example, used in context to show the unlikeliness of the "presumed" logical derivation, MacKay introduces the concept of one million cups instead of the three. Suppose you have selected one cup among the one million cups. Then, the dealer removes 999,998 cups and you’re left with the one you’ve initially chosen and the 432,238th cup. Would you now switch, or stick to the one you had chosen? 

]]>
http://www.emresururi.com/physics/?feed=rss2&p=46 0
Bayesian Inference – An introduction with an example http://www.emresururi.com/physics/?p=44 http://www.emresururi.com/physics/?p=44#comments Mon, 26 Nov 2007 12:35:36 +0000 http://www.emresururi.com/physics/?p=44 Suppose that Jo (of Example 2.3 in MacKay’s previously mentioned book), decided to take a test to see whether she’s pregnant or not. She applies a test that is 95% reliable, that is if she’s indeed pregnant, than there is a 5% chance that the test will result otherwise and if she’s indeed NOT pregnant, the test will tell her that she’s pregnant again by a 5% chance (The other two options are test concluding as positive on pregnancy when she’s indeed pregnant by 95% of all the time, and test reports negative on pregnancy when she’s actually not pregnant by again 95% of all the time). Suppose that she is applying contraceptive drug with a 1% fail ratio.

Now comes the question: Jo takes the test and the test says that she’s pregnant. Now what is the probability that she’s indeed pregnant?

I would definitely not write down this example if the answer was 95% percent as you may or may not have guessed but, it really is tempting to guess the probability as 95% the first time.

The solution (as given in the aforementioned book) is:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaabbeaacaWGqb
% GaaiikaiaadshacaWGLbGaam4CaiaadshacaGG6aGaaGymaiaacYha
% caWGWbGaamOCaiaadwgacaWGNbGaaiOoaiaaigdacaGGPaGaeyypa0
% ZaaSaaaeaacaWGqbGaaiikaiaadchacaWGYbGaamyzaiaadEgacaGG
% 6aGaaGymaiaacYhacaWG0bGaamyzaiaadohacaWG0bGaaiOoaiaaig
% dacaGGPaGaamiuaiaacIcacaWGWbGaamOCaiaadwgacaWGNbGaaiOo
% aiaaigdacaGGPaaabaGaamiuaiaacIcacaWG0bGaamyzaiaadohaca
% WG0bGaaiOoaiaaigdacaGGPaaaaaqaaiabg2da9maalaaabaGaamiu
% aiaacIcacaWGWbGaamOCaiaadwgacaWGNbGaaiOoaiaaigdacaGG8b
% GaamiDaiaadwgacaWGZbGaamiDaiaacQdacaaIXaGaaiykaiaadcfa
% caGGOaGaamiCaiaadkhacaWGLbGaam4zaiaacQdacaaIXaGaaiykaa
% qaamaaqafabaGaamiuaiaacIcacaWG0bGaamyzaiaadohacaWG0bGa
% aiOoaiaaigdacaGG8bGaamiCaiaadkhacaWGLbGaam4zaiaacQdaca
% WGPbGaaiykaiaadcfacaGGOaGaamiCaiaadkhacaWGLbGaam4zaiaa
% cQdacaWGPbGaaiykaaWcbaGaamyAaaqab0GaeyyeIuoaaaaakeaacq
% GH9aqpdaWcaaqaaiaaicdacaGGUaGaaGyoaiaaiwdacqGHxdaTcaaI
% WaGaaiOlaiaaicdacaaIXaaabaGaaGimaiaac6cacaaI5aGaaGynai
% abgEna0kaaicdacaGGUaGaaGimaiaaigdacqGHRaWkcaaIWaGaaiOl
% aiaaicdacaaI1aGaey41aqRaaGimaiaac6cacaaI5aGaaGyoaaaaae
% aacqGH9aqpcaaIWaGaaiOlaiaaigdacaaI2aaaaaa!ADD3!
\[
\begin{gathered}
 P(test:1|preg:1) = \frac{{P(preg:1|test:1)P(preg:1)}}
{{P(test:1)}} \\ 
 = \frac{{P(preg:1|test:1)P(preg:1)}}
{{\sum\limits_i {P(test:1|preg:i)P(preg:i)} }} \\ 
 = \frac{{0.95 \times 0.01}}
{{0.95 \times 0.01 + 0.05 \times 0.99}} \\ 
 = 0.16 \\ 
\end{gathered} 
\]

 where P(b:bj|a:ai) represents the probability of b having the value bj given that a=ai. So Jo has P(test:1|preg=1) = 16% meaning that given that Jo is actually pregnant, the test would give the positive result by a probability of 16%. So we took into account both the test’s and the contra-ceptive’s reliabilities. If this doesn’t make sense and you still want to stick with the somehow more logical looking 95%, think the same example but this time starring John, Jo’s humorous boyfriend who as a joke applied the test and came with a positive result on pregnancy. Now, do you still say that John is 95% pregnant? I guess not  Just plug in 0 for P(preg:1) to the equation above and enjoy the outcoming likelihood of John being non-pregnant equaling to 0…

The thing to keep in mind is the probability of a being some value ai when b is bj is not equal to the probability of b being b when a is equal to ai.

]]>
http://www.emresururi.com/physics/?feed=rss2&p=44 1
(7,4) Hamming Code http://www.emresururi.com/physics/?p=43 http://www.emresururi.com/physics/?p=43#comments Fri, 23 Nov 2007 12:45:03 +0000 http://www.emresururi.com/physics/?p=43 The (7,4) Hamming Code is a system that is used for sending bits of block length 4 (16 symbols) with redundancy resulting in transmission of blocks of 7 bits length. The first 4 bits of the transmitted information contains the original source bits and the last three contains the parity-check bits for {1,2,3}, {2,3,4} and {1,3,4} bits.

It has a transmission rate of 4/7 ~ 0.57 . Designate the source block as s, the transmission operator as G so that transmitted block t is t = GTs and parity check operator is designated as H where:

Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaCisaiabg2<br>  % da9maadmaabaqbaeaGboWbaaaabaGaaGymaaqaaiaaigdaaeaacaaI<br>  % XaaabaGaaGimaaqaaiaaigdaaeaacaaIWaaabaGaaGimaaqaaiaaic<br>  % daaeaacaaIXaaabaGaaGymaaqaaiaaigdaaeaacaaIWaaabaGaaGym<br>  % aaqaaiaaicdaaeaacaaIXaaabaGaaGimaaqaaiaaigdaaeaacaaIXa<br>  % aabaGaaGimaaqaaiaaicdaaeaacaaIXaaaaaGaay5waiaaw2faaaaa<br>  % !4A2B!<br>  ${\bf{H}} = \left[ {\begin{array}{*{20}c}<br>     1 \hfill & 1 \hfill & 1 \hfill & 0 \hfill & 1 \hfill & 0 \hfill & 0 \hfill  \\<br>     0 \hfill & 1 \hfill & 1 \hfill & 1 \hfill & 0 \hfill & 1 \hfill & 0 \hfill  \\<br>     1 \hfill & 0 \hfill & 1 \hfill & 1 \hfill & 0 \hfill & 0 \hfill & 1 \hfill  \\<br>  \end{array}} \right]$<br>

 This could be more obvious if we investigate the GT and keeping in mind that, the first 4 "transformed" bits stay the same with 3 added to the end for parity check. So our parity reporter operator P would be:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaCiuaiabg2
% da9maadmaabaqbaeaGboabaaaabaGaaGymaaqaaiaaigdaaeaacaaI
% XaaabaGaaGimaaqaaiaaicdaaeaacaaIXaaabaGaaGymaaqaaiaaig
% daaeaacaaIXaaabaGaaGimaaqaaiaaigdaaeaacaaIXaaaaaGaay5w
% aiaaw2faaaaa!4399!
\[
{\mathbf{P}} = \left[ {\begin{array}{*{20}c}
1 \hfill & 1 \hfill & 1 \hfill & 0 \hfill \\
0 \hfill & 1 \hfill & 1 \hfill & 1 \hfill \\
1 \hfill & 0 \hfill & 1 \hfill & 1 \hfill \\
\end{array} } \right]
\]

As it calculates the parities for bits, also its Modulo wrt 2 must be included. Since we’re calculating the parities for {1,2,3}, {2,3,4} and {1,3,4} bit-groups, the values of the P are easily understandable. Now, we will join the Identity Matrix of order 4 (for directly copying the first 4 values) with the P operator to construct the GT transformation (encoder) operator:

Formula: % MathType!MTEF!2!1!+-
% feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaC4ramaaCa
% aaleqabaGaamivaaaakiabg2da9maadmaabaqbaeaGbEabaaaaaeaa
% caaIXaaabaGaaGimaaqaaiaaicdaaeaacaaIWaaabaGaaGOmaaqaai
% aaigdaaeaacaaIWaaabaGaaGimaaqaaiaaicdaaeaacaaIWaaabaGa
% aGymaaqaaiaaicdaaeaacaaIWaaabaGaaGimaaqaaiaaicdaaeaaca
% aIXaaabaGaaGymaaqaaiaaigdaaeaacaaIXaaabaGaaGimaaqaaiaa
% icdaaeaacaaIXaaabaGaaGymaaqaaiaaigdaaeaacaaIXaaabaGaaG
% imaaqaaiaaigdaaeaacaaIXaaaaaGaay5waiaaw2faaaaa!505A!
\[
{\mathbf{G}}^T = \left[ {\begin{array}{*{20}c}
1 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\
0 \hfill & 1 \hfill & 0 \hfill & 0 \hfill \\
0 \hfill & 0 \hfill & 1 \hfill & 0 \hfill \\
0 \hfill & 0 \hfill & 0 \hfill & 1 \hfill \\
1 \hfill & 1 \hfill & 1 \hfill & 0 \hfill \\
0 \hfill & 1 \hfill & 1 \hfill & 1 \hfill \\
1 \hfill & 0 \hfill & 1 \hfill & 1 \hfill \\
\end{array} } \right]
\]


and H is actually nothing but [-P I3]. It can be seen that HGTs will always result {0,0,0} in a parity check, meaning that all three parities have passed the test. So, where is the problem?

Noise: Noise is the operator that flips the values of bits randomly with an occuring probability f. Suppose that we have a source signal block of 4 bits length: {s1, s2, s3, s4}. I encode this information with calculating the parities and adding them to the end, thus coming up with a block of 7 bits length and call this block as t (for transmitted):

t=GTs (supposing t and s are given as column vectors. Otherwise, t=sG)

Now I send this packet but on the road to the receiver, the noise changes it and so, the received message, r, is the transmitted message (t) AND the noise (n):

r=t+s

Since I have got the parity information part (which may also have been affected by the noise), I compare the received data’s first 4 values with the parity information stored in the last 3 values. A very practical way of doing this is using 3 intersecting circles and placing them as:

 

with r1-r4 representing the first values (ie, actual data) received and p1-p3 being the parity information (the last 3 values). So we place the values to their corresponding locations and easily can check if the parity matches to the related values. We can have 8 different situations at this moment. Let’s denote the z=Hr vector as the parity check result. z={0,0,0} if all three parity information agrees with the calculated ones, z={1,0,0} if only p1 is different from the calculated parity of {r1,r2,r3}, z={1,0,1} if p1 AND p3 are different from their corresponding calculated parities and so on. If there is only 1 error (ie, only one "1" and two "0") in the parity check, then it means that, the noise affected only the parity information, not the data encoded (ie, the first 4 values), so the corrupted data can be fixed by simply switching the faulty parity. If there are 2 parity check errors, this time, the value on the intersection of the corresponding parities (ie, r1 for p1&p3, r2 for p1&p2 and r4 for p2&p3,) must be switched. If z={1,1,1} then r3 should be switched.

Let’s introduce some mathematica operations and then tackle with some examples. You can just copy and paste them since there are no additional libraries required:

mGT = {{1,0,0,0},{0,1,0,0},{0,0,1,0},{0,0,0,1},{1,1,1,0},{0,1,1,1},{1,0,1,1}};
mH = {{1,1,1,0,1,0,0},{0,1,1,1,0,1,0},{1,0,1,1,0,0,1}};

(mS = {{1},{1},{1},{0}});
Mod[(mGTS=mGT . mS),2] // MatrixForm
mN ={0,0,0,0,0,0,1};
(mR = Mod[mGTS+mN,2])  // MatrixForm
mZ = Mod[mH . mR,2];
Flatten[mZ]

The first two lines define GT and H and from here, you can check a very important property being:

 Mod[mH.mGT,2] // MatrixForm

 equal to 0.

We introduce the 4bit data to be transmitted as mS which we had referred as s through out the text. Then we encode the data to be send with the parity info encoded as mGTS (t) add a noise mN (n) to it assuming it is corrupted in the transmission process and the data the other side receives is mR (r) consisting of the data transmitted AND the noise acted on it. mZ is the parity check result (z) (I’ve "Flatten"ed it for a better view). In the operations above, the "Mod" operator is frequently used because of our universe being a mod2 universe (2=0, 3=1, 4=0,… just 0s and 1s actually!).

So, in this example, we had s={1,1,1,0}, and this was encoded as t={1,1,1,0,1,0,0} and on its way to the happy-ending, a noise corrupted it and switched its 3rd parity information from 0 to 1. As a result of this, the recipient got its data as r={1,1,1,0,1,0,1}. But since our recipient is not that dull, a parity check is immediately applied and it was seen that z={0,0,1} meaning that, there is something wrong with the 3rd parity information, and nothing wrong with the information received at all!

Now, let’s apply all the possible noises given that only one value can be switched at a time (ie, only one 1 in the n vector with the remaining all being 0s).

 For[i = 1, i <= 7, i++,
    mN = {0, 0, 0, 0, 0, 0, 0} ;
    mN = ReplacePart[mN, 1, i];
    mR = Mod[mGTS + mN, 2];
    mZ = Mod[mH . mR, 2];
    Print[mN, "\t", Flatten[mZ]];
    ];

After this, if everything went right, you’ll see the output something like this:

{1,0,0,0,0,0,0}   {1,0,1}
{0,1,0,0,0,0,0}   {1,1,0}
{0,0,1,0,0,0,0}   {1,1,1}
{0,0,0,1,0,0,0}   {0,1,1}
{0,0,0,0,1,0,0}   {1,0,0}
{0,0,0,0,0,1,0}   {0,1,0}
{0,0,0,0,0,0,1}   {0,0,1}

So we know exactly what went wrong at are able to correct it… Everything seems so perfect, so there must be a catch…

There is no decoding error as long as there is only 1 corruption per data block!

As the corruption number increases, there is no way of achieving the original data send with precise accuracy. Suppose that we re-send our previous signal s={1,1,1,0} which yields t={1,1,1,0,1,0,0} but this time the noise associated with the signal is n={0,1,0,0,0,0,1} so, the received signal is r={1,0,1,0,1,0,1}. When we apply the partition check, we have z={1,1,1}, which means that, there is a problem with the 3rd value and we deduce that the original message transmitted was t={1,0,0,0,1,0,1} which passes the parity check tests with success alas a wrongly decoded signal!

The probability of block error, pB is the probability that there will at least be one decoded bit that will not match the transmitted bit. A little attention must be paid about this statement. It emphasizes the decoded bit part, not the received bit. So, in our case of (4,7) Hamming Code, there won’t be a block error as long as there is only one corrupted bit in the received bit because we are able to construct the transmitted bits with certainty in that case. Only if we have two or more bits corrupted, we won’t be able to revive the original signal, so, pB scales as O(f2).

References : I have heavily benefitted from David MacKay’s resourceful book Information Theory, Inference, and Learning Algorithms which is available for free for viewing on screen.

Information Theory, Inference, and Learning Algorithms Information Theory, Inference, and Learning Algorithms
by David MacKay

 

]]>
http://www.emresururi.com/physics/?feed=rss2&p=43 1
A Crash Course on Information Gain & some other DataMining terms and How To Get There http://www.emresururi.com/physics/?p=36 http://www.emresururi.com/physics/?p=36#comments Wed, 21 Nov 2007 10:01:46 +0000 http://www.emresururi.com/physics/?p=36 Suppose that you have -let’s say- 5 pipelines outputting Boolean values seemingly randomly such as:

output.jpg

Now, we will be looking for a possible relation between these variables -let’s name that {a,b,c,d,e}� (Suppose that we do not know that the first 3 variables (a,b,c)� are randomly tossed, while the 4th (d)� is just “a AND b” and the 5th one (e) is actually “NOT a”).

  1. We look for different possible values of each variable and store them in an array called posVals (as obvius, from “Possible Values”). In our example, each of the variables can have only two values, so our posVals array will look something like this:

    a2.jpg

  2. Now that we know the possible outcomes for each of the variables, we calculate the probability of each of these values by simply counting them and then dividing to the total number of output. As an example, for the 3rd variable,� there are four 0’s and 6 1’s totaling 10 values. So, the probability of a chosen value to be 0 would be 4 out of 10 and 6/10 for 1. We store the probabilities in an array called probs this time which is like
    � a3.jpg
    In the� 3rd row, the probabilities for the c variable are stored in the order presented in the posVals array (check the 3rd row of posVals to see that the first value is 1 and the second is 0). You can object the unnecessity of storing the second row of probs because, the second row is always equal to 1-probs[[1]] (I will be using the Mathematica notation for array elements, with a[[1]] meaning the 1st row if a is a matrix, or 1st element if a is a 1-D List; a[[1]][[3]] meaning the element on the first row and third column of the matrix a), but it is only a specific case that we are dealing with Boolean outputs. Yes,� a variable’s probability always equals to the sum of the probabilities of all the other variables subtracted from 1 but we will keep record of all the probabilities for practical reasons.
  3. Now we will calculate the entropy for each variable. Entropy is a measure of order in a distribution and a higher entropy corresponds to a more disordered system. For our means, we will calculate the entropy as
    Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamisaiaacI<br>  % cacaWGybGaaiykaiabg2da9iabgkHiTiaadchadaWgaaWcbaGaaGym<br>  % aaqabaGcciGGSbGaai4BaiaacEgadaWgaaWcbaGaaGOmaaqabaGcca<br>  % WGWbWaaSbaaSqaaiaaigdaaeqaaOGaeyOeI0IaamiCamaaBaaaleaa<br>  % caaIYaaabeaakiGacYgacaGGVbGaai4zamaaBaaaleaacaaIYaaabe<br>  % aakiaadchadaWgaaWcbaGaaGOmaaqabaGccqGHsislcqWIMaYscqGH<br>  % sislcaWGWbWaaSbaaSqaaiaad2gaaeqaaOGaciiBaiaac+gacaGGNb<br>  % WaaSbaaSqaaiaaikdaaeqaaOGaamiCamaaBaaaleaacaWGTbaabeaa<br>  % aaa!55DC!<br>  $$<br>  H(X) =�  - p_1 \log _2 p_1�  - p_2 \log _2 p_2�  -�  \ldots�  - p_m \log _2 p_m<br>  $$<br>
    Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyypa0Jaey<br>  % OeI0YaaabCaeaacaWGWbWaaSbaaSqaaiaadQgaaeqaaOGaciiBaiaa<br>  % c+gacaGGNbWaaSbaaSqaaiaaikdaaeqaaOGaamiCamaaBaaaleaaca<br>  % WGQbaabeaaaeaacaWGQbGaeyypa0JaaGymaaqaaiaad2gaa0Gaeyye<br>  % Iuoaaaa!45A5!<br>  $$<br>  � =�  - \sum\limits_{j = 1}^m {p_j \log _2 p_j }<br>  $$<br>
    In these formulas, H represents the entropy, Formula:<br>  $$<br>  {p_j }<br>  $$<br>  represents the probability of the jth value for variable X. Let’s store this values in the entropies array.
  4. Special Conditional Entropy, Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamisamaabm<br>  % aabaGaamywaiaacYhacaWGybGaeyypa0JaamiEamaaBaaaleaacaWG<br>  % PbaabeaaaOGaayjkaiaawMcaaaaa!3E25!<br>  $$<br>  H\left( {Y|X = x_i } \right)<br>  $$<br>  � is defined as the entropy of the Y variable values for which the corresponding X variable values equal to Formula: $$x_i$$.� For� our given example, let’s calculate H(d|a=1): Searching through the data, we see that, a=1 for rows: {1,3,4,7,10} and the values of d in these rows correspond to: {1,0,0,0,1}. Let’s assume this set ({1,0,0,0,1}) as a brand new set and go on from here.� Applying the definition of the entropy given in the 3rd step, we have:
    Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiCamaabm<br>  % aabaWaaiWaaeaacaaIWaGaaiilaiaaigdaaiaawUhacaGL9baaaiaa<br>  % wIcacaGLPaaacqGH9aqpdaGadaqaamaaleaaleaacaaIZaaabaGaaG<br>  % ynaaaakiaacYcadaWcbaWcbaGaaGOmaaqaaiaaiwdaaaaakiaawUha<br>  % caGL9baaaaa!43EB!<br>  $$<br>  p\left( {\left\{ {0,1} \right\}} \right) = \left\{ {{\textstyle{3 \over 5}},{\textstyle{2 \over 5}}} \right\}<br>  $$<br>
    Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamisamaabm<br>  % aabaGaamizaiaacYhacaWGHbGaeyypa0JaaGymaaGaayjkaiaawMca<br>  % aiabg2da9iabgkHiTmaaqahabaGaamiCamaaBaaaleaacaWGQbaabe<br>  % aakiGacYgacaGGVbGaai4zamaaBaaaleaacaaIYaaabeaakiaadcha<br>  % daWgaaWcbaGaamOAaaqabaaabaGaamOAaiabg2da9iaaigdaaeaaca<br>  % aIYaaaniabggHiLdGccqGH9aqpcqGHsisldaqadaqaamaalaaabaGa<br>  % aG4maaqaaiaaiwdaaaGaciiBaiaac+gacaGGNbWaaSbaaSqaaiaaik<br>  % daaeqaaOWaaSaaaeaacaaIZaaabaGaaGynaaaaaiaawIcacaGLPaaa<br>  % cqGHsisldaqadaqaamaalaaabaGaaGOmaaqaaiaaiwdaaaGaciiBai<br>  % aac+gacaGGNbWaaSbaaSqaaiaaikdaaeqaaOWaaSaaaeaacaaIYaaa<br>  % baGaaGynaaaaaiaawIcacaGLPaaaaaa!6003!<br>  $$<br>  H\left( {d|a = 1} \right) =�  - \sum\limits_{j = 1}^2 {p_j \log _2 p_j }�  =�  - \left( {{3 \over 5}\log _2 {3 \over 5}} \right) - \left( {{2 \over 5}\log _2 {2 \over 5}} \right)<br>  $$<br>
    Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamisamaabm<br>  % aabaGaamizaiaacYhacaWGHbGaeyypa0JaaGymaaGaayjkaiaawMca<br>  % aiabg2da9iaaicdacaGGUaGaaGyoaiaaiEdacaaIWaGaaGyoaiaaiw<br>  % dacaaIXaaaaa!43C0!<br>  $$<br>  H\left( {d|a = 1} \right) = 0.970951<br>  $$<br>
  5. Conditional Entropy, Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamisamaabm<br>  % aabaGaamywaiaacYhacaWGybaacaGLOaGaayzkaaaaaa!3AFE!<br>  $$<br>  H\left( {Y|X} \right)<br>  $$<br>  is the entropy including all the possible values of the variable being tested (X)� as having a relation to the� targeted� variable (Y). It is calculated as:
    Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamisamaabm<br>  % aabaGaamywaiaacYhacaWGybaacaGLOaGaayzkaaGaeyypa0Zaaabu<br>  % aeaacaWGWbWaaeWaaeaacaWGybGaeyypa0JaamiEamaaBaaaleaaca<br>  % WGQbaabeaaaOGaayjkaiaawMcaaiaadIeadaqadaqaaiaadMfacaGG<br>  % 8bGaamiwaiabg2da9iaadIhadaWgaaWcbaGaamOAaaqabaaakiaawI<br>  % cacaGLPaaaaSqaaiaadQgaaeqaniabggHiLdaaaa!4DD2!<br>  $$<br>  H\left( {Y|X} \right) = \sum\limits_j {p\left( {X = x_j } \right)H\left( {Y|X = x_j } \right)}<br>  $$<br>
    Again, if we look at our example, since we have H(d|a=1) =� 0.970951, p(a=1) = 0.5 ; and if we calculate H(d|a=0) in a similar way as in step 4, we find H(d|a=0) = 0 and taking
    Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaCbeaeaaci<br>  % GGSbGaaiyAaiaac2gaaSqaaiaadIhacqGHsgIRcaaIWaaabeaakiaa<br>  % dIhaciGGSbGaai4BaiaacEgadaWgaaWcbaGaaGOmaaqabaGcdaqada<br>  % qaaiaadIhaaiaawIcacaGLPaaaaaa!43E9!<br>  $$<br>  \mathop {\lim }\limits_{x \to 0} x\log _2 \left( x \right) = 0<br>  $$<br>
    we find that H(d|a) = 0.5 x 0.970951 + 0.5 x 0 = 0.4854755
  6. Information Gain, Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysaiaadE<br>  % eadaqadaqaaiaadMfacaGG8bGaamiwaaGaayjkaiaawMcaaaaa!3BCB!<br>  $$<br>  IG\left( {Y|X} \right)<br>  $$<br>  – is a measure of the effect a variable (X)� is having against� a targetted variable (Y) with the information on the entropy of the targetted variable itself (H(Y))� taken into account. It is calculated as:
    Formula: % MathType!MTEF!2!1!+-<br>  % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn<br>  % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr<br>  % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9<br>  % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x<br>  % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysaiaadE<br>  % eadaqadaqaaiaadMfacaGG8bGaamiwaaGaayjkaiaawMcaaiabg2da<br>  % 9iaadIeadaqadaqaaiaadMfaaiaawIcacaGLPaaacqGHsislcaWGib<br>  % WaaeWaaeaacaWGzbGaaiiFaiaadIfaaiaawIcacaGLPaaaaaa!4603!<br>  $$<br>  IG\left( {Y|X} \right) = H\left( Y \right) - H\left( {Y|X} \right)<br>  $$<br>

For our example , H(d) = 0.721928 and we calculated H(d|a=0) = 0.4854755 so, IG(d|a) = 0.236453. Summing up, for the data (of 10 for each variable)� presented in step 1, we have the Information Gain Table as:

X IG(e|X) X IG(d|X) X IG(c|X)
a 1.0 a 0.236453 a 0.124511
b 0.0290494 b 0.236453 b 0.0
c 0.124511 c 0.00740339 d 0.00740339
d 0.236453 e 0.236453 e 0.124511
X IG(b|X) X IG(a|X)
a 0.0290494 b 0.0290494
c 0.0 c 0.124511
d 0.236453 d 0.236453
e 0.0290494 e 1.0

And here are the results for 1000 values per variable instead of 10:

X IG(e|X) X IG(d|X) X IG(c|X)
a 0.996257 a 0.327922 a 0.000500799
b 0.000716169 b 0.29628 b 0.00224924
c 0.000500799 c 0.000982332 d 0.000982332
d 0.327922 e 0.327922 e 0.000500799
X IG(b|X) X IG(a|X)
a 0.000716169 b 0.000716169
c 0.00224924 c 0.000500799
d 0.29628 d 0.327922
e 0.000716169 e 0.996257

What can we say about the relationships from these Information Gain results?

  • H(Y|X) = H(X|Y) – Is this valid for this specific case or is it valid for� general?
  • Variable a: We can most probably guess a if we had known e, and it is somehow correlated with d.
  • Variable b: It is correlated with d.
  • Variable c: There doesn’t seem to be any correlation.
  • Variable d: It is equally correlated with a&e and also correlated with b very closely to that correlation with a&e.
  • Variable e: It is VERY strongly correlated with a and� it is somehow correlated with d.

Let’s take it on from here, with the variable d:

This is the IG table for 10000 data per each variable:

X IG(d|X)
a 0.30458
b 0.297394
c 0.00014876
e 0.30458

This is the IG table for 100000 data per each variable:

X IG(d|X)
a 0.309947
b 0.309937
c 1.97347×10^-6
e 0.309947

You can download the Mathematica sheet for this example : Information Gain Example

You will also need the two libraries (Matrix & DataMining) to run the example.

]]>
http://www.emresururi.com/physics/?feed=rss2&p=36 2