Re: help constructing decision tree

From: Lennart Jonsson <lennart_at_kommunicera.umea.se>
Date: 16 Oct 2001 11:47:56 -0700
Message-ID: <6dae7e65.0110161047.7f097ab9_at_posting.google.com>


mark_li_at_my-deja.com (Mark Li) wrote in message news:<bbcb687b.0110151818.6f277bea_at_posting.google.com>...
> Hi,
>
> I need help in constructing a decision tree for something like the
> following data, and I need to predict SALARY (using ID3).
>
> What I really need to know is how I calculate the entropy for each of
> the attributes DEPARTMENT, STATUS and AGE. I ony can find examples
> calculating the entropy where there are only two results to deal with
> (eg. PLAY and NOT PLAY, SUNBURNT and NON-SUNBURNT). In this case, the
> end results are 25-30K, 30-35K, 35-40K, etc.
>
>
> Department Status Age Salary Count
>
> sales senior 31-35 45-50K 30
> sales junior 26-30 25-30K 40
> sales junior 31-35 30-35K 40
> systems junior 21-25 45-50K 20
> systems senior 31-35 65-70K 5
> systems junior 26-30 45-50K 3
> systems senior 41-45 65-70K 3
> marketing senior 36-40 45-50K 10
> marketing junior 31-35 40-45K 4
> secretary senior 46-50 35-40K 4
> secretary junior 26-30 25-30K 6
>
>
> TIA.
Without remembering very much about entropy and ID3, couldnt you do something like?

select department, log(((1.0*count(case when salary = 25 then 1 else null end)) / count(1))) + log(((1.0*count(case when salary = 30 then 1 else null end)) / count(1))) + ... + log(((1.0*count(case when salary = 65 then 1 else null end)) / count(1))) from staff group by department

the log thing is just a dummie, since I remember that it had something to do with log but thats about it :-). If log really is involved I assume one would have to prevent log(0) from beeing calculated somehow.

Hope it helps
/Lennart Received on Tue Oct 16 2001 - 20:47:56 CEST

Original text of this message