决策树-信息增益


首先计算Ent,这里最好看一下西瓜书的P75页,结合着来学习

我们首先要计算信息熵Ent,然后再计算信息增益Gain

from math import log
# 数据集
dataSet = [[1, 1, 'yes'],
           [1, 1, 'yes'],
           [1, 0, 'no'],
           [0, 1, 'no'],
           [0, 1, 'no']]
# 标记
labels = ['no surfacing', 'flippers']

# 定义香农熵计算函数
def calcShannonEnt(dataSet):
    dataNum = len(dataSet)
    dict1 = {}
    for data in dataSet:
        label = data[-1]
        if label not in dict1:
            dict1[label] = 0
        dict1[label] += 1
    shannonEnt = 0
    for key in dict1:
        pi = dict1[key] / dataNum
        # log的用法
        shannonEnt -= pi * log(pi, 2)
    return shannonEnt


result = calcShannonEnt(dataSet)
print(result)

0.9709505944546686