构建分布式Tensorflow模型系列:特征工程

看这篇文章：

https://zhuanlan.zhihu.com/p/41663141

在Tensorflow中，通过调用tf.feature_column模块来创建feature columns。有两大类feature column，一类是生成dense tensor的Dense Column；另一类是生成sparse tensor的Categorical Column。具体地，目前tensorflow提供的feature columns如下图所示。

我们已经使用过indicator_column来把categorical column得到的稀疏tensor转换为one-hot或者multi-hot形式的稠密tensor，

将categorical_column_with_vocabulary_list的输出，直接作为input_layer的输入时，报错

意思是要将 a categorical column转为denseColumn，通过embedding或者indicator column方法可以实现。

Indicator and embedding columns

Indicator columns 和 embedding columns 不能直接作用在原始特征上，而是作用在categorical columns上。

Embedding column与indicator column之间的区别可以用下图表示。

从上面的测试结果可以看出不在vocabulary里的数据'A'在经过categorical_column_with_vocabulary_list操作时映射为默认值-1，而默认值-1在embeding column时映射为0向量，这是一个很有用的特性，可以用-1来填充一个不定长的ID序列，这样可以得到定长的序列，然后经过embedding column之后，填充的-1值不影响原来的结果。在下一篇文章中，我会通过一个例子来演示这个特性。

有多个特征可能需要共享相同的embeding映射空间，比如用户历史行为序列中的商品ID和候选商品ID，这时候可以用到tf.feature_column.shared_embedding_columns。

Weighted categorical column

有时候我们需要给一个类别特征赋予一定的权重，比如给用户行为序列按照行为发生的时间到某个特定时间的差来计算不同的权重，这是可以用到weighted_categorical_column。

tensorflow 深度学习特征工程

构建分布式Tensorflow模型系列:特征工程

Indicator and embedding columns

Weighted categorical column

相关

双一流博士整理的计算机视觉学习路线（深度学习+传统图像处理）

特征工程

[机器学习笔记(一)] TensorFLow安装

深度学习模型训练阶段的加速与优化

神舟G7-CT7NK 安装tensorflow-gpu

TensorFlow——机器学习编程框架

12-使用TensorFlow加载和预处理数据

TensorFlow小记

深度学习3：波士顿房价预测（2）

TensorFlow----3dimension

深度学习--GAN学习笔记

Tensorflow实现LeNet5网络并保存pb模型，实现自定义的手写数字识别（附opencv-python调用

标签