Sims，Mosi, Mosei

sims：中文多模态情感识别数据集
- label
- Feature
- 数据集结构
- Statistics
MOSI：英文多模态情感识别数据集
- label
- Feature
- 数据集结构
- Statistics
MOSEI
- label
- Feature Extraction
- 数据集结构
- Statistics

sims：中文多模态情感识别数据集

label

**sentimental state **

emotion	label
negative	-1
neutral	0
positive	1

**regression task: average the five labeled results. **
{-1.0, -0.8, -0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0}.

divide these values into 5 classifications

emotion	label
negative	{-1.0, -0.8}
weakly negative	{-0.6, -0.4, -0.2}
neutral	{0.0}
weakly positive	{0.2, 0.4, 0.6}
positive	{0.8, 1.0}

Feature

Text

BERT-base word embeddings (768-dimensional word vector)

Audio

LibROSA speech toolkit with default parameters to extract acoustic features at 22050Hz.
Totally, 33dimensional frame-level acoustic features are extracted, including 1-dimensional logarithmic fundamental frequency (log F0), 20-dimensional Melfrequency cepstral coefficients (MFCCs) and 12dimensional Constant-Q chromatogram (CQT).

Vision

Frames are extracted from the video segments at 30Hz.
MTCNN face detection algorithm to extract aligned faces.
MultiComp OpenFace2.0 toolkit to extract the set of 68 facial landmarks, 17 facial action units, head pose, head orientation, and eye gaze. Lastly, 709-dimensional frame-level visual features are extracted in total.

数据集结构

import pickle
import numpy as np

with open('data/SIMS/unaligned_39.pkl', 'rb') as f:
    data = pickle.load(f)

print(data.keys())
output:
dict_keys(['train', 'valid', 'test'])

print(data['train'].keys())
output:
dict_keys(['raw_text', 'text_bert', 'audio_lengths', 'vision_lengths', 'classification_labels', 'regression_labels', 'classification_labels_T', 'regression_labels_T', 'classification_labels_A', 'regression_labels_A', 'classification_labels_V', 'regression_labels_V', 'text', 'audio', 'vision', 'id'])

print(data['train']['raw_text'][0])
output:
闭嘴，不是来抓你的。

保存数据
for mode in ['train','valid','test']:
    # str --> float32
    if use_bert:
        self.text = data[self.mode]['text_bert'].astype(np.float32)
    else:
        self.text = data[self.mode]['text'].astype(np.float32)
        
    vision = data[mode]['vision'].astype(np.float32)
    audio = data[mode]['audio'].astype(np.float32)
    rawText = data[mode]['raw_text']
    ids = data[mode]['id']

Statistics

print(len(data['train']['id']))
print(len(data['valid']['id']))
print(len(data['test']['id']))

output:
1368
456
457

MOSI：英文多模态情感识别数据集

label

emotion	label
strongly positive	+3
positive	+2
weakly positive	+1
neutral	0
weakly negative	-1
negative	-2
strongly negative	-3

Feature

Audio and visual features have been automatically extracted from MPEG files with framerates of 1000 for audio and 30 for video

Visual

16 Facial Action Units, 68 Facial Landmarks, Head Pose and Orientation, 6 Basic Emotions6 and Eye Gaze

Audio

COVAREP： pitch, energy, NAQ (Normalized Amplitude Quotient), MFCCs (Mel-frequency Cepstral Coefficients), Peak Slope, Energy Slope

数据集结构

import pickle
import numpy as np

with open('data/MOSI/aligned_50.pkl', 'rb') as f:
    data = pickle.load(f)

print(data.keys())
output:
dict_keys(['train', 'valid', 'test'])

print(data['train'].keys())
output:
dict_keys(['raw_text', 'audio', 'vision', 'id', 'text', 'text_bert', 'annotations', 'classification_labels', 'regression_labels'])

print(data['train']['raw_text'][0])
output:
A LOT OF SAD PARTS

保存数据
for mode in ['train','valid','test']:
    if use_bert:
        self.text = data[mode]['text_bert'].astype(np.float32)
    else:
        self.text = data[mode]['text'].astype(np.float32)
        
    vision = data[mode]['vision'].astype(np.float32)
    audio = data[mode]['audio'].astype(np.float32)
    rawText = data[mode]['raw_text']
    ids = data[mode]['id']

Statistics

print(len(data['train']['id']))
print(len(data['valid']['id']))
print(len(data['test']['id']))

output:
1284
229
686

MOSEI

label

emotion	label
strongly positive	+3
positive	+2
weakly positive	+1
neutral	0
weakly negative	-1
negative	-2
strongly negative	-3

Feature Extraction

Text

All videos have manual transcription. Glove word embeddings

Visual:

Frames are extracted from the full videos at 30Hz.

The bounding box of the face is extracted using the MTCNN face detection algorithm .

facial action units through Facial Action Coding System (FACS) .

a set of six basic emotions purely from static faces using Emotient FACET .

MultiComp OpenFace is used to extract the set of 68 facial landmarks, 20 facial shape parameters, facial HoG features, head pose, head orientation and eye gaze.

face embeddings from commonly used facial recognition models such as DeepFace , FaceNet and SphereFace .

Acoustic

COVAREP software： extract acoustic features including 12 Mel-frequency cepstral coefficients, pitch, voiced/unvoiced segmenting features, glottal source parameters , peak slope parameters and maxima dispersion quotients.

数据集结构

import pickle
import numpy as np

with open('data/MOSEI/aligned_50.pkl', 'rb') as f:
    data = pickle.load(f)

print(data.keys())
output:
dict_keys(['train', 'valid', 'test'])

print(data['train'].keys())
output:
dict_keys(['raw_text', 'audio', 'vision', 'id', 'text', 'text_bert', 'annotations', 'classification_labels', 'regression_labels'])

print(data['train']['raw_text'][0])
output:
Key is part of the people that we use to solve those issues, whether it's stretch or outdoor resistance or abrasions or different technical aspects that we really need to solve to get into new markets, they've been able to bring solutions.

保存数据
for mode in ['train','valid','test']:
    if use_bert:
        self.text = data[mode]['text_bert'].astype(np.float32)
    else:
        self.text = data[mode]['text'].astype(np.float32)
        
    vision = data[mode]['vision'].astype(np.float32)
    audio = data[mode]['audio'].astype(np.float32)
    rawText = data[mode]['raw_text']
    ids = data[mode]['id']

Statistics

print(len(data['train']['id']))
print(len(data['valid']['id']))
print(len(data['test']['id']))

output:
16326
1871
4659

数据集

Sims，Mosi, Mosei

sims：中文多模态情感识别数据集

label

Feature

数据集结构

Statistics

MOSI：英文多模态情感识别数据集

label

Feature

数据集结构

Statistics

MOSEI

label

Feature Extraction

数据集结构

Statistics

相关

数据集扩增（Data Augmentation）

sklearn：随机森林_分类器_红酒数据集

INT104-lab13[Parzen Window Method][此方法无数据集划分]

机器学习2.3-不同数据集下使用微调

mnist数据集探究

mnist数据集的获取、访问、使用例子

大数据集快速上传到colab方法分享

csv数据集按比例分割训练集、验证集和测试集，即分层抽样的方法

关于发现Flower102数据集标签文件无法正确下载的问题

目前开源数据集整理

insightface 人脸识别加载训练数据集报错

转载【轨迹数据集】GPS轨迹数据集整理

标签