越览（183）——精读期刊论文《Elasticity unleashed》的3.3和3.4

摘要：This issue of tweets will introduce 3.3 Multi-head attention layer and 3.4 Multi-head attention from a decision-making perspective

分享兴趣，传播快乐，

增长见闻，留下美好。

亲爱的您，这里是LearingYard学苑！

今天小编为您带来“越览（183）——精读期刊论文

《Elasticity unleashed: Fine-grained cloud scaling

through distributed three-way decision

fusion with multi-head attention》的

3.3多头注意力层和3.4从决策角度看多头注意力”。

欢迎您的访问！

Share interest, spread happiness,

increase knowledge, and leave beautiful.

Dear, this is the LearingYard Academy!

Today, the editor brings the

＂Yue Lan (183)：Intensive reading of the journal article

＇Elasticity unleashed: Fine-grained cloud scaling

fusion with multi-head attention’

3.3 Multi-head attention layer and

3.4 Multi-head attention from

a decision-making perspective.

Welcome to visit!

一、内容摘要（Summary of Content）

本期推文将从思维导图、精读内容、知识补充三个方面介绍精读期刊论文《Elasticity unleashed: Fine-grained cloud scaling through distributed three-way decision fusion with multi-head attention》的3.3多头注意力层和3.4从决策角度看多头注意力。

This issue of tweets will introduce 3.3 Multi-head attention layer and 3.4 Multi-head attention from a decision-making perspective of "Elasticity unleashed: Fine-grained cloud scaling through distributed three-way decision fusion with multi-head attention" from three aspects: mind map, intensive reading content, and knowledge supplement.

二、思维导图（Mind map）

三、精读内容（Intensive reading content）

（一）多头注意力层（Multi-head attention layer）

这一部分主要阐述了基于多粒度时间序列的多头注意力建模与决策生成过程。首先，从不同时间粒度提取的注意力向量经过线性变换，得到多头的查询、键和值向量。

This section mainly describes the multi-head attention modeling and decision generation process based on multi-granularity time series. First, the attention vectors extracted from different time granularities are linearly transformed to obtain multi-head query, key, and value vectors.

多头机制能够在并行的注意力空间中捕捉多样化的模式和信息。随后，每个注意力头通过查询与键的点积并经 softmax 运算，计算得到注意力权重，并与对应的值向量相乘。各头的输出结果在拼接后形成融合的注意力表示，从而综合了不同注意力头所关注的特征。

The multi-head mechanism captures diverse patterns and information in parallel attention spaces. Each attention head then calculates an attention weight by performing a softmax operation on the dot product of the query and key, and multiplies it by the corresponding value vector. The outputs of each head are concatenated to form a fused attention representation, integrating the features focused on by different attention heads.

接着，融合后的注意力表示被用作加权因子，对小时、天和周三个粒度的时间序列特征进行聚合，得到能够同时反映多尺度动态的综合表示。该动态表示进一步与静态特征拼接，形成增强的输入表示。最后，通过全连接层映射，模型输出三个不同的决策结果。

Next, the fused attention representation is used as a weighting factor to aggregate time series features at the hourly, daily, and weekly granularities, yielding a comprehensive representation that simultaneously reflects multi-scale dynamics. This dynamic representation is further concatenated with static features to form an enhanced input representation. Finally, through mapping through a fully connected layer, the model outputs three different decision outcomes.

在训练过程中，采用交叉熵损失函数来衡量预测结果与真实标签之间的差异，并利用梯度下降方法不断优化参数，从而提升模型在多决策任务中的准确性与鲁棒性。

During the training process, the cross-entropy loss function is used to measure the difference between the predicted results and the true labels, and the gradient descent method is used to continuously optimize the parameters, thereby improving the accuracy and robustness of the model in multiple decision-making tasks.

（二）从决策角度看多头注意力（Multi-head attention from a decision-making perspective）

这一部分主要阐述了基于多头注意力机制的多粒度三向决策过程。多头注意力机制的核心目标是通过刻画输入特征矩阵 X 中元素之间的关系及其重要性来辅助决策。具体来说，设有 k 个注意力头，每个注意力头都会计算一个独立的注意力分布矩阵，其计算方式为：

This section mainly describes the multi-granularity, three-way decision-making process based on the multi-head attention mechanism. The core goal of the multi-head attention mechanism is to assist decision-making by characterizing the relationships and importance of elements in the input feature matrix X. Specifically, there are k attention heads, each of which calculates an independent attention distribution matrix, which is calculated as follows:

其中XT 表示矩阵转置，d 为缩放因子。最终的综合注意力分布W_combined 通过对所有注意力头的分布矩阵进行求和得到：

Where XT represents the matrix transpose and d is the scaling factor. The final combined attention distribution W_combined is obtained by summing the distribution matrices of all attention heads:

在应用层面，模型需要处理不同时间粒度下的特征，如 CPU 利用率和内存使用率。为此，初始化三个空集合 R1,R2,R3 用于存储中间与最终的三向决策结果。算法基于规则或算法方法，对每一粒度 di∈D 进行迭代，并得到对应的三向决策 Δi。例如，Dh1 表示一小时粒度，Dd1 表示一天粒度。

At the application level, the model needs to process features at different time granularities, such as CPU utilization and memory usage. To this end, three empty sets R1, R2, and R3 are initialized to store intermediate and final three-way decision results. Based on rules or algorithmic methods, the algorithm iterates for each granularity di∈D and obtains the corresponding three-way decision Δi. For example, Dh1 represents an hourly granularity, and Dd1 represents a daily granularity.

在不同粒度的决策结果融合过程中，引入了多头注意力机制。具体而言，模型以当前的三向决策 Δ 及某一粒度下的决策 Δi 作为输入，通过类似 Transformer 的注意力机制对二者进行信息融合，从而更新最终的三向决策 Δ。最终，算法输出基于多头注意力融合的分布式三向决策，使得决策过程能够综合多粒度信息并具备上下文敏感性。

A multi-head attention mechanism is introduced to fuse decision results at different granularities. Specifically, the model takes the current three-way decision Δ and the decision Δi at a specific granularity as input and fuses the information between them using a Transformer-like attention mechanism to update the final three-way decision Δ. Ultimately, the algorithm outputs a distributed three-way decision based on multi-head attention fusion, enabling the decision-making process to integrate information from multiple granularities and be context-sensitive.

四、知识补充（Knowledge supplement）

多头注意力机制最早由 Transformer 模型提出，其优势在于能够在并行的多个子空间中学习输入特征之间的不同关系。相比单头注意力，多头机制可以从不同的表示维度捕捉多样化的模式，从而提升模型在复杂任务中的表现。其核心思想是通过对输入向量进行线性变换，得到多个查询（Q）、键（K）、值（V），并在每个注意力头中独立计算注意力分布，再将各头的结果进行融合，形成一个更全面的特征表示。

The multi-head attention mechanism, first proposed by the Transformer model, has the advantage of being able to learn different relationships between input features in multiple, parallel subspaces. Compared to single-head attention, the multi-head mechanism can capture diverse patterns across different representation dimensions, thereby improving the model's performance on complex tasks. Its core idea is to perform a linear transformation on the input vector to obtain multiple queries (Q), keys (K), and values (V). The attention distribution is then independently calculated in each attention head, and the results from each head are then combined to form a more comprehensive feature representation.

三向决策理论）由 Yao（2010）提出，作为经典二向决策（接受/拒绝）的扩展，它引入了“延迟决策”这一中间选项，使得决策过程更符合不确定性和动态性环境下的认知规律。在多粒度时间序列分析中，三向决策可以分别对应不同时间尺度下的接受、拒绝与延迟操作，从而实现更加灵活的任务管理和资源调度。

The three-way decision theory, proposed by Yao (2010), extends the classic two-way decision model (accept/reject) by introducing the intermediate option of "delayed decision," making the decision process more consistent with cognitive laws in uncertain and dynamic environments. In multi-granularity time series analysis, the three-way decision model can correspond to acceptance, rejection, and delay operations at different time scales, enabling more flexible task management and resource scheduling.

将多头注意力与三向决策结合，可以在不同时间粒度下对决策信息进行动态加权和融合。一方面，注意力机制能够根据特征的重要性分配不同权重，实现自适应的信息整合；另一方面，三向决策框架提供了更丰富的决策表达能力，使得模型能够在面对不确定性时保留中间态，避免过早或过于武断的判断。这种结合能够显著提升分布式环境下的决策合理性与鲁棒性。

Combining multi-head attention with three-way decision-making allows for dynamic weighting and fusion of decision information at different time granularities. The attention mechanism assigns weights based on feature importance, enabling adaptive information integration. Furthermore, the three-way decision-making framework provides richer decision-making expressiveness, enabling the model to retain intermediate states in the face of uncertainty and avoid premature or overly arbitrary judgments. This combination significantly improves the rationality and robustness of decision-making in distributed environments.

今天的分享就到这里了。

如果您对文章有独特的想法，

欢迎给我们留言，让我们相约明天。

祝您今天过得开心快乐！

That's all for today's sharing.

If you have a unique idea about the article,

please leave us a message,

and let us meet tomorrow.

I wish you a nice day!

文案|yyz

排版|yyz

审核|hzy

翻译：谷歌翻译

参考资料：百度百科、Chat GPT

参考文献：Jiang C, Duan Y. Elasticity unleashed: Fine-grained cloud scaling through distributed three-way decision fusion with multi-head attention [J]. Information Sciences, 2024, 660(1): 1-15.

本文由LearningYard学苑整理发出，如有侵权请在后台留言！

来源：LearningYard学苑

标签： elasticityunleashed elasticity

本文地址：http://news.43b.com.cn/a/1489312.html

免责声明：本站系转载，并不代表本网赞同其观点和对其真实性负责。如涉及作品内容、版权和其它问题，请在30日内与本站联系，我们将在第一时间删除内容!