越览（134）——精读期刊论文的3案例分析（2）

摘要：This issue will introduce the case study of the intensively read replica paper "Crowd intelligence knowledge mining method based o

分享兴趣，传播快乐，

增长见闻，留下美好。

亲爱的您，这里是LearingYard学苑！

今天小编为您带来“越览（134）——精读期刊论文

《基于共词网络的群智知识挖掘方法

——在应急决策中应用》的

3案例分析（2）”。

欢迎您的访问！

Share interest, spread happiness,

increase Knowledge, and leave beautiful.

Dear, this is the LearingYard Academy!

Today, the editor brings the

＂Yue Lan (134)：Intensive reading of the journal article

＇Crowd intelligence knowledge mining method

based on co-word network– application

in emergency decision-making’

3 Case study(2)".

Welcome to visit!

一、内容摘要（Summary of Content）

本期推文将从思维导图、精读内容、知识补充三个方面介绍精读复刻论文《基于共词网络的群智知识挖掘方法——在应急决策中应用》的3案例分析（2）。

This issue will introduce the case study of the intensively read replica paper "Crowd intelligence knowledge mining method based on co-word network – application in emergency decision-making" in terms of mind maps, intensively read content, and knowledge supplementation.

二、思维导图（Mind mapping）

三、精读内容（Intensive reading content）

本节以中国武汉ＣＯＶＩＤ⁃１９应急事件为例，依托新浪微博为数据获取渠道，对所提出的群智知识挖掘方法进行应用与分析。

This section takes the COVID-19 emergency in Wuhan, China as an example, relying on Sina Weibo as a data acquisition channel, to apply and analyze the proposed crowd intelligence knowledge mining method.

步骤一：使用Python-Jieba分词工具对博文进行分词处理，去除停用词后获得108,414个有效术语。选取词频Top 500的术语并生成词云图。通过Sk-learn框架计算TF-IDF矩阵，得到25,006个特征术语。最后，采用头尾断裂法筛选出476个有效术语，作为共词网络构建的特征关键词。

Step 1: Use Python-Jieba word segmentation tool to segment the blog post, and get 108,414 valid terms after removing stop words. Select the top 500 terms and generate a word cloud. Calculate the TF-IDF matrix through the Sk-learn framework and get 25,006 feature terms. Finally, use the head-tail break method to select 476 valid terms as the feature keywords for the co-word network construction.

步骤二：通过专家打分法确定转发、评论和点赞的权重，使用sigmoid映射函数计算UGCs的影响力，避免高交互数据博文主导影响力分布。基于数据预处理结果，提取共现信息并使用Word2Vec计算共词对的语义相似度，结合共现情况构建共词矩阵。最后，分析共现强度与博文影响力及语义相似度的关系，并比较不同共现强度定义下的网络模块度差异。

Step 2: Determine the weights of forwarding, commenting, and liking through expert scoring, and use the sigmoid mapping function to calculate the influence of UGCs to avoid high-interaction data blogs dominating the influence distribution. Based on the data preprocessing results, extract co-occurrence information and use Word2Vec to calculate the semantic similarity of co-word pairs, and build a co-word matrix based on the co-occurrence situation. Finally, analyze the relationship between co-occurrence intensity and blog influence and semantic similarity, and compare the differences in network modularity under different co-occurrence intensity definitions.

步骤三：使用Louvain算法对共词网络进行社区检测，识别出群智知识主题社区，并通过关键词归纳人工标识主题名称。实验结果显示，大多数关键词节点映射到与疫情危机相关的主题社区。

Step 3: Use the Louvain algorithm to perform community detection on the co-word network, identify the crowd-intelligence knowledge topic community, and manually identify the topic name through keyword induction. The experimental results show that most keyword nodes are mapped to topic communities related to the epidemic crisis.

步骤四：根据UGCs特征表示结果，识别各社区的关键词及相关博文UGCs，并测量群智知识价值。专家群体经过多轮讨论确定了四个应急方案选择的主题因素：人员伤亡（c1）、二次危机（c2）、社会影响（c3）和应急效果（c4）。通过相应公式计算得到基于群智知识的客观属性权重。

Step 4: Based on the results of UGCs feature representation, identify the keywords and related blog UGCs of each community, and measure the value of crowd intelligence knowledge. After multiple rounds of discussion, the expert group identified four thematic factors for emergency plan selection: casualties (c1), secondary crisis (c2), social impact (c3), and emergency effect (c4). The objective attribute weight based on crowd intelligence knowledge was calculated through the corresponding formula.

步骤五：专家群体确定决策属性后，通过协商得出群体属性偏好，并计算一致性指标为0.9375，最终应急决策属性权重为{0.401, 0.197, 0.252, 0.150}。20位专家评估5个备选方案，并加权得出最终得分，最终选定X5为最佳应急方案。

Step 5: After the expert group determines the decision attributes, the group attribute preferences are obtained through consultation, and the consistency index is calculated to be 0.9375. The final emergency decision attribute weights are {0.401, 0.197, 0.252, 0.150}. 20 experts evaluate 5 alternatives and weight them to get the final score. Finally, X5 is selected as the best emergency plan.

最后针对基于公众行为大数据分析的群智知识挖掘方法，选择了两种相关方法进行比较，验证所提方法的有效性。通过比较属性权值排序结果和一致性，发现不同方法得到的属性权值存在差异。

Finally, two related methods were selected for comparison based on the crowd intelligence knowledge mining method based on public behavior big data analysis to verify the effectiveness of the proposed method. By comparing the ranking results and consistency of attribute weights, it was found that there were differences in the attribute weights obtained by different methods.

方法1和方法3的排序结果完全一致，而方法2与基于Word2Vec技术的比较方法一致性较低。原因在于，TF-IDF算法未考虑单词之间的语义相似性，导致数据维度灾难问题，从而影响分析结果的准确性。相比之下，本文提出的方法通过引入Word2Vec算法，减少了维度灾难的负面影响，提高了结果的有效性。

The ranking results of methods 1 and 3 are completely consistent, while the consistency between method 2 and the comparison method based on Word2Vec technology is low. The reason is that the TF-IDF algorithm does not consider the semantic similarity between words, which leads to the problem of data Dimensionality disaster, thus affecting the accuracy of the analysis results. In contrast, the method proposed in this paper reduces the negative impact of dimensionality disaster and improves the effectiveness of the results by introducing the Word2Vec algorithm.

四、知识补充（Knowledge supplement）

Word2Vec 是一种将词语转换为向量表示的算法，属于词嵌入（Word Embedding）技术，广泛应用于自然语言处理（NLP）任务中。它通过将词语转换为密集的、低维的向量，捕捉词与词之间的语义关系，进而能够实现很多有用的语言理解任务。

Word2Vec is an algorithm that converts words into vector representations. It is a word embedding technology and is widely used in natural language processing (NLP) tasks. It converts words into dense, low-dimensional vectors to capture the semantic relationship between words, thereby enabling many useful language understanding tasks.

Word2Vec 的核心思想是通过上下文预测和相似度计算来学习词语的向量表示。它的目标是使得语义相似的词语在向量空间中更接近。Word2Vec 主要有两种训练方法：

The core idea of Word2Vec is to learn the vector representation of words through context prediction and similarity calculation. Its goal is to make semantically similar words closer in the vector space. There are two main training methods for Word2Vec:

Skip-gram：这个方法的目标是用一个词来预测它周围的上下文词汇。即给定一个中心词，模型会根据这个词来预测它在某个窗口内出现的词汇。例如，给定“狗”这个词，模型会预测“跑”，“跳”等相关的上下文词。

Skip-gram: The goal of this method is to use a word to predict the context words around it. That is, given a central word, the model will predict the words that appear in a certain window based on this word. For example, given the word "dog", the model will predict related context words such as "run", "jump", etc.

Continuous Bag of Words (CBOW)：这个方法的目标是用一组上下文词汇来预测中心词。即通过周围词的组合来推测中间词的含义。

Continuous Bag of Words (CBOW): The goal of this method is to predict the central word using a set of context words. That is, to infer the meaning of the middle word through the combination of surrounding words.

特点和优势如下：

Features and advantages are as follows:

1. 捕捉语义相似性：Word2Vec 能够将语义相近的词（如“狗”和“猫”）映射到向量空间中相近的位置。

1. Capturing semantic similarity: Word2Vec can map semantically similar words (such as "dog" and "cat") to similar positions in the vector space.

2. 降维效果：通过将高维稀疏的词语表示（如one-hot编码）映射到低维密集向量，可以有效降低计算和存储的成本。

2. Dimensionality reduction effect: By mapping high-dimensional sparse word representations (such as one-hot encoding) to low-dimensional dense vectors, the cost of computing and storage can be effectively reduced.

3. 词向量运算：可以通过向量的加减运算来推理词汇的关系，如“王-男+女=女王”。

3. Word vector operation: The relationship between words can be inferred through vector addition and subtraction operations, such as "王-男+女=女王".

今天的分享就到这里了。

如果您对文章有独特的想法，

欢迎给我们留言，让我们相约明天。

祝您今天过得开心快乐！

That's all for today's sharing.

If you have a unique idea about the article,

please leave us a message,

and let us meet tomorrow.

I wish you a nice day!

文案|yyz