摘要:逻辑回归(Logistic Regression)是一种经典的监督学习算法,尽管名称含"回归",但实际用于解决分类问题(尤其是二分类)。它通过Sigmoid函数将线性回归的输出映射到[0,1]区间,表示样本属于某一类别的概率,广泛应用于信用评分、疾病预测、垃圾
分享兴趣,传播快乐,增长见闻,留下美好!
亲爱的您,这里是LearningYard新学苑。
今天小编为大家带来
“Python中的逻辑回归:原理与实践”。
欢迎您的访问!
Share interest, spread happiness, increase knowledge, and leave beautiful.
Dear, this is the LearingYard New Academy!
Today, the editor brings the
"Logistic Regression in Python: Principles and Practice ".
Welcome to visit!
思维导图
Mind mapping
逻辑回归(Logistic Regression)是一种经典的监督学习算法,尽管名称含"回归",但实际用于解决分类问题(尤其是二分类)。它通过Sigmoid函数将线性回归的输出映射到[0,1]区间,表示样本属于某一类别的概率,广泛应用于信用评分、疾病预测、垃圾邮件识别等场景。
Logistic Regression is a classic supervised learning algorithm. Despite the word "regression" in its name, it is actually used to solve classification problems (especially binary classification). It maps the output of linear regression to the [0,1] interval through the Sigmoid function, representing the probability that a sample belongs to a certain category. It is widely used in credit scoring, disease prediction, spam identification, and other scenarios.
核心原理
Core Principles
1. 从线性回归到逻辑回归 - 线性回归输出连续值(如`y = w*x + b`),但分类问题需要离散的类别标签。 - 逻辑回归引入Sigmoid函数(也称为Logistic函数),将线性回归的结果`z = w*x + b`转换为概率值
Sigmoid函数特性:输出值在(0,1)之间,当`z→+∞`时`σ(z)→1`,当`z→-∞`时`σ(z)→0`,中间在`z=0`处为0.5。
1. From Linear Regression to Logistic Regression - Linear regression outputs continuous values (e.g., `y = w*x + b`), but classification problems require discrete class labels.
Logistic regression introduces the Sigmoid function (also known as the Logistic function) to convert the result of linear regression `z = w*x + b` into a probability value
Characteristics of the Sigmoid function: The output value is between (0,1). When `z→+∞`, `σ(z)→1`; when `z→-∞`, `σ(z)→0`; and it is 0.5 at `z=0`.
2. 分类决策 - 设定阈值(通常为0.5):若`σ(z) ≥ 0.5`,预测为类别1;否则预测为类别0。 - 例如:在垃圾邮件识别中,若概率≥0.5,判定为"垃圾邮件";否则为"正常邮件"。
2. Classification Decision - Set a threshold (usually 0.5): If `σ(z) ≥ 0.5`, predict as class 1; otherwise, predict as class 0. - For example: In spam identification, if the probability is ≥0.5, it is judged as "spam"; otherwise, it is "normal email".
与线性回归的区别
Differences from Linear Regression
在任务类型上,逻辑回归主要用于分类(以二分类为主),而线性回归用于回归,即预测连续值。 输出范围方面,逻辑回归的输出是[0,1]之间的概率值,线性回归的输出则是(-∞,+∞)的连续值。 损失函数不同,逻辑回归采用对数损失(Log Loss),线性回归使用均方误差(MSE)。 核心函数上,逻辑回归包含Sigmoid非线性转换,线性回归则是纯线性函数。
In terms of task type, logistic regression is mainly used for classification (mainly binary classification), while linear regression is used for regression, i.e., predicting continuous values. In terms of output range, the output of logistic regression is a probability value between [0,1], while the output of linear regression is a continuous value in (-∞,+∞). The loss functions are different: logistic regression uses Log Loss, and linear regression uses Mean Squared Error (MSE). In terms of core functions, logistic regression includes Sigmoid non-linear transformation, while linear regression is a pure linear function.
Python实现与示例
Python Implementation and Examples
1. 二分类逻辑回归(以乳腺癌预测为例)
2. 多分类逻辑回归 逻辑回归可通过"一对多"(One-vs-Rest)策略扩展到多分类:
关键参数与调优
Key Parameters and Tuning
正则化参数: - `C`:正则化强度的倒数(`C`越小,正则化越强),用于防止过拟合,默认值为1.0。 - `penalty`:正则化方式,`l2`(默认)、`l1`、`elasticnet`(需配合`solver='saga'`)。
Regularization Parameters: - `C`: Inverse of regularization strength (smaller `C` means stronger regularization), used to prevent overfitting, default is 1.0. - `penalty`: Regularization method, `l2` (default), `l1`, `elasticnet` (needs to be used with `solver='saga'`).
求解器选择*: - `solver='lbfgs'`:默认求解器,适合中小型数据集。 - `solver='saga'`:支持所有正则化方式,适合大型数据集。
Solver Selection: - `solver='lbfgs'`: Default solver, suitable for small and medium-sized datasets. - `solver='saga'`: Supports all regularization methods, suitable for large datasets.
其他重要参数:- `max_iter`:最大迭代次数(默认100),若模型未收敛需增大(如10000)。 `class_weight`:处理不平衡数据(如`class_weight='balanced'`自动调整类别权重)。
Other Important Parameters:- `max_iter`: Maximum number of iterations (default 100), needs to be increased (e.g., 10000) if the model does not converge.-`class_weight`:Handles imbalanced data (e.g., `class_weight='balanced'` automatically adjusts class weights).
优缺点与适用场景
Advantages, Disadvantages, and Applicable Scenarios
优点(Advantages)
可解释性强:通过系数`coef_`可直接看出特征对分类的影响(正系数表示该特征增加时,样本更可能属于类别1)。
训练速度快:计算量小,适合大规模数据集。
输出概率值:不仅能预测类别,还能给出置信度(如"有90%的概率为垃圾邮件")。
Strong interpretability: The impact of features on classification can be directly observed through the `coef_` coefficient. A positive coefficient indicates that as the feature increases, the sample is more likely to belong to Class 1.
Fast training speed: With low computational complexity, it is suitable for large-scale datasets.
Probability output: It can not only predict the class but also provide confidence (e.g., "90% probability of being spam").
缺点(Disadvantages)
只能处理线性可分数据:对非线性关系需要手动构造特征(如多项式特征)。
对异常值敏感:需预处理去除或修正异常值。
二分类为主:多分类场景表现不如专门的多分类算法(如随机森林)。
Only handles linearly separable data: Manual feature engineering (such as polynomial features) is required for non-linear relationships.
Sensitive to outliers: Outliers need to be removed or corrected through preprocessing.
Primarily for binary classification: Performs less effectively than specialized multi-class algorithms (e.g., Random Forest) in multi-class scenarios.
适用场景(Applicable Scenarios)
二分类问题(如用户流失预测、疾病诊断)。
需要概率输出的场景(如信用评分:输出违约概率)。
作为基线模型快速验证业务假设。
Binary classification problems (e.g., user churn prediction, disease diagnosis).
Scenarios requiring probability output (e.g., credit scoring: output default probability).
As a baseline model to quickly verify business hypotheses.
逻辑回归是分类任务中的基础算法,其简单性与可解释性使其成为工业界的常用工具。在Python中,借助scikit-learn库可快速实现并调优逻辑回归模型,结合特征工程(如标准化、特征选择)能进一步提升其性能。
Logistic regression is a fundamental algorithm in classification tasks. Its simplicity and interpretability make it a commonly used tool in industry. In Python, logistic regression models can be quickly implemented and tuned with the scikit-learn library, and their performance can be further improved by combining with feature engineering (such as standardization and feature selection).
今天的分享就到这里了,
如果您对文章有独特的想法,
欢迎给我们留言。
让我们相约明天,
祝您今天过得开心快乐!
That's all for today's sharing.
If you have a unique idea about the article,
please leave us a message,
and let us meet tomorrow.
I wish you a nice day!
翻译:文心一言
参考资料:百度百科
本文由LearningYard新学苑整理并发出,如有侵权请后台留言沟通。
文案|qiu
排版|qiu
审核|song
来源:LearningYard学苑
