本书是面向高等院校计算机相关专业的机器学习教材。全书以机器学习应用程序的开发流程为主线,详细介绍数据预处理和多种算法模型的概念与原理;以Python 和Spark 为落地工具,使读者在实践中掌握项目代码编写、调试和分析的技能。本书最后两章是两个实战项目,举例讲解机器学习的工程应用。本书内容丰富、结构清晰、语言流畅、案例充实,还配备了丰富的教学资源,包括源代码、教案、电子课件和习题答案,读者可以在华信教育资源网下载。
		
	
孙立炜,厦门南洋职业学院大数据技术教研室主任。解放军电子工程学院信号与信息处理专业硕士研究生,大数据高级分析师。主要研究方向为数据挖掘、Hadoop大数据技术。在CN刊物公开发表论文20篇,主编教材1部,主持申报并获得软件著作权4项,主持市级以上科研课题3项,主持精品课程项目1项。
第 1 章 机器学习技术简介 ···············································································1 
1.1 机器学习简介 ·······················································································1 
1.1.1 机器学习的概念············································································1 
1.1.2 机器学习的算法模型······································································1 
1.1.3 机器学习应用程序开发步骤·····························································2 
1.2 机器学习的实现工具 ··············································································3 
1.3 Python 平台搭建 ····················································································3 
1.3.1 集成开发环境 Anaconda ··································································4 
1.3.2 集成开发环境 PyCharm···································································7 
1.3.3 搭建虚拟环境············································································.10 
1.3.4 配置虚拟环境············································································.13 
1.4 Spark 平台搭建···················································································.17 
1.4.1 Spark 的部署方式·······································································.17 
1.4.2 安装 JDK··················································································.18 
1.4.3 安装 Scala·················································································.21 
1.4.4 安装开发工具 IDEA ····································································.22 
1.4.5 安装 Spark ················································································.24 
1.4.6 安装 Maven···············································································.25 
1.5 基于 Python 创建项目 ··········································································.27 
1.6 基于 Spark 创建项目············································································.29 
习题 1 ·····································································································.32 
第 2 章 数据预处理 ·····················································································.34
2.1 数据预处理的概念 ··············································································.34 
2.1.1 数据清洗··················································································.34 
2.1.2 数据转换··················································································.35 
2.2 基于 Python 的数据预处理 ····································································.37 
2.3 基于 Spark 的数据预处理······································································.43
习题 2·······························································································.46 
第 3 章 分类模型 ························································································.48
3.1 分类模型的概念 ·················································································.48 
3.2 分类模型的算法原理 ···········································································.51 
3.2.1 决策树算法···············································································.51 
3.2.2 最近邻算法···············································································.56 
3.2.3 朴素贝叶斯算法·········································································.58 
3.2.4 逻辑回归算法············································································.59 
3.2.5 支持向量机算法·········································································.59 
3.3 基于 Python 的分类建模实例 ·································································.60 
3.4 基于 Spark 的分类建模实例···································································.63 
习题 3 ·····································································································.67 
第 4 章 聚类模型 ························································································.70
4.1 聚类模型的概念 ·················································································.70 
4.1.1 聚类模型概述············································································.70 
4.1.2 聚类模型中的相似度计算方法·······················································.71 
4.1.3 聚类算法的评价············································