MODULE 1: Programming with Python
Machine Learning and Artificial Intelligence are the future of technology, but to approach these disciplines, you must have a solid foundation in computer programming. The purpose of this course is to make you a programmer, and then allow you to enter the world of Artificial Intelligence, and we will do this using the most widely used programming language in the AI field: Python.
MODULE OBJECTIVES
- Design and develop complex software independently
- Automate repetitive procedures using Python
- Analyze data of any type with Python
TOPICS COVERED
- Constant variables and data types
- Data collections, conditional statements
- Loops, Procedural Programming
- Exceptions
- File operations
- Code modularization
- The Standard Library
- PyPI and PIP
- Virtual environments with virtualen
- PRACTICAL EXERCISE PROJECT
MODULE 2: Principles of Descriptive Statistics with Python
All Data Science projects have an important data exploration phase based on the application of methodologies that are part of Descriptive Statistics. This branch of statistics is used to extract the first information from a data set, as well as during the exploratory phase before the construction of a Machine Learning model.
MODULE OBJECTIVES
- Analyze data using Python
- Understand the distributions of each variable of interest
- Describe the data with position, variability, and shape indices
- Produce summary graphics and tables using ggplot2
TOPICS COVERED
- Programming with R
- Study designs
- Data synthesis
- Visualizations with ggplot2
- Position indices
- Variability indices
- Probability calculation
- Normal distribution
- Shape indices
- PRACTICAL EXERCISE PROJECT
MODULE 3: Principles of Inferential Statistics with Python
Understanding phenomena and making predictions based on data is one of the most common activities of a data scientist. You will learn exactly how to do that by studying the principles of the scientific method and inferential statistics, which you will apply through Python.
MODULE OBJECTIVES
- Understand statistical sampling
- Conduct a hypothesis test
- Relate variables to each other using analytical and graphical tools
- Discover correlations and build a simple and multiple linear regression model.
TOPICS COVERED
- Sample and population
- Central limit theorem
- Hypothesis testing
- The z-test
- Student's t-distribution
- Relationships between variables
- Correlation and association measures
- The chi-square test
- Simple and multiple linear regression
- Model selection metrics
- PRACTICAL EXERCISE PROJECT
MODULE 4: Fundamentals of Machine Learning
Machine Learning is the hottest area in the tech sector at the moment and is the reason behind the exponential growth of the Artificial Intelligence and Data Science sectors. In this course, we will introduce you to the general basics of Machine Learning, its history, its characteristic elements, and its taxonomy. Together we will discover what data manipulation consists of and why it is so important when it comes to Artificial Intelligence. In addition, we will teach you how to build regression, classification, and clustering machine learning models using Python and the most popular machine learning library: scikit-learn.
MODULE OBJECTIVES
- Classify a Machine Learning problem
- Clean and prepare a dataset appropriately (missing values, duplicates, outliers, categorical variables)
- Build machine learning models for regression problems with linear regression
- Build machine learning models for classification problems with logistic regression
- Build machine learning models to perform clustering with the K-means algorithm
- Evaluate a machine learning model using the right metrics
TOPICS COVERED
- Types of data and types of variables
- Data preprocessing techniques
- Simple linear regression
- Multiple and polynomial regression
- Overfitting and regularization techniques
- Classification with logistic regression
- Clustering with K-Means
- PRACTICAL EXERCISE PROJECT
MODULE 5: Machine Learning: Models and Algorithms
The purpose of this course is to take you beyond the horizons and basic techniques of Machine Learning. You will study the main algorithms and models that characterize it and learning how to use them concretely using Python and scikit-learn.
MODULE OBJECTIVES
- Understand the main optimization algorithms and how to choose the correct one
- Build online learning systems that can learn and improve continuously
- Learn to use all the main supervised Machine
- Learning models
- Build Artificial Neural Networks models using scikit-learn
TOPICS COVERED
- Gradient Descent and optimization algorithms
- Parametric and non-parametric models
- Naive Bayes
- Support Vector Machine (SVM)
- Neural Networks
- K-Nearest Neighbors (K-NN)
- Decision tree and Random forest
- PRACTICAL EXERCISE PROJECT
MODULE 6: Machine Learning: Advanced Techniques
AutoML, MLOps, Computer Vision, Dimensionality Reduction are all advanced techniques that give an almost unfair advantage to Data Scientists who know how to use them correctly. It is essential to know how to automate tasks to achieve your goals more effectively and faster.
MODULE OBJECTIVES
- Automate the search for hyperparameters using algorithms such as Grid Search and Random
- Use AutoML with the Microsoft FLAML framework
- Create and export an MLOps pipeline with joblib and pickle
- Understand time series and perform forecasting using the popular Facebook Prophet algorithm
- Reduce the dimensionality of a dataset using techniques such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA)
- Understand the main computer vision techniques to classify images and recognize people and objects within them using OpenCV
- Build recommendation systems
TOPICS COVERED
- Hyperparameter optimization techniques
- AutoML Hyperparameters tuning with FLAML
- Basics of MLOps
- Time series analysis
- Basics of computer vision
- Dimensionality reduction
- Recommendation systems
- PRACTICAL EXERCISE PROJECT
MODULE 7: Deep Learning and Artificial Neural Networks
The incredible progress that artificial intelligence has made in the last decade has a very specific name: deep learning. Deep learning is the set of techniques and methods used to train deep artificial neural network models. This course will introduce you to the fundamental concepts of deep learning and analyze various neural network architectures that can be created and their applications. To do all this, we will use the most popular and requested deep learning framework by companies, which is Tensorflow.
MODULE OBJECTIVES
- Implement Artificial Neural Network models for regression problems
- Implement Artificial Neural Network models for classification problems
- Implement Convolutional Neural Network (CNN) models for Computer Vision
- Implement Recurrent Neural Network (RNN) models for Natural Language Processing
- Implement Mixed Neural Network Architectures
- Train Deep Learning models in the Cloud and on GPUs
TOPICS COVERED
- Basics of Artificial Neural Networks
- Training and Optimization Methods
- Cloud and GPU Training
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
- Mixed Neural Network Architectures
- PRACTICAL EXERCISE PROJECT
MODULE 8: Natural Language Processing
Thanks to Natural Language Processing, companies are automating many of the repetitive and boring tasks, with enormous cost savings. Chatbots, virtual assistants, and semantic search engines are increasingly in demand. Training in this field is difficult, as dedicated resources are scarce, sometimes too superficial, and other times too theoretical. With this course, you can learn the general basics of Natural Language Processing, such as cleaning and manipulating textual data or analyzing lexical and morphological aspects
MODULE OBJECTIVES
- Build text classification systems
- Build sentiment analysis systems
- Build topic modeling systems
- Build language recognition systems
- Build systems for identifying related documents
TOPICS COVERED
- Preprocessing of textual documents
- Text encoding techniques
- Textual document classification
- Language identification
- Sentiment analysis
- Topic modeling
- Part of speech tagging (PoS)
- Named entity recognition (NER)
- Word Embedding
- Word2Vec
- PRACTICAL EXERCISE PROJECT
MODULE 9: SQL for Data Science
In the coming years, data will not only increase in number but also in complexity and importance. One of the best ways to retrieve and store data is certainly through SQL, the de facto language for relational databases. You will learn how to perform queries independently, which will allow you to dig deeper into data and obtain solutions to questions that could optimize the way companies conduct their business.
MODULE OBJECTIVES
- Analyze data using SQL
- Create, modify, and update databases on MySQL
- Use MariaDB to execute more complex queries
TOPICS COVERED
- Elements of a database
- First queries
- String operations
- Filters
- Data aggregation
- Data cross-referencing
- Data structure manipulation
- Introduction to NoSQL
- PRACTICAL EXERCISE PROJECT
MODULE 10: Technologies and Principles for Big Data
Every minute, 350,000 stories are published on Instagram, 400,000 hours of video footage are viewed on Netflix, and 40 million messages are sent on WhatsApp. Nowadays, being able to navigate through this multitude of data has become an indispensable skill for a Data Scientist. A good Data Scientist not only needs to know how to analyze Big Data but also possess a good knowledge of the typical architecture of Big Data storage and analysis solutions and the technologies of the ecosystem. In addition, training machine learning models (such as regression, classification, clustering, and recommendation systems) requires special precautions when working with large amounts of data. In this course, you will discover which precautions to take and how to apply them in practice..
MODULE OBJECTIVES
- Analyze large amounts of data with Python and Spark
- Use Cloud Computing systems (AWS) to analyze Big Data
- Create an ETL (Extract, Transform, Load) pipeline for Big Data
- Create machine learning models on Big Data
- Use Spark Streaming with Python for real-time analysis of Big Data
- Create a Data Lake with AWS S3 and Glue
- Use the Databricks platform
TOPICS COVERED
- Big Data technologies
- Apache Spark
- Cloud solutions: Databricks and AWS EMR
- Using Spark with Zeppelin
- Resilient Distributed Dataset (RDD)
- Big Data analysis with Spark SQL
- Machine learning on Big Data with Spark MLlib
- Data Lake and Data Warehouse
- Techniques and technologies for storing Big Data
- Real-time analysis of Big Data with Spark Streaming
- PRACTICAL EXERCISE PROJECT
MODULE 11: Data Visualization Techniques
A good Data Scientist must have transversal skills in these 3 areas: mathematics, computer science, and communication. This last ability is often overlooked, but it is extremely important. A Data Scientist who stands out from others is the one who can show clearly and effectively to the team or client what information they have extracted from the data. Together, we will learn what Data Visualization is and how to create graphs, and how to create storytelling to effectively tell the data. We will do all of this using Tableau, the most popular software for interactive data visualization.
MODULE OBJECTIVES
- Create various types of visualizations using Tableau
- Use Gestalt principles to reduce the cognitive load of a visualization
- Conduct the necessary analyses to create a successful visualization
- Use graphic elements correctly
- Create visualizations in an ethical manner
TOPICS COVERED
- Introduction to data visualization
- Guide to using Tableau
- Neuroscience of visualization
- Methods for creating good visualizations
- Audience study
- Visualization design