MD Academy - Data Science Software Development

The Data Science course is a comprehensive program designed to equip individuals with the skills needed to succeed in the dynamic field of data science. With a focus on SQL, the course covers topics such as database elements, string operations, filtering, and data manipulation. The program also delves into big data technologies and principles, teaching participants to analyze large datasets with Python and Spark, use cloud computing solutions, and build ETL pipelines. Additionally, the course emphasizes the importance of data visualization, teaching students to create effective visualizations using Tableau while also addressing ethical considerations. By the end of the course, participants will have gained a deep understanding of data science fundamentals and be equipped with practical skills for real-world applications.

MODULE 1: Programming with Python

Machine Learning and Artificial Intelligence are the future of technology, but to approach these disciplines, you must have a solid foundation in computer programming. The purpose of this course is to make you a programmer, and then allow you to enter the world of Artificial Intelligence, and we will do this using the most widely used programming language in the AI field: Python.

MODULE OBJECTIVES
- Design and develop complex software independently
- Automate repetitive procedures using Python
- Analyze data of any type with Python

TOPICS COVERED
- Constant variables and data types
- Data collections, conditional statements
- Loops, Procedural Programming
- Exceptions
- File operations
- Code modularization
- The Standard Library
- PyPI and PIP
- Virtual environments with virtualen
- PRACTICAL EXERCISE PROJECT

MODULE 2: Principles of Descriptive Statistics with Python

All Data Science projects have an important data exploration phase based on the application of methodologies that are part of Descriptive Statistics. This branch of statistics is used to extract the first information from a data set, as well as during the exploratory phase before the construction of a Machine Learning model.

MODULE OBJECTIVES
- Analyze data using Python
- Understand the distributions of each variable of interest
- Describe the data with position, variability, and shape indices
- Produce summary graphics and tables using ggplot2

TOPICS COVERED
- Programming with R
- Study designs
- Data synthesis
- Visualizations with ggplot2
- Position indices
- Variability indices
- Probability calculation
- Normal distribution
- Shape indices
- PRACTICAL EXERCISE PROJECT

MODULE 3: Principles of Inferential Statistics with Python

Understanding phenomena and making predictions based on data is one of the most common activities of a data scientist. You will learn exactly how to do that by studying the principles of the scientific method and inferential statistics, which you will apply through Python.

MODULE OBJECTIVES
- Understand statistical sampling
- Conduct a hypothesis test
- Relate variables to each other using analytical and graphical tools
- Discover correlations and build a simple and multiple linear regression model.

TOPICS COVERED
- Sample and population
- Central limit theorem
- Hypothesis testing
- The z-test
- Student's t-distribution
- Relationships between variables
- Correlation and association measures
- The chi-square test
- Simple and multiple linear regression
- Model selection metrics
- PRACTICAL EXERCISE PROJECT

MODULE 4: Fundamentals of Machine Learning

Machine Learning is the hottest area in the tech sector at the moment and is the reason behind the exponential growth of the Artificial Intelligence and Data Science sectors. In this course, we will introduce you to the general basics of Machine Learning, its history, its characteristic elements, and its taxonomy. Together we will discover what data manipulation consists of and why it is so important when it comes to Artificial Intelligence. In addition, we will teach you how to build regression, classification, and clustering machine learning models using Python and the most popular machine learning library: scikit-learn.

MODULE OBJECTIVES
- Classify a Machine Learning problem
- Clean and prepare a dataset appropriately (missing values, duplicates, outliers, categorical variables)
- Build machine learning models for regression problems with linear regression
- Build machine learning models for classification problems with logistic regression
- Build machine learning models to perform clustering with the K-means algorithm
- Evaluate a machine learning model using the right metrics

TOPICS COVERED
- Types of data and types of variables
- Data preprocessing techniques
- Simple linear regression
- Multiple and polynomial regression
- Overfitting and regularization techniques
- Classification with logistic regression
- Clustering with K-Means
- PRACTICAL EXERCISE PROJECT

MODULE 5: Machine Learning: Models and Algorithms

The purpose of this course is to take you beyond the horizons and basic techniques of Machine Learning. You will study the main algorithms and models that characterize it and learning how to use them concretely using Python and scikit-learn.

MODULE OBJECTIVES
- Understand the main optimization algorithms and how to choose the correct one
- Build online learning systems that can learn and improve continuously
- Learn to use all the main supervised Machine
- Learning models
- Build Artificial Neural Networks models using scikit-learn

TOPICS COVERED
- Gradient Descent and optimization algorithms
- Parametric and non-parametric models
- Naive Bayes
- Support Vector Machine (SVM)
- Neural Networks
- K-Nearest Neighbors (K-NN)
- Decision tree and Random forest
- PRACTICAL EXERCISE PROJECT

MODULE 6: Machine Learning: Advanced Techniques

AutoML, MLOps, Computer Vision, Dimensionality Reduction are all advanced techniques that give an almost unfair advantage to Data Scientists who know how to use them correctly. It is essential to know how to automate tasks to achieve your goals more effectively and faster.

MODULE OBJECTIVES
- Automate the search for hyperparameters using algorithms such as Grid Search and Random
- Use AutoML with the Microsoft FLAML framework
- Create and export an MLOps pipeline with joblib and pickle
- Understand time series and perform forecasting using the popular Facebook Prophet algorithm
- Reduce the dimensionality of a dataset using techniques such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA)
- Understand the main computer vision techniques to classify images and recognize people and objects within them using OpenCV
- Build recommendation systems

TOPICS COVERED
- Hyperparameter optimization techniques
- AutoML Hyperparameters tuning with FLAML
- Basics of MLOps
- Time series analysis
- Basics of computer vision
- Dimensionality reduction
- Recommendation systems
- PRACTICAL EXERCISE PROJECT

MODULE 7: Deep Learning and Artificial Neural Networks

The incredible progress that artificial intelligence has made in the last decade has a very specific name: deep learning. Deep learning is the set of techniques and methods used to train deep artificial neural network models. This course will introduce you to the fundamental concepts of deep learning and analyze various neural network architectures that can be created and their applications. To do all this, we will use the most popular and requested deep learning framework by companies, which is Tensorflow.

MODULE OBJECTIVES
- Implement Artificial Neural Network models for regression problems
- Implement Artificial Neural Network models for classification problems
- Implement Convolutional Neural Network (CNN) models for Computer Vision
- Implement Recurrent Neural Network (RNN) models for Natural Language Processing
- Implement Mixed Neural Network Architectures
- Train Deep Learning models in the Cloud and on GPUs

TOPICS COVERED
- Basics of Artificial Neural Networks
- Training and Optimization Methods
- Cloud and GPU Training
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
- Mixed Neural Network Architectures
- PRACTICAL EXERCISE PROJECT

MODULE 8: Natural Language Processing

Thanks to Natural Language Processing, companies are automating many of the repetitive and boring tasks, with enormous cost savings. Chatbots, virtual assistants, and semantic search engines are increasingly in demand. Training in this field is difficult, as dedicated resources are scarce, sometimes too superficial, and other times too theoretical. With this course, you can learn the general basics of Natural Language Processing, such as cleaning and manipulating textual data or analyzing lexical and morphological aspects

MODULE OBJECTIVES
- Build text classification systems
- Build sentiment analysis systems
- Build topic modeling systems
- Build language recognition systems
- Build systems for identifying related documents

TOPICS COVERED
- Preprocessing of textual documents
- Text encoding techniques
- Textual document classification
- Language identification
- Sentiment analysis
- Topic modeling
- Part of speech tagging (PoS)
- Named entity recognition (NER)
- Word Embedding
- Word2Vec
- PRACTICAL EXERCISE PROJECT

MODULE 9: SQL for Data Science

In the coming years, data will not only increase in number but also in complexity and importance. One of the best ways to retrieve and store data is certainly through SQL, the de facto language for relational databases. You will learn how to perform queries independently, which will allow you to dig deeper into data and obtain solutions to questions that could optimize the way companies conduct their business.

MODULE OBJECTIVES
- Analyze data using SQL
- Create, modify, and update databases on MySQL
- Use MariaDB to execute more complex queries

TOPICS COVERED
- Elements of a database
- First queries
- String operations
- Filters
- Data aggregation
- Data cross-referencing
- Data structure manipulation
- Introduction to NoSQL
- PRACTICAL EXERCISE PROJECT

MODULE 10: Technologies and Principles for Big Data

Every minute, 350,000 stories are published on Instagram, 400,000 hours of video footage are viewed on Netflix, and 40 million messages are sent on WhatsApp. Nowadays, being able to navigate through this multitude of data has become an indispensable skill for a Data Scientist. A good Data Scientist not only needs to know how to analyze Big Data but also possess a good knowledge of the typical architecture of Big Data storage and analysis solutions and the technologies of the ecosystem. In addition, training machine learning models (such as regression, classification, clustering, and recommendation systems) requires special precautions when working with large amounts of data. In this course, you will discover which precautions to take and how to apply them in practice..

MODULE OBJECTIVES
- Analyze large amounts of data with Python and Spark
- Use Cloud Computing systems (AWS) to analyze Big Data
- Create an ETL (Extract, Transform, Load) pipeline for Big Data
- Create machine learning models on Big Data
- Use Spark Streaming with Python for real-time analysis of Big Data
- Create a Data Lake with AWS S3 and Glue
- Use the Databricks platform

TOPICS COVERED
- Big Data technologies
- Apache Spark
- Cloud solutions: Databricks and AWS EMR
- Using Spark with Zeppelin
- Resilient Distributed Dataset (RDD)
- Big Data analysis with Spark SQL
- Machine learning on Big Data with Spark MLlib
- Data Lake and Data Warehouse
- Techniques and technologies for storing Big Data
- Real-time analysis of Big Data with Spark Streaming
- PRACTICAL EXERCISE PROJECT

MODULE 11: Data Visualization Techniques

A good Data Scientist must have transversal skills in these 3 areas: mathematics, computer science, and communication. This last ability is often overlooked, but it is extremely important. A Data Scientist who stands out from others is the one who can show clearly and effectively to the team or client what information they have extracted from the data. Together, we will learn what Data Visualization is and how to create graphs, and how to create storytelling to effectively tell the data. We will do all of this using Tableau, the most popular software for interactive data visualization.

MODULE OBJECTIVES
- Create various types of visualizations using Tableau
- Use Gestalt principles to reduce the cognitive load of a visualization
- Conduct the necessary analyses to create a successful visualization
- Use graphic elements correctly
- Create visualizations in an ethical manner

TOPICS COVERED
- Introduction to data visualization
- Guide to using Tableau
- Neuroscience of visualization
- Methods for creating good visualizations
- Audience study
- Visualization design

View brochure in printable format

Enter your e-mail address to view a printable version of the brochure.

I declare that I have read and subscribe to the Privacy Policy.

Print Brochure

Fill out the form below to find out the price of the course.

*The price depends on the student's area of origin, according to the agreement we have with the territory partners.

The final price is comprehensive of:

Theoretical study + Practical practice projects
Final Project work
Italian A2 Course and Exam + Certificate
Exam fee + Certificate

I want to apply to this course:

Data Science Software Development

APPLICATION FORM

Your application will be forwarded to MD Academy.
Our committee will evaluate your application and will give you an answer in 15 working days.
Filling out this form does not constitute official enrollment in the course. You will be officially enrolled only after the committee has evaluated your request and upon receipt of the first installment of payment.

Fill out the form

ITALIAN LANGUAGE SKILLS ENGLISH LANGUAGE SKILLS Upload your curriculum (PDF - Max. 5Mb)

By clicking here you accept our Privacy Policy.

Pre-requisites

Be at least 18 years old.
Taking our Level A1 Italian Language exam.
Applicants are also required to submit the following documents: valid passport, valid visa.

Final certificate

Certification of Professional Skills by Regione Lombardia

Credits

Mediadream s.r.l. is an accredited body of Regione Lombardia for the Professional Education - N° 264. and for Employment Services - N° 114.

Data Science Software Development

International courses

In-person courses

Course schedule

Duration and Timetable

Location

Contact us

MODULE 1: Programming with Python

MODULE 2: Principles of Descriptive Statistics with Python

MODULE 3: Principles of Inferential Statistics with Python

MODULE 4: Fundamentals of Machine Learning

MODULE 5: Machine Learning: Models and Algorithms

MODULE 6: Machine Learning: Advanced Techniques

MODULE 7: Deep Learning and Artificial Neural Networks

MODULE 8: Natural Language Processing

MODULE 9: SQL for Data Science

MODULE 10: Technologies and Principles for Big Data

MODULE 11: Data Visualization Techniques

View brochure in printable format

Data Science Software Development

Data Science Software Development

Pre-requisites

Final certificate

Credits

Discover MD Academy with us

CHOOSE THE COURSE THAT'S RIGHT FOR YOU

Choose one of our courses and change your life.

Contact US

All the useful advice to make the right choice

Call Us

WhatsApp

Academy

Opening hours