Generative AI & Large Language Model Engineering using Python

Natural language Processing: Methods, Models & Applications
(In Association with iHUB Divyasampark IIT Roorkee)

About this Course:

In today’s AI-driven world, Generative AI and Large Language Models (LLMs) are at the forefront of innovation, powering applications ranging from chatbots to recommendation engines, speech recognition, and intelligent automation. Python is the most widely used programming language in the world of Gen-AI and Data Science owing to its simplicity, versatility, and a vast ecosystem of powerful libraries. Therefore, to make the learners acquainted of these latest high-in-demand skills, we have designed this course that will build a strong foundation in Python programming and its practical applications in Artificial Intelligence, Natural Language Processing (NLP), Large Language Models (LLMs), LangChain, Vector Databases, and Speech Recognition technologies with a hands-on approach. By completing this course, the learners will not only strengthen their technical skill-set but also gain the ability to build AI-powered applications, positioning themselves for exciting career opportunities in the rapidly evolving field of Generative AI and LLM Engineering.

Course Objectives:

To make the participants learn Python programming from scratch with a focus on its applications in the fields of Artificial Intelligence (AI) & Generative AI.
To introduce participants to core AI concepts, including Natural vs Artificial Intelligence, Machine Learning, Deep Learning, and Generative AI fundamentals.
To provide a strong foundation in Python programming concepts such as objects, functions, methods, error handling, regex, and object-oriented programming with practical illustrations.
To enable learners to work with Python libraries (OS, Requests, Pandas) for automation, data handling, and API integration (including OpenAI API).
To introduce participants to Natural Language Processing (NLP) and equip them to build text-processing pipelines including tokenization, sentiment analysis, and custom text classifiers.
To make learners proficient in working with Large Language Models (LLMs), covering transformer architecture, GPT, BERT, Hugging Face, and LangChain for chatbot development and text generation.
To equip learners with the knowledge of LangChain framework, LangGraph, and Retrieval Augmented Generation (RAG) for building advanced conversational agents and memory-enabled systems.
To familiarize learners with Vector Databases (e.g., Pinecone) and their applications in semantic search, recommendation engines, and biomedical research.
To introduce learners to Speech Recognition and Speech-to-Text systems using traditional ML, Deep Learning, and transformer-based approaches such as Whisper AI.
To prepare learners for LLM Engineering, including prompt engineering, hosting models vs APIs, cost optimization, scaling strategies, and deploying AI-powered applications with Streamlit.

Batch Details:
Class Timings: 8 pm – 10 pm (Monday & Wednesday) Start Date: 23rd Feb 2026
Duration: 74 Hours End Date: 22nd June 2026
Mode: Online Certification: iHUB Divyasampark IIT Roorkee
Registration Closed

Course Highlights:
•   Industry-Relevant Skills.
• Hands-on based learning experience through practical projects.
• Globally accepted certification from iHUB Divyasampark IIT Roorkee
•   Full-time access to recorded lectures/PPTs/PDFs/Study Materials.
•   Session on Resume Preparation/Interview Preparation.

Course Overview:

Module 1

Building an AI tool
Natural vs Artificial Intelligence, Brief history of AI, Weak vs Strong AI
AI vs Data Science vs Machine Learning vs Deep Learning
Data: Collection, Labelled vs Unlabeled, Structured vs Unstructured, Metadata
Overview of Machine Learning (Supervised, Unsupervised and Reinforcement Learning)
Overview of Deep Learning, Robotics, Computer Vision, Traditional ML, Generative AI
Generative AI: Introducing ChatGPT, Natural Language, Processing (NLP)
Large Language Models (LLMs): Training, N-Grams, RNNs, Transformers
Building LLMs: Prompt Engineering, Fine Tuning, RAG
Foundation Models vs Private Models
Inconsistency and Hallucination in Gen AI, Budgeting, Latency, Running out of Data
AI stack: Python, Working with APIs, Vector Databases, Hugging Face, LangChain

Module 2

Basics of Python Language
Python objects with details of Numbers/Variables/Strings/Lists/Dictionaries/Tuples etc.
Variable Assignment, Indexing & Slicing
Comparison & Logical operators
Range, List Comprehension, Functions, Lambda expressions.
Scope, with statement, Working with various Files types, Error Handling
Regex: Anchors and Groupings, Range Expressions, Non-Greedy Matching, Substitutions.
Object Oriented Programming, -class, Attributes, Inheritance, Polymorphism

Module 3

Understanding Python Module: OS module, Import Techniques and Best Practices
Requests and HTTPZ Libraries: Getting Started, Handling Errors and Managing Authentication and Headers (Open AI API)
Introduction and working with Pandas
Series, DataFrames, working with various data types, Group-By operation
Selecting a single column, important series methods
Indexing & Sorting; loc & iloc with series; Inspecting DataFrames, filtering with conditional operators

Module 4

Introduction, Supervised vs Unsupervised NLP
Data preparation, Handling Stop words, Regular Expressions
Tokenization, Stemming, Lemmatization, N-grams
Text tagging, Parts of Speech (POS) tagging, Name Entity Recognition (NER)
Sentiment Analysis, Rule-based Sentiment Analysis, Pre-trained Transformers
Numerical Representation of text, Bag of Words, TF-IDF
Topic Modelling, Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA)
Custom text classifier using various ML models

Module 5

Understanding LLMs, General Purpose Models, Pre-training and Fine Tuning
Deep- Learning recap, Transformer Architecture, Input Embeddings
Multi-headed Attention, Feed-Forward Layer, Masked multihaed Attention
Understanding GPT, Open AI API, Generating Text, Customizing GPT output
Key word text summarization, Coding a simple chatbot using LangChain in Python
Hugging Face, Transformer Pipeline, Pre-trained tokenizers, Special tokens
Q&A models: BERT architecture, Tokenizer, Embeddings, Calculating response
Creating QA bot, BERT, RoBERTa, DistilBERT, GPT vs BERT vs XLNET, XLNET Embeddings and Fine Tuning

Module 6

LangChain Introduction, Tokens and Models, Setting up Environment, Open API key
System, User and Assistant roles, Creating chatbot, Temperature, Max Tokens, Streaming
LangChain Framework, ChatOpenAI, System, AI and Human messages
Prompt Templates and Prompt Values, Few-shot chat message prompt templates
String output parser, Comma-separated list output parser, Datetime output parser
Piping a prompt, model and parser, Batching, Streaming, Runnable Sequence class
Piping chains and Runnable Passthrough, Runnable Parallel, Runnable Lambda
Retrieval Augmented Generation (RAG): Document Loading, Splitting and Embedding
Document storing, retrieval and generation, loading with PyPDFLoader, Docx2txtLoader
Splitting with character text splitter, markdown header text splitter
QA bot, BERT, RoBERTa, DistilBERT, GPT vs BERT vs XLNET, XLNET Embeddings and Fine Tuning
Text embedding with OpenAI, Chroma vectorstore (Inspecting and Managing docs)
Retrieval: Similarity search, Maximal Marginal Relevance (MMR) search, Vectorstore-backed retriever
Generation: Stuffing documents and Generating a response

Module 7

States, nodes and Edges, First graph: Importing relevant classes, Building graph
Conditional edges: Defining nodes, routing function, Building the graph
Annotated construct and reducer functions, MessagesState, RemoveMessages
Checkpointers and threads, Short-term memory with the InMemorySaver class
The StateSnapshot class, Long-term memory with SQLite

Module 8

Database comparison: SQL, NoSQL and Vector, Understanding Vector databases,
Vector space: Introduction, Distance Metrics, Vector Embeddings
Vector database, comparison, Pinecone registration, walkthrough and creating index
Pinecone with Python: Connection, Pinecone Index, Upsetting data and using embedding
Vector database for Recommendation Engines, Biomedical research, Semantic search

Module 9

Development and Evolution, Formants, Harmonics and Phonemes
Sound and Sound waves: Fundamentals and properties
Sample Rate, bit depth, bit rate, Audio signal processing for Machine Learning and AI
Audio Features: Time-domain, Frequency-domain, time-frequency-domain, Fourier transform
Acoustic and language modeling, Hidden Markov Models (HMMs)
Traditional Neural Networks: CNNs, RNNs and LSTMs
Advanced speech recognition systems: Transformers
Building a Speech Recognition Model
Audio file formats for speech recognition, Importing audio files in python
Google Web Speech API, Evaluation metrics: WER and CER
Dealing with background, Noise Creating spectrogram
Whisper AI: Transformer-based speech-to-text, Transcribing multiple audio files
Reversing the process: AI-powered text-to-speech

Module 10

Hosting an LLM vs Using an API, Open-Source vs Closed-source, Tokens
Pricing: Hosting an LLM vs Pay-by-Token, Initial Prompt Development
Database Design and Schema Development, Activity Diagram
OpenAI Playground, Optimizing Temperature, Top P for Different Use Cases
Prompt Engineering for Software Lifecycle
Streamlit: Introduction, Pros and Cons, Text Methods, Chat Elements, Session State
Initializing an OpenAI Client, Implementing the Chat Functionality, Building the Setup Page
Enhancing Chatbot Interaction with Session State, Feedback Functionality, Deployment
Application Structure, Prompt Structure, Hallucinations, Prompt Injection
Counting Tokens, Cost reduction and Scaling

Prerequisites and eligibility:

No coding experience in any programming language required. We’ll start from scratch.
This course can be taken up by any undergraduate/postgraduate student of Basic & Applied Sciences, Engineering, Management and Computer Applications and also by Research Scholars/Faculties/Working Professionals who want to upskill themselves.
Participants need to have a laptop/PC (with a minimum of 4 GB RAM, 100 GB HDD, Intel i3 processor) and proper internet/Wi-Fi connection.

Contact Person: Dr. Subrat Kotoky

Email: [email protected] / [email protected]

Phone: 9085317465 / 8473874389

Expert Profile: Mr. Shreyas Shukla

Professional Corporate Trainer & Microsoft Azure Certified Data Engineer

M.Tech-IIT Kharagpur & BE- The Aeronautical Society of India, New Delhi

Has successfully conducted 25+ courses and trained 2000+ learners in the fields of Python Programming, Data Analytics, Machine Learning, Deep Learning, Computer Vision etc. till now.

(Total Experience in conducting Professional Courses: 4+ Years)

Certifications: