Natural language Processing: Methods, Models & Applications
(In Association with iHUB Divyasampark IIT Roorkee)
About this Course:
In today’s AI-driven world, Generative AI and Large Language Models (LLMs) are at the forefront of innovation, powering applications ranging from chatbots to recommendation engines, speech recognition, and intelligent automation. Python is the most widely used programming language in the world of Gen-AI and Data Science owing to its simplicity, versatility, and a vast ecosystem of powerful libraries. Therefore, to make the learners acquainted of these latest high-in-demand skills, we have designed this course that will build a strong foundation in Python programming and its practical applications in Artificial Intelligence, Natural Language Processing (NLP), Large Language Models (LLMs), LangChain, Vector Databases, and Speech Recognition technologies with a hands-on approach. By completing this course, the learners will not only strengthen their technical skill-set but also gain the ability to build AI-powered applications, positioning themselves for exciting career opportunities in the rapidly evolving field of Generative AI and LLM Engineering.
Course Objectives:
- To make the participants learn Python programming from scratch with a focus on its applications in the fields of Artificial Intelligence (AI) & Generative AI.
- To introduce participants to core AI concepts, including Natural vs Artificial Intelligence, Machine Learning, Deep Learning, and Generative AI fundamentals.
- To provide a strong foundation in Python programming concepts such as objects, functions, methods, error handling, regex, and object-oriented programming with practical illustrations.
- To enable learners to work with Python libraries (OS, Requests, Pandas) for automation, data handling, and API integration (including OpenAI API).
- To introduce participants to Natural Language Processing (NLP) and equip them to build text-processing pipelines including tokenization, sentiment analysis, and custom text classifiers.
- To make learners proficient in working with Large Language Models (LLMs), covering transformer architecture, GPT, BERT, Hugging Face, and LangChain for chatbot development and text generation.
- To equip learners with the knowledge of LangChain framework, LangGraph, and Retrieval Augmented Generation (RAG) for building advanced conversational agents and memory-enabled systems.
- To familiarize learners with Vector Databases (e.g., Pinecone) and their applications in semantic search, recommendation engines, and biomedical research.
- To introduce learners to Speech Recognition and Speech-to-Text systems using traditional ML, Deep Learning, and transformer-based approaches such as Whisper AI.
- To prepare learners for LLM Engineering, including prompt engineering, hosting models vs APIs, cost optimization, scaling strategies, and deploying AI-powered applications with Streamlit.
Batch Details:
Class Timings: 8 pm – 10 pm (Monday & Wednesday) Start Date: 23rd Feb 2026
Duration: 74 Hours End Date: 22nd June 2026
Mode: Online Certification: iHUB Divyasampark IIT Roorkee
Registration Closed
Course Highlights:
• Industry-Relevant Skills.
• Hands-on based learning experience through practical projects.
• Globally accepted certification from iHUB Divyasampark IIT Roorkee
• Full-time access to recorded lectures/PPTs/PDFs/Study Materials.
• Session on Resume Preparation/Interview Preparation.
Course Overview:
Module 1
- Building an AI tool
- Natural vs Artificial Intelligence, Brief history of AI, Weak vs Strong AI
- AI vs Data Science vs Machine Learning vs Deep Learning
- Data: Collection, Labelled vs Unlabeled, Structured vs Unstructured, Metadata
- Overview of Machine Learning (Supervised, Unsupervised and Reinforcement Learning)
- Overview of Deep Learning, Robotics, Computer Vision, Traditional ML, Generative AI
- Generative AI: Introducing ChatGPT, Natural Language, Processing (NLP)
- Large Language Models (LLMs): Training, N-Grams, RNNs, Transformers
- Building LLMs: Prompt Engineering, Fine Tuning, RAG
- Foundation Models vs Private Models
- Inconsistency and Hallucination in Gen AI, Budgeting, Latency, Running out of Data
- AI stack: Python, Working with APIs, Vector Databases, Hugging Face, LangChain
Module 2
- Basics of Python Language
- Python objects with details of Numbers/Variables/Strings/Lists/Dictionaries/Tuples etc.
- Variable Assignment, Indexing & Slicing
- Comparison & Logical operators
- Range, List Comprehension, Functions, Lambda expressions.
- Scope, with statement, Working with various Files types, Error Handling
- Regex: Anchors and Groupings, Range Expressions, Non-Greedy Matching, Substitutions.
- Object Oriented Programming, -class, Attributes, Inheritance, Polymorphism
Module 3
- Understanding Python Module: OS module, Import Techniques and Best Practices
- Requests and HTTPZ Libraries: Getting Started, Handling Errors and Managing Authentication and Headers (Open AI API)
- Introduction and working with Pandas
- Series, DataFrames, working with various data types, Group-By operation
- Selecting a single column, important series methods
- Indexing & Sorting; loc & iloc with series; Inspecting DataFrames, filtering with conditional operators
Module 4
- Introduction, Supervised vs Unsupervised NLP
- Data preparation, Handling Stop words, Regular Expressions
- Tokenization, Stemming, Lemmatization, N-grams
- Text tagging, Parts of Speech (POS) tagging, Name Entity Recognition (NER)
- Sentiment Analysis, Rule-based Sentiment Analysis, Pre-trained Transformers
- Numerical Representation of text, Bag of Words, TF-IDF
- Topic Modelling, Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA)
- Custom text classifier using various ML models
Module 5
- Understanding LLMs, General Purpose Models, Pre-training and Fine Tuning
- Deep- Learning recap, Transformer Architecture, Input Embeddings
- Multi-headed Attention, Feed-Forward Layer, Masked multihaed Attention
- Understanding GPT, Open AI API, Generating Text, Customizing GPT output
- Key word text summarization, Coding a simple chatbot using LangChain in Python
- Hugging Face, Transformer Pipeline, Pre-trained tokenizers, Special tokens
- Q&A models: BERT architecture, Tokenizer, Embeddings, Calculating response
- Creating QA bot, BERT, RoBERTa, DistilBERT, GPT vs BERT vs XLNET, XLNET Embeddings and Fine Tuning
Module 6
- LangChain Introduction, Tokens and Models, Setting up Environment, Open API key
- System, User and Assistant roles, Creating chatbot, Temperature, Max Tokens, Streaming
- LangChain Framework, ChatOpenAI, System, AI and Human messages
- Prompt Templates and Prompt Values, Few-shot chat message prompt templates
- String output parser, Comma-separated list output parser, Datetime output parser
- Piping a prompt, model and parser, Batching, Streaming, Runnable Sequence class
- Piping chains and Runnable Passthrough, Runnable Parallel, Runnable Lambda
- Retrieval Augmented Generation (RAG): Document Loading, Splitting and Embedding
- Document storing, retrieval and generation, loading with PyPDFLoader, Docx2txtLoader
- Splitting with character text splitter, markdown header text splitter
- QA bot, BERT, RoBERTa, DistilBERT, GPT vs BERT vs XLNET, XLNET Embeddings and Fine Tuning
- Text embedding with OpenAI, Chroma vectorstore (Inspecting and Managing docs)
- Retrieval: Similarity search, Maximal Marginal Relevance (MMR) search, Vectorstore-backed retriever
- Generation: Stuffing documents and Generating a response
Module 7
- States, nodes and Edges, First graph: Importing relevant classes, Building graph
- Conditional edges: Defining nodes, routing function, Building the graph
- Annotated construct and reducer functions, MessagesState, RemoveMessages
- Checkpointers and threads, Short-term memory with the InMemorySaver class
- The StateSnapshot class, Long-term memory with SQLite
Module 8
- Database comparison: SQL, NoSQL and Vector, Understanding Vector databases,
- Vector space: Introduction, Distance Metrics, Vector Embeddings
- Vector database, comparison, Pinecone registration, walkthrough and creating index
- Pinecone with Python: Connection, Pinecone Index, Upsetting data and using embedding
- Vector database for Recommendation Engines, Biomedical research, Semantic search
Module 9
- Development and Evolution, Formants, Harmonics and Phonemes
- Sound and Sound waves: Fundamentals and properties
- Sample Rate, bit depth, bit rate, Audio signal processing for Machine Learning and AI
- Audio Features: Time-domain, Frequency-domain, time-frequency-domain, Fourier transform
- Acoustic and language modeling, Hidden Markov Models (HMMs)
- Traditional Neural Networks: CNNs, RNNs and LSTMs
- Advanced speech recognition systems: Transformers
- Building a Speech Recognition Model
- Audio file formats for speech recognition, Importing audio files in python
- Google Web Speech API, Evaluation metrics: WER and CER
- Dealing with background, Noise Creating spectrogram
- Whisper AI: Transformer-based speech-to-text, Transcribing multiple audio files
- Reversing the process: AI-powered text-to-speech
Module 10
- Hosting an LLM vs Using an API, Open-Source vs Closed-source, Tokens
- Pricing: Hosting an LLM vs Pay-by-Token, Initial Prompt Development
- Database Design and Schema Development, Activity Diagram
- OpenAI Playground, Optimizing Temperature, Top P for Different Use Cases
- Prompt Engineering for Software Lifecycle
- Streamlit: Introduction, Pros and Cons, Text Methods, Chat Elements, Session State
- Initializing an OpenAI Client, Implementing the Chat Functionality, Building the Setup Page
- Enhancing Chatbot Interaction with Session State, Feedback Functionality, Deployment
- Application Structure, Prompt Structure, Hallucinations, Prompt Injection
- Counting Tokens, Cost reduction and Scaling
Prerequisites and eligibility:
-
No coding experience in any programming language required. We’ll start from scratch.
-
This course can be taken up by any undergraduate/postgraduate student of Basic & Applied Sciences, Engineering, Management and Computer Applications and also by Research Scholars/Faculties/Working Professionals who want to upskill themselves.
-
Participants need to have a laptop/PC (with a minimum of 4 GB RAM, 100 GB HDD, Intel i3 processor) and proper internet/Wi-Fi connection.
Contact Person: Dr. Subrat Kotoky
Email: [email protected] / [email protected]
Phone: 9085317465 / 8473874389
Expert Profile: Mr. Shreyas Shukla
Professional Corporate Trainer & Microsoft Azure Certified Data Engineer
M.Tech-IIT Kharagpur & BE- The Aeronautical Society of India, New Delhi
Has successfully conducted 25+ courses and trained 2000+ learners in the fields of Python Programming, Data Analytics, Machine Learning, Deep Learning, Computer Vision etc. till now.
(Total Experience in conducting Professional Courses: 4+ Years)
Certifications:
-
DP-203: Microsoft Certified: Azure Data Engineer Associate
-
DP-900: Microsoft Certified: Azure Data Fundamentals
-
AZ-900: Microsoft Certified: Azure Fundamentals
Register Now
Share this Page!