DATA SCIENCE WITH R
-
PARAMETERS SPECIFICATIONS
-
Tools Used R
-
Learning Mode (Classroom – Instructor based)
-
Duration 58 – 52 Hours
-
Batch size 5- 8 Students
-
Location Delhi (Saket)
-
Course includes Live scenarios, Case Studies, Project, Assessments, Mock Interview.
-
Study Material PPTs, Doc, Data, PDFs etc.
TARGET AUDIENCE:
-
Any graduate - No prior knowledge of Data Science / Analytics is required.
WHAT IS R?
-
R is open source data analysis software: and widely uses by Data scientists, statisticians, Researchers and Data analysts—anyone who needs to make sense/insight of data can use R for Statistical Analysis, Data visualization, and Predictive Modeling. R is created by Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand in the 1990s as a statistical platform for their students, and thus it has been extended over the decades by thousands of user-created libraries/packages. R is a programming language: An object-oriented language created by statisticians, R provides objects, operators, and functions that allow users to explore, model, and visualize data. R is a vector language, so anyone can add functions to a single Vector without putting in a loop. And at the same time R is powerful and faster than other languages, we can easily implement Machine Learning algorithms in a fast and simple way
JOB PROFILES IN R:
-
R Programmer
-
Data Analyst/Miner
-
Data Modeler
-
Data Scientist
-
ML specialist
-
NLP specialist and many more.
*****COURSE CONTENT*****
FUNDAMENTAL OF STATISTICS:
-
Population and sample
-
Descriptive and Inferential Statistics
-
Statistical data analysis
-
Variables
-
Central Tendency, Sample and Population Distributions
-
Central Limit Theorem (CLT)
-
Estimation & Confidence interval
-
Normal Distribution
-
Skewness.
-
Boxplot
-
Standard deviation
-
Standard Error
-
Hypothesis testing
-
P-value
-
Scatter plot and correlation coefficient
-
Scales of Measurements and Data Types
-
Numerical Summarization
-
Outliers & Summary
-
Data Summarization
-
Visual Summarization
MODULE 1- INTRODUCTION TO R PROGRAMMING
-
Installing & starting with R
-
Basic and environmental features of R.
-
Calculations with R
-
Functions
-
Understanding R language and programming guidelines
-
Listing the objects in the workspace
-
Vectors
-
Extracting elements from vectors
-
Vector arithmetic
-
Simple patterned vectors
-
Missing values and other special values
-
Character vectors Factors
-
More on extracting elements from vectors
-
Matrices and arrays
-
Data frames
-
Dates and times
-
Assignments with Datasets
MODULE 2- INTRODUCTION TO DATA ANALYTICS:
-
This module introduces you to some of the important keywords in R like Business Intelligence, Business
-
Analytics, Data and Information. You can also learn how R can play an important role in solving complex analytical problems
-
This module tells you what is R and how it is used by the giants like Google, Facebook, etc. Also, you will learn use of 'R' in the industry, this module also helps you compare R with other software in analytics, install R and its packages.
-
Business Analytics, Data, Information Understanding Business Analytics and R Compare R with other software in analytics Install R Perform basic operations in R using command line
MODULE 3- IMPORT AND EXPORT DATA IN R
-
Importing data into R
-
CSV File
-
Excel File
-
Import data from text table
-
DATA SCIENCE USING R-PROGRAMMING
-
Topics
-
Variables in R
-
Scalars
-
Vectors
-
R Matrices
-
List
-
R – Data Frames
-
Using c, Cbind, Rbind, attach and detach etc. functions in R
-
R – Factors
-
R – CSV Files
-
R – Excel File
-
Assignments
-
Business Scenario/Group Discussion
-
R Nuts and Bolts
-
Entering Input. – Evaluation- R Objects- Numbers- Attributes- Creating Vectors- Mixing
-
Objects- Explicit Coercion- Summary- Names- Data Frames
MODULE 4- MANAGING DATA FRAMES WITH THE DPLYR PACKAGE:
-
The dplyr Package
-
Installing the dplyr package
-
select()
-
filter()
-
arrange()
-
rename()
-
mutate()
-
group_by()
-
%>%
-
Assignments
-
Business Scenario/Group Discussion
MODULE 5- LOOPS FUNCTIONS:
-
Looping on the Command Line
-
lapply()
-
sapply()
-
tapply()
-
apply()
-
Assignments
-
Business Scenerio/Group Discussion
MODULE 6- DATA MANIPULATION IN R OBJECTIVES:
-
In this module, we start with a sample of a dirty data set and perform Data Cleaning on it, resulting
-
in a data set, which is ready for any analysis
-
Thus using and exploring the popular functions required to clean data in R.
-
Topics
-
Data sorting
-
Find and remove duplicates record
-
Cleaning data
-
Merging data
-
Statistical Plotting
-
Bar charts and dot charts
-
Pie charts
-
Histograms
-
Box plots
-
Scatter plots
-
QQ plots
-
Assignments with Datasets
OBJECTIVES
-
Control Structure Programming with R
-
The for() loop
-
The if() statement
-
The while() loop
-
The repeat loop, and the break and next statements
-
Apply, Sapply, Lapply
-
Assignments with Datasets
FACTORS:
-
Using Factors
-
Manipulating Factors
-
Numeric Factors
-
Creating Factors from Continuous Variables
-
Convert the variables in factors or in others
RESHAPING:
-
Data Modifying
-
Data Frame Variables
-
Recoding Variables
-
The recode Function
-
Reshaping Data Frames
-
The reshape Package
-
Assignments with Datasets
MODULE 7- BASICS OF STATISTICS & LINEAR & MULTIPLE REGRESSION:
-
This module touches the base of Descriptive and Inferential Statistics and Probabilities & 'Regression Techniques'.
-
Linear and logistic regression is explained from the basics with the examples and it is implemented in R using two case studies dedicated to each type of Regression discussed.
-
Assessing the Accuracy of the Coefficient Estimates
-
Assessing the Accuracy of the Model
-
Estimating the Regression Coefficients.
-
Some Important Questions
-
Lab: Linear Regression.
-
Libraries
-
Simple Linear Regression
-
Multiple Linear Regression
-
Interaction Terms
-
Qualitative Predictors
-
Writing Functions
-
Assignments with Different Datasets
-
Business Scenario/Group Discussion