Wednesday, February 24, 2016

WEEK_2: Introduction to R programming

Hi there, welcome to week 2 session.
Today we will learn,
  1. Why did I chose R over python
  2. Introduction to R language
  3. Basics of R

Why R over python? 

We can choose R or python for data analysis. If you are already familiar with python, you can go with python. But I was newbie in both technologies. 
I selected R because of the following reasons.
  • R is object-oriented
  • R is a functional programming language
  • Operator overloading is much easier in R than in Python
  • Parallelism in R has been much further developed than in Python
  • R is designed for statistical analysis
  • R is great for exploratory work
  • R has huge number of packages and readily usable tests that often provide you with the necessary tools to get up and running quickly
  • R can even be part of a big data solution

 

Introduction to R language

R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member.
As you know, we need an environment to run any program. You need to have r-base to run R programs. 

You can download r-base by following below links.

For Windows machine, click here
For mac OSX machine, click here 
For Linux  machine, click here
(if any of the link is broken, get the r-base from cran website)

Now we have r-base. We can start coding! But we always prefer to work with IDEs than working on command line. Even R has a beautiful IDE called RStudio.
RStudio is an open source IDE. You can download it from their website. Here is the link.

Basics of R

Hope you have installed r-base and RStudio on your machine. Now launch RStudio or r-base interface.
After R is started, there is a console awaiting for input. At the prompt (>), you can enter numbers and perform calculations. 

eg:
 > 1 + 2 
output:
[1] 3 

Variable assignment

We assign values to variables with the assignment operator "=". Just typing the variable by itself at the prompt will print out the value. We should note that another form of assignment operator "<-" is also in use. I prefer using "<-" operator, for no specific reason!

eg:
> x = 1
> x 
output:
[1] 1 
  

Comments

All text after the pound sign "#" within the same line is considered as a comment. 

eg:
> 1 + 1      # this is a comment 
output:
[1] 2

Functions

R functions are invoked by its name, then followed by the parenthesis, and zero or more arguments. The following apply the function c to combine three numeric values into a vector.

eg:
> c(1, 2, 3) 
output:
[1] 1 2 3 

 

Extension Package

Sometimes we need additional functionality beyond those offered by the core R library. In order to install an extension package, you should invoke the install.packages function at the prompt and follow the instruction. 

eg:
> install.packages("package_name") 

Getting Help

R provides extensive documentation. For example, entering ?c or help(c) at the prompt gives documentation of the function c in R.

eg:
> help(c) 
If you are not sure about the name of the function you are looking for, you can perform a fuzzy search with the apropos function.


eg:
> apropos("can")
output: 
[1] ".rs.scanFiles" "canCoerce"     "cancor"        "scan"          "volcano"

I will be writing about Sentiment analysis of twitter and WhatsApp data in the next post.



Thanks for visiting my blog. I always love to hear constructive feedback. Please give your feedback in the comment section below or write to me personally here.
(I use R-Bloggers for updates on R, consider visiting this blog too!)

Wednesday, February 17, 2016

WEEK_1: Introduction to Data Science

Hi there!

I am Sharath G S. I have started to learn Data Science.
This booming field was introduced to me by the organization I am working with.
I want to be a master of Data Science. So I have done a lot of research about Data Science. I will be sharing my learnings here. I will post on weekly basis. I will try to summarize my learnings of the week in a single post.

First of all, we need to understand what is Data Science. The very first thing that we do is just 'Google it'. Even I did the same.

Here is how wikipedia defines Data Science.

"Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analysis, similar to Knowledge Discovery in Databases (KDD)."

Data Science often involves using mathematic and algorithmic techniques to solve some of the most analytically complex business problems, leveraging troves of raw information to figure out hidden insight that lies beneath the surface. It centers around evidence-based analytical rigor and building robust decision capabilities.

Data Science enables companies to operate and strategize more intelligently. That is the reason why Data Science is the booming field.

Here is an image which will summarize the role of Data Science.





Who is Data Scientist?

"A data scientist is simply someone who is highly adept at studying large amounts of often unorganized/undigested data."

Another definition for a Data Scientist.

"A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician."

I found a Data Scientist's learning map. You don't have to worry about this now. This is just for your reference!


Data Science learner's path



You need to be good with statistics to become a good Data Scientist. You can refer the Probability and statistics course by Khanacademy. Follow this link to access the course.

We will start with Data Analysis. 


This is how this page define data analysis.

"Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data."


We can use R language or Python for this purpose. I would like to go with R.
Let us start with R language next week. We will be doing text mining and analysis in the next session. And you know what? It is real fun! You will be doing sentiment analysis of your Twitter tweets and WhatsApp chats.

Don't miss it, subscribe to the blog for all updates.

Thanks for visiting my blog. I always love to hear constructive feedbacks. Please give your feedback in the comment section below or write to me personally here.
(I use R-Bloggers for updates on R, consider visiting this blog too!)