Here is a list of top ten fifteen functions for analysis in Python
- import (imports a particular package library in Python)
- getcwd (from os library) – get current working directory
- chdir (from os) -change directory
- listdir (from os ) -list files in the specified directory
-
read_csv(from pandas) reads in a csv file
- objectname.info (like proc contents in SAS or str in R , it describes the object called objectname)
- objectname.columns (like proc contents in SAS or names in R , it describes the object variable names of the object called objectname)
- objectname.head (like head in R , it prints the first few rows in the object called objectname)
- objectname.tail (like tail in R , it prints the last few rows in the object called objectname)
- len (length)
-
objectname.ix[rows] (here if rows is a list of numbers this will give those rows (or index) for the object called objectname)
-
groupby -group by a categorical variable
-
crosstab -cross tab between two categorical variables
- describe – data analysis exploratory of numerical variables
- corr – correlation between numerical variables
In [1]:
import pandas as pd #importing packages
import os as os
In [2]:
os.getcwd() #current working directory
Out[2]:
In [3]:
os.chdir('/home/ajay/Downloads') #changes the working directory
In [4]:
os.getcwd()
Out[4]:
In [5]:
a=os.getcwd()
os.listdir(a) #lists all the files in a directory
In [105]:
diamonds=pd.read_csv("diamonds.csv")
#note header =0 means we take the first row as a header (default) else we can specify header=None
In [106]:
diamonds.info()
In [36]:
diamonds.head()
Out[36]:
In [37]:
diamonds.tail(10)
Out[37]:
In [38]:
diamonds.columns
Out[38]:
In [92]:
b=len(diamonds) #this is the total population size
print(b)
In [93]:
import numpy as np
In [98]:
rows = np.random.choice(diamonds.index.values, 0.0001*b)
print(rows)
sampled_df = diamonds.ix[rows]
In [99]:
sampled_df
Out[99]:
In [108]:
diamonds.describe()
Out[108]:
In [109]:
cut=diamonds.groupby("cut")
In [110]:
cut.count()
Out[110]:
In [114]:
cut.mean()
Out[114]:
In [115]:
cut.median()
Out[115]:
In [117]:
pd.crosstab(diamonds.cut, diamonds.color)
Out[117]:
In [121]:
diamonds.corr()
Out[121]: