Showing posts with label Data Transformation. Show all posts
Showing posts with label Data Transformation. Show all posts

Tuesday, August 6, 2019

Working With Date Time - In Depth

A date in Python is not a data type of its own, but we can import a module named datetime to work with dates as date objects.

Example


import datetime
x = datetime.datetime.today()
x #> datetime.datetime(2019, 8, 6, 22, 39, 30, 864393)

The output is in the following order: ‘year’, ‘month’, ‘date’, ‘hour’, ‘minute’, ‘seconds’, ‘microseconds’

Parsing a string to datetime

my_date_time = datetime.datetime.strptime('8/3/19', '%m/%d/%y')
my_date_time #> datetime.datetime(2019, 8, 3, 0, 0)

Parsing any string format to datetime

from dateutil.parser import parse
parse('94, December 26, 2010, 10:51pm') #> datetime.datetime(1994, 12, 26, 22, 51)

Formatting datetime

my_date_time = datetime.datetime.strptime('8/3/19', '%m/%d/%y')
my_date_time.strftime('%m/%d/%y') #> '08/03/19'

Adjusting datetime

my_date_time = datetime.datetime.strptime('8/3/19', '%m/%d/%y')
my_date_time - datetime.timedelta(days=2) #> datetime.datetime(2019, 8, 1, 0, 0)

Syntax: datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)

Useful datetime functions

# create a datatime obj
dt = datetime.datetime(2019, 2, 15)

# 1. Get the current day of the month
dt.day #> 31

# 2. Get the current day of the week
dt.isoweekday() #> 5 --> Friday

# 3. Get the current month of the year 
dt.month  #> 2 --> February

# 4. Get the Year
dt.year  #> 2019

Get the last day of a month for any given date

import datetime
dt = datetime.date(1952, 2, 12)


import calendar
calendar.monthrange(dt.year,dt.month)[1] #> 29


Pandas date_range


import pandas as pd
import datetime
 
date1 = pd.Series(pd.date_range('2018-1-1 12:00:00', periods=7, freq='M'))
df = pd.DataFrame(dict(date_given=date1))
df



Sunday, August 4, 2019

Get list from pandas DataFrame column headers

The groupby function can be used to concatenate data from multiple rows into one field.


Create a Dataframe

import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name': pd.Series(['Alisa','Bobby','Cathrine','Madonna','Rocky','Sebastian','Jaqluine',
   'Rahul','David','Andrew','Ajay','Teresa']),
   'Age': pd.Series([26,27,25,24,31,27,25,33,42,32,51,47]),
   'Score': pd.Series([89,87,67,55,47,72,76,79,44,92,99,69])}
 
#Create a DataFrame
df = pd.DataFrame(d)
df

the resultant dataframe will be




Now lets get the values as a list by doing:


column_names = df.columns.values.tolist()
column_names

the result will be


Monday, July 22, 2019

How to split a list inside a Dataframe cell into rows in Pandas

       
temp = {'name' : ['Edmond', 'ALex'], 'cat' : [['Horror', 'Vengeance', 'Justice'], ['Romance', 'Sacrifice']]}

      
df = pd.DataFrame(temp)

      
df.cat.apply(pd.Series)

      
df.cat.apply(pd.Series) \
.merge(df, left_index = True, right_index = True)

df.cat.apply(pd.Series) \
.merge(df, left_index = True, right_index = True) \
.drop(['cat'], axis = 1)

df.cat.apply(pd.Series) \
.merge(df, left_index = True, right_index = True) \
.drop(['cat'], axis = 1) \
.melt(id_vars = ['name'], value_name = "cat") \
.drop(['variable'], axis= 1)