For Data Science enthusiasts, Greenhorns, Business Intelligence Analyst, Data Analyst, Machine learning and Artificial intelligence Engineers.
Inspired by @TunjiOgunoye's #DearDesigner.
Your job is to solve problems notwithstanding the tool you use. /2
EDA is a very crucial stage, its like dating you need to ask the crucial questions before getting into bed with your Data.
If you don't ask the right/relevant questions there will be problems.
EDA: Exploratory Data Analysis. /3
This will guide you through the entire analysis process. /4
Your Domain Knowledge is your competitive edge. This will show in how you approach the problems that you will encounter regularly.
Domain Knowledge : Expertise and Experience in a particular industry. /5
You need an open mind because the possibilities are endless in this industry. /6
Data Science is a Journey, you need to develop relationships vertically and horizontally.
Industry discussions will spark up things in your mind. /7
Its the fastest and easiest way to meet people in your field. /8
Constantly engage in online conversations with different hashtags such as #Rstats #pydata #DaxFridays
You will be updated about the industry and constantly inspired. /9
Follow the industry leaders such as @BecomingDataSci
@mdancho84
@kareem_carr
They spill alot of wisdom and guidance that you might need. /10
These things can be learnt. /11
The term Data can be a little bit overwhelming and mean different things such as Primary Data, Secondary Data, Internal Data, External Data, Big Data, Alternative Data etc. /12
Primary Data: the Data you collected yourself. E.g from an interview. You create the question served it and collected the answers. /13
Internal and External Data Sources.
Internal Data : Data you own, create and control. E.g An organization visitor's record.
They ask you what to fill in. Name, phone number, etc. They own it, they control it. /15
Can data also be both external and primary or secondary?
Yes definitely. /17
Massive volumes of structured or unstructured that is too complex to be processed by traditional data systems (relational databases and data warehouses). /18
Tonight let's break down the Vs of Big Data.
Velocity: This is the speed at which vast amounts of data are generated and or collected. /20.
A very simple example is the Twitter trend table you can see what is trending and the amount of tweets
/22
"Social Media is the number one creator of Data Globally". - IBM
The amounts of data have become so large that it is impossible to store and analyze data using traditional database. /23
There is a Day and Night difference between data and insights.
The process of extracting insights is called Analysis. /25
We no longer just have structured (Organized) data such as name, phone number and address that fits well into a table.
80% of Today’s data is unstructured. /26
How accurate(reliable) is your data?
E.g Data collected (Tweets).
Tweets with hashtags, abbreviations, typos and the reliability & accuracy of the content. /27
Let us continue our discussion, today we will be talking about Big Data Technology, Data Science and Data Analytics.
These three are different we will find out.
Lets go /28
It combines programming, problem-solving and capturing data in ingenious ways. E.g collecting Data from Twitter, Facebook and Instagram etc. /30
It applies an algorithmic or mechanical process to derive insights. E.g Combing through several data to find correlations between each other. /31
1. Search engines such as Google and Bing make use of data science algorithms to show the best results for search queries in micro seconds. /32
The digital marketing industry relies heavily on algorithms.
Facebook Algorithm
Twitter Algorithm
LinkedIn Algorithm
Instagram Algorithm etc.
Display banners, Google ads words etc. /33
Netflix and Amazon.
This system makes it easy to find relevant products and also adds user experience.
These companies use this system to promote their products and suggestions according to the user’s demands and relevance. /34
This makes it easy to locate and access specific values within the database. /36
Structured Query Language. /37
Initially designed by two IBM engineers Donald Chamberlin and Raymond Boyce. /38
NoSQL: No Structure Query Language or Not only SQL. /39
1. Data Mining
Knowledge of SQL will allow you to mine data with greater efficiency.
With basic queries that can identify specific data at time intervals, view update events, monitor table activity. /40
It is very easy to manipulate Data because it allows you to see the exact data and how it works, with simple queries. /41
SQL can manage Large datapools of virtually all sizes. Today different large Multinationals use Sql; Microsoft SQL, IBM SQL, Oracle SQL etc. /42
NoSQL is a non-relational database, they do not require, a fixed schema. It is easy to scale.
It encompasses a wide range of database technologies that house structured, semi-structured and unstructured data /44
RDBMS would be too slow for massive volumes of data that these tech giant handles per day. /45
Your number one priority should not be to become a Generalist or Specialist instead to become better at what you do? /47
Learn the Application of these theories why? Data Science is all about practical application.
These theories are to help you solve problems. /49
When you understand the principles application will be easier. /51
Tonight, I will be explaining different industry roles. Businesses Intelligence Analyst, Data Analyst, Data Scientist, Machine Learning Engineer and Data Engineers.
Shall we? /57
In house Analyst that works for Shoprite he/she collects data from the cash register transforms it into insights... /58
These insights will help the Management take better business decisions. /59
Tools : SQL, Excel, Power Bi/ Tableau.
NB: Some BI uses R/Python in different cases. /60
Some Analyst earn up to $140,000.00 per year.
Tools : R, Python, SQL, Power Bi /Tableau. /62
Big Data wranglers, they take large chunks of either structured or unstructured data and using the knowledge of Math, Statistics and Programming, they clean up & organize this data till it becomes easier to understand & make useful business predictions from. /63
Salary Range : $66,000 - $140,000.00
Tools: R, Python, SQL, Power Bi, Azure, Hive etc. /64
ETL: Extract Transform and Load. This process is used by Data Analyst, Scientist, Data Engineers and Machine Learning Engineer. /66
Average pay according to glass door $114,000.00 in the US excludes other benefits. /68
Extract Transform and Load
They build software for Data Scientist, Analyst and Bi /69
Average Salary in the US: $102,300.00