12,399 views

Iyanuoluwa

@Dejifan

, 71 tweets, 14 min read

My Authors

@TunjiOgunoye

@TunjiOgunoye

Tonight, I am starting a series of tweets tagged #DearAnalyst

For Data Science enthusiasts, Greenhorns, Business Intelligence Analyst, Data Analyst, Machine learning and Artificial intelligence Engineers.

Inspired by @TunjiOgunoye's #DearDesigner.

#DearAnalyst

The main idea of data analysis is about extraction of information that cannot be easily inferred.

This quality information is known as INSIGHT so it is not about the tool: #Rstats, #pydata, #PowerBi, #SQL, #Tableau #Juila they are just a means to an end. /1

When you understand this, you will know that no tool is better than the other.

Your job is to solve problems notwithstanding the tool you use. /2

#DearAnalyst

EDA is a very crucial stage, its like dating you need to ask the crucial questions before getting into bed with your Data.

If you don't ask the right/relevant questions there will be problems.

EDA: Exploratory Data Analysis. /3

EDA will help you discover patterns, identify anomalies.

This will guide you through the entire analysis process. /4

#DearAnalyst

Your Domain Knowledge is your competitive edge. This will show in how you approach the problems that you will encounter regularly.

Domain Knowledge : Expertise and Experience in a particular industry. /5

The most crucial "tool" is your mind.

You need an open mind because the possibilities are endless in this industry. /6

#DearAnalyst

Data Science is a Journey, you need to develop relationships vertically and horizontally.

Industry discussions will spark up things in your mind. /7

Attend meet-ups, events, conferences around Data Science.

Its the fastest and easiest way to meet people in your field. /8

#DearAnalyst

Constantly engage in online conversations with different hashtags such as #Rstats #pydata #DaxFridays

You will be updated about the industry and constantly inspired. /9

@BecomingDataSci

@BecomingDataSci

Bookmark blogs and medium accounts

Follow the industry leaders such as @BecomingDataSci
@mdancho84
@kareem_carr

They spill alot of wisdom and guidance that you might need. /10

That you are a "number person" is not enough, read about statistics.

These things can be learnt. /11

#DearAnalyst

The term Data can be a little bit overwhelming and mean different things such as Primary Data, Secondary Data, Internal Data, External Data, Big Data, Alternative Data etc. /12

Let me clarify.

Primary Data: the Data you collected yourself. E.g from an interview. You create the question served it and collected the answers. /13

Secondary Data: Data collected from another person or an App run by another person. Your website data was collected by Google Analytics or any other back end tracking software. /14

Lets go further.

Internal and External Data Sources.

Internal Data : Data you own, create and control. E.g An organization visitor's record.

They ask you what to fill in. Name, phone number, etc. They own it, they control it. /15

External Data: Data from outside sources. They are publicly available data on Data gov, Electoral statistics, tax records. /16

The next question is
Can data also be both external and primary or secondary?
Yes definitely. /17

Big data

Massive volumes of structured or unstructured that is too complex to be processed by traditional data systems (relational databases and data warehouses). /18

The 4Vs of Big Data Volume, Veracity, Velocity and Value. /19

#DearAnalyst

Tonight let's break down the Vs of Big Data.

Velocity: This is the speed at which vast amounts of data are generated and or collected. /20.

A good example is Twitter, an average of 6,000 tweets per second and 500 Million per day. That's massive data that no traditional medium can process. /21

Big data technology allows us to analyze data In real-time.

A very simple example is the Twitter trend table you can see what is trending and the amount of tweets
/22

Volume: incredible amounts of data generated per second.

"Social Media is the number one creator of Data Globally". - IBM

The amounts of data have become so large that it is impossible to store and analyze data using traditional database. /23

Imagine trying to download Twitter Data (Tweets) for a year into your Hard drive locally. /24

Value : The worth of the data extracted. Unless data can be turned into value it is useless.

There is a Day and Night difference between data and insights.

The process of extracting insights is called Analysis. /25

Variety: The different types of data we now use. i.e Text, Pictures, Gifs, Videos, Urls etc.

We no longer just have structured (Organized) data such as name, phone number and address that fits well into a table.

80% of Today’s data is unstructured. /26

Veracity: Quality or trustworthiness of the data.

How accurate(reliable) is your data?
E.g Data collected (Tweets).

Tweets with hashtags, abbreviations, typos and the reliability & accuracy of the content. /27

#DearAnalyst

Let us continue our discussion, today we will be talking about Big Data Technology, Data Science and Data Analytics.

These three are different we will find out.

Lets go /28

Big Data Technology : Computer program designed to Analyse, Process and Extract the information from large multi dimensional data sets that Traditional Data Processing Software can't handle. / 29

Data Science :is the field that deals with all things data cleansing, preparation, and analysis.

It combines programming, problem-solving and capturing data in ingenious ways. E.g collecting Data from Twitter, Facebook and Instagram etc. /30

Data Analytics is about extraction of insight from raw data.

It applies an algorithmic or mechanical process to derive insights. E.g Combing through several data to find correlations between each other. /31

Applications of Data Science

1. Search engines such as Google and Bing make use of data science algorithms to show the best results for search queries in micro seconds. /32

2. Digital Advertisement

The digital marketing industry relies heavily on algorithms.
Facebook Algorithm
Twitter Algorithm
LinkedIn Algorithm
Instagram Algorithm etc.

Display banners, Google ads words etc. /33

Recommender Systems

Netflix and Amazon.
This system makes it easy to find relevant products and also adds user experience.

These companies use this system to promote their products and suggestions according to the user’s demands and relevance. /34

#DearAnalyst

Today's topic RDBMS.

RDBMS: Relational Database Management Systems. /35

RDBMS: Database that stores data in a structured format; rows and columns.

This makes it easy to locate and access specific values within the database. /36

SQL is a falls into the RDBMS category.

Structured Query Language. /37

SQL is a 'specials purpose' programming language that’s used to interact with databases.

Initially designed by two IBM engineers Donald Chamberlin and Raymond Boyce. /38

NoSQL is a non-relational database that stores and accesses data using key-values. Instead of storing data in rows and columns like a traditional database, a NoSQL DBMS stores each item individually with a unique key

NoSQL: No Structure Query Language or Not only SQL. /39

Use of SQL

1. Data Mining
Knowledge of SQL will allow you to mine data with greater efficiency.

With basic queries that can identify specific data at time intervals, view update events, monitor table activity. /40

2. Data Manipulation:

It is very easy to manipulate Data because it allows you to see the exact data and how it works, with simple queries. /41

3. Managing Large Data:

SQL can manage Large datapools of virtually all sizes. Today different large Multinationals use Sql; Microsoft SQL, IBM SQL, Oracle SQL etc. /42

#DearAnalyst

Today let's discuss NoSQL Database. /43

What is NoSQL?
NoSQL is a non-relational database, they do not require, a fixed schema. It is easy to scale.

It encompasses a wide range of database technologies that house structured, semi-structured and unstructured data /44

It is majorly used by Internet giants like Google, Facebook, Amazon, etc. who deal with huge volumes of data.

RDBMS would be too slow for massive volumes of data that these tech giant handles per day. /45

Type of NoSQL MongoDb, Maria, Cassandra etc. /46

#DearAnalyst

Your number one priority should not be to become a Generalist or Specialist instead to become better at what you do? /47

The industry you play in, is constantly changing, getting better at what you do is not an option. /48

Data Science has got alot of theories, don't waste time too much time on it.

Learn the Application of these theories why? Data Science is all about practical application.

These theories are to help you solve problems. /49

Once you start applying what you are learning to solve real world problems your learning will be faster and easier. /50.

A Bulk of your time should be in spending time understanding underlying principles not Coding.

When you understand the principles application will be easier. /51

#DearAnalyst

Collect, Cleaning, Visualization and report are the process of Data Science. /53

Cleaning Data takes 80% of the time. Settle down with cleaning Data. /54

#DearAnalyst

Bad Data is better than No Data. You need to use numbers to back your advisory. /55

Data has alot to say all you need to do is interrogate it will confess. /56

#DearAnalyst

Tonight, I will be explaining different industry roles. Businesses Intelligence Analyst, Data Analyst, Data Scientist, Machine Learning Engineer and Data Engineers.

Shall we? /57

Business intelligence (BI) analysts: These set of persons transform data (Primary data) into insights that drive business value.

In house Analyst that works for Shoprite he/she collects data from the cash register transforms it into insights... /58

and shows the Fastest moving goods and the peak hours of sales.

These insights will help the Management take better business decisions. /59

According to PayScale, the average salary for a BI analyst is $66,645 per year, with a reported salary range of $48,701 to $93,243.

Tools : SQL, Excel, Power Bi/ Tableau.

NB: Some BI uses R/Python in different cases. /60

Data Analyst they collect data from primary and secondary sources, then reorganize it format that can be easily read by either human or machine, then analyze it and present actionable insights for Business Decision. /61

According to PayScale the Average earnings of a Data Analyst is $68,000.00 per year minus Bonuses, equity sharing, profit sharing etc.

Some Analyst earn up to $140,000.00 per year.

Tools : R, Python, SQL, Power Bi /Tableau. /62

Data Scientist

Big Data wranglers, they take large chunks of either structured or unstructured data and using the knowledge of Math, Statistics and Programming, they clean up & organize this data till it becomes easier to understand & make useful business predictions from. /63

Average annual earnings of a Data Scientist is $96,000.00 minus bonuses and profit sharing.

Salary Range : $66,000 - $140,000.00

Tools: R, Python, SQL, Power Bi, Azure, Hive etc. /64

The major difference between a data analyst and a data scientist is the ability to apply machine learning methods at a professional level. /65

Let me mention something important here.

ETL: Extract Transform and Load. This process is used by Data Analyst, Scientist, Data Engineers and Machine Learning Engineer. /66

Machine Learning Engineer: These set of persons are sophisticated software programmer. Their Job is to build a Product (Software) that can solve the problem or automate a process. /67

A good example of what they work on is self driving cars, computer vision (where a computer can recognize dog 🐶 and or Wolf 🐺).

Average pay according to glass door $114,000.00 in the US excludes other benefits. /68

Data Engineers; These sets of professionals are typically in charge of managing data workflows, pipelines, and ETL processes.

Extract Transform and Load

They build software for Data Scientist, Analyst and Bi /69

Skillset : SQL, Python, AWS etc.

Average Salary in the US: $102,300.00

@threadreaderapp

@threadreaderapp

@threadreaderapp please unroll thanks #DearAnalyst

Enjoying this thread?

Try unrolling a thread yourself!

Trending hashtags

Enjoying this thread?

Try unrolling a thread yourself!

Related threads

Trending hashtags

Did Thread Reader help you today?