My Authors
Read all threads
Preparing for a #MachineLearning or #DataScience interview?

One retweet — one technical question.

Categories: SQL, coding (Python) and algorithms

Let’s start!

#100DaysOfMLCode #100DaysOfPythonCode
= SQL =

Suppose we have the following schema:
* Ads(ad_id, camplaign_id, status)
* Events(event_id, ad_id, source, event_type, date, hour)

status: active, inactive
event_type: impression (ad is shown), click (ad is clicked), conversion (app is installed)
Write a query to return:

- The number of active ads
- All active campaigns. A campaign is active if there’s at least one active ad
- The number of active campaigns
- The number of events per each ad broken down by event type
- The number of events over the last week per each active ad broken down by event type and date (most recent first)
- The number of events per campaign by event type
- The number of events over the last week per each campaign broken down by date (most recent first)
- CTR (click-through rate) for each ad. CTR = num impressions / num clicks
- CVR (conversion rate) for each ad. CVR = num clicks / num installs
- CTR and CVR for each ad broken down by day and hour (most recent first)
- CTR for each ad broken down by source and day
= Coding (Python) =

Simple questions to check if a candidate can implement an idea in Python and knows the basics: loops, strings, basic data structures (lists, sets, dictionaries)
- FizzBuzz: Print numbers from 1 to 100
If it’s a multiplier of 3, print “fizz”
If it’s a multiplier of 5, print “buzz”
If both 3 and 5 — “fizzbuzz”
Otherwise, print the number itself
- Calculate a factorial of a number
E.g. 10! = 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10 = 3628800
- Compute the mean of number in a list. For empty list, return NaN (float(‘NaN’))
- Find the minimum and the maximum in a list
- Calculate the standard deviation of elements in a list. If a list is empty or contains one element, return NaN
- Calculate the RMSE (root mean squared error) of a model. You have two lists: one with actual values, one with predictions
- Remove duplicates from a list of elements. The list is not sorted. The original order should be preserved

[1, 2, 3, 1] ⇒ [1, 2, 3]
[1, 3, 2, 1, 5, 3, 5, 1, 4] ⇒ [1, 3, 2, 5, 4]
- Count how many times each element of a list occurs. Order of output doesn’t matter
- Reverse a string:

“reverse” ⇒ “esrever”
- Reverse an unsigned integer:

0 ⇒ 0
123 ⇒ 321
- Is string a palindrome? A palindrome is a word which reads the same backward as forwards

“ololo” ⇒ Yes
“cafe” ⇒ No
- Is number a palindrome?

12321 ⇒ Yes
123 ⇒ No
- We have a list with identifiers of form “id-SITE”. Calculate how many ids we have per site
- We have a list with identifiers of form “id-SITE”. Show the top 3 sites. (You can break ties in any way you want)
- Implement RLE (run-length encoding): encode each character by the number of times it appears consecutively

“aaaabbbcca” ⇒ [(‘a’, 4), (‘b’, 3), (‘c’, 2), (‘a’, 1)]

(note that there are two groups of a)
- Calculate Jaccard similarity between two sets. It’s in the size of intersection divided by the size of union

jaccard({‘a’, ‘b’, ‘c’}, {‘a’, ‘d’}) = 1 / 4
- Given a collection of texts (already tokenized), calculate IDF for each token.

input example: [[‘interview’, ‘questions’], [‘interview’, ‘answers’]]
- Given a collection of text, find pointwise mutual information (PMI) of the tokens in the text. Return top 10 token pairs according to PMI. You can assume the input is already tokenized

input example: [[‘interview’, ‘questions’], [‘interview’, ‘answers’]]
Most of these questions might seem easy and the instructions too detailed. That’s on purpose. The idea is to test the knowledge of Python, not the knowledge of algorithms and more advanced data structures
Also, I believe that if a candidate can solve a simple task, they will most likely also be able to solve a more complex one. But if they can’t — they won’t solve more difficult ones as well. So there’s little point in asking difficult questions.
= Algorithmic questions =
The questions in this category are different from the previous one. Some of them are brain teasers, some of them require using recursion, knowing algorithms and data structures.
The goal of these problems is to “see how candidates think”. I’m not a fun of such coding problems, but there are many companies that ask them.
- Two sum. Given an array and a number N, return True if there are numbers A, B in the array such that A + B = N. Otherwise, return False.

[1, 2, 3, 4], 5 ⇒ True
[3, 4, 6], 6 ⇒ False
- Return n-th Fibonacci number (more to a different section!)

It’s computed using this formula:

- F(0) = 1
- F(1) = 1
- F(n) = F(n-1) + F(n-2)
- Most frequent outcome. We have two dice of different sizes (D1 and D2). We roll them and sum their face values. What are the most probable outcomes?

6, 6 ⇒ [7]
2, 4 ⇒ [3, 4, 5]
- Reverse a linked list

a -> b -> c ⇒ c -> b -> a
- Flip a binary tree
- Binary search. Return the index of a given number in a sorted array or -1 if it’s not there

[1, 4, 6, 10], 4 ⇒ 1
[1, 4, 6, 10], 3 ⇒ -1
- Remove duplicates from a sorted array

[1, 1, 1, 2, 3, 4, 4, 4, 5, 6, 6] ⇒ [1, 2, 3, 4, 5, 6]
- Return intersection of two sorted arrays

[1, 2, 4, 6, 10], [2, 4, 5, 7, 10] ⇒ [2, 4, 10]
- Return union of two sorted arrays

[1, 2, 4, 6, 10], [2, 4, 5, 7, 10] ⇒ [1, 2, 4, 5, 6, 7, 10]
- Suppose we represent numbers by a list of integers from 0 to 9

12 is [1, 2]
1000 is [1, 0, 0, 0]

Implement the “+” operation for this representation

[1, 1] + [1] ⇒ [1, 2]
[9, 9] + [2] ⇒ [1, 0, 1]
- Sort by custom alphabet. You’re given a list of words and an alphabet (e.g. a permutation of Latin alphabet).

You need to use this alphabet to order words in the list.
- Check if a tree is a binary search tree. In BST, the root is greater than or equal to the numbers on the left, and less than or equal to the number on the right.
Most of these are “easy” algorithmic questions, but there are more difficult ones.

To prepare, use resources like LeetCode and practice a lot. You can check my solutions to some of LeetCode challenges here: github.com/alexeygrigorev…
Not many companies use these kinds of questions for data science interviews.

On the other hand, if you interview for software engineer or ML engineer positions, you’re more likely to get them.

Check with your recruiter if you need to prepare for it.
That's it - my list of questions is over now!
@threader_app compile
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Alexey Grigorev

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!