I’ve interviewed hundreds of software engineers and reviewed 1,000+ resumes, especially SDETs – Software Development Engineers in Test and Software Tooling Automation developers. Here’s a rundown of the playbook I use when interviewing an SDET candidate. I’m publicizing this playbook as a means of giving back to software engineers affected by COVID, forced to now look for work. Writing this post also serves as a forcing function for me to change up my interview specifics moving forward. Though my interview will change, the competencies gauged and leadership principles assessed will remain intact.
What this is: An assessment of engineering competency / ability via a mock interview
What this is not: This is not a skills assessment, so it won’t go into cloud or specific technologies unless it is prominent in your resume and seems like an interesting topic we can explore.
Instead, this is an assessment of engineering competency / ability and motivation.
How to follow this guide: Treat this as a mock interview with some educational snippets
Each question has 2 videos.
1. Question Video describing the question / scenario: To emulate a real interview experience, it is encouraged that you try and formulate an answer here on your own. Visualize yourself being asked this question and rehearse the way you’d answer it.
Play hard, practice harder.
Really, I’m trying to get you to emulate a real interview by first practice answering the question before going on to the explanation video. I’m going to try and stall you one more time.
“The more you seek the uncomfortable, the more you will become comfortable” -Connor McGregor
Try and formulate your own response even if it may be uncomfortable before going to the explanation video.
2. Explanation Video describing possible responses to the question
Structure of the Interview
- Welcome + Self Intro
- About the Candidate
- About the Candidate’s Role
- Play Pretend
- Automation Framework Scenario
- Followup threading question
- Coding Question
- Bonus Question – Streaming Logs
- Your Questions & Thank you
Welcome + Self Intro
After a brief warm welcome thanking you for your time and that I was looking forward to our conversation, I proceed with a self introduction of what I work on. This includes:
- 2 sentences on the company
- 3-5 sentences on the products I directly work on
- 2 sentences regarding the team
- Brief pause
At this point, I’m interested in your general level of interest or curiosity. Do you just say “Ok”, do you remain silent, or do you make a comment or better yet, ask a question about any of the topics I highlighted? If so, this could be an early indicator that the candidate is both socially aware / emotionally intelligent and also has some level of curiosity. To me, curiosity is strongly correlated with motivation.
Not asking a question at this point doesn’t hurt you, because it could mean that you don’t want to interrupt me. Different cultures, different norms. Some really strong engineers (hired guns) may not ask a question at this point either because they just want to answer technical questions and showcase their strengths there.
About the candidate
I now ask a question to find out about you. Not the engineer. The person.
- “Really impressive resume! Can you tell me something that’s not on your resume that you think I should know about?”
About the candidate’s role
- “What are your current responsibilities?”
- “Let’s pretend that I’m clueless about the technical product that you test. How would you describe the product to me as if I were 7 years old?”
Listen. Here, I listen to the breadth and depth of the answer as well as the methodological nature in approaching the answer. Are you clearly communicating and articulating a response? Are you someone who is empathetic to your users and understands how to convey a meaningful response catered to your audience? This is important because SDETs often build tools for software engineers and it’s vital to understand your users’ pain points so that you can deliver meaningful tools. Perhaps, you decide to draw a picture for me, the 7 year old. That shows you’re willing to extend yourself and think outside of the normal realm. This could also mean you’re an engineer who documents their architectures and draws visual diagrams as well.
Trust is everything on a team – I may not have time to dive into the weeds of a technical discussion as much as I’d love to. Often, I’ll purposely remove myself from such conversations so that the team feels empowered to make their own choices. I need to trust in my employee’s abilities to succinctly communicate with me the essence of a bug, solution, or situation. Will this person be able to represent the team in a professional manner? Does this person convey mastery of what they are talking about?
Doubts / lack of clarity: At this point, if I have any doubts about what you’re telling me, then I ask for clarifying questions. I drill deep here and quote the resume coming from a place of curiosity. Details are vital in Quality Engineering. Details save months of engineering time per year. Details are the difference between succeeding and failing in Quality Engineering.
Details = dollars.
Now, what triggers doubt or the need for me to ask clarifying questions? I rely on my gut feelings here as well as past interviews. As humans, we can sense when something is not authentic and when there is some slight hesitation. We get this twinge moment – “something is not quite right… Maybe I should just skip through this and continue with the rest of the interview. It’s probably not a big deal”. That’s an important moment to get clarification on.
Common ground – Play pretend that the candidate had a different role at a different company.
“Let’s imagine that you’re the head of quality for a cloud storage service like Google Drive or Dropbox. What are your top 3 testing priorities or top 3 features to assure the quality of?”
Let’s be honest, I still don’t know much about the company / product you currently work for. Let’s try to find some common ground. Given that most people are familiar with cloud storage services for uploading documents, photos, and other files, let’s use this common ground to assess the candidate’s ability to prioritize and succinctly break down a problem. While there’s no right 3 answers, there are some answers that weigh more than others. One hack is to use answers that end with “ity” (data integrity, security, reliability, scalability, usability, accessibility, performance-ity ha! ). Let’s take a look at the product once more.
Cloud. Storage. Service.
- Data Integrity: This is a storage product so if the storage doesn’t work, we may as well close shop. A good answer here is that you will ensure files viewed/downloaded are the same exact content as files uploaded. When I dig into this answer later on, you should describe how files / chunks will be compared to ensure data integrity. Checksum or hashes should be mentioned. If you can’t mention that, then you need to have a really strong answer for how to compare various file contents and their metadata.
- Dig deep: If the candidate is comfortable and strong in this answer, I go into a discussion about a hash lookup for checksums and how that lookup table must reside in a finite storage (eg limited to 1 node) that is a function of the number of entries (rows) in the lookup table. I’ll ask how you’d design this lookup so that the size (total rows) is bounded. Good points here are discussing the possibility of reusing hashes for the same files, in which case, hash collisions must be dealt with.
- Security: If you mention security, then I like to understand “what does that mean?”. Here, my expectation is that the candidate would explicitly mention “authorization” vs “authentication” or provide some examples that illustrate those 2 points.
- Reliability: If you describe the importance of handling errors, data failures, disk drive issues, network errors, the need for replication, then this suffices. Bonus points for discussing tradeoffs of the CAP theorem.
Automation framework scenario
At this point, we’ll go into a scenario where I’ll ask a few follow-up questions.
“We have an API server that takes REST requests for performing typical cloud storage operations like login, list files, upload, download, and sharing. We have 100 API tests written in an NUnit like framework like JUnit, test1.java to test100.java”
Question: Linux / shell command (less than 5 minutes): Given access to a terminal and a test results log file, I’d like for you to count the number of failed tests. The test results are written to a single log file and that log file has the following format:
NOTE: Each line number corresponds to the result of that test case number. For example, line1 of the log file indicates “pass”, meaning that test1.java passed successfully.
I’m looking for a quick solution here that leverages some common linux commands like “grep”, “find”, “cat”, “wc” and perhaps piping the output together. The caveat here is that we’re looking for the number of failed tests. Simply saying “grep” or “find” is not sufficient unless you can get me the counts.
grep fail test.log | wc -l
Followup threading question
Question: The 100 REST API tests take 100 minutes to run. That’s too long for our developers. How can we speed this up?
This is an open-ended question that I hear all kinds of interesting responses for.
Some candidates ask if we can deprioritize certain tests and not run all of them. To which I respond that this is not acceptable in this scenario. What I’m looking for here is that the candidate understands the concept of parallelism / multi-threading, as well as execution environments.
- OK answer: Split up test environments and application under tests environment
- Keep the tests running sequentially in a non-parallel manner but have test1 – test50 running against app server #1. Test51 -> test100 would run against server #2. This is a nice thought but I’ll usually follow up and let the candidate know we may not have enough funding to support the new machines
- Good answer: Ask me if the tests have any shared state or possible race conditions / deadlock if the tests are run in parallel. If there’s no shared state between tests, then propose running the tests in parallel with custom code or by relying on the test framework like TestNG to parallelize the execution of test suites.
- Creative variant: Ask me where the time is being spent during the test. Is the AUT slow? Maybe we can parallelize the API server code. It’s a good sign when the candidate questions the code of the application under test. At this point, I’ll let the candidate know there’s nothing we can do to improve the AUT code, so I’d expect them to modify the test code / environment at this point.
- Others: If you propose Docker at this point, I’m going to grill medium-well to understand how specifically would Docker help us improve performance. Maybe you’re on to something or maybe you’re just rattling off different buzzwords.
Execution environments: You need to be empathetic to “this works for me” and develop an automatic tracing mechanism that questions not only the AUT code and test code, but also the various environments and stacks that the AUT and tests are running on. This means understanding the impacts of various hardware / network configurations on test latencies.
- Run the test client and API server (Application Under Test) on different servers
- Beef up the test client and API server machines with more memory, better CPU
- Ensure the environment has a fast network
There’s no one correct answer here. Whenever you propose a possible solution, I may say “ok, so we’ve now reduced the test execution time from 100 minutes to 99 minutes. Can we improve this any further?”
Question: Only when we run the tests in parallel, we see random test failures. How would you go about debugging this? Let’s now assume that we’ve re-written the tests to execute them in parallel. For example, we now have 100 tests being processed by a thread pool of size 10, so at any point in time, there will be 10 concurrent tests running. When we run the tests with a threadpool size of 1 (sequential no parallelism), the tests all pass though it takes 100 minutes (too long). When we run the tests in parallel, there are random failures.
Here, I’m looking for candidates with a capacity that doesn’t rule anything out. I’m also looking for you to clearly state your attack vectors, including
- Application Under Test code (is it not written to handle concurrent requests, shared state, deadlock). Look through server logs and server code to assess this.
- External Dependencies (if there’s a Database the API server is reliant upon, is that introducing some error in the parallel scenario?)
- 3rd party libraries
- Test Code : Is the test code written in a way that relies on shared state and overrides that shared state, thereby causing random test failures? Look through test logs and code to assess this
- Machine / network environments: Is the AUT environment or test client beefy enough to handle the parallelism? Discuss the constraints of CPU, memory, and network
Input: "Hi my name is ABCD"
Output: "iH ym eman si DCBA"
At this point, I’ll share a Google Doc with you. That’s right, a Google Doc to write code in. It won’t have the niceties of an online code editor / IDE like CoderPad. A Google Doc will auto capitalize the beginning of the statement and that can be annoying. That’s OK. I want to see how you deal with annoyances and road blocks. I’ll also make it very clear that I’m not concerned with correct syntax / spacing because I know that Google Docs isn’t meant for that and that’s also not the important part of the coding question.
Input: "Hi my name is ABCD"
Output: "iH ym eman si DCBA"
Restriction: No usage of built in library to call string.reverse(). If you want to reverse a string, then build your own function to do so.
Half of the candidates quickly come to a solution where they simply reverse the entire string. This is technically not correct. I also don’t hold it against them. You’ve been through a long interview and you may be nervous. I’ll ask you to go through some iterations of your code with the input string. At this point, they’ll realize the coding question is not to reverse the characters in the string, but to reverse the characters in each word of the sentence while preserving word order.
I’ve included some sample code answers in the following github gist
At this point, I’ll ask 2 follow-ups:
- How would you test your function? What are your test cases?
- I’m looking for a breadth of creative answers. Bonus points if you keep coming up with test cases until I have to ask you to stop and we can proceed with something else.
- What is the algorithmic complexity / runtime performance of your function?
- I’m looking for BIG O notation here and an explanation of why. Even if your answer is correct, I’ll challenge you here with “how confident are you that this is the correct answer?” I want to see how you deal with the pressure that comes from being in a quality oriented role. There’s the pressure of doing your job and the pressure of making the developers and business happy by saying “it’s not a big deal. It should work fine.”
Bonus Question – streaming logs
We have an app server and logs get generated all the time. We want an API endpoint “/logs” that streams the 200 most recent log messages from the app server back to the client. This API endpoint “/logs” gets hit very often from multiple clients so it must be lightweight in terms of memory footprint (store no more than 200 log messages in memory at a time) and must be performant. How would you design a solution?
Explanation: Try and formulate your response without access to an explanation video. Kudos if you comment on this post with your answer.
Your Questions & Thank You
If, at the end of the interview, you are asked if you have any questions, ask a question and thank the interviewer for their time. Showing interest goes a long way. As an interviewer, I make it a point to also sell the role and ensure the interview ends on a positive note. Speaking of positive notes, it’s time for Flamenco guitar!
I’ve included some sample code answers in the following github gist