Technology

Adding a Lego Brick To A Building’s Foundation

Paz Bechor
September 1, 2021

Introduction

A software engineer’s (work) life isn’t easy; it contains many ups and downs, and failures and successes—and contrary to common opinion, it usually does not involve writing code. But when it does, if you ask any developer what is the biggest challenges they experience when writing code, they probably won’t say “implementing a project from the beginning”.

I’d bet a lot of answers will indicate that every developer struggles when implementing a new feature into an existing piece of code. In fact, I will say that the challenge may even turn into a fear, depending of course, on how much the code logic and behavior have been changed.

Credibility As a Guideline

At anecdotes, we consider our credibility to be the most important thing — for our customers and ourselves. On a daily basis, our product processes and manipulates significant amounts of data that are supplied by a large number of plugins — Cloud services, HR tools, developer’s tools, and many more. Each plugin “collects” multiple pieces of evidence that contain data that is relevant for Compliance processes.

In order to to get data in each plugin, we usually involve regular HTTP requests. Just send the request → get the data → process it as you wish → save it. But 3 questions have to be asked — since our credibility depends on that:

  1. Is the data reliable?
  2. Is the data relevant?
  3. What did you do in order to get the data?

So first of all, answering the first two questions is pretty easy — Yes and Yes. anecdotes plugins are designed by InfoSec experts, so the data collected from each plugin contains relevant data for audits.

But in order to answer the last question, you’ll just have to read this article.

Exposition — Some Tech Background:

At anecdotes, our BE side runs over GCP infrastructure (Cloud Run), using Python as a programming language. We also have multiple micro-services collaborating. However, in this article, we are going to focus on only one of them — the data collector. In order to get access to third-party service data, there are some steps that have to be done:

  • Get service connection params — several ways are available, such as Access token, OAuth, API key, et cetera.
  • Connect to the service & validate the connection process was successfully done.
  • Get the service relevant data — this is done by sending multiple HTTP requests.
  • Save the data & process it as desired.

This kind of process has to be done for each plugin from our plugins, which means it has to be done a lot.

Some plugins class code examples, to warm it up:

The Mission

At some point, we were asked by auditors who work with the platform to present the real API call that is sent in each request (for each piece of evidence that’s collected in the plugin collection process). That means we have to cover up the whole plugin's requests and display it, obfuscated of course, as sometimes secret params are being transferred in the URL.

Evidence Collecting Function

As I said before, each plugin “run” contains multiple evidence collections. Here is an example of generic evidence collection, by a simple get request:

Implementation

In order to “catch them all”, which means all the API queries are sent in our plugin’s collecting process, we had to hook on the lowest common level which appears in the request module. In our hook, we practically add the API query that has been executed just now to a list in our object class.

Simple as that?

Unfortunately, no. We faced some challenges in our way to credibility. Let’s elaborate:

  1. Request context — Our product maintains connections with many internet services. One, for example, is our DB, which is based on Snowflake. That means that besides the request we want to observe, we might accidentally document other requests that are being sent, asynchronously, in our GCP running instance. Or in other words, a BIG no-no.

We managed to overcome this challenge by using the inspect module. Let’s dive in:

The inspect module provides several useful functions to help get information about live objects such as modules, classes, methods, functions, tracebacks, frame objects, and code objects. For example, it can help you examine the contents of a class, retrieve the source code of a method, extract and format the argument list for a function, or get all the information you need to display a detailed traceback.

So by using it, when adding the API query in our hook (as described earlier) we verified that our object “running context” is the plugin’s class, as you can see in the following example:

2. Handling multiple plugins — It seems trivial as our solution already supports it, but it wasn’t at the beginning. As our platform runs over GCP Cloud Run, we support multiple requests per instance. That can cause unintended multi-thread bugs. Any time multiple requests are received in our running Cloud Run instance and are running concurrently, multiple HTTP requests are sent, in order to receive data to fulfill the plugin’s evidence.
In the beginning, we implemented the hook so it was “turned on” with every HTTP request (using contextmanager). That caused the hook to be turned off by one plugin, although another plugin was in the middle of its run, or in other words — a multi-threading error.

This challenge has been resolved as we implemented the hook as described earlier (with the inspect module). That's as a result of the logic — each object we have is independent and contains its own query list. So when multiple plugins run concurrently, the inspect module will save the API query in the currently running object.

3. Caching — In order to reduce request amounts, we’ve created a “cache manager” which is managed per plugin instance. Every request we make, it’s watching us. So practically, we save the data result in a simple dict. For this reason, some evidence API queries won’t be as they actually were. Let's simplify it:

  • First evidence — Collect all users in the plugin. Save it to the cache. API queries, for example, will be: [“GET /api/v1/users/all”]
  • Second evidence—Collect all users who have joined the company in the last year. So we should get all users, and then filter it by date. User lists will be returned from the cache. No request has been done, so the API query list will be empty.

The solution for this one is pretty simple—just add an API query to the cache, any time we push a new value to the cache. On the other hand, each time we get a value from cache, we need to update the current API queries that had been executed, for current evidence.

4. Obfuscating — This is another issue that might seem to be minor, but can actually lead to security leaks. There are some plugins that expose secret data, such as tokens, and pass it as a query param in the request’s URL. In order to deal with this one, all one needs to do is define a blacklist and obfuscate it.

These were just some of the challenges we faced when adding new features into existing code. But instead of allowing our nerves to get the best of us, we kept tweaking and optimizing. We also learned a lot about how to plow on, despite complexity. I’d love to hear about your experiences with adding new features into code. How did it go? What did you learn about yourself and your team?

Paz Bechor
Software engineer at anecdotes, problem solver, challenge-driven. Passionate about programming.

Our latest news

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Non eget pharetra nibh mi, neque, purus.