Categories

## Working with Intervals

Over the last couple of months my team at Flytrex had occasion to use intervals more than once, and in both cases the team asked me, “What’s the right way to solve this?”. Since this is a common problem, I thought I’d write a short post about it.

Most interval problems seem deceptively simple, while they require more work than expected. Also, programmers aren’t always aware they are working with intervals.

Here are the two problems:

• Given the current time and a list of opening hours for a business, determine if the business is open or closed, and if it’s open – say when it will close, and if it’s closed say when it’s open. For Flytrex this is further complicated because we need to find the intersection between a restaurant’s opening hours and the opening hours of the delivery center.
• Produce a shift report for a delivery center – show the list of opening shifts (period when the center was active) – and the total flight duration for each shift (by summarizing the flight time for each delivery for each shift.)

First let’s define intervals – an interval is represented by a pair of numbers (a, b) such that it includes all the numbers between a and b. An interval may be closed or open on either side, e.g.x <= b for a closed interval or x < b for an open one. In addition, it’s sometimes useful to allow a to be -inf and b +inf.

Now we can define operations on intervals – union, intersection and inversion, such that the input and output of each operation is a set of intervals.

As an anecdote – one of the questions I ask candidates in Flytrex includes an intersection between intervals – check if two intervals intersect. The naive solution is “let’s map all the cases”, which is hard. The easier solution is to understand the cases when they don’t intersect (b2 < a1 OR b1 < a2), and then NOT the result.

The first trick for solving interval problems is to realize they are interval problems. One of our developers asked some friends about his particular problem – and most suggestions revolved around lookup tables and binary search, which might work but are complex and not a direct solution.

The right way to solve an interval problem is using an interval tree. The second trick is of course not to develop one yourself but instead to find a library. For the developer faced with the interval problem – it transformed a very complex problem to 5 lines of easy code. My hope with this post – the next time you are faced with an interval problem – you’ll know how to solve it the right way – which is also the easy way. Let me know if you do!

Categories

## How we deploy with Git

It seems common practice to have a staging and production branches for deploying your code. A common pattern is to push (or pull request) to these branches, then merge the changes. Then, some system watching this branch will notice, and deploy to the appropriate environment. (Another way this is done is with tags, but I will not get into it here.)

For us it is slightly different. At least for one of our projects we have multiple deployments, where each can be on a different environment. In addition, it is useful to have multiple staging environments for use by our R&D and product teams, and multiple “pre-production” environments for use by our lab team. So we have staging, staging2, staging3, and lab, lab2, and possibly soon lab3 as well.

One problem with merging to deploy, is that merges create a new commit ID. Rebases are even worse in that regard. When I’m deploying version 1.5 to staging2, I definitely don’t want to change version 1.5, and I want the commit ID to be identical to the one in the version 1.5 branch. When you merge, you accrue merge commits, that are different between various deployments.

So our solution to these problems:

• All deployment branches are named deploy/ENV_NAME, e.g. deploy/lab or deploy/staging.
• In order to deploy say, release/release1.5 to deploy/staging, we follow the following instructions:
• `git fetch`
• Find out the the commit hash of the last commit of the branch to be deployed. You can do this in the Github UI by switching to this branch, or run something like:
`git log origin/release/release1.5 --pretty=%H -1`
• `git push -f origin COMMIT_HASH:deploy/staging`

This last command is the interesting one. It tells git to have the deploy/staging branch on origin to point to COMMIT_HASH. This is incredibly useful – with this all your deployments of a given version will have exactly the same hash, which will easy version tracking.

The downside is that our git history doesn’t reflect deployment history – but that’s ok – it shouldn’t. It should only reflect only the code history. Deployment history is kept by our deployment software – in our case Jenkins.

I’m interested in learning of more deployment strategies. How do your deployment environment look like? Are you SaaS with a single production? Cloud based with dedicated servers per customers? On premise? What deployment approach works for you?

Photo credit: Bill Jelen on Unsplash

Categories

## 5 Tips for more effective logging

Logging is a critical part of every serious project. If logging is not important in your project – you’re probably doing logging wrong. Here are a few lessons I learned over the years running multiple projects.

## 1 – Reserve ERROR for errors, and everything that is not a bug in your code shouldn’t be an ERROR

Every log line has a log level. The most important distinction in log levels is between ERROR and everything below ERROR. The following logic should guide you – an ERROR log line should indicate a bug in your code. If there’s an event that generates an ERROR log line which is not an indication of a bug in your code – this should not be logged as an error.

Furthermore, you should spend effort making sure that every recognizable error should be logged as such. So, most handlers should be generically wrapped by an appropriate logger, and your 500 logger or equivalent should naturally emit ERROR logs.

## 2 – User input validation failures should be warnings

As a natural result of our first tip, user validation failures shouldn’t be logged as errors. They are your code doing what it should be doing. However, they still merit more than an INFO log line. So use WARNING here. Other events can also be logged as WARNINGs, events such as resources running low, a fallback being used, etc.

As a natural outcome of the first two tips, we come to tip no. 3:

## 3 – Alert on errors, or on multiple warnings

So, your log-levels are now correct. The next step is getting notified whenever an error happens – this is an indication you have a bug in your code. But you don’t want the same error happening a lot to flood your inbox (or whatever other reporting mechanism you use.)

You can de-dup your errors yourself, for example, by hashing the call stack. Alternatively, use a service such as sentry.io to do that for you. You can now send notifications such as E-Mails and text messages when new errors appear.

Once you have that, you can also consider getting alerts for warnings that happen too many times. For example – if a particular user input validation fails often then perhaps your UX is broken. If a fallback happens too many times then perhaps your main flow is not robust enough.

## 4 – Make your logs informative

Be liberal with adding info logs. At the least, all cross-service and requests to your API should be logged. Other major events/decisions should probably also be logged. Personally I’d probably prefer O(1) per call to my API (i.e. don’t INFO log in a loop).

Independently of that, take care to include all the useful information you can in your logs. That includes file, line, perhaps all or part of your stack trace, and so on. The text logged should also be informative – if a particular value is incorrect log it and the desired value (be careful of privacy concerns though!)

## 5- Aggregate all logs into a single searchable database

Having a single, searchable log interface, instead of separate ones is critical. Being able to understand the complete flow of an issue is in many cases dependent on you seeing all the relevant information together. Having it searchable will greatly speed up your ability to find issues and fix them. Today at Flytrex we are using logz.io, but there are quite a few other effective solutions.

## Bonus section

• If your project involves two or more people, decide on a logging policy explicitly.
• There’s a big difference between logging in libraries, tools that run once, or long-running programs. Each one has different needs.
• For cases when your logs are not perfect (and they never are), a tool such as rookout is very useful. It allows you to set a “logging breakpoint” anywhere in your code – without redeploying it. This already saved me hours of debugging.

Photo credit: Wood photo created by onlyyouqj – www.freepik.com

Categories

## QA by Child

I recently published a home project I was working on, an app to teach children to read Hebrew. I wrote it originally to help my son learn to read Hebrew.

In an early version my son was very excited to play it. He quickly understood the principle – see a word, then tap one of four emojis this word describes. Every time you tap a right answer, you get a few more points, which in that early version were displayed prominently at the top of the screen.

It took him less than five minutes to find a “cheat” – if you tap the right answer very quickly many times – you get points for every time you tap it, as long as the “correct answer” animation is running and the word is not changed.

It reminds that a few years back I was working on Desti and when I gave that same kid an early version of our iPad app – he broke it in less than 30 seconds just by moving his hand on the screen and touching everything at once.

Generally, if you have a GUI, one of the ways to find issues is to let a kid hack at it. One reason is that GUIs have the curious property that changes take non-zero time, and usually buttons are not disabled once they are initially tapped. As a result – you sometimes get the effect multiple times – which can result in extra score, multiple transitions, repeated actions on now incorrect state, and so on. In the extreme case this can lead to resource exhaustion very quickly and your app crashing. I’ve seen that happen to my app!

What else do you get by giving your app to a child? You can see very quickly if your UI and UX are clear and easy to understand. If you need to explain what needs to be done – it’s not good enough. That’s true in general – and doubly so for a kid. If your kid gets it on their own – you did something right.

More deeply than UX- you can learn if your gamification works. Is your app/game engaging enough? Does it invite gameplay? Does the meta-game encourage repeated plays? It took me a lot of thought to get my reading app to work well – and it’s far from done.

Categories

## LearnLang – a small chrome extension for learning the German cases

I’ve been learning German for quite some time now. Some months ago, it came to the point where I was stuck – in order to progress I had to learn the German cases by heart.

It’s not a lot of data, and being able to understand it is relatively straightforward, however knowing it actively as part of a language takes practice.

My main sources of German practice are Duolingo, books and music. Both books and music contribute to passive knowledge rather than practice, and Duolingo just wasn’t focused enough. I decided to write something myself. It was a small itch I had to scratch!

Ideally, I just wanted exercises that given a sentence, I would have to pick the correct form of der/das/die/den/dem/des whenever it appeared. This should apply to ein/eine/eines/einer/einem/einen, dein/deine/… and mein/meine/… etc. you get the point.

To achieve that, I wrote a small chrome extension that would process a page, find all the pieces of texts to replace, and add a bit of dropdown html instead of them. Then you would pick the right option in the dropdown – it would turn into the right word with a green checkmark, otherwise you would get some toaster message saying you were wrong.

Since these days I have a full time job plus two kids – I wrote this mostly during train rides and a couple of evenings. Doing this allowed me to lean how to write a chrome extension (it’s really easy), but interestingly enough, there is a small challenge there I didn’t expect: how to regex-search through text nodes in a given HTML document and to replace the match with some HTML? The solution is apparently non-trivial.

If you decide to take the old text, add some tags and then old_tag.innerHTML = modify(text_data) you are in for a nasty surprise. If that text_data contained html tags as text – they would now be parsed as HTML. This is at best a bug, and at worst a security risk. It would appear to work, except when it won’t. Unfortunately, a lot of answers on stackoverflow suggest you do exactly that.

Well, as a lazy developer – I used somebody else’s answer, almost as is. It wasn’t even the selected answer – the selected answer used innerHTML :(

Here is the extension itself, you are welcome to try it out, e.g. on RotkÃ¤ppchen (AKA “Little Red Riding Hood”).

Categories

## Writing a pandemic simulation

Over the last weekend I felt like programming something fun and easy, so I thought, why not write yet another pandemic/epidemic simulation.

So between helping a crying child and preparing lunch, I created simpandemic. It’s small, simplistic, but easy to play with and change parameters. As a toy project, it’s far from perfect. I implemented infection based on distance rather than collision detection, like some other simulations do, and optimized it using a grid and not a tree structure (e.g. rtree). However, it works, it is playable and very much tweak-able.

Right now it depends on pygame, which is great fun, but a bit of a pain to get it working on mac these days.

Feel free to download it, fork it, play with it, whatever. I’ll accept fun pull requests in case these actually come.

Stay healthy, stay safe, stay home!

Categories

## How I learned to stop worrying and actually use StackOverflow

So apparently almost all of the developers in the world are using stackoverflow. However many developers just use StackOverflow to lookup answers, and rarely to ask their own questions. Answering other people’s questions is of course rarer still.

Up until recently I was the same: I wrote a few questions in StackOverflow, and even answered a few, but by and large I was using it to find existing answers.

This week something changed, something broke. In a way, I stopped caring. I had a problem, I didn’t find a solution fast enough, and decided, “what the heck, the solution is not obvious, I’ll just write a question”. Also, if the solution is obvious to someone else – that’s even better, I’ll learn something.

And so I asked my most recent questions, about distances between 2D segments, projections, etc. I’ll cover this subject in depth in a future blog post, as this one is about StackOverflow.

Writing a question on StackOverflow has a few advantages over not writing it. The most obvious one: you might actually get an answer! Here is a good example, my most recent question. The less obvious is that you get to put down your question in writing which just like in rubber duck debugging and that would help you with solving this problem, and practice the skill of asking the right questions.

Also important to mention – you have nothing to lose but a little bit of time. As long as your question is real and you are not clueless, asking a question will not reflect badly on you in any way, quite the opposite.

What actually surprised me is the gamification of StackOverflow – you get points for participating. I already knew about it, but I was surprised at how effective it is. Here is where I am at the time of writing this post:

Participating on SO is surprisingly addictive, and as a close friend told me there are additional advantages: once your reputation is high enough – you start getting job offers, and you can actually use that on your resume/CV (if using them is a thing you do :)

My advice to any developer reading this: you are already looking up answers on StackOverflow. If you don’t find an answer, don’t just move on. Before you do – write a question. Even if you do move on, you’ll get something valuable from it.

Categories

## Back to writing

So apparently my last blog post was from 2012. That’s quite a bit of time.

Since then I’ve:

• Sold my startup Desti to HERE
• Moved with my family to Boston
• Moved back to Israel, join Cymmetria, first as VP R&D and later as CTO
• Left Cymmetria and joined Flytrex as VP R&D

It’s not a long list, but it covers a lot of ground. Right now, Corona virus notwithstanding, I’m pretty excited about the work we do at Flytrex: we’re building a system for food delivery via autonomous drones.

Here is a short video that shows what we’re working on:

The video is by now 11 months old and the system changed a lot since then, and our main challenge right now is getting this system working in the USA.

Learning from my experience, I want to start writing regularly. To achieve that, while I will write mostly about programming, I will also write about other areas of interest. Let’s see where this new adventure takes us. Onwards!

Categories

## Starting from Scratch – Part 1

About a month ago, someone I met asked me if I could give him C programming lessons. To protect his privacy I won’t talk about him much, besides saying that he is a serious guy and he doesn’t know programming at all.

The first thing I did was give him a general explanation of what C does, the compilation process and so on. I then showed him a simple program, and showed him how to compile it, and how to run it.
I think the first thing beginners see when they start to program with C is all the “unnecessary voodoo”. For someone just starting to program, “#include <stdio.h>” doesn’t make much sense, and he shouldn’t be bothered with it. The same applies to “int main(void)”. So I just told him to ignore that for now, it would be made clear later on.

The next thing I did was give him a C book, and telling him what to read, and more importantly, what not to read. Teaching books suffer from a kind of “split personality” problem – they try to be both a tutorial and a reference. They also cover things fully the first time around. For example, consider variable types.
A first timer doesn’t need to know about anything besides int. He really doesn’t need to know about size limitations. He shouldn’t be bothered by these issues in the first few lessons.

As I see it, the order of things to learn is:

• General information such as how to compile, etc…
• Meaning of variables, and variable assignments.
• if-else, and then do-while.

So we covered these subjects in the first two lessons, and he was able to write some simple programs, such as compute the average of two numbers, and so on.

We wrote a simple ‘guess the number’ game together, and I used this opportunity to explain if-else by writing conditions for cases such as “warm” or “cold”, “too low” and “too high”. Simple while loops were explained as well: “keep playing as long as the number was not guessed”. We wrote some more programs after that.

As homework, I gave him the following problem:

Write a program to compute an average of numbers.
It will read numbers from the user, until the user gives the number -1.
After it finished reading the numbers, it will print their sum, and their average.
If the average is in the range, print: 0-55: failed, 56-72: could be better, 73-85: ok, 85-95: good, 95-100: excellent.

I knew this will be problematic, and indeed it was. He wrote a program that asked for three grades, and averaged them. It went on asking again until the grades 100, 100, 100 were entered – he thought that was nicer than -1.
Well, a few days ago we sat together, and I understood the problem:

He knew how to write C code, but he didn’t know how to program!

I simplified the problem. I asked him to write (during the lesson) a program that reads numbers from the user until a zero is entered. It will then output how many ones were encountered. He couldn’t write that as well, and that’s no surprise – he had a big concept to understand.

So I told him, let’s play this together. I will tell you numbers, and when I finish, you will tell me how many ones were there. I gave him ten numbers and I noticed he was counting on his fingers!

Grasping this opportunity, I told him to explain exactly what he does. Then I asked him to give me instructions on how to count numbers. The instructions were “read a number onto your left hand”, and “if the number on your left hand is one, set the number on your right hand to be the number on your right hand plus one”. It took some effort for him to understand this, but he did. I then asked him to do the same, but now summing the all of the numbers with the right hand.

I then showed him how I translate these instructions into C. I asked him for the definition of an average, and then, looking at the code on the screen, realization dawned on him. He was amazed – he managed to calculate the average of an unknown number of numbers!

It was amazing for me as well – it’s not often that you get to see someone understand a big concept such as programming.

After the averaging program I asked him to write a counter program – given a number, print all the numbers from zero up to this number. When he got stuck, I asked to turn off the screen, and again give me instructions on how to do the task. That he was able to do. Translating these instructions to C wasn’t that hard, and some 30 minutes later, he had a working program. Success!

Categories

## Fractal Memory Usage and a Big Number

In a previous post I said I’d talk about 4**(4**(4**4)).
First, about the number. I first saw it mentioned in a math lesson back in high-school, when the teacher wanted to demonstrate estimation abilities.
He wrote it on the board, like so:

4444

and then asked students to estimate how big that number is. Turns out this number is much bigger than most students thought. (For comparison, the number of atoms in the observable universe is 1080.)

Given that this number is so big, it should come as no surprise that computing it will consume all available memory.
What is interesting is the way the memory is consumed:

(It died with a MemoryError just before the next “jump”.)

When I first saw this I was fascinated. To those interested it is generated by the long pow function in Python, implemented in longobject.c. According to the comment there, the algorithm is a left to right 5-ary exponentiation, taken from “Handbook of Applied Cryptography”.
In short, the simpler algorithm “left-to-right binary exponentiation” (used for smaller exponents) repeatedly squares the accumulator, and according to the binary digits of the exponents, multiplies it by the base. 5-ary exponentiation basically does the same thing, but in each iteration it ‘eats’ five digits of the exponent instead of just one.

It might be very interesting to study algorithms and code according to their memory usage. Once at work I used a similar memory usage graph to optimize the memory usage of some algorithm that was eating too much memory. I don’t remember the details as it happened a few years ago, but I do remember that I recognized the “type of the memory eating”, which helped me to narrow the problem down to a specific area of code, where indeed the problem was located. (I consider this a case of programming-fu :)

Notes:
1) The reason I’m talking about Python is because it has arbitrarily sized longs built-in.
2) This graph was generated using Windows’ task manager. On my Ubuntu machine it looks quite different, almost a straight line. Any guesses why?