Categories
Algorithms

Validating Flight Networks for Drones – part 2

In part-1 I described how we validate flight-networks at Flytrex to make sure that no two nodes of a flight network are too close. Now let’s turn our attention to validating that no two edges are too close.

First, how does one define “edges being too close”? What is the distance between edges?

Here’s a good definition – the distance between edges e1 and e2 is

D(e1, e2) = min D(p1, p2) for p1 ∈ e1 and p2 ∈ e2

That means, the minimal distances of all distances between pairs of points taken from e1 and e2 respectively.

Given that definition – if two edges cross, then of course the distance between them is zero. This is useful – two edges crossing is a more extreme case of two edges being too close to one another.

So how can we implement this? Unfortunately our “closest pair” algorithm for vertices from previous post is not good enough here. I was unsure what to do – and so the easiest solution like all lazy programmers – is going to stackoverflow. Note, just looking for an existing question and answer pair is not good enough – I didn’t find a solution. As I wrote previously – it’s better to not be shy and actually ask for help.

What did I learn? I saw a suggestion to use Rtree. What is that?
Rtree is a wrapper around libspatialindex, which is a library that provides you with, well, a spatial index. A spatial index is a data structure that indexes objects according to their position in space. Read here for some theoretical details.

Visualization of an R*-tree for 3D points using ELKI (the cubes are directory pages).
Taken from the R-tree page on wikipedia

So our approach to solving our problem is relatively straightforward. For each edge in our graph, we will add it to our spatial index. Then, for each edge we will look up the closest edges to it, and make sure that distance between our edge and the closest edges is still more than our MINIMAL_DISTANCE.

How many edges should we retrieve? Well, that depends. Obviously, if two edges share a vertex then the distance between them is zero, however, we do not want to alert on edges that share a vertex. So we need to get the N closest edges such that N > number of neighbour edges of our edge. For some simple graphs that is simple, but for the general case that might include all of the edges (e.g. a sun-shaped graph with a single vertex all edges share.)

Here is some sample code:

def _prepare(self):
    for idx, edge in enumerate(self.edges):
        self.node_to_edges[edge.from_vertex_id].add(idx)
        self.node_to_edges[edge.to_vertex_id].add(idx)
        self.tree.add(idx, self._get_bounds(idx))
def validate(self):
    for idx, edge in enumerate(self.edges):
        neighbor_edges = (
            self.node_to_edges[edge.from_vertex_id] |
            self.node_to_edges[edge.to_vertex_id]) - {idx}
        # The +10 there is because we want the few edges that are not
        # our neighbors. (The distance to our neighbors will always
        # be 0 and we should ignore them)
        nearest = self.tree.nearest(
            self._get_bounds(idx), len(neighbor_edges) + 10)
        for nearest_idx in nearest:
            if nearest_idx in neighbor_edges or nearest_idx <= idx:
                continue
            dist = self._get_distance(idx, nearest_idx)
            if dist < MIN_EDGE_DISTANCE_M:
                p1 = self.id_to_node[edge.from_vertex_id]
                p2 = self.id_to_node[edge.to_vertex_id]
                edge2 = self.edges[nearest_idx]
                q1 = self.id_to_node[edge2.from_vertex_id]
                q2 = self.id_to_node[edge2.to_vertex_id]
                raise ValidationError(
                    f"The edges {p1.name} -> {p2.name}" 
                    f"and {q1.name} -> {q2.name}" 
                    f"are too close to one another. "
                    f"Distance is {dist:.1f}m but should be" 
                    f"{MIN_EDGE_DISTANCE_M}m")

The result of this code worked well. It manages to find issues in flight networks, and does it in reasonable speed.

For myself, I’m happy to have added another tool to my arsenal – Rtree.

Categories
Algorithms

Validating Flight Networks for Drones – part 1

A Flytrex drone lowering a package.

At Flytrex, in order to ensure drones fly only where allowed, and do not collide with each other, they are allowed to fly only in preplanned paths. However, if our goal is to deliver food to backyards, we can’t have a predefined path from our delivery center to each back yard, that would be wasteful. Instead, we created a flight network, which is a graph similar to a railway map, only instead of trains we have drones flying on it. This way, when a drone is required to a fly to a certain delivery point, a path is calculated on the flight network graph.

(Interesting note: while we could theoretically support a 3D flight network with routes above each other, we decided for simplicity’s sake to only allow planar flight networks for now.)

In order to support multiple drones flying at the same time, we also support semaphores and locking, but I’ll cover that subject at another time.

It is critical to verify that a flight network is correct and will not cause problems. We need to make sure that no two edges in the graph cross, otherwise the two drones flying through them might collide. Additionally, no two waypoints (nodes in the graph) may be too close to each other – for the same reason.

How do we do that?

First problem – for every two nodes n1 and n2, we need to make sure that distance(n1, n2) >= MIN_DISTANCE.

Second problem – for every two edges e1 and e2 that don’t share a node, we need to make sure that distance(e1, e2) >= MIN_DISTANCE. Of course, if they intersect then the distance is 0.

The simplest way to solve this is using brute force – go over every possible pair of nodes, and every possible pair of edges. This way however, is too slow – it’s the classic quadratic complexity. Can we do better?

For nodes, it is relatively easy: find the closest pair of nodes, n1 and n2. If distance(n1, n2) < MIN_DISTANCE return error, otherwise, the flight network is ok. How quickly can we implement closest-pair() for nodes? Apparently in O(nlogn), see this Wikipedia article and implementation guide.

We still have a problem though – both implementations assume Euclidean distance – and we need this implemented using e.g. Geodesic distance.

Here, this can be solved using one of two approaches:

  1. Project our nodes n1, n2 over an Euclidean surface such P(n1)=p1 and P(n2)=p2, and Geodesic-distance(n1, n2) ~= Euclidean-distance(p1, p2).
  2. Implement our closest-pair algorithm in a way that will work well enough over the earth’s surface.

(Note: happily enough, we can assume a maximum radius of a few kilometers for our flight network, otherwise this would have been much more complicated)

I tried first to go with option (1). However, how do you implement this projection correctly? I thought – let’s do a naive projection myself, using Wikipedia as a guide. Unfortunately again, this did not pan out. I took a sample of a few points, and calculated the Euclidean and Geodesic distances between them and compared. I got errors of 0%-15%.

15% error is way way too high.

Well, I don’t know, let’s get some help. Unfortunately, I wasn’t able to get this solution working using pyproj, and after some time spent I gave up on that direction. I decided to go back and try to reimplement closest-pair in a way that would work for Geodesic distances.

That worked!

To achieve that, I needed to replace all distance calculations with geodesic distance, and be able to go from a distance to latitude delta. Well, using the same calculation from before that is not too complicated. Also, we can have this support points with altitude without much effort/

Let’s write the function definition for closest-pair in a generic typed way:

def find_closest_pair(
        points: List[P],
        x_getter: Callable[[P], float],
        y_getter: Callable[[P], float],
        distance: Callable[[P, P], float],
        distance_to_x_delta: Callable[[float, P], float]) -> Tuple[P, P, float]:

    """Find the two points p1, p2 in points with the shortest distance between them
    Parameters:
        points: a list of points from which we would like to find the closest pair
        x_getter: a function that returns the x coordinate of a point
        y_getter: a function that returns the y coordinate of a point
        distance: a function that returns the distance between two points
        distance_to_x_delta: a function that given a point and a distance, 
            returns the difference in the x coordinate to get a 
            new point that is the given distance away

    Returns:
            The pair of closest points and the distance between them
    """

In Part 2 I will write about validating the distance between all edges.

Categories
Algorithms Humour Programming Python

Rhyme and Reason with Python

After reading xkcd’s blog, I went to LimerickDB.com. It looks and works just like bash.org, but for limericks. Quite amused with some of the limericks available, I wanted to try and compose my own.
The first one came out fine, but I got stuck without a rhyme for the second one:

a man named guido once thought
he’d invent a language well wrought
indentation as tabs

Uhm, no luck there… “tabs” does not rhyme well. Then an idea came… I could create a rhyme index!
How to do it?
First, I took my already written word index. Last time I used it was to solve a substitution cipher for my crypto homework.
Then, I looked for the phonemes for each word, using espeak – a text to speech program:

lorg@kild:~$ espeak -q -x hello
 h@l'oU

After doing so, I collected words that had similar phonemes endings, and there it was, a rhyme index:

In [84]: words.get_word_rhymes("tabs", word_index, rhyme_index)[:10]
Out[84]:
['crabs', 'stabs', 'macnabs',
 'dabs', 'cabs', 'scabs',
 'slabs', 'arabs', 'snobs',
 'thebes']

Note that better rhymes are listed first – according to the length of the match.
To create this list I didn’t use stress information, although with some more work it may be done.

Back to our limerick, we can see that “tabs” does not have really good rhymes. Better luck with indentation maybe:

In [86]: words.get_word_rhymes("indentation", word_index, rhyme_index)[:20]
Out[86]:
['kishion', 'alleviation', 'expostulation',
 'machination', 'syrophenician', 'proposition',
 'infatuation', 'computation', 'mansion',
 'meditation', 'aggression', 'anticipation',
 'clarification', 'logician', 'gratulation',
 'discretion', 'quotation', 'presentation',
 'dissolution', 'deprecation']

And we get (not the best limerick I admit):

a man named guido once thought
he’d invent a language well wrought
with space indentation
and good presentation
we all got the language we sought

My original ending line was “it left all the perl hackers distraught”… but it ain’t true, so I gave it up. The ‘distraught’ was found with my script as well.
If you want to look for your own rhymes, then you can use my hacked-up script, although it isn’t written well enough for my taste. (For example, don’t expect to see documentation :)
I’ll upload a better version once I refactor it.
To save you the trouble of generating the word index and the rhyme index from scratch, I used cPickle to store them. You can use them for lookups regardless of the script.

Categories
Algorithms Geometry Graphics Math Programming Python

Elegant Line Clipping Algorithm

I actually wrote about that a long time ago, in the wiki that once was algorithm.co.il. Since the wiki is long gone, and I figured this is a very elegant algorithm, I decided I should write about it.

First, some background – while writing Ascii-Plotter, I had to plot lines. To plot lines correctly I had to clip them. Now since this was a fun project, I decided I wanted to figure out how to do that myself, even though the problem is really solved. So I sat down with pen and paper, and did a little thinking, and I came up this nice algorithm.

The idea is simply this: consider a line, that is not parallel to any of the axes. This line intersects all the four lines that make up the limits of our screen. Here is a diagram:

An intersection of a line with a bounding box

Now notice that there are six points of interest: the four intersection points, and the two endpoints of the line. For each point, we find its parameter in the parametric representation of the line, where a and b are the endpoints:

x(t) = a+t(b-a)

To find out the actual endpoints of the clipped line, we sort the six points according to their parameter, and take the two innermost points (with zero based indexing, those would be 2 and 3). What if the line is outside the box you say? We check that the result points’ parameters are between 0 and 1 (which correspond to the original endpoints).

Note that for a line parallel to one of the axes, a similar method applies, but without two of the intersection points. However, it is probably easier to just calculate the clipping directly in that case.

Here is some sample code, without all the surrounding checks (for parallelism with the axes, trivial clipping etc…).
The full version is available in the code of Ascii-Plotter.

points_t = [0.0,1.0]
 
points_t.append( float(rect_bottom_left[0]-line_pt_1[0])/(line_pt_2[0]-line_pt_1[0]) )
points_t.append( float(rect_top_right[0]-line_pt_1[0])/(line_pt_2[0]-line_pt_1[0]) )
points_t.append( float(rect_bottom_left[1]-line_pt_1[1])/(line_pt_2[1]-line_pt_1[1]) )
points_t.append( float(rect_top_right[1]-line_pt_1[1])/(line_pt_2[1]-line_pt_1[1]) )
 
points_t.sort()
if points_t[2] < 0 or points_t[2] >= 1 or points_t[3] < 0 or points_t[2]>= 1:
	return None
result = [(pt_1 + t*(pt_2-pt_1)) for t in (points_t[2],points_t[3]) for (pt_1, pt_2) in zip(line_pt_1, line_pt_2)]
return (result[0],result[1]), (result[2], result[3])
Categories
Programming Python Utility Functions

Simple word lookup for crosswords and such…

I am swamped with university homework. I did manage to come up with some small somewhat useful utility.

Do you know when you need a word that ends with a double letter and then an ‘n’? you would want to write it like this: “xxN$”. Well, I wrote a simple tool that translates a ‘word lookup syntax’ into regular expressions, and then does a lookup for the word.

This raises a two questions:

What exactly is the syntax?
Small letters stand for any letter, and big letter stand for their specific letters. So for example, “bEEn” would match “been”, “seen” and “keep”, but not “boon”. Punctuation is left unchanged, so you can use $,^, (?!xy) and so on. I didn’t work too too much on it, so [] doesn’t translate well (the letters inside it will come out garbled).

What is the data for the lookup?
Well, I used the excellent natural language toolkit (obviously in Python). Very easy to start using it, with a nice tutorial. It’s nice knowing that such a tool is available. The source data is actually an extract from the Gutenberg project. The result words are sorted according to frequency in the source language, and the 20 most frequent words are returned.

Without further ado, here are some code snippets. (The compiled regexp is printed. Note how ugly (yet simple) it is :)
(Note: I did cut out some of the output…)

In [1]: fd = words.create_word_index()
 
In [2]: words.find_words("^bEEn$",fd)
^(?P<b>.)EE(?!((?P=b))|(E))(?P<n>.)$
Out[2]:
['BEEN',
 'SEEN',
 'KEEP',
 'FEET',
 'FEEL',
 'SEEK',
 'SEED',
 'MEET',
 'DEEP',
 'NEED',
 'SEEM',
 ...]
 
In [3]: words.find_words("^jvfpf$",fd)
^(?P<j>.)(?!((?P=j)))(?P<v>.)(?!((?P=j))|((?P=v)))(?P<f>.)(?!((?P=j))|((?P=f))|((?P=v)))(?P<p>.)(?P=f)$
Out[3]:
['THERE',
 'THESE',
 'WHERE',
 'JESUS',
 'MOSES',
 'HOSTS',
 'LINEN',
 'PIECE',
 'SCENE',
 'WHOSO',
 'POSTS',
 'EPHAH',
 ...]

Note that if you find some of the result words a bit too uncommon for your taste (e.g. Ephah), I would suggest removing the bible from the source data… :)

The (very ugly, uncommented, draft code) is available here. I apologize for its unseemliness, but I figured that it’s better to make it available in its current form, rather than never put it online for lack of time and will. It could also be seen that the code isn’t terribly efficient. It is however, ‘good enough’ for interactive work.

Categories
Algorithms computer science Math Programming Python

Pagerank recommends movies

So how was your Yom Kippur? After Yom Kippur ended, I sat down to write something that was nagging me for quite some time. I wanted to see what pagerank had to say about movies. I’ve always liked to picture things as graphs, and looking at movies and actors as nodes in a graph is very intuitive. I also think that the idea has some merit. I like most of the movies that Edward Norton appears in, so it makes sense to give him a high score, and see which other movies Edward Norton “recommends”.

So I fired up my IDE, and started to work.

Well, at first I was a little bit disappointed, because IMDbPY doesn’t have code to read the IMDB charts. Well, I was going to extend it a little bit, but then I remembered that I already downloaded all the data I needed some time ago (more than half a year I believe), and that was good enough. So I opened up some old pickle file I had lying around, and Voila! after some playing around with the format, I had a weighted graph at my disposal. I weighted the graph according to the IMDB listing – if an actor appeared higher on the cast, his (or hers) weight to and from that movie will be higher.

I implemented a quick and dirty pagerank, and then added some code to score my recommendations higher. I did that by adding each iteration my recommendation’s weight to the movie’s score.

Without recommendations, the top spots were:

  • 24
  • The Sopranos
  • The Wizard of Oz
  • JFK
  • Cidade de Deus
  • Citizen Kane
  • King Kong
  • Voyna i mir
  • Dolce vita, La
  • The Patriot

After adding “Edward Norton” and “Tom Hanks” as recommendations, I got the following results:

  • American History X
  • Fight Club
  • 24
  • Red Dragon
  • The Sopranos
  • Catch Me If You Can
  • The Wizard of Oz
  • JFK
  • Forrest Gump
  • Cidade de Deus

Well, I was surprised a little to see 24 and the Sopranos rank so high, and also a little bit disappointed. The recommendations didn’t really add enough information, and more work will have to be done for this to work properly. It has the right ‘smell’ however. It looks like it has potential to be useful – I can just see the program, having a small table with a ‘seen/unseen’ checkbox, and a place to write your favorites, and the program will tell which movies you should watch next. This reminds me a little bit of the recommendations on soulseek and Amarok. It would be really fun to write a total recommendation engine, with the best of all worlds. Someone probably already implemented it.

In any case, I’ll work on it some more. This is interesting.

If anyone wants the code, tell me, and I’ll make it ‘online worthy’…

Categories
Algorithms Programming Python Sound

How PyTuner works

PyTuner is really quite simple. Here is the outline of the algorithm behind it:

  1. Record audio data with pymedia.
  2. Compute the FFT of the audio data – now we have the strength of each frequency in the sample.
  3. Find peaks in the result of the FFT. This means looking for the dominant frequencies.
  4. Find the lowest frequency that has a peak besides the zero frequency.
  5. Find the nearest ‘desired note’ (the notes we should tune to) to that frequency.
  6. Mark the difference between the actual frequency and the desired note frequency.

There is more to the algorithm. First, to find peaks, I used a simple algorithm – I computed the average strength of the frequencies in the sample, and looked at all the frequencies above two standard deviations above that average. This yields an array, where each frequency is marked by 0, or 1. (Instead of 1 you can mark it with its original strength). For each consecutive sequence of frequencies with 1 – I computed the average of the run, and that was the peak location. So for example, if the data was:
0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0,
The peak would be in the frequency matching index 5.
(If you used the original strength instead of 0 and 1, you can apply a weighted average to get a more accurate result)

To find the actual note of a given frequency, I used the fact that notes (half-tones) are separated by a ratio of 2^(1/12) . (the twelfth root of 2). So, if our base note is A, at 110 Hz, to compute the note we would use the following code snippet:

note_names = "A A# B C C# D D# E F F# G G#".split()
note_index = math.log(note_freq/base_freq, 2**(1.0/12))
note_name = note_names[note_index%12]

While writing PyTuner, at first I implemented the FFT on my own, just to know how to do it once. I found it helped me to better understand the nature of the algorithm, and its output. I find the practice of implementing various algorithms on your own, for educational purposes, quite healthy. I always like to know how things work ‘behind the scenes’.

Categories
Algorithms Programming Philosophy Python

Proofreading and what's wrong with the heapq module

Consider the following problem:

you have a large book you would like to proofread, with many chapters (100+) and a few men (4) at your disposal. How would you distribute the chapters among the men, considering that each proofreader must get whole chapters?

Well, the obvious solution is just divide the number of chapters by the number of men, and give each proofreader an appropriate number of (randomly picked?) chapters. But what if the chapters or of varying length? Well, you could just distribute them randomly, but that just doesn’t feel right now does it?

I was asked by a friend to help him write a script to solve this, and quite quickly I came up with the solution:

Sort the chapters according to length (number of words), in descending order. Keep the proofreaders in a sorted order. While there are still chapters left – get the next longest chapter, and give it to the proofreader with the least total length. Rearrange the proofreaders.

This algorithm is a greedy algorithm and is also regarded as the straightforward way to solve this problem.

Well, this seems simple enough – In the case where there are more then a few proofreaders – well, the obvious solution is to use a minimum heap. Using a minimum heap in python should be easy – just import the heapq module, use heapify and various other functions, just as you would use sort, and you are home free.

Not quite…

Turns out, that using the heapq module isn’t so easy as it seems, or at least not as quick and simple as I would like it to be. While you can sort list using your own compare function, and also providing a key-function, the heapq module sorts items using Python’s standard operators. This means, that to use heapq properly to sort an object according to some property, you have to write an object wrapper that has __lt__ and similar functions defined. All this coding could be saved, if heapq had a key argument that could be used, or any other method to control the comparisons.

And what about the proofreading you ask? Well, it worked. It divided the chapters quite evenly, although we did end up using sort() repeatedly, as the number of proofreaders was small and we did not want to overly complicate matters.

This again emphasizes a something I’ve come to discover – being able to write something quickly is usually more important then later being able to run it quickly. If you write something in an environment that allows for quick coding – later if speed is required you’ll be able to optimize it quickly, or even change the design or the basic algorithm.