<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: A Simple Race-Condition</title>
	<atom:link href="http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/</link>
	<description>Algorithms, for the heck of it</description>
	<lastBuildDate>Tue, 21 Jun 2011 21:07:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
	<item>
		<title>By: ObiWan</title>
		<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/#comment-194</link>
		<dc:creator>ObiWan</dc:creator>
		<pubDate>Thu, 03 Sep 2009 16:59:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.algorithm.co.il/blogs/?p=263#comment-194</guid>
		<description>I&#039;m quite late on this, but I think that there&#039;s a consideration which needs to be done and about which you gave no infos; that is the cost in terms of computational time and network traffic related to a cache fetch, a record lock and a record insert

See, your solution (that is, ignoring the error) may be ok, but it may cause unnecessary network traffic due to the &quot;insert&quot; attempt, so if such a traffic is a problem it may be worth considering the use of a lock

Another possible solution may be the use of a shared queue managed by a separate process which will then perform in background the dequeue/insert tasks; I mean something like

result = cache.select(input)
if result:
    return result
result = enqueue(input)

so the code will just check if an entry is already cached, if not it will pass the input value to the queue handler which will call the &quot;compute&quot; function, store the result into the cache, enqueue the element and return the result at a &quot;later time&quot; the queue handling process will then, in background, proceed to dequeue one element at a time in FIFO style and insert them into the database</description>
		<content:encoded><![CDATA[<p>I&#8217;m quite late on this, but I think that there&#8217;s a consideration which needs to be done and about which you gave no infos; that is the cost in terms of computational time and network traffic related to a cache fetch, a record lock and a record insert</p>
<p>See, your solution (that is, ignoring the error) may be ok, but it may cause unnecessary network traffic due to the &#8220;insert&#8221; attempt, so if such a traffic is a problem it may be worth considering the use of a lock</p>
<p>Another possible solution may be the use of a shared queue managed by a separate process which will then perform in background the dequeue/insert tasks; I mean something like</p>
<p>result = cache.select(input)<br />
if result:<br />
    return result<br />
result = enqueue(input)</p>
<p>so the code will just check if an entry is already cached, if not it will pass the input value to the queue handler which will call the &#8220;compute&#8221; function, store the result into the cache, enqueue the element and return the result at a &#8220;later time&#8221; the queue handling process will then, in background, proceed to dequeue one element at a time in FIFO style and insert them into the database</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Motoma</title>
		<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/#comment-193</link>
		<dc:creator>Motoma</dc:creator>
		<pubDate>Fri, 31 Jul 2009 16:23:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.algorithm.co.il/blogs/?p=263#comment-193</guid>
		<description>In a situation where the compute function is computationally intensive, it is not acceptable to compute the result multiple times. In this situation, you will want the first request to compute and cache, while the remaining requests block until the result has been cached, and then query the cache again.

A little python-ish pseudocode:

[code]
def query(input):
    result = cache.select(input)
    # Result has already been computed.
    if result:
        return result

    if lock(input, blocking=False):
        # If we get here, the result has never been compute, so compute and cache.
        # Locking succeeded, meaning no one else is computing.
        result = compute(input)
        cache.insert(input, result)
        unlock(input)
        return result

    # If we get here, the result is being computed.
    # Locking failed, so we block until the computation is complete, then query again.
    # Recursion is not necessary, could return a code indicating to re query.
    lock(input, blocking=True)
    unlock(input)
    return query(input)
[/code]</description>
		<content:encoded><![CDATA[<p>In a situation where the compute function is computationally intensive, it is not acceptable to compute the result multiple times. In this situation, you will want the first request to compute and cache, while the remaining requests block until the result has been cached, and then query the cache again.</p>
<p>A little python-ish pseudocode:</p>
<p>[code]<br />
def query(input):<br />
    result = cache.select(input)<br />
    # Result has already been computed.<br />
    if result:<br />
        return result</p>
<p>    if lock(input, blocking=False):<br />
        # If we get here, the result has never been compute, so compute and cache.<br />
        # Locking succeeded, meaning no one else is computing.<br />
        result = compute(input)<br />
        cache.insert(input, result)<br />
        unlock(input)<br />
        return result</p>
<p>    # If we get here, the result is being computed.<br />
    # Locking failed, so we block until the computation is complete, then query again.<br />
    # Recursion is not necessary, could return a code indicating to re query.<br />
    lock(input, blocking=True)<br />
    unlock(input)<br />
    return query(input)<br />
[/code]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rani</title>
		<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/#comment-192</link>
		<dc:creator>Rani</dc:creator>
		<pubDate>Sun, 12 Jul 2009 15:42:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.algorithm.co.il/blogs/?p=263#comment-192</guid>
		<description>I don&#039;t get all the solutions that lock *the entire cache*.
Assuming that compute() is reentrant, there&#039;s no reason not to have multiple concurrent compute()s (for distinct inputs) and hence the locking has to be entry-based, similar to what Jack suggested.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t get all the solutions that lock *the entire cache*.<br />
Assuming that compute() is reentrant, there&#8217;s no reason not to have multiple concurrent compute()s (for distinct inputs) and hence the locking has to be entry-based, similar to what Jack suggested.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emil Friðriksson</title>
		<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/#comment-191</link>
		<dc:creator>Emil Friðriksson</dc:creator>
		<pubDate>Tue, 07 Jul 2009 09:41:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.algorithm.co.il/blogs/?p=263#comment-191</guid>
		<description>Sorry, here is the documentation for it: http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html</description>
		<content:encoded><![CDATA[<p>Sorry, here is the documentation for it: <a href="http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emil Friðriksson</title>
		<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/#comment-190</link>
		<dc:creator>Emil Friðriksson</dc:creator>
		<pubDate>Tue, 07 Jul 2009 09:40:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.algorithm.co.il/blogs/?p=263#comment-190</guid>
		<description>You could also use &#039;INSERT ... ON DUPLICATE KEY UPDATE&#039; so you can update the use_count... that makes even more sense than my previous recommendation.</description>
		<content:encoded><![CDATA[<p>You could also use &#8216;INSERT &#8230; ON DUPLICATE KEY UPDATE&#8217; so you can update the use_count&#8230; that makes even more sense than my previous recommendation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emil Friðriksson</title>
		<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/#comment-189</link>
		<dc:creator>Emil Friðriksson</dc:creator>
		<pubDate>Tue, 07 Jul 2009 09:38:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.algorithm.co.il/blogs/?p=263#comment-189</guid>
		<description>I suggest you use the MySQL &#039;REPLACE&#039; syntax, instead of &#039;INSERT&#039; as documented here: http://dev.mysql.com/doc/refman/5.0/en/replace.html

It inserts the row if there is no row there and if there is already a row there with the same unique key, it deletes the row and inserts a new one with the new data. You could also just put an exception handler around the insert, as you are aware of the possibility of duplicate inserts, but I&#039;d use &#039;REPLACE&#039;.</description>
		<content:encoded><![CDATA[<p>I suggest you use the MySQL &#8216;REPLACE&#8217; syntax, instead of &#8216;INSERT&#8217; as documented here: <a href="http://dev.mysql.com/doc/refman/5.0/en/replace.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.0/en/replace.html</a></p>
<p>It inserts the row if there is no row there and if there is already a row there with the same unique key, it deletes the row and inserts a new one with the new data. You could also just put an exception handler around the insert, as you are aware of the possibility of duplicate inserts, but I&#8217;d use &#8216;REPLACE&#8217;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: arkon</title>
		<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/#comment-188</link>
		<dc:creator>arkon</dc:creator>
		<pubDate>Sun, 05 Jul 2009 19:47:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.algorithm.co.il/blogs/?p=263#comment-188</guid>
		<description>imri... mind you, that you should catch the double insertion in sql code already. thus its seamless to the application. i prefered this way in my code (back in digicash), to seperate the db as much as possible from the user code. so actually i had a try catch in the sql code which rethrows only if its not an exception of same key thingy...

gluck</description>
		<content:encoded><![CDATA[<p>imri&#8230; mind you, that you should catch the double insertion in sql code already. thus its seamless to the application. i prefered this way in my code (back in digicash), to seperate the db as much as possible from the user code. so actually i had a try catch in the sql code which rethrows only if its not an exception of same key thingy&#8230;</p>
<p>gluck</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jack Pepper</title>
		<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/#comment-187</link>
		<dc:creator>Jack Pepper</dc:creator>
		<pubDate>Fri, 03 Jul 2009 21:43:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.algorithm.co.il/blogs/?p=263#comment-187</guid>
		<description>Add a table:

create table cache_index (
  var input primary key
  var state  enum(&#039;current&#039;,&#039;pending&#039;)
  var cachetime  timestamp default NOW();
  )

your program is already multithreaded and therefore is using semaphores already, right?), so create another mutex semaphor &quot;cachelock&quot; to protect setting the cache table state variable:

function iscache(requestinput) {
  cachelock.acquire_read();
  cachestate= cache_index.select(&quot;select state where input=requestinput&quot;)
  cachelock.release;
  return cachestate;  // NULL return would mean that no value existed at all */
}


So now your main code looks like this:

cachestate = iscache(input)
if isnull(cachestate):
      cachelock.acquire_write();
      cache_index.select(&quot;replace into cache_lock values input=input,state=&#039;pending&#039;&quot;);
      cachelock.release;
      result = compute(input)
      cache.insert(input, result)
      cache_index.select(&quot;replace into cache_lock values input=input,state=&#039;current&#039;&quot;);
      return result
if cachestate=&#039;current&#039;:
      result=cache.select(input)
      return result
/* implied else cachestate=pending  */
wait() ? ;
/* I don&#039;t know the use case for waiting ...

and naturally you have some housekeeping thread that clears out the cache when things have gotten too old.</description>
		<content:encoded><![CDATA[<p>Add a table:</p>
<p>create table cache_index (<br />
  var input primary key<br />
  var state  enum(&#8216;current&#8217;,'pending&#8217;)<br />
  var cachetime  timestamp default NOW();<br />
  )</p>
<p>your program is already multithreaded and therefore is using semaphores already, right?), so create another mutex semaphor &#8220;cachelock&#8221; to protect setting the cache table state variable:</p>
<p>function iscache(requestinput) {<br />
  cachelock.acquire_read();<br />
  cachestate= cache_index.select(&#8220;select state where input=requestinput&#8221;)<br />
  cachelock.release;<br />
  return cachestate;  // NULL return would mean that no value existed at all */<br />
}</p>
<p>So now your main code looks like this:</p>
<p>cachestate = iscache(input)<br />
if isnull(cachestate):<br />
      cachelock.acquire_write();<br />
      cache_index.select(&#8220;replace into cache_lock values input=input,state=&#8217;pending&#8217;&#8221;);<br />
      cachelock.release;<br />
      result = compute(input)<br />
      cache.insert(input, result)<br />
      cache_index.select(&#8220;replace into cache_lock values input=input,state=&#8217;current&#8217;&#8221;);<br />
      return result<br />
if cachestate=&#8217;current&#8217;:<br />
      result=cache.select(input)<br />
      return result<br />
/* implied else cachestate=pending  */<br />
wait() ? ;<br />
/* I don&#8217;t know the use case for waiting &#8230;</p>
<p>and naturally you have some housekeeping thread that clears out the cache when things have gotten too old.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: silky</title>
		<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/#comment-186</link>
		<dc:creator>silky</dc:creator>
		<pubDate>Fri, 03 Jul 2009 00:55:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.algorithm.co.il/blogs/?p=263#comment-186</guid>
		<description>Well, surely the insert would check to see if the entry exists before it inserts, given that compute() takes time.

The loss would be in the fact that it computed() for nothing, but then if you have lots of threads and another thread has already acted on the first set of data, life is good anyway, because the second guys insert().

That would be my approach.

A similar option would be for compute() to check, occasionally, if it still needs to continue. This may or may not be appropriate depending on what it does, but it would let it short circuit longer calculations if some other thread has already done them.</description>
		<content:encoded><![CDATA[<p>Well, surely the insert would check to see if the entry exists before it inserts, given that compute() takes time.</p>
<p>The loss would be in the fact that it computed() for nothing, but then if you have lots of threads and another thread has already acted on the first set of data, life is good anyway, because the second guys insert().</p>
<p>That would be my approach.</p>
<p>A similar option would be for compute() to check, occasionally, if it still needs to continue. This may or may not be appropriate depending on what it does, but it would let it short circuit longer calculations if some other thread has already done them.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lorg</title>
		<link>http://www.algorithm.co.il/blogs/programming/a-simple-race-condition/#comment-185</link>
		<dc:creator>lorg</dc:creator>
		<pubDate>Mon, 29 Jun 2009 15:06:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.algorithm.co.il/blogs/?p=263#comment-185</guid>
		<description>General note:
What I wrote in the post may apply to general cases of multiprogramming, and not just multi-threading. Because of that, every synchronization object should be applicable to these other cases, which might affect its simplicity of use.

Most of the commenters noticed, but did not articulate that there are actually two issues to solve:
1. Caching, as presented in the post itself
2. Synchronizing, e.g. knowing what are the operations happening *now*.

@Erez:
As we discussed elsewhere, you&#039;d have to make sure that your dict is threadsafe (which is doable).

@Inger:
I disagree. If another thread discovers the row already exists, with a NULL output, it needs to wait for the result to be put there, or get it itself (which means, getting it twice). If you want it to wait, you need some kind of synchronization mechanism.

@dbrodie:
You can do that, or you can just catch the exception and ignore it :)

@Brodie:
Your solution is equivalent to Budowski&#039;s, just with locking, which is unneeded.

My solution:
Well, since I&#039;m all for doing the least work required, I didn&#039;t use a synchronizing mechanism, and in effect used budowski&#039;s and dbrodie&#039;s solution. I caught and ignored the exception raised for duplicate rows. Once this mechanism is more heavily used, and I&#039;d be interested in preventing duplicate queries at the same time, I&#039;ll implement some kind of synchronization, probably along the lines Erez suggested.

Also, thanks to everyone for your comments: I appreciate the discussion on this issue. I think it&#039;s a worthwhile discussion that makes you think.</description>
		<content:encoded><![CDATA[<p>General note:<br />
What I wrote in the post may apply to general cases of multiprogramming, and not just multi-threading. Because of that, every synchronization object should be applicable to these other cases, which might affect its simplicity of use.</p>
<p>Most of the commenters noticed, but did not articulate that there are actually two issues to solve:<br />
1. Caching, as presented in the post itself<br />
2. Synchronizing, e.g. knowing what are the operations happening *now*.</p>
<p>@Erez:<br />
As we discussed elsewhere, you&#8217;d have to make sure that your dict is threadsafe (which is doable).</p>
<p>@Inger:<br />
I disagree. If another thread discovers the row already exists, with a NULL output, it needs to wait for the result to be put there, or get it itself (which means, getting it twice). If you want it to wait, you need some kind of synchronization mechanism.</p>
<p>@dbrodie:<br />
You can do that, or you can just catch the exception and ignore it :)</p>
<p>@Brodie:<br />
Your solution is equivalent to Budowski&#8217;s, just with locking, which is unneeded.</p>
<p>My solution:<br />
Well, since I&#8217;m all for doing the least work required, I didn&#8217;t use a synchronizing mechanism, and in effect used budowski&#8217;s and dbrodie&#8217;s solution. I caught and ignored the exception raised for duplicate rows. Once this mechanism is more heavily used, and I&#8217;d be interested in preventing duplicate queries at the same time, I&#8217;ll implement some kind of synchronization, probably along the lines Erez suggested.</p>
<p>Also, thanks to everyone for your comments: I appreciate the discussion on this issue. I think it&#8217;s a worthwhile discussion that makes you think.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

