Async in loops

In yesterday’s post, I mentioned that there is a performance penalty of using the new Async/Await feature inside of loops. Today, I want to point out another reason why you might not want to use Async/Await inside of a loop. Consider the following example that takes a web page as a string and finds all of the hyperlinks contained in that page (using a regular expression match) and downloads each of those child pages. Using Tasks from the TPL we can load the child pages like this:

      private void GetChildren(string parent)
        {
            var matches = Regex.Matches(parent, _HtmlRegex, RegexOptions.IgnoreCase);
            foreach (Match match in matches)
            {
                if (_CancelToken.IsCancellationRequested)
                    return;
                string childTarget = match.Groups[1].Value;
                WebClient client = new WebClient();
                Task<string> childTask = client.DownloadStringTaskAsync(new Uri(childTarget));
                childTask.ContinueWith(child =>
                {
                    _FoundUrls.Add(string.Format("{0} ({1})", childTarget, child.Result.Length));
                },
                    TaskContinuationOptions.ExecuteSynchronously);
            }
            TaskButton.IsEnabled = true;
        }

Here we use the DownloadStringTaskAsync passing each found child page url. We use the ContinueWith method to chain operations asynchronously to download the child pages. As a first pass on this using Async/Await, we can simply await the task returner and no longer need the ContinueWith clause. Our code just continues as if it were just a straight code flow:

        private async void GetChildrenAsync(string parent)
        {
            var matches = Regex.Matches(parent, _HtmlRegex, RegexOptions.IgnoreCase);
            foreach (Match match in matches)
            {
                if (_CancelToken.IsCancellationRequested)
                    return;
                string childTarget = match.Groups[1].Value;
                WebClient client = new WebClient();
                string child = await client.DownloadStringTaskAsync(new Uri(childTarget));
                _FoundUrls.Add(string.Format("{0} ({1})", childTarget, child.Length));
            }
            AsyncCtp2Button.IsEnabled = true;
        }

That sure looks cleaner. What’s the issue? The downside of await is that the code execution moves over to a background loop and stops executing until it receives a response. As a result, we loose the performance improvement of issuing multiple requests and processing them as they return. Although the UI thread can continue on, this background process is blocked until it completes. At which time we request the next page that was found.

When writing multiple asynchronous requests, it is often better to use the WaitAny, WaitAll (for blocking operations), or WhenAll for non-blocking operations. Unfortunately, Async/Await only allow for operating on a single task, not a group of tasks. So how can we modify the code to issue multiple requests and operate on the results as they come back? We need to translate each of the requests into tasks and then return a Task of the resulting string arrays. Here’s the resulting code:

        private Task<string[]> GetChildrenAsyncAll(string parent)
        {
            var matches = Regex.Matches(parent, _HtmlRegex, RegexOptions.IgnoreCase);
            var childTasks = matches.OfType<Match>()
                             .Select(match => 
                                 {
                                     string childTarget = match.Groups[1].Value;
                                     var client = new WebClient();
                                     var childTask = client.DownloadStringTaskAsync(new Uri(childTarget));
                                     childTask.ContinueWith(child => 
                                         Dispatcher.BeginInvoke(new Action(() => 
                                             _FoundUrls.Add(string.Format("{0} ({1})", childTarget, child.Result.Length)))));
                                     return childTask;
                                 });

            return Task.WhenAll(childTasks.ToArray());
       }

Ugh. That’s not quite as elegant. I never said async was easy. Async/Await do make the simple cases appear easier. However there are times where a bit more effort is needed to achieve the true results that you want.

As always, if you have recommendations on how to improve this code, I’m always eager to learn ways to improve. If you want to try this sample yourself, feel free to download the async project and kick the tires a bit. The download sample also includes a version using Rx if you want to compare that version as well.

Posted on 8/21/2012 1:40:00 PM - Comment
Categories: C# -

Thoughts about LINQ
served up with LINQ

Async in loops by ThinqLinq

Async in loops