Tuesday, May 5, 2009

Google shows trends in Ops Research activity

How much can you tell by the volume of Google searching on a particular topic? Apparently enough to predict a flu epidemic faster than the CDC. So I thought it would be interesting to search for a few terms in Google Trends to see if there were any interesting results. The volume of searches for "operations research" has been steadily declining during the period from 2004 to the present.













Interesting... As far as I can tell, our field has been growing, not declining, so what's going on there. Google also shows patterns of news article volume which had been increasing during that time (with increasing variance as well). So searches are down, but news articles are up... This could mean that more information in our filed is being disseminated via RSS news aggregators and/or blogs, which are read regularly, and delivered without searching. Possibly, but I needed to do more digging. So I next looked for some terms which are often associated with OR, either synonyms or commonly used vernacular that describes the field. Most had the same pattern (industrial engineering, machine learning, artificial intelligence), but one stood out. I tried 'predictive analysis' and got a much different result.














Apparently, 'predictive analysis' wasn't a popular term until it spiked in the latter half of 2007, then dropped to zero for three months or so, and then shot back up again. Now, I don't really believe that this last pattern is real, it's more likely a quirk of how Google calculates their numbers. But it's curious to think that there is such a decline in searches for these topics overall, when by all accounts the techniques of applied mathematics for making better decisions is booming. I think rather what is occurring is that OR is less often thought of as a discipline on it's own, but a set of tools used by practitioners in many disciplines. That all seems OK to me. It means OR is achieving widespread use, and that's good for everyone.

BTW: Google didn't have enough data to produce trend data for "spatial operations research", but 'spatial analysis' produced a similar downward trend in searches, and an increase in news. Also, the Google Trend for 'statistics' is dominated by the academic cycle. Check it out. I suppose this means only students actually search the term 'statistics.' The trend for something like 'standard deviation' is even more obvious.

Saturday, December 20, 2008

GIS at massive scales... who will get us there? Understanding high performance computing for GIS.

As a GIS practitioner for over ten years I've been subject to all of the same limitations and headaches as my peers. GIS has some powerful capabilities for analysis and research, but boy can it be slow. Many GIS engines have operations that work in active memory, others are limited to working on single files or databases, and we've all experienced the blasted slowness of it all. So how can we get over this hurdle? We've taken major strides in the digital mapping field in the last several years - tiled map interfaces are commonplace online now, and the user experience is far better than the first generation mapping services; analytically we're doing more every day. ESRI's toolbox capabilty in the 9.x series allows them and others to develop special purpose toolsets that perform almost any spatial operation known. But how can we do more. Every GIS practitioner I know wishes that we could perform all these calculations on more and more data at faster and faster speeds. As GIS capability grows, so does the demand for higher resolution data. As a result, we've been at a standstill in terms of processing speed for 5 years or so.
So what's the answer. In the data processing community at large, the answer has clearly been parallel processing of data. When more than one processor can work on a part of the problem in isolation, and then each contributes to the answer, huge speed increases can be gained. These technologies come in many different flavors. First, there's the multi-core, multi-processor architecture. Almost every new computer sold today comes with a dual core processor, and possibly several of them. In this way, a high-end desktop workstation may have 8 or so processors. Another paradigm is to distribute a job across many computers or nodes, in a cluster of computers. These clusters are usually co-located and are wired on the same sub-network and are dedicated to processign large complex jobs for users. This is related, but not exactly the same as what many are calling cloud computing these days, where computers of many different forms are linked together via various services to perfom an overall task or form a composite application. A fourth model for parallel processing is using the graphics processor on high end graphics cards to perform general purpose computing tasks. This is handy because graphics processors are very specialized and blazing fast for the right tasks. Not every job can be performed on the GPU, but when it can, it's flipping fast, as this study from the University of Virginia shows. The graphics hardware company nVidia has even produced several models of hardware add-ons that contain multiple GPUs and are designed to use simple software libraries to make use of this specialized hardware in more generalized ways. Their Tesla high performance computing product line is another way to get involved in HPC, and is actually the best performance per dollar solution out there.

So, we have some examples of getting more work done in the same amount of time. Who is stepping up to transition this to the geospatial analysis field? It turns out that there are a few pockets of innovators who are migrating geo-processing and analysis to some HPC platforms.

Manifold is the only full featured GIS so far to include support for nVidia's graphics processors. It can perform many raster data transforms and processing like hill shading, slope, and aspect using the computer's GPU for the bulk of the work in much less time. Most GIS workstations already have high end graphics cards, so this is a very logical and promising direction for GIS to go, and I'm glad to see Manifold taking the lead in this.

For distributed computing, there are a few projects that are making use of multiple connected servers for analyzing expansive spatial data sets. Parallel computing for spatial data is nothing new. Specialized applications for analyzing weather data and processing imagery are well known and proven at this point. But those are specialized applications. Is anyone using parallel architectures for general purpose GIS? I'll keep this post updated as I find them. Post links in the comments section if you know of any ground-breaking work in this area. This technology has much promise in the coming years.

Wednesday, August 20, 2008

Using the open source R project for spatial statistics

This info was originally posted here, but since most people would never find that link, I decided to report it here where there's a bit more traffic. The open source R language and statistical package has many useful functions for investigating spatial data. I recommend it, and use R extensively for prototyping new algorithms for spatial statistics, forecasting, and machine learning.

Alternative Tools for Web Mapping
  • Adobe SVG viewer. A free viewer for SVG vector graphics data. The viewer actually has enough GIS-like functionality to provide a simple web mapping interface on its own
  • FWTools. A handy installer for a collection of geospatial data tools that includes the Geographic Data Abstraction Layer (GDAL), the OpenEV raster imagery analysis tool, and other data maintenance and conversion tools. Be prepared to get into the command line to really exploit this tool, but it's well worth it.
  • GeoCon. A shapefile and mapinfo file translator to get to GML and SVG from your spatial data.
  • Geoserver. The latest generation of standards based web mapping engines. Provides lots of the dynamic interface elements you would expect from modern web mapping interfaces.
  • MapGuide. An open source project from AutoDesk, MapGuide is a very nice and professional level development environment. You can get lots of functionality from MapGuide, if you're prepared to spend time configuring and mastering this tool.
  • OpenLayers. Without a doubt the fastest way to get custom web mapping up and running on your site. Functionality is good in the basic package, but there are limited opportunites for cusomization if you aren't a software developer.
  • uDig. Nice Java based desktop GIS. Key feature is that uDig natively supports WFS and WMS web services.
  • QuantumGIS. Free desktop GIS. qGIS went through a major revision in the last year, and they are picking up support for GRASS in the coming year. Compatible with OGC spec web services.



Alternative Tools for Spatial Statistics

  • The R Statistical Package. A programming environment to support prototyping statistical processes for analyzing data. Has many spatial data processing and spatial statistics tools available.
  • Here are the sample files I used during the talk presented at the 2007 Crime Mapping Research Conference.
  1. osworkshop1.R
  2. osworkshop2.R
  3. osworkshop3.R


Feel free to drop me an email if you have questions or want to post some best practices for the community.

Thanks,

Jason Dalton

Thursday, August 14, 2008

Machine Learning Applied to Remote Sensing: Part 1 - Imagery


Remote sensing is the method of collecting and analyzing data using mechanical or electronic sensors. Commonly the industry considers imagery to be the main remote sensing medium. A new age of understanding the world was ushered in when French photographer Gaspar Felix Tournachon (a.k.a. Nadar) first strapped a camera to the basket of a tethered balloon in 1858 - and when James Black's glass negative camera took panorama shots of Boston in 1850. They were pioneering photographers inventing a new way to collect data about the world around us in the form of aerial imagery. Since that time, new methods of flight (the Wright brothers did some of the first photography from an airplane) and improvements in technology led to greater accuracy, resolution, clarity, and a useful resource for everything from natural resource planning, to fighting wars (US Civil War, to the Middle East conflicts of today). Today's remote sensing data consists of not only aerial imagery, but satellite imagery, RADAR, LIDAR, SONAR, and many other sensors are available. In fact, the evolution of operations research begins with sonar sensor operations. Questions of how to optimally deploy and interpret these data yielded the discipline of using applied mathematics to determine operational parameters of these systems and OR was born.

That legacy lives on today. Globally there are more than 500TB (my swag estimate from company websites) of imagery collected each month from commercial imagery providers. Who knows how much the world's governments are collecting. It almost certainly dwarfs that number. Having people visually inspect all that data is a meaningless task. It's error prone, slow, and inefficient. That's where OR and specifically techniques in machine learning and computer vision come in. Some basic operations that can be performed on imagery that add value are edge detection, class segmentation, terrain models, watershed analysis, line of sight, slope, aspect, and impervious surface models.

Wednesday, July 30, 2008

Preparing for the ESRI Intl User Conference

Next week is the big ESRI User Conference event. I've been to this show several times over the last 10 years and it keeps getting bigger and broader. If you're going, you'll have the opportunity to talk to researchers and practitioners in your field from all over the world, but you'll have to find them first. There are over 10,000 people at the UC and it can get a little overwhelming. Here are some tips to keep your sanity and focus.
  1. Get there early in the morning each day. Most people trickle in throughout the day, due to jetlag, sunshine-itis (the UC is in beautiful SanDiego), or trying to catch up on work from their hotel room. Don't do it! Get to the conference center early, get registered, and start wandering.
  2. Plan out your day ahead of time. Most of the sessions are handily arranged in tracts, so if you're a hydrologist, you won't have to go far to see talks in your field. But when you're primary focus is spatial research (and yours is, right?), then you may have to hustle to see all the good stuff. There will be good research talks in all domains, and they may be spread out pretty far at the conference center. Allow yourself some walking time, and pick out which talks you want to go to.
  3. Don't be afraid to get up and leave a session. If I'm not speaking at a session, I like to sit near the side or back so i can bail out quickly and quietly if the talk isn't what I expected. Naturally I understand if someone gets up and leaves one of my talks - there's a lot going on, and you can only see a small fraction of it. Make your trip worthwhile and see the sessions, demos, and vendors that are of the greatest interest to you.
  4. Mingle at night! Don't sit in your hotel. Nearly everyone there is from out of town, and San Diego is a great place to explore on foot. Meet up with some people and hit the streets in the gaslamp district or down by the waterfront south of the conference center/hotel.
  5. Take notes. Paper and pens are available at every session. You'll never remember close to everything you hear, so take notes, and review them when you get back. Follow up with an email or phone call to the speaker if you have questions. Speakers love to get questions, so feed their egos and learn something in the process.
  6. Talk to the speakers after the session. If something strikes you as particularly interesting, don't wait until after the conference to contact a speaker, do it right then. Strike up a conversation and maybe you'll be able to follow up over lunch or at a break.
I'm looking forward to this year's show. Although there are a WIDE array of topics, and not all will interest you, there are some real gems each year and it's worth it to mine those out. And yes, I'm giving a talk too, so if you read this site, say 'Hi'.

Saturday, July 12, 2008

IEEE VAST Symposium Challenge


Last night my team and I finished submitting our entries for the IEEE Visual Analytics Science and Technology Grand Challenge. This year's contest was made of four mini-challenges and a grand challenge which ties everything from the other challenges together. The data was well thought out, and the problem overall had a good mix of easy problems to get started and more challenging ones to strive for. I'm sure we didn't find all the answers possible in the data, and we went down many analytical paths that didn't pan out, but that's the way it works in practice, so it was a good experience. Two of the mini-challenges had a spatial-OR approach. The first was an analysis of the illegal immigration patterns of a fictional group called Paraiso. Our team used spatio-temporal variograms to examine the strength of the migration patterns, and animations of the Coast Guard interdiction to examine their success rate.

The second spatial-OR flavored challenge was the fictional account of a bombing of a building. IEEE provided very high quality (unrealistically so) data from 'RFID' tags on the occupants of the building. By examining the patterns of movement, we were to identify potential suspects, witnesses, and casualties in the building.

We'll see how we did in August, and then meet with all the other submitters at the VAST Symposium in October.

Friday, July 4, 2008

Researcher spotlight - Tatiyana Apanasovitch


Happy 4th of July Americans...



In this series, we'll take a look around the web to check in on some of the top researchers in our field to see what they're up to. Today we're featuring Dr Tatiyana Apanasovitch, from Cornell University's School of Operations Research and Information Engineering.

Dr. Apanasovich received her B.A./M.S. degree in Economic Cybernetics in 1999 from Belarusian State University, Minsk, Belarus. She obtained her doctorate in Statistics from Texas A&M University in 2004. She has been a professor in the School of Operations Research and Information Engineering since 2004.

Apanasovich’s research interests encompass Generalized Linear Mixed Models, Spatial Statistics, and Nonparametric and Semi parametric Regression methods.

I'm trying to contact Dr Apanasovitch for a brief interview. Check back after the July 4th holiday to see.