SlashGT by James Rector

Friday, January 23, 2009

RETS Modules

Here are the modules, along with short descriptions that I have settled on for this application.
  • retsAuth - Handles basic authentication to the RETS server including logging in and out. Stores the loginResponse.xml file. Stores the login cookie that is required with each request.
  • retsMetadata - Fetches the metadata files from the server. Parses each to discover additional metadata further down the hierarchy, then repeats. Saves all files local with a standardized naming convention.
  • retsMetadataStore - Traverses local metadata files, parses them and stores them in the database.
  • retsSchema - Using the metadata that is stored in the DB, creates the database tables and relationships that will be required to store the data that will be pulled from the RETS server.
  • retsDataSearch - Queries the RETS server and saves the returned XML.
  • retsDataStore - Parses and stores returned XML files in the DB.
  • retsObject - Grabs RETS objects from the server such as images and videos. Stores them locally. Updates lookup tables in the DB.
  • retsController - Handles sequencing the processes. Checks local metadata versions against current to trigger updates to the schema. Runs count queries to insure local DB is in sync with the RETS server.
The breakdown should clue you in that I have decided not to leave the RETS data in its native format (XML) for accessing. I did quite a bit of experimenting with loading the metadata into memory and querying against it with xpath. While it does work, you can easily wind up with 4 or 5 GB of metadata loaded into memory, um, no thanks. I also created a xmlLoader.cfc that would dynamically load the metadata files into memory only when they were called, but you then have to run cleanup routines to clear them out. In the end it just did not seem worth it to me when I could just parse it all into the DB and get blazing fast performance. I will toss up a DB diagram for the metadata schema when I get a chance.

Thursday, January 22, 2009

RETS Data Formats Standard, Compact, Compact Decoded

When making requests to the RETS server, you have to specify the the Return Format. Your choices are STANDARD-XML, COMPACT and COMPACT-DECODED. On the surface, this seems like a pretty minor decision. By pulling each you will see that STANDARD-XML looks like what you picture an XML document to look like, nicely formatted, organized and nested.


COMPACT and COMPACT-DECODED are pretty similar, much more concise (read: smaller) formats. The data all comes in two XML elements, one has a delimited list of columns, the other is a delimited list of values. The decoded version simply includes the long name value for lookup fields as opposed to the value. So the field LivingRoomFlooring would have the value "C" in a COMPACT format, and the value "Carpet" in the COMPACT-DECODED format.



Here is a some information that could save you a lot of head scratching; Don't use STANDARD-XML. STANDARD-XML adheres to a strict XML DTD, only displaying predefined "Standard Name" fields. Existing MLS databases are anything but standard. So what does this mean? If you request STANDARD-XML you will get only the fields specified in the RETS standard names specification. In the case of MRED (the MLS I am working with) this amounts to maybe 20% of the data. When you query RETS for data (i.e. a listing) you can pass a SELECT field that specifies which fields you would like in your return, just like an SQL SELECT. First note on this, a SELECT statement will have no effect on a STANDARD-XML request because it will always return the same document format. The documentation indicates that if you do not provide a SELECT statement the server should return all fields (like a SELECT *). This has not been my experience. If I don't provide a SELECT I get a subset of the data. Loop over your metadata and explicitly select all of the fields and you have hit pay dirt.

One last topic for this post. Using COMPACT or COMPACT-DECODED you get a tab delimited list of the column names and a tab delimited list of the values. First reaction is to loop over the columns list while running a counter, output column name, then use a listGetAt on the value list using the counter value. This will not work due to null values, when you tell Coldfusion to address the string as a chr(9) delimited list, it ignores the null values in the value list which kills the relationship between the two lists. The key is to do a couple replace operation on the strings prior to treating them as lists. First replace chr(9) with a pipe (|), then replace "||" with "|?|" twice. Something like this...

listVals = myXML.RETS.DATA.XmlText;
listVals = replace(listVals,chr(9),'|','all');
listVals = replace(listVals,'||','|?|','all');
listVals = replace(listVals,'||','|?|','all');

Now you can treat listVals as a pipe delimited list that will have a "?" to represent null.

RETS Project, back on track (again)

This project is finally back on track. The RETS implementation is just one of many modules in this project that I have been working on for a client, and it kept getting put on the back burner. Now the time has come to wrap this thing up. I should also mention that the contracting agreement for this particular project somewhat restricts what I can publish on this blog.

I think it is important to say a few words about the “big picture” in regards to this project. The developers of RETS have done a great job. Don't underestimate just what that job must have entailed. Creating a transaction standard for the hundreds of MLS systems must have been like wrangling cats. This Coldfusion project is unlike most RETS projects you will come across. Most are using existing code libraries to communicate with the RETS server. There is no native Coldfusion library. We are essentially creating our own library. If you are a developer with little or no MLS experience, this will be a long road that I don't think you will want to venture down. The documentation available is really more of a specification, there are no code samples. Most of the information you will come across online is focused on using the existing .net, java or php libraries, when what you really need is to know what the library is doing. There is little or no support structure out there. You can browse the RETS forums, but once again, you will find most of the people there are using existing libraries. I have been working with MLS data for most of my career and this has been a very frustrating experience. Imagine being locked in a room to learn a new programming language, the only resource in the room is a phone that connects to an instructor that will only answer yes or no to any question. You better know what questions to ask, and expect a lot of trial and error. Maybe I am being a little dramatic here, but my point is, as a programmer, don't think you can jump into this and bang some RETS functionality into your project in a couple days. Consider bringing in a consultant for that part of the project that already has his/her head around RETS. I might as well toss in a shameless plug here for my services, but really, this isn't my reason for giving the warning, if it was I would not be sharing my experience here. So enough venting, lets get on with it.

Monday, October 20, 2008

Sorry for the delay... Bonus IRR function

Sorry for the delay in posts, I have been working on another aspect of this project that does not involve RETS. I will be returning to the RETS code later this week. The project I am working on requires me to produce an IRR (Internal Rate of Return) for an investment. An IRR is, in simple terms a way to judge the return the you are getting on your money when you put up an initial investment, then recieve cash flow out on a regular basis. I was suprised that I could not find a Coldfusion function out there floating around, but I did find a SQL function, simply create a CFC that calls it and you are in business. You pass in your list of cashflows, the first of which will always be a negative because it is your initial investment. You also pass in a "guess" at the return, which eases the load on the fuction as it must loop from the guess until it resolves the equation. So, if you buy a rental property and finance $10k in the process, and you have $400 per month in cash flow after expenses, you would pump in something like; -'10000,400,400,400,400,400,400,400,400,400,400,400,.10', with 10% being your guess.

drop FUNCTION dbo.ufn_IRR

CREATE FUNCTION dbo.ufn_IRR(@strIDs varchar(8000), @guess decimal(30,10))
RETURNS decimal(30,10)
  DECLARE @t_IDs table(id int identity(0,1), value decimal(30,10))
  DECLARE @strID varchar(12), @sepPos int, @NPV decimal(30,10)
  set @strIDs = coalesce(@strIDs + ',', '')
  set @sepPos = charindex(',', @strIDs)
  WHILE @sepPos > 0  BEGIN
     SET @strID = LEFT(@strIDs, @sepPos - 1)
     INSERT INTO @t_IDs(value) SELECT (CAST(@strID AS decimal(20,10))) where isnumeric(@strID)=1
     SET @strIDs = RIGHT(@strIDs, DATALENGTH(@strIDs) - @sepPos)
     set @sepPos = charindex(',', @strIDs)
  set @guess = case when isnull(@guess, 0) <= 0 then 0.00001 else @guess end
  select @NPV = SUM(value/power(1+@guess, id)) from @t_IDs
  WHILE @NPV > 0
    set @guess = @guess + 0.00001
    select @NPV = SUM(value/power(1+@guess, id)) from @t_IDs
  RETURN @guess


Saturday, October 4, 2008

Coldfusion RETS - Getting all of the metadata

As discussed earlier, the metadata is the key to communicating with a RETS server, it is schema for the data you will be dealing with. Once authenticated on the server, you will need to pull a boatload of metadata. Here is the abbreviated description of what needs to be pulled, and how it relates.

  • METADATA-RESOURCE : This file contains the "top level" resources on the server, in my case Property, Office, Agent, Tour and Open House. Is also tells you how many classes each resource has.
  • METADATA-CLASS : Describes the sub-groups within each resource. In my case, only property has more than one class (Property has 18). These should be familiar to anyone who has dealt with MLS data, in standard terms, the property types; Single Family, Multi-Unit, Commercial, etc.
  • METADATA-TABLE : This document, pulled for each class describes the fields such as Listing Data, List Price, etc. There is good detail in this document including datatype, length, if the value is a lookup or lookup-multi and if so, what the lookup key is, etc.
  • METADATA-LOOKUP : Pulled for each class, this document does not contain much information. It pretty much lists all of the lookup fields for the class. This can be pulled from the METADATA-TABLE document for the class.
  • METADATA-LOOKUP_TYPE : So you would think this would describe the multi-select, select, yes/no attributes of the lookup fields, I know I did. This document actually contains the values for the lookup columns. So, if your single family property METADATA-TABLE document says the field "Addtl Room 10 Flooring" is a lookup field, and has a lookup name of "AddlRoomFlooring", this document will have a section for that listing the values such as Carpet, Hardwood, Marble, etc. Good stuff indeed. For those who have not struggled to figure out the schema of a MLS database from the outside this might seem like no big deal. Those of us that have been doing this for years with no cooperation from the MLS systems, using cryptic listing forms and inconsistent interfaces might get a bit emotional when the schema comes across in a few seconds, haha.
So the plan; We grab the RESOURCE file, loop over it to grab the CLASS files for each resource, Loop over the CLASS file to grab the TABLE, LOOKUP and LOOKUP_TYPE files for each class. 34 files later (in my case), and we are ready to go. I am going to repost the metadata.cfc code also because I made some modifications in regards to error notification and file naming.



The output of this code should look something like this...

Getting METADATA-RESOURCE... Complete
Getting Class Agent... Complete
Getting METADATA-LOOKUP Agent... unknown resource
Getting METADATA-LOOKUP_TYPE Agent... unknown resource
Getting METADATA-TABLE Agent:Agent... Complete

Getting Class Office... Complete
Getting METADATA-LOOKUP Office... unknown resource
Getting METADATA-LOOKUP_TYPE Office... unknown resource
Getting METADATA-TABLE Office:Office... Complete

Getting Class Property... Complete
Getting METADATA-LOOKUP Property... Complete
Getting METADATA-LOOKUP_TYPE Property... Complete
Getting METADATA-TABLE Property:ResidentialProperty... Complete
Getting METADATA-TABLE Property:RentalHome... Complete
Getting METADATA-TABLE Property:LotsAndLand... Complete
Getting METADATA-TABLE Property:CommercialProperty... Complete
Getting METADATA-TABLE Property:DE... Complete
Getting METADATA-TABLE Property:AT... Complete
Getting METADATA-TABLE Property:MH... Complete
Getting METADATA-TABLE Property:MU... Complete
Getting METADATA-TABLE Property:RN... Complete
Getting METADATA-TABLE Property:DP... Complete
Getting METADATA-TABLE Property:VL... Complete
Getting METADATA-TABLE Property:MF... Complete
Getting METADATA-TABLE Property:OI... Complete
Getting METADATA-TABLE Property:BU... Complete
Getting METADATA-TABLE Property:CO... Complete
Getting METADATA-TABLE Property:RS... Complete
Getting METADATA-TABLE Property:OT... Complete
Getting METADATA-TABLE Property:IN... Complete

Getting Class OpenHouse... Complete
Getting METADATA-LOOKUP OpenHouse... Complete
Getting METADATA-LOOKUP_TYPE OpenHouse... Complete
Getting METADATA-TABLE OpenHouse:OpenHouse... Complete

Getting Class Tour... Complete
Getting METADATA-LOOKUP Tour... Complete
Getting METADATA-LOOKUP_TYPE Tour... Complete
Getting METADATA-TABLE Tour:Tour... Complete

Don't sweat the "Unknown Resource" response on some entries, if the table for a class has no lookup fields, there will be no lookup or lookup_type files.

This is not production code, it works like a champ but it is what I would consider testing code. Next up, a "field" object that pulls everything together from the xml files on a per field basis. For now, my intention is to leave the XML data as is, and not pull the metadata into a database. That is what xpath is for...

Friday, October 3, 2008

Coldfusion RETS Goals

Prior to diving into parsing the metadata, I should clarify what the initial goal of this project is. While my line of work will have me doing all kinds of interesting things with RETS, I think the best approach here is to keep it simple. To that end,  I am going to build an application that allows for a dynamic search form to be created by the client, execute the search, and see the results in a presentable manner. This will require the following.
  • Authenticate on the RETS server
  • Compare the current metadata version to what we have stored.
  • If our metadata is outdated or we don't have it yet, grab the metadata.
  • Created a form-builder that uses the metadata for it's source.
  • Parse the form submission into a DMQL2 query.
  • Present the result.
You may have notice in my earlier post, (the one that showed the RETS authentication response) that the RETS server I am connecting to does not support updating of records. This is a huge letdown in my view. Many of the real estate research applications I have built would benefit from being able to update listings.  Until I gain access to a RETS server that does support updating, I obviously won't be covering those topics.

Sunday, September 28, 2008

Coldfusion RETS Getting the MetaData

Once logged into the server we need to get the metadata to decipher the schema and know what fields are available to us. It is really important to read through section 11 and 12 of the RETS 1.5 Specification . It is a painfull read for sure, and it is by no means a developers guide, but the metadata is what RETS is all about. The metadata is arranged in a hierarchy, with the idea being you can grab the top level documents, which will have the IDs to the next level, etc. Trust me when I tell you that you will wind up poking around seeing what you can pull up, and what seems to make sense. The top nodes of the metadata are SYSTEM and RESOURCE. In MRED the SYSTEM node is all but useless...

Yeah, thanks for that, RESOURCE is more interesting...

Now we are getting somewhere. Anyone familier with MLS data will recognize these as the top level of MLS databases. I ommited the last two for space, OpenHouse and Tour, but you get the idea. The Property resource is what it is all about, and we can tell there are 18 classes under it, and the key field is LN (ListingNumber). Before we proceed, lets run down the code used to pull the metadata.

Again, nothing fancy for the demostration purposes. We are ACCEPTing anthing the server chooses to send us back. We are passing the first row from the application.RetsCookie struct that we set earilier. The last three arguments is where it all happens, and where reading the spec will come in handy. TYPE and ID are how you navigate the nodes, FORMAT determines what the returned XML will look like. There are three FORMAT values, COMPACT, COMPACT-DECODED and XML-STANDARD. For pulling metadata, the FORMATS don't really matter much (at least as far as I can tell, I am still learning). Try pulling the same node in each and you will see what I mean. The FORMAT will be critical when pulling actual data later, so we will come back to that. I am going to cut to the chase and tell you what TYPEs and IDs I pulled to get what I needed. Use format COMPACT for now.


Browse the XML files and you should start to get a pretty good handle on where things are headed. Next up, let's do something with this METADATA...