From time to time I look at bailiff auctions. To check if there are some interesting properties for sale. If I want to check where they are located I have to go into details of each auction. I felt it would be great to see all of them on the map, so I could filter them by localization or district.
from time to time I look at bailiff auctions. To check if there are some interesting properties for sale. If I want to check where they are located I have to go into details of each auction. I felt it would be great to see all of them on the map, so I could filter them by localization or district.
I decided to write an application that would gather information about auctions in FABLE. Based on that information I want to show markers on a map with a prize, date of auction and a link to full details of an auction inside of it.
I started from creating a blank project. (Thanks to SAFE Template):
Thanks to the above I created a blank project. So I could start by writing the logic of an application. First of all, I want to download information about available estates auctions in the city. I decided that by default I would download data for Gdańsk. Looking at the site with auctions I see that communication is done by forms:
The page gives a lot of possibilities to filter. In my situation only 2 of 26 options are relevant. City and the type of auction. Because of that, I look at how the request sent by a page looks like.
Having information about how the request looks like I could implement it in code. I started by creating a new project named Application in my solution. In this project, all logic related to downloading/parsing data would be located. The page sends a simple POST request setting 2 of 26 fields, so the rewrite of this in F# looks as follow:
I used http.fs and hopac libraries. In response, I get the full HTML page which I have to parse and gather the data I want to show on a map.
Going to parsing. I was thinking about using HtmlDocument, HtmlProvider, and HtmlAgilityPack. Because an app was written in F# the last option goes away. In terms of optimization of source files (I don’t want to have files of hundred of lines, because of templates “send” to HtmlProvider) the second option was also refused. So I used HtmlDocument. List of auctions looks like that:
On a map, I want to show the localization of a property that would be auctioned. This information would be visualized as a marker on a map. The marker would have additional info with the data described earlier. Because of that for each record on a list, I have to gather the:
date of the auction;
link to details.
So I wrote a code like this:
As you may notice. First, I parse the document. Look for a first table on a page and check if it has more than 1 record except for the header record. Otherwise, we want to pass an empty list.
If the data seems to be correct, I gather the concrete information. Price and the details link are located in the last 2 columns of a table. So the following fold/reduce line was written to gather only those two pieces of information.
Beyond that, I also get the information about the date of the auction. So here is the full code for parsing list properties:
Right now I have the full list of auctions. The only thing I still need is a localization of a home. I could get it from details of an auction, which looks like that:
So right now for each sale, I download the full detail page. This fetch would be much easier than the previous one. Because I could use a simple GET request, which gonna looks like that:
After the download, I have to gather information about the localization which is located under the attribute hidden_address of an input. In addition to that, I also download a description of an auction. The code that realizes that:
The address is in the following format city, street, house number because I want to show it on a map I need to somehow translate it to longitude and latitude. This is why I pass the human-readable address to a translateAddressToCoord function which has a task of reverse geocoding. To achieve it I used nominatim service to which I send a GET request and then parse a response to get lng/lat.
Here I decided to use JsonProvider, because the response from service has a strict format. After sending a request, I parse it via JsonProvider and then gather longitude/latitude information.
Right now I have all the data needed. So the only thing to do is to combine them:
and return as a model which is defined like that:
Mapping looks like that:
I can go right now to the definition of the interface between the back and frontend. The interface would look like that:
It would give a possibility to download the default data, and filtered one based on some keyword (city name).
As long as a shared model between client and server is done right now. The server is also ready. We could go to the frontend part of an application. Which needs to be adjusted. I start by showing a blank map. So I have to install 2 libraries:
Next thing is to adjust messages that would be sent across the application. I created the following messages:
I thought that every user could:
change input (SearchChanged);
submit input (Search).
And the server could respond with:
initial data (Init);
filtered data (Filtered);
As long as messages were defined as follow we could go to the message handling.
Based on a type of a message I change the actual state of an application (SearchChanged, Error, Init, Filtered) or ask the backend side for data (Search). Also if the message would have an error msg inside I send another command with Error.
The code looks pretty simple. You could notice that we have here a cascade of messages. For example, when a user submits an input a server action is invoked. When it returns some data, a filtered or error message are sent. One handling looks a little bit different. It is a filtered message handling, where the calculation of localization of the first home also happens. So the map could be moved to this point. No matter which city user would search. The map will auto adjust to a valid region. Of course, I could count here a centroid of all points but I want to keep it very simple.
The definition and handling of messages are done. So I could show a map without any markers. I would show a leaflet map. To add it, I wrote the following code:
I define zoom, width, height, the center of a map. There are some key aspects, that you couldn’t omit if you want to show a leaflet map:
you have to set some width/height of a map, otherwise, the map wouldn’t show;
you have to set map center, otherwise, the map would be grey;
you have to add a tile layer;
you have to import leaflet styles;
you have to change the default imagepath for an icon, otherwise you would get errors in console;
you should not forget to add to package.json leaflet packages
Having in mind above, a map should be visible.
Having a blank map, I could show some markers on it. Previously I show how to handle messages with data information about auctions. These auctions are available in an app state. Thanks to that, I could show some markers. This is why mapElements function was created:
For each element in a table I get from a server I create a marker. Set the position of it and create a popup with a short description of the auction (price, date, link to details).
The map and markers are visible. The only missing thing is an input that would accept strings and would trigger a search “function” after pushing the submit button. Code that is responsible for rendering a button:
Handler of a Search message (just a reminder):
The whole application is ready so I create a docker image:
Small adjustment to build.fsx, so my image would have an additional tag.
Push to docker hub:
Creation of Web App on Azure
When the deployment is ready I need to do one additional thing (as SAFE stack docs stated). We need to map port 8085 which is used by Giraffe to port 80.
How finally the applications looks like:
Right now I could finish this article. But as you may see in the above picture there is a map without any markers. Why? Because a page from which I scrap the data change a little bit from time when I implement the whole application. In every request, there should be included a _requestValidation field and a cookie. I decided that instead of that I would use the canopy to gather a full page after some on-site filtering. I modified code responsible for downloading a list of auctions:
As I said I used a canopy library. I open a headless chrome browser which is hidden for a client. I open a page, search for a city and click “search”. In the end, I download the full page and that’s all. Parsing would look the same, no changes required.
The code for downloading a single auction was also modified to use a canopy. I thought that it would be a huge benefit in terms of performance to open a browser once and for each auction go to the site and download it. Downloading all auctions at once is achieved like that:
And invoked like that:
And finally application looks like that:
The only minus of a new solution is the performance. Because the previous solution was a lot faster.
Right now everything should work just fine locally. I need some adjustments so the canopy would be run inside of a docker. To make it happen I have to do the following things:
copy chrome driver in a server.fsproj to output folder;
install chrome in a docker image.
To summarize. In this article, I show how to combine two possibilities of scraping data from a webpage (if it doesn’t have an API). And also how to write a simple application that simply does something in Fable. I hope you enjoyed this article :)