Monday, 20 June 2016

Simultaneous connections

We have a machine in one of our interactive installations that runs a simple web server and an interaction projection application.

The webserver is just a WAMP install.

The interactive application is just a map that talks with a comms server and displays some information.

One of the interface features is that you can access it via a tablet app and select what content the map should show. It all works pretty well except for one of the contents that asks for a slideshow with 23 images.

Other contents have slideshows, and the system handles them perfectly, but this one has 23 images, and for having 23 images it causes some weird behavior on the tablet app, it starts losing network connectivity and sometimes fails to load the slideshow. Sometimes that failed load causes the app to crash.

So why would a system work fine with 5, 10, 15 images and not 23?

We had 3 theorems:
a) the wireless is shit
b) the app code is crap
c) the server can't handle the load

This is a permanent installation (in a museum), and we do the maintenance of it, so for the last couple of years, we have been on and off being reminded that it's still not fixed and trying to get to the end of this.

To address a) we did 3 things:
1) optimize router configuration
2) replace a wireless bridge with a ethernet over powerline
3) optimize router configuration some more

it did not improve the situation.

So we looked into b) and ran some tests on local server and network and it runs fine, manages to load tons of images into slideshow, no problem. Version update also seems to improve the communication instability that was making the application sometimes crash, so we should definitely update the tablets, but then if this runs fine, the problem is not really in the app code, it needs to be on the server. Although the code could still be optimized to load images one at a time, and not 23 simultaneously.

It's much easier to fix a server config then it is to reflash 4 tablets though, so, i took a look at the server config, after much head scratching and fiddling with WAMP configurations and doing stress tests, i conclude the culprit is the OS version, a windows 7 home edition, which does not like multiple simultaneous connections much (to prevent viruses from spreading they say). There were some registry hacks to try and unlock the limits but the problem seemed to persist...

So we decide to update the OS, that was where today began:

When upgrading windows you should always do a backup, actually, the easiest way to update windows without problems is to do a fresh install. So we got a new HD of same size, swapped it in the machine. Struggled a bit with msdn to get a copy and a key of Windows Server 2012 R2, Figured out there is an official microsoft usb/dvd boot creator tool available and flashed the usb 3.0 pen with the OS.

Stuck the usb pen into the new machine and... it won't boot. Why is it not booting? I have recollections of some usb pendrives needing to be formatted a certain way to be live booted, could this be it? Doesn't seem to even show up on the boot selection screen. The simplest solution is usually the correct one. Turns out the USB 3.0 hubs are not being used on boot, only the 2.0 ones are. So just changing the pendrive into another slot worked.

Alright, it boots, it installs Windows Server 2012 R2, i enter the key. It forces me to type a hard to guess password. I would really prefer to have this with auto-boot into the account directly, but i'll look into that after everything is setup.

Use the other disk on a dock station to copy relevant files. Application won't run. Says it's missing XINPUT9_1_0.dll. Spent a few minutes online trying to figure out what redist i need to install to get rid of that because you should never just download and use a .dll from random internet site. Latest DirectX doesn't fix it. vc++ 2012 redist doesn't fix it. Oh nevermind, let's just copy from the old disk, it's in system32 there so i copy it to the new disk on system32 aswell. Except the app still can't find it. Maybe because of 64 bits OS? Who knows? Let's just copy it to the root of the directory.

App no longer complains of XINPUT9_1_0.dll, now it just crashes with a non descriptive error. Great! Take a look at event viewer, no information about the crash whatsoever. So i guess we should debug this the old way, install visual studio 2012 express, compile the project and see what's crashing.

So the project is a libcinder 0.8.5 project using awesomium 1.7.1 as a cinderblock. The original cinderblock repo disapeared but there are forks around on github, for the latest version of libcinder and awesomium 1.7.5, i'm not falling for that old trick of "let's just update a version while i'm at it" and then spend the rest of the day debugging why it won't compile, i'm sticking to the original versions. Luckily the dev environment was already installed on the original disk, so it's mostly copy the folder, and make sure you place all the folders with the same name on the same place, and it just works. It compiles on first try, but the app still crashes. Gives another error though, another missing .dll

Alright, timeout from that crap, let's try to get the other part of the thing working, installing WAMP, how hard can it be? Just download and double click and... Error launching httpd, missing .dll file. Great! Reading the internets it turns out apache really needs a certain version of vc++ redist, version 7 requires vc++2015 redist, Alright, simple enough, just download from microsoft site and, it fails to install. Great! Reading log file it turns out it's trying to install a windows 8.1 MSU of some sort and failing, there are tons of people complaining of the same problem on the internet (why didn't anyone fix this? Who knows!) and finally i some hacks instructing you to go to the temp folder, copy that file out into somewhere else, run some voodoo commands and run the vc++redist repair. It actually works! I always wanted a proper working vc++redist 2015 installed on Windows Server 2012 R2, thank you internet. Unfortunately WAMP still complains and will not launch, even after reinstall.

Here is a brilliant idea, let's try an older version of WAMP instead. WAMP 2 downloaded, installed, and works green light, hooray! Only thing left is to alter the default root password, create the database, import the tables from previous database, add the aliases to our application directories, restart WAMP and... "Forbidden access to this content". OK, might be something wrong on the configuration i guess, nothing an hour of googling and trial and error won't fix. WAMP version was slightly different so copying alias configurations from old disk wasn't working. Finally WAMP seems accessible so on to more important things.

Now there was the issue of the user not logging in automatically, needing to always insert a password. Well, yes, it is a bit of a security risk, but this machine is hanging 20 meters in the air so i don't think anyone will manage to plus a keyboard there and access it unlawfully, so i would really like to have user auto login. In the old days you just went to the user accounts screen and selected that you did not want a password for this user. But that was then, now you always need a password, so the checkbox you have to uncheck is more hidden, on any other Windows machine you just go to the user accounts _control panel_ (not the managment screen) and you have the option there. But Windows 2012 R2 is special, it doesn't have that checkbox. It would be too much of a security risk. So we waste some more time searching the internets and find that you can just add a registry string on a certain place to make the checkbox show up again. Hooray we can boot and login automatically.

Now the only thing left to fix is the application not running. I tested a few things like creating a new basic app to see where the issue could be, turns out i should probably install the Awesomium SDK itself first, even though libcinder doesn't seem to be using it at all since it has it's own directory with awesomium include, lib and binaries. Testing them, they also don't work. So maybe awesomium doesn't like Windows 2012 R2.

Forgot to update graphics drivers, so maybe that could also be it, let's spend half an hour installing 1 gigabyte of drivers (plus the latest update) of our graphics card. I swear i have no idea what they include in those drivers. 15 years ago a driver was 200k. I don't know how something that does the same for a new model could have increased exponentially, sometimes i think they put all versions of all games that are in the market included in the driver, to patch binaries to optimize them for the card. But who knows really? After updating the graphics drivers, and rebooting, application still won't run.

While looking into Awesomium i end up installing the latest version of the SDK and the binaries seem to run without crashing, so maybe i should be updating all the code to latest versions? But the cinderblock doesn't seem to be caring about where awesomium is installed at all, it doesn't even seem to be compiling awesomium itself, so maybe just replace the cinderblock references to use the new version? replacing entire directories gives no error compiling but app still crashes. It's strange because usually these changes give shitton of errors, and this alteration gave none. So maybe it's not even using them?! After much stress and doubts on how to more optimally proceed to save time i end trying to download the 1.7.1 sdk, and copying those binaries into the cinderblock reference and app release folder (they even include the forsaken XINPUT9_1_0.dll) and it works!!

Finally the system seems to be running again, after i had already given up hope of finishing in time, all that is left is to test and see the huge performance increase in action. So i load the tablet and... it gets stuck on the first loading screen. But something shows up on the map, so, it's partial. I had the problem of the alias configuration being wrong earlier on, but it was fixed, so, what could it be? After a few headscratches and local tests i decide to test accessing it from another machine and it turns out it's the firewall that needs to be open.

So finally i can grab a tablet and load a slideshow of 23 images, works fine, excellent! Then the second tablet, load slideshow of 23 images... and it loses connectivity. What?! Maybe it was bad cache from early usage? I try it on a newly booted tablet, also loses connectivity. Well isn't that awesome? Read more about WAMP configurations, to increase number of simultaneous connections, turn everything up, restart WAMP, still same problem, plus while testing i realize the videos no longer play on the map application.

So long story short, one whole day replacing an OS, getting everything working again, and concluding it is not the OS's fault, plus the new install still isn't working 100% as the old one was. Will have to look more into it later, probably recode the tablets to only load 1 image at a time or try another version of WAMP.

Spend another hour trying to debug why this build of the updated Awesomium stopped playing webm, The HTML of awesomium is working fine playing the video on stand alone so it really is awesomium who stopped playing webm. Tried a few different settings but to no avail. Maybe a more recent version will fix it?

But i think i had enough fun with this for today.