I am making some progress on speeding it up.
What I have done so far, is cache in the GPIO structure that I have previously Init the output enable pin for this specific physical pin. So Inside of mraa_gpio_dir it calls off to the Edison Pre processing function, which creates a GPIO structure for the outputen pin and every time sets it to output. Also every time we call gpio_init_raw it currently always checks to see if the pin has been exported yet or not. So create a new version mraa_init_raw_fast that does not do this (moved most of the other function into this one...). So now by only doing that work once it speeds things up...
I still have some debug code in it to clean up, but it is faster as you can see in the following picture:
1.5ms vs 2.7ms...
Again still WIP, but if anyone wishes to look, it is up on my MRAA fork on github in a new branch gpio-speedup