Update on Object Detection

Hello! Wasn’t sure whether to put this in General or Feature Requests. I have several years experience training deep learning models but using online/web based tools. Recently, I stumbled on your website and watched all of the getting started videos. I am very encouraged and eager to get started with your tools. They give me ultimate control and tuning of my model which you do not get on most web based tools out there.

That said, we mainly work with Object Detection models (Yolo and MobileNet SSD) and I was bummed to see you don’t support that yet. I did read that (maybe) you are working on it? Can I get an update on this so I know whether to keep your tool on my radar and when I might be able to check back?


Hi @mbedford

Nice to have you around - especially with all that experience!

Just a quick comment from me: things like MobileNet are mentioned in the documentation .e.g here and, as @misakss mentions in this thread you can import from tensorflow_hub (see esp. post #5 to resolve a typo in the initial post)

If any of the stuff you want to use are available from _hub that’s the way to go I think.

(BTW - for others: MobileNetV2 (not what you are looking for, I know) and similar are available under the “Public” menu in the modelling tool… )

Out of curiosity, what have you been doing with object detection? The PL guys are really helpful and if you care to say a little bit more about your use case they’ll certainly be more helpful that I can be right now :wink:

Hi @mbedford and welcome to the forum!

Object Detection is the next model type we will work on getting into the tool, expect to see something early next year :slight_smile:
As @JulianSMoore mentioned as well, any info on use cases or OD preferences you are willing to share are useful for us to prioritize which part of OD we will tackle first.

Best regards,

Hello @robertl and @JulianSMoore. Thank you so much for the response and warm welcome. A quick background to give a bit more details. We do object detection to detect vehicles in traffic in real time, as they pass under our cameras. As you can imagine, inference speed in this case is critical. So, we tend to choose model frameworks which have a good balance between detection speed and accuracy. We do other things to help, like run with hardware acceleration (we are running on Intel Neural Compute Stick 2/Myriad using OpenVino, Google Coral TPU and of course Nvidia RTX cards) depending on the application.

In the past years, we have bounced around between Google Cloud Vision, Microsoft CustomVision.ai and AWS to train our models. Each platform gives great web based upload, annotation, training on cloud hardware and then export capability. However, they are rather limited with regard to the base model framework. For example, we mainly use customvision.ai because it is a great user friendly tool and has tons of export options (Microsoft actually did a decent job here). But, I have no idea if this is a Yolo model, MobileNet, etc… and I have not option to select a different one, tune the model, analyze the model, etc… like your tool does. Also, while Microsoft does provide code samples in Python and C# when you export the model to help you understand how to consume it (because it does appear to be some custom base model type/pipeline) it is still difficult to get up and running and run any off the shelf scripts to convert the model, quantize it or whatever because the scripts mostly all fail, likely because this is not a typical model base type.

Sorry if this response was a bit much, just wanted to share what we use the model for as well as what tools we have used in the past.

Transfer learning from the COCO dataset which usually includes cars and then training to our custom images, captured from our cameras and in our scenes gives us the best results. Aside from being conscious about inference speed, I am not tied to any one in particular. Could be Yolo, MobileNet, MobileNet V2, etc…



Thank you very much for the information @mbedford, that’s awesome! :pray::slight_smile:

No :smiley: that was great for me too… I think your philosophy is sound - off the shelf is great for proof-of-concept or initial operating capability (to us a mil. term) but when you want to optimise for what is important to you, you need to get under the hood in a way that the OTS offerings don’t always support. PL could be very good here - especially given the insights in evaluation.

You mentioned a number of technologies I hadn’t even heard of - and must now investigate (as a background task :wink: )

One quick question: vehicles where? I can imagine that traffic behaviour varies a lot around the world (well, imagination not really required: I’ve seen Mumbai, Cairo, Bangkok… for myself and I probably would want to manage London traffic based on what I saw in those places!)

Hello @JulianSMoore Glad the response was great for you all.

Which technologies are you referring to? The other web based tools we use? I will be happy to elaborate in another thread or email if you like. But in general, if doing other research on your own, yes, check out customvision.ai as an example. Google Cloud Vision also has something similar. But again, they have cons too.

I know what you are getting at with your question. Traffic and the behavior of it can be quite challenging. The good news is our traffic is controlled. It is typically monitoring parking lots and private drives, not high speed highways with congestion.

That said, we still can have some minor partial occlusion at times if two cars are following closely or something like that.

Those two technologies, for starters :wink: I’ve now found the info on the Intel Compute Stick

I’m curious about how one benchmarks different tech to work out whether you can do what you want on the desktop, at the edge, etc. - or is it just “see what works?”

The good news is our traffic is controlled. It is typically monitoring parking lots and private drives, not high speed highways with congestion.

I saw something online that someone had put together to find vacant parking spaces - very handy in town!

Partial occlusions - of course, this is where AI shows its incompleteness… add physics to identification - understanding solidity, velocity etc. and it would be easier, but I wouldn’t mind betting that the biggest and best “dumb” models of traffic end up inferring “effective physics”, which, unfortunately, never gets re-used anywhere else.

(I think a lot about Generalised AI and bring all the different modes - vision, language, etc. - together; I know why things are they way they are, but I think we are on the edge of being able to to exponentially better with e.g. cross-modal transformers. YMMV :smiley: )

Good luck and don’t hesitate to ask for input!