It should be alright when there’s only two kinds of loader. But things got complicated as the experiments proceed. During the months We’ve tried dozens of model designs for seeking an optimal one. Some of them should be fed with a combination of input that is different from the two before 3. More loaders popped out in support of those models. We began to mess up, since it was a tedious nightmare to keep loader_type
in sync with the model in each configuration file.
The second is a rather common problem in training models. Say you have designed a multi-stage training pipeline, where you would like the model to switch its behavior at some point. In the first X iterations, we disable a component A of model for warming up; while after that, it is enabled again for normal training. The catch is, how to make a deeply rooted component aware of the iteration number?
Back to our codebase. We had a Trainer
in charge of everything. It starts a training loop, in which the iteration number lies as a local variable. It also holds a reference to the model. The model has a hierachical structure, and component A hides deeply in some layers.
class Trainer:
model: "Model"
def train(self):
for iter_num, data_batch in enumerate(self.data_loader):
self.model.forward(data_batch)
...
class Model:
A: "ComponentA"
def forward(self, data_batch):
...
self.A.forward(data_batch)
class ComponentA:
def forward(self, data_batch):
# How can I access iter_num?
The stuff was implemented in a rough way at that time – we add a second argument for both Model.forward()
and ComponentA.forward()
, and pass iter_num
down along the path.
class Trainer:
model: "Model"
def train(self):
for iter_num, data_batch in enumerate(self.data_loader):
#hl: begin
self.model.forward(data_batch, iter_num)
#hl: end
...
class Model:
A: "ComponentA"
#hl: begin
def forward(self, data_batch, iter_num):
#hl: end
...
self.A.forward(data_batch, iter_num)
class ComponentA:
#hl: begin
def forward(self, data_batch, iter_num):
#hl: end
# How can I access iter_num?
Jesus it is dirty. The argument passing “contaminates” all functions it goes through. Whether or not expecting, they have to accept an extra argument. What if more components would like to access the states? What if more states would be passed? Every single change would have to modify a large area of code. Nobody would like it. At least I won’t.
Now let’s move to a higher level for some deeper thoughts. In the first example, we choose to initiate model and data loader separately. The crux is, they are not uncorrelated components. The choice of model decides what shape input data would be like, and further determines the type of loader. We in fact have a graph like
Ideally, DataLoaderBuilder
should “contact” with Model
to obtain information required for building loader. But we couldn’t, due to the limitation from hierarchy. The only possible path for message passing is Model
-> Trainer
-> DataLoaderBuilder
. It would however turns Trainer
into a “god object”, passing messages around between its children. Having a god object is considered to be a bad practice 4). Components are tightly coupled to their parents, and maintenance becomes difficult. The second is similar, except we are making Model
into the broker between Trainer
and ComponentA
.
A more generalized version of the problem: In a system with tree-like hierarchical structure, how would the communication be made between two non-adjacent components?
It is not some kind of novel research problem, but one already addressed in practical scenes. Following the single-responsibility principle, we can use a standalone service responsible for managing the communication. Such would be much common in modern Web development, since web components are usually organized in a tree and pass messages more frequently. Mature and production-ready solutions exist like Event-Bus pattern or centralized state management 5, which are all instances of the design pattern. Instead of relying on the target (or the path to the target), the components now depend only on the service object, and the system becomes less coupled.
So why won’t we use the techniques? Well, if some libaries integrate the stuff, we are glad to try; if not, we have to implement by ourselves, but sorry, we are running out of time.
For programmers in production group, they care more about coupling, otherwise the maintenance is getting painful. They would apply every best practice and design patterns that could be found from textbooks or from some blog posts.
But as researchers, we might have taken the course of software engineering, but we seldom do this. I’ve skimmed so much released code for papers on Github, most of which have their logic for building model, loading data and training tightly coupled, fragile and with mere flexibility for extending. They might be enough to showcase the papers, but are far from a good codebase. But sometimes we have no choice 6 but to extend upon it. There are indeed someone paying efforts to make well-designed and easy-to-extend codebases 7, but apparently they could not cover all extension demand from developers. We are from time to time being limited by our codebase, badly-designed or over-designed. What’s worse, we have a deadline ahead, and to rush out the idea, we are practicing so many anti-patterns – communicating via global variables or god object, duplicating the logic here and there, or writing meaningless boilerplate codes. The codebase would finally grow into spaghetti. Badly-cooked spaghetti.
It was then I began to think about why practices for production could hardly apply to a research project. The answer is that a research project is not production-ready, but evolving and iterating rapidly with aimless target, rather like a prototype. A prototype might grow into a production, but research project won’t, mostly ending after some paper deadlines. The dogmatism of design patterns are too verbose, and sometimes complicated. Researchers seldom use them, but run for some easy-to-use-but-dirty hacking or tricks.
And that’s the background of hsfzxjy/mocona. It implements some patterns like Dependency Injection and Event Emitter, in addressing the problem of communication between components. The library is deliberately designed to be “magical”, that is, do most of the heavy work behind the scene, but expose a very simple interface or (self-made) “syntax” for users. It is evil and an anti-pattern to be implicit and magical in Python. But there’re people tired or more afraid of verbosity, for which they are willing to write even worse code. If the library could help, they would be glad to make a trade-off between verbosity and anti-pattern.
(img, offset)
as input and supervised by (gtseg, gtoffset)
Author: hsfzxjy.
Link: .
License: CC BY-NC-ND 4.0.
All rights reserved by the author.
Commercial use of this post in any form is NOT permitted.
Non-commercial use of this post should be attributed with this block of text.
OOPS!
A comment box should be right here...But it was gone due to network issues :-(If you want to leave comments, make sure you have access to disqus.com.