Object Addressing Subsystem design
Author: Michał Jaszczyk
Signed-off-by: Bogdan Yakovenko
Overview
To implement the core of SIO2 engine we've decided that we want certain types of objects to be accessible by some kind of an address, i.e. we want a subsystem that will map character strings to object instances. For example, we may want to write Contest(id=OI_15) and by that we mean an object of class Contest (or rather its subclass) whose ID, whatever it is, is 15. BY Comm: must be OI_15?This subsystem will be used for example by the event framework - someone may want to dispatch an event and specify its recipient by giving its address. The address should also have human-readable form, since the UI will use these addresses and the user may enter an address manually, for whatever purpose.
Address syntax
Addresses will have syntax similar to function call and to pulling a property out of a compound object. Formal syntax in pseudo-regular expression notation is as follows:
IDENTIFIER ::= [A-Za-z_] [A-Za-z0-9_]* VALUE := [A-Za-z0-9$-_.+!*'(),]+ PARAMETER ::= IDENTIFIER ( = VALUE )? PARAMETER_LIST ::= \( PARAMETER (, PARAMETER )* \) OBJECT ::= IDENTIFIER (PARAMETER_LIST)? ADDRESS ::= OBJECT (. OBJECT)*
BY Comment: Probably, it's better to split definition of IDENTIFIER to Class, Parameter and Value definition J's response: Split to identifiers and values. An identifier has to have a letter in the beginning. No distinction between classes and parameters has to be made, I suppose.
Examples:
- FooBar is a valid identifier. So is texas, AlAsKa, jasiu85 and _under_scOre_.
- Parameter list may look like this: (), (name), (user=jasiu), (id=123,name=OI_15).
- Object may look like this: Foo, bAR, Contest(name=OI_15), AAA123().
- A full address may be as follows: Contest(name=OI_15).Owner.Submission(id=1234), Task(id=111).Submission(no=123).Status.
The idea is to resemble constructor calls and taking properties out of objects. For example, Contest(name=OI_15).Owner means that we want to take a Contest class and instantiate it using a constructor that takes parameter list ['name']. This instantiation may return an object that is actually a subclass of the Contest class. Then we'd like to take the owner of the contest. There's no need for supplying any parameters for Owner, since once we know which contest we are talking about, its owner is well defined.
I propose that we stick to the following convention:
- classes have first letter uppercase, then lowercase;
- parameters are all lowercase;
- values are, well, whatever you need.
Some features:
- Characters allowed for values are those which are valid in an URL. In order to encode arbitrary byte sequence, do the following: take each byte; if it's valid, leave it, if it's not, replace it with "%hh". In order to encode arbitrary character sequence, encode it to a byte sequence using UTF8 encoding and then perform step described previously.
BY comment: Well, probably it's a bad assumption. I understand that couple of special Polish letters like 'ż,ó,ł' is quite recognizable by latin 'z,o,l', but it's would be a little bit harder to do the same with Cyrillic letter like 'ж, ю, э, ч, ...'. We should be more careful about i18n! I remember 10 years ago it was a big problem to same "cyrrilic" files in DOS. Let's prevent it in SOI2. Wikipedia has a good idea about encoding such stuff: http://ru.wikipedia.org/wiki/%D0%98%D0%B2%D0%B0%D0%BD_%D0%93%D1%80%D0%BE%D0%B7%D0%BD%D1%8B%D0%B9 <=> http://ru.wikipedia.org/wiki/Иван_Грозный http://ar.wikipedia.org/wiki/%D9%85%D8%AD%D9%85%D8%AF_%D8%A8%D9%86_%D8%B9%D8%A8%D8%AF_%D8%A7%D9%84%D9%84%D9%87 <=> http://ar.wikipedia.org/wiki/محمد_بن_عبد_الله J's response: Fixed. Look at the proposal.
- Spaces not allowed as parameter values explicitly. You have to encode them.
- Class().Something is the same as Class.Something, Class().Something() and Class.Something(), i.e. empty parameter list and no parameter list will mean the same.
BY comment: Double syntax looks unnatural for me. May be it would be better to accept only one format? J's response: OK, let's do the following: If an object doesn't need parameters, we don't allow empty parameter list (i.e Obj()).
- Although this is not a syntax feature, I'll mention it: there's overloading going on, i.e. Contest(id=123) and Contest(name=kon_zzgij) are both valid at the same time. The idea is that we may want to have several ways of instantiating a class.
Address meaning
We'll name different parts of an address according to what was introduced in the syntax description. So we have parameters, parameter lists, objects. Identifier appearing in the rule for object will be called class name and two identifiers in the rule for parameter will be called parameter name and parameter value respectively.
The dot notation introduces a tree structure in an natural way. We can imagine that empty string, which is not a valid address actually, corresponds to the root of a tree. If node x corresponds to address a, then addresses of the form a.b correspond to x 's immediate children (assuming of course that string b contains no dots). For example, addresses Aaa, Bbb, Aaa.Xxx, Aaa.Yyy, Bbb.Xxx, Bbb.Yyy form following tree:
TODO(jasiu): replace this with an image? Aaa.Xxx / Aaa / \ / Aaa.Yyy / \ \ Bbb.Xxx \ / Bbb \ Bbb.Yyy
This tree will appear implicitly in the mechanism that resolves addresses.
Address resolving
In this paragraph object will refer to a portion of an address and instance will refer to a Python object.
Firstly, the address is parsed. We create a list of objects that appear in the address and for each object we store its class and parameter list. The parameter list will be a dictionary that maps parameter names to parameter values. Having this list we may perform actual instantiation. This will consist of several steps, each step corresponding to an edge in the tree mentioned above. Suppose we are in the middle of instantiating. At this point we have an instance that corresponds to a certain node in the tree. Now we take next object in the address, which is a description of an object and we pass its class name and parameter list things to a method in the instance we have. This method will return instance of the desired class obtained using the parameter list. We can advance one level in the tree. When we perform the last step, we'll have the instance we want. We start instantiation by passing the first object to an instance that will be the entry point to the addressing subsystem and will be provided by one of subsystem's modules.
An example is as follows: We'd like to resolve address Contest(id=OI_15).Owner. We do the parsing and obtain two objects to resolve: first one is of class Contest with parameters {'id': 'OI_15'} , second one is of class Owner without any parameters. From the addressing subsystem we get the root_instance. We call root_instance.method('Contest', {'id': 'OI_15'}) and as the result we get a contest_instance that represents the contest called OI_15. The class of this instance is probably one of Contest subclasses. Some queries to the database may have occurred. Step one is finished. To perform step two, we call contest_instance.method('Owner', {}) and obtain an instance that represents the owner of the contest. This could be done just by reading contest_instance property, for example contest_instance.owner. But it may also have required access to the database. The method used to instantiate next instance is hidden from us. The Owner instance may be a string that tells owner's name, but may also be some more sophisticated object.
Subsystem interface
The addressing subsystem will be implemented in obj_addr module. (TODO(jasiu): think of a better name.) The module will provide a class hierarchy with Resolver being the root class. Each class in the hierarchy will implement the following interface:
class Resolver(object): def resolve(self, class_name, params): ...
Subclasses of the Resolver class will implement different strategies of resolving address part.
Resolving strategies
One of these strategies is static resolving (TODO(jasiu): think of a better name). The idea is as follows: We decide that instances of class Contest should only resolve classes Id and Name, i.e. only Contest(...).Id and Contest(...).Name are valid. So we create a resolver class called, say, ContestStaticResolver that will only accept 'Id' and 'name' as class_name in the resolve method.
Another strategy is that we may make all instance members addressable. Suppose we have following class:
class MyClass(object): def __init__(self, f, b): self.foo = 123 + f self.bar = 'abc' + b
Then we may use following addresses: MyClass(f=123,b=def).foo, MyClass(f=456,b=xxx).bar. Similarity between parameter list of the object in the address and the __init__ procedure is coincidental! The point is that for these two addresses we'll get instance members of a MyClass instance.
This leads to a nice idea: If a property/instance member will be of a type that supports resolving, we may automatically incorporate resolving routines into our class hierarchy. But we'd like to avoid implementing resolve method in every class, or even in the superclass. Therefore Resolver classes should be mix-ins, so that we can do the following:
def MyClass(object): # Base functionality for our class hierarchy. pass def MyClassWithResolving(SomeResolver, MyClass): # Now we have enabled resolving in our hierarchy. pass def MySubClass(MyClassWithResolving): # Some specialized functionality, has general resolving. pass def MyOtherSubClass(OtherResolver, MyClassWithResolving): # Change resolving strategy for this class. pass
If we manage to incorporate this with, say OrmEntity, then we can write addresses that will navigate us through the database and we'll have it for free.
Third resolving strategy is dynamic resolving (TODO(jasiu): better name?). The set of objects that an instance may resolve may change (even at runtime) and it may happen that we won't be able to point a superset that will contain all possible addressable objects. The root_instance that will be the entry point to the subsystem is a good example - since we want to implement framwork for plugins, we can't tell what kind of objects it should resolve.
Other resolving strategies that I'm unaware of right now may be introduced. Some resolvers will have to access database, some won't have to. Root resolving class will be existing on its own, others will be mixed into some other classes.
TODO
- How will the framework handle errors? Especially, how to signal 'object not found' and what does it actually mean?
BY comment: have no idea, how it's in Python. may be some exception? or nil value? J's response: Well, exception is the most obvious way, but I was thinking here more about design pattern, rather than implementation details. Hopefully exceptions will be flexible enough.
- How about distributed system? Can the following happen: on one machine we want to instantiate something but the information required to do this is only available on other computer. What to do then?
BY comment: well, class or other level is responsible to decide how to get the data. it's probably not a problem of this layer. J's response: Accek said that this situation should never happen, and even if it does, we shall return an empty object. For me it's kind of unclear what does it mean...
- Is the outlined framework flexible enough to handle plugins?
- Should any access control happen in here?
- Do we allow addressing to have side effects? This can do bad things if there's no access control at this level, think about this: User sends a request containing an address. The address is resolved and this has some side effects. Access control happens and it turns out that the user doesn't have permissions, his request is ignored. But wait, he managed to change something in the system using addressing!
accek's comment (to both above): these issues are real, although let's leave them to the designer of the access control. We don't know how access control will look like at the moment.