GSoC 2017 - Week 4

The fourth week of Community Bonding period has elapsed and the coding period has begun. :D yey!

This week began with some informal discussions on WhatsApp group of GSOC 2017 students regarding their projects at their respective organisations. The discussion brought me an opportunity to learn about the vast domain of projects on which students are working on. We also exchanged information about our university course curriculum among students from different continents. The group is fun and a nice place to know each other.

To begin with th coding period, I thought it would be good to read the documentation on the extension. So I went through Going Basic Extension Writing Tutorial and Guide for developing extensions to understand various components of an extension and the purpose that they serve. The Basic Extension Writing Tutorial served a quick brush up regarding files and the practice of i18n.

The project revolves around the efficient storage and querying of hierarchy fields, so I decided to learn about implementation techniques with respect to Database design. I found one of the basic and most flexible implementation Adjecency List.


And then The coding period began… I had started with introducing ‘hierarchy’ keyword to the cargo_declare statement of the template added in order to create the table under the supervision of Cargo Extension. I initially went on to the track for declaring ‘hierarchy’ as a new data type for the Cargo. But my mentor reminded me of that we only wish to treat hierarchy as a set of ‘allowed values’ under a tree structure, so we need to take ‘hierarchy’ in a pseudo-type fashion, just as ‘enumeration’. Rather, here the aim was to create an ‘enumeration’ with tree structure allowing any value (or list of values) as the nodes of the tree (the “allowed values tree”). Yaron came up with the idea to add Nodes of the ‘hierarchy’ tree with “”” in the ‘allowed values’. But then I pointed out that it may not be good because we may have allowed values starting with “” and NOT being hierarchical in nature. Hence, I propose to add an extra parameter in the field description as ‘hierarchy=true’. But Yaron, reminded me of ‘hidden’ parameter which takes no values and is more compact. Thus we finalized ‘hierarchy=true;allowed_values=….(tree with *)’.

Thus I implemented the work to identify ‘hierarchy’ keyword as an additional parameter by modifying the CargoFieldDescription.php (adding $mIsHierarchy variable to the class and implementing necessary changes in order to update it)

While working on the project I saw the need to use error_log(), but then it became cumbersome and I also came to know about MediaWiki debug environment. I tried to setup debug environment using the documentation. I found the doc a bit overwhelming, but still, I tried to setup. At the end, I could not manage to print my debug messages so I took help from my mentor to ensure whatever I did was correct or not, and finally resolved the issue by some hit and trial hacks.

Now the ‘hierarchy’ keyword implementation was successfully done.

In most projects, you get to know more and your perspective towards the application & work changes as you begin to write more and more code. When we completed this first part. We realised it would be good to to implement PageForms to detect default input for hierarchy as a tree. As completing this part would successfully help one to perform easy input and hence ensure that the working of ‘hierarchy’ extra-param was successful, so I began with this completed the part. As soon as I finished registering the ‘tree’ input type of PageForms for default cargo type (yes yes I said pseudo-type) of ‘hierarchy. I ran the test to find it successful. I added ‘possible values’ to the ‘structure’ of the tree but the tests failed. The Cargo parser failed to accept multiline substrings as a match for regular expressions (It took a lot of time to track this bug, but yeah! Finally I did). I had already asked my mentor that if the parser could handle newlines, and he was sure about that. But still sometimes bugs are there, where you don’t expect them to be. So I fixed the issue by simply concatenation ‘/s’ to the regular expression - known as PCRE_DOTALL pattern modifier - which can match substrings over the multiple lines that is ignoring the ‘\n’(newline).

With this the first and fourth steps of the task (T161034)[https://phabricator.wikimedia.org/T161034] were concluded after testing on single and multiple values for String and as well as Page.


Now the most important phase of the project has hit, that is to look out for the best technique to store and retrieve hierarchy data. We require a database design which is best according to our demands - that is the type of queries that would be subjected to our hierarchy data. Hence, I decided to list down all possible type of queries with the help of Mr Yaron and Mr Tobi and prepared the following list as:

  1. Searching on fields. (be it leaf or internal node) for normal WHERE query
  2. Searching on the field according to hierarchy ancestors for ‘WITHIN’ query.
  3. Searching on the field having a list of values for ‘HOLDS WITHIN’ query.
  4. Counting all the ‘allowed values’ of the tree in the table for DrillDown. (# Possibility of querying on ‘Category’)

Example for point 3 : It could be something like - for querying a table called “Movies” - “where=Genres HOLDS WITHIN ‘Science fiction’ “. Meaning, you want to get all movies where at least one of its listed genres is “Science fiction”, or a subset of it. ….like “Space opera” or something. So, for instance, “Space Opera” and “Cyberpunk” are both subsets of “Science fiction”. Another example: think the city as Branches of a firm in that case I guess, it’s same if I provide only one continent. as :”where=BranchLocations HOLDS WITHIN ‘ASIA’ “.

Now I have to study technique other than adjacency list and prepare a good comparison to finalise the decision of DB design.

comments powered by Disqus