Category Archives: Code

Posts containing code

Interactive REST hacking with Sublime Text and RESTer

Do you ever miss a command line to a web application? If you’re lucky, you may use a REST API with appropriate tools for the same job.

This Christmas, I bought a Sonos system for our family. It’s a wireless speaker system that uses music streaming services. It supports both Spotify and Deezer, but works better with Deezer and it came with a one year free subscription, so Deezer it is.

First order of business: Importing Spotify playlists into Deezer. I found a free service that does this, but by accidentally refreshing the browser a couple of times, I managed to import five copies of each playlist. Disaster!

Thus began my quest: I had a number of accidentally imported playlists in Deezer, and it turns out that after deleting a playlist, I needed about five clicks, including hitting a small icon, in order to delete the next. This was not going to be fun.

As a programmer, you know there must be a better way. Perhaps that better way requires more time because you have to program something, but dammit, better is better. In the case of Deezer, I found that they had a well-documented REST API. But this time, I didn’t feel like writing a program just to solve this one problem. And so we get to the core of this post: How can you just execute a fairly large number of API calls without writing a program for it? All I needed to do was to find out which set of API calls needed to be made based on my playlist listing and execute these.

My text editor of choice these days is Sublime Text, which has a number of plugins. The RESTer plugin was just what I needed.

So, in order to delete the extra playlists:

  1. I installed SublimeText 3, Package Manager and RESTer
  2. Looking at the Deezer API doc, I created a test application and entered the URL: https://connect.deezer.com/oauth/auth.php?app_id=YOUR_APP_ID&redirect_uri=http://localhost& perms=basic_access,email,delete_library&response_type=token in a web browser. This gave me an authorization dialog with Deezer that ends up redirecting to an invalid URL on localhost. However, the URL will be something like http://localhost/#access_token=xxxxxxxxxxxxxxxxxx&expires=3600. Just copy the access_token part, as this will be needed for later.
  3. Now, let’s list the playlists. In Sublime Text, I write the URL http://api.deezer.com/user/2529/playlists?access_token=xxxxx (replace the id with your own user id and the access_token with the result from last step) and press ctrl-alt-r. This executes the REST call and returns the result in a new buffer as a formatted JSON text.
  4. In Sublime Text, I bring up the search dialog with ctrl-f, click the “.*” button (regular expression), enter "title"|"picture" and press “Find All”. This selects all the relevant lines from the JSON response.
  5. Now, some Sublime Text editing magic. All the lines from the output are selected. Press shift-end to extend the selections to the end of their lines, ctrl-c to copy the texts to the clipboard, ctrl-a to select everything, del to delete and ctrl-v to paste it.
  6. Whoa! That should replace the text with a set of alternating lines like "title": "something" and "picture": "something". We almost have what we want. Double click on the word "title" on any line and press alt-f3. This selects all the places where it says "title" in the buffer. Press end to move to the end of the lines and del to join the line with the next.
  7. Now each playlist should be on a separate line with the title and the picture link. Press F9 to sort the lines. In my example, I can now clearly see all the lines the are duplicates. So I delete the lines I want to keep and keep the lines I want to delete. For all playlists with duplicates, I remove one line in my buffer and keep the rest for deleting.
  8. Now I just have to craft the delete commands. I double click an example of the word picture. I press right-arrow until the cursors (several!) are between the ” and the https-part. Press shift-home to select everything before https://api... and del press to delete this. Press end to go to the end of the line and backspace until you have deleted the ‘/image”,’ part so each URL looks like “https://https://api.deezer.com/playlist/number“.
  9. We now have our targets to delete. At this point in time, I went to the beginning of the buffer and pressed ctrl-alt-down arrow until I have a cursor at the start of all lines. I now type “DELETE “, press end, paste in the access_token from step 2 and press enter. I now have all the requests that I need to execute separated by a blank line!
  10. Finally, I go to the first line, press ctrl-alt-r which executes the HTTP request. I press ctrl-w to close the response, go down two lines and press ctrl-alt-r to do it again.

This is the result of the first request:

This is what my buffer looks like after selecting titles and pictures and joining the lines:

This is what my buffer looked like in the end.

Sometimes, doing an everyday Christmas task like deleting a lot of playlists is hard to do with the programs available. Luckily, with a little bit of programming knowledge and some sharp tools, you can get the job done through an API.

Posted in Code, English | Leave a comment

Promises you can trust

JavaScript Promises provide a strong programming model for the future of JavaScript development.

So here I’m playing with promises.

First I need a bit of a package.json file:

Now I can write my first test (test/promises_test.js):

Notice that the “it” function takes a “done” function parameter to ensure that the test waits until the promise has been resolved. Remove the call to done() or to resolve() and the test will timeout.

This test fails, but because of a timeout. The reason is that done is never called. Let’s improve the test.

Using “done()” instead of “then()” indicates that the promise chain is complete. If we haven’t dealt with errors, done will throw an exception. The test no longer times out, but fails well:

And we can fix it:

Lesson: Always end a promise chain with done().

In order to separate this, you can split the then and the done:

There is another shorthand for this in Mocha as well:

But what is a promise chain?

This is extra cool when we need multiple promises:

Notice that done is only called when ALL of the strings have had their length calculated (asynchronously).

This may seem weird at first, but is extremely helpful when dealing with object graphs:

Here, the save methods on dao.orderDao and orderLineDao both return promises. Our “savePurchaseOrder” function also returns a promise which is resolved when everything is saved. And everything happens asynchronously.

Okay, back to basics about promises.

Here, the second function to “done()” is called. We can use “fail()” as a shortcut:

But this is not so good. If the comparison fail, this test will time out! This is better:

Of course, we need unexpected events to be handled as well:

And of course: Something may fail in the middle of a chain:

The failure is automatically propagated to the first failure handler:

It took me a while to become really comfortable with Promises, but when I did, it simplified my JavaScript code quite a bit.

You can find the whole source code here. Also, be sure to check out Scott Sauyet’s slides on Functional JavaScript for more on promises, curry and other tasty functional stuff.

Thanks to my ex-colleague and fellow Exilee Sanath for the inspiration to write this article.

Posted in Code, English, Java | 1 Comment

Dead simple configuration

Whole frameworks have been written with the purpose of handling the configuration of your application. I prefer a simpler way.

If by configuration we mean “everything that is likely to vary between deploys“, it follows that we should try and keep configuration simple. In Java, the simplest option is the humble properties file. The downside of a properties file is that you have to restart your application when you want it to pick up changes. Or do you?

Here’s a simple method I’ve used on several projects:

The AppConfiguration class looks like this:

This reads the configuration file in an efficient way and updates the settings as needed. It supports environment variables and system properties as defaults. And it even gives a pretty good log of what’s going on.

For the full source code and a magic DataSource which updates automatically, see this gist: https://gist.github.com/jhannes/b8b143e0e5b287d73038

Enjoy!

Posted in Code, English, Java | Leave a comment

The lepidopterist’s curse: Playing with java.time

Pop quiz: What will be the output of this little program?

The answer is, like with most interesting questions, “it depends”. How can it depend? Well, let try a few examples:

  • getHoursOfDay(LocalDate.of(2014, 7, 15), ZoneId.of("Asia/Colombo")) returns 24. As expected
  • getHoursOfDay(LocalDate.of(2014, 7, 15), ZoneId.of("Europe/Oslo")) also returns 24.
  • But here comes a funny version: getHoursOfDay(LocalDate.of(2014, 3, 30), ZoneId.of("Europe/Oslo")) returns 23! This is daylight saving time.
  • Similarly: getHoursOfDay(LocalDate.of(2014, 10, 26), ZoneId.of("Europe/Oslo")) also returns 25
  • And of course, down under, everything is upside down: getHoursOfDay(LocalDate.of(2014, 10, 5), ZoneId.of("Australia/Melbourne")) gives 23.
  • Except, of course, in Queensland: getHoursOfDay(LocalDate.of(2014, 10, 5), ZoneId.of("Australia/Queensland")) => 24.

Daylight saving hours: The bane of programmers!

Daylight saving hours were instituted with the stated purpose of improving worker productivity by providing more working hours with light. Numerous studies have failed to prove that it works as intended.

Instead, when I examined the history of daylight saving hours in Norway, it turns out that it was lobbied by a golfer and a butterfly collector (“lepidopterist”) so that they could better pursue their hobbies after working hours. Thus the name of this blog post.

Most of the time, you can ignore daylight saving hours. But when you can’t, it can really bite you in the behind. For example: What does the hour by hour production of a power plan look like on the day that changes from daylight saving hours to standard time? Another example given to me by a colleague: TV schedules. It turns out that some TV channels just can’t be bothered to show programming during the extra hour in the fall. Or they will show the same hour of programming twice.

The Joda-Time API and now, the Java 8 time API java.time can help. If you use it correctly. Here is the code to display a table of values per hour:

Given 2014/10/26 and Oslo, this prints:

And on 2014/3/30, it prints:

So, if you ever find yourself writing code like this: for (int hour=0; hour<24; hour++) doSomething(midnight.plusHours(hour)); you may want to reconsider! This code will (probably) break twice a year.

At the face of it, time is an easy concept. When you start looking into the details, there's a reason that the java.time library contains 20 classes (if you don't count the subpackages). When used correctly, time calculations are simple. When used incorrectly, time calculations look simple, but contain subtle bugs.

Next time, perhaps I should ruminate on the finer points of Week Numbers.

Posted in Code, English, Java | Leave a comment

C# tricks: Securing your controllers

This article is dedicated to the ExileOffice team – revolutionizing the way we run our business in Exilesoft.

As applications move more and more of their business logic to the client side, it falls upon us to simplify what’s left on the server to avoid distractions. My previous article shows how I get rid of lot of the boring and noisy code.

There is one thing that the server will always be responsible for: Security. Much functionality can be moved to the client side, but we have no way of stopping users from hacking our client-side code or faking requests.

In this article, I extend upon the generic controllers to handle the one of the most important aspects of security: Authorization.

This is not your data!

The first order of business is to ensure that only logged in users can access our controller. This is trivially done with AuthorizationAttribute:

Here, our custom AuthorizationAttribute sets a property by looking for the Company of the currently logged in user in the database. You probably want to cache this in a session.

The second problem is to ensure that the user can only read and write data.

According to my previous article, we want to move as much as possible of the boilerplate logic into a generic EntityController. Here we add authorization:

Here, we give the controllers a virtual AccessFilter property. Whenever we read data, we need to run it through the access filter, and whenever we write data, we need to check that the data matches the filter.

It’s up to each entity controller to determine how to match the data. Here is an example for PersonData:

The clue here is to ensure that PersonController can only get at Persons through the Entities collection. This collection is always filtered with the Company that the user has access to.

The code will resolve to simply: db.Persons.Where(p => p.Company == Request.Properties["UserCompany"]).Where(p => p.Type == PersonType.Admin && p.City == city).Select(p => new { p.FirstName, p.LastName }). By using the filtered Entities, we can ensure that we never accidentally forget to apply the users company.

Posted in C#, Code, English | Leave a comment

A canonical web test in NodeJS

Working with web applications in NodeJS is great. Using the same language and libraries on the client and server simplified the thinking. And NodeJS has fast tests and restart for a super quick edit-verify cycle when you’re coding.

I like to write tests to verify the server-side and client-side logic, but do you know that the whole solution really is working? You can of course test your service manually after deploying, but that becomes tedious. By using Selenium, we can test that the solution works end-to-end.

In this article, I show how to write a Mocha test that:

  1. Starts up the Selenium test runner
  2. Creates a browser client that can access the application (using PhantomJS)
  3. Starts up the server and sends the browser to the server (running in ExpressJS)
  4. Clicks a button in the web page that triggers some JavaScript (written in jQuery in this example)
  5. Verifies that the call to the server returns and displays correct data

This test can be run with a simple command after you check out the code and requires no setup on the developers computer!

This test starts a new server and client for each test. When using this for real, you really want to make sure you only start the server and the web driver once as these are expensive operations.

The test case of “it clicks menu item” has a lot of callbacks. There are versions of webdriver APIs for NodeJS which are based on promises that you may enjoy using more.

Implementation details

The full application can be found on Github. To implement it, I used the following NPM modules:

  • phantomjs
  • selenium-standalone
  • webdriverjs
  • expressjs (of course)
  • mocha and chai (of course)

Starting the server looks like this:

And the app.js file looks like this:

I have a server.js file which starts up the express application to run normal (outside the tests).

A simplified version of starting Selenium and WebDriver looks like this:

I found that Selenium and PhantomJS has some issues, at least on Windows, so my final code needed a few hacks to work. See the full details at github.

Conclusion

A web end to end integration test in NodeJS requires a little bit of shaking your fist at the heavens to get to work the first time, not in the least due to the need to work around limitations in Selenium and PhantomJS. But once you got it up and running, you can easily test not only that your logic works, but that your whole application works together.

When making these tests, I allowed for a little flexibility as well: By setting environment variables, the same tests can be run with a manually deployed server, so you can use it to verify that your staging server is up and running (for example). And of course, you can replace PhantomJS as a web browser with Firefox or Chrome and see the tests run for real in your browser.

Automate the end-to-end tests of your NodeJS applications!

Posted in Code, English, JavaScript, Software Development | Leave a comment

C# Tricks: Slimming down your controllers

This blog post is dedicated to my colleague Seminda who has been experimenting with how to create simple and powerful web applications. Thank you for showing me your ideas and discussing improvements with me, Seminda.

I find many C# applications have much unnecessary code. This is especially true as the weight of the business logic of many applications are shifting from the backend to JavaScript code in the web pages. When the job of your application is to provide data to a front-end, it’s important to keep it slim.

In this article, I set out to simplify a standard MVC 4 API controller by generalizing the functionality, centralizing exception handling and adding extension methods to the DB set that is used to fetch my data.

If you generate an API controller based on an existing Entity and remove some of the noise, your code may look like this:

This code is a simplified version of what the API 4 Controller wizard will give you. It includes a GetPerson method that returns a person by Id, PostPerson which saves a new person, GetPeople which returns all people in the database, GetAdminsInCity, which filters people on city and type and DeletePerson which finds the person with the specified Id and deletes it.

I have replaced DbContext and IDbSet with interfaces instead of the concrete subclass of DbContext makes it easy to create tests that use a double for the database, for example MockDbSet.

Generify the controller

This is easy and safe as long as things are simple. As a general tip, rename you methods to “Get”, “Post” and “Index” rather than “GetPerson”, “PostPerson” and “GetPeople”. This will make it possible to generalize the controller thus:

The generic EntityController can be used for any class that is managed with EntityFramework.

Exception handling

I will dare to make a bold generalization: Most of the bugs that remain in production software are in error handling. Therefore I have a very simple guideline: No catch blocks that don’t rethrow another exception.

The actual handling of exceptions should be centralized. This both makes the code easier to read and it ensures that exceptions are handled consistently. In MVC 4, the place to do this is with a ExceptionFilterAttribute.

We can thus simplify the EntityController:

The HandleApplicationException looks like this:

This code code of course add special handling of other types of exceptions, logging etc.

DbSet as a repository

But one piece remains in PersonController: The rather complex use of LINQ to do a specialized query. This is where many developers would introduce a Repository with a specialized “FindAdminsByCity” and perhaps even a separate service layer with a method for “FindSummaryOfAdminsByCity”.

If you start down that path, many teams will choose to introduce the same layers (service and repository) even when these layers do nothing and create a lot of bulk in their applications. From my personal experience, this is worth fighting against! Needless layers is the cause of a huge amount of needless code in modern applications.

Instead, you can make use of C# extension methods:

The resulting controller method becomes nice and encapsulated:

The extension methods can easily be reused and can be freely combined.

Conclusions

With these tricks, most of your controllers would look something like this:

That’s pretty minimal and straightforward!

I’ve showed three tricks to reduce the complexity in your controllers: First, parts of your controllers can be generalized, especially if you avoid the name of the entity in action method names; second, exception handling can be centralized, removing noise from the main application flow and enforcing consistency; finally, using extension methods on IQueryable<Person> gives my DbSet domain specific methods without having to implement extra layers in the architecture.

I’d love to hear if you’re using these approach or if you’ve tried something like it but found it lacking.

Posted in C#, Code, English, Software Development | 4 Comments

Horizontal reuse in JavaScript and C#

In his article Horizontal Reuse: An Alternative to Inheritance Toby Inkster compares how to implement multiple inheritance or mixins in Java, Perl, PHP and Ruby. It’s a very interesting comparison of programming languages features and well worth the read.

Toby writes:

In class-based object-oriented programming, when there are classes that appear to share some functionality, this is often a time when people will refactor them into two subclasses of a common base class, avoiding repetition.

Toby gives an example of a Tractor class and a Horse class which can both pull_plough to help on the farm.

However, Horse already inherits from Animal, and Tractor already inherits from Vehicle. […] So ruling out multiple inheritance, what other possibilities do we have?

In this blog-post, I would like to extend (mixin?) Toby’s examples with two of my favorite languages: JavaScript and C#.

JavaScript

JavaScript doesn’t implement classical object-oriented classes, but instead uses prototypes. JavaScript doesn’t have a built-in mixin concept available, but the flexibility of the languages makes it easy for us to create one from scratch:

These mixins acts just as Ruby’s mixins, but like much of JavaScript, it falls on us to create the mechanism for reuse ourselves. There are many libraries out there that aim to do just this, but it’s useful to understand the underlying code.

Just like with Ruby, if we added the Hitchable mixin to Horse and forgot to implement go, JavaScript would only warn us when we tried to call horse.pull_plough().

C#

C# offers extension methods which aim to do the same thing as defender methods in Java, but rather than having to add the methods to the interface, C# allows us to put the extension methods in an independent static class.

Just like Toby’s Perl example, all the important qualities of horizontal reuse are covered, but very differently from the Perl example: We have a name for the common behavior, the Hitchable class specifies what’s needed to make something pull a plough (the IGo interface) and we get an error message from the compiler if we try to PullPlough on something that doesn’t implement IGo.

The difference is that the compiler has no idea that we may want to make Horse Hitchable before we try to call MyHorse.PullPlough().

The C# approach has one big advantage: Anyone can create the Hitchable static class. Neither Vehicle, IGo or Horse needs to know about it’s existence.

The flip-side of this is that the only type that we can assign both Horses and Tractors to is the IGo interface.

Conclusion

Different languages offer different mechanisms for horizontal reuse. JavaScript reminds a lot of Ruby, but requires more manual work for the developer. C# has a totally different approach from other languages that offers some unique strengths, but also has limitations compared to the Perl and PHP approaches.

Make sure that you check out Toby Inkster’s article for four other languages each with their unique approach: Java, Ruby, PHP and Perl.

As developers in one programming language, it is useful to know what solutions are offered in other programming languages for the same problems that we face ourselves. For programming language designers, we live in a golden age with lots of different ideas that I hope may cross-pollinate between modern programming languages.

Posted in C#, Code, English, JavaScript | Leave a comment

Announcing EAXY: Making XML easier in Java

XML libraries in Java is a minefield. The amount of code required to manipulate and read XML is staggering, the risk of getting class path problems with different libraries is substantial and the handling of namespaces opens for a lot of confusion and errors. The worst thing is that the situation doesn’t seem to improve.

A colleague made me aware of the JOOX library some time back. It’s a very good attempt to patch these problems. I found a few shortcomings with JOOX that made me want to explore alternatives and naturally I ended up writing my own library (as you do). I want the library to allow for Easy manipulation of XML, and in an episode of insufficient judgement, I named the library EAXY. It’s a really bad name, so I appreciate suggestions for improvement.

Here is what I set out to solve:

  • It should be easy to create fairly complex XML trees with Java code
  • It should be straight-forward and fool-proof to use namespaces. (This is where JOOX failed me)
  • It should easy to read values out of the XML structure.
  • It should be easy to work with existing XML documents in the file structure or classpath
  • The library should prefer throwing an exception over silently failing.
  • As a bonus, I wanted to make it even easier to deal with (X)HTML, by adding convenience functions for this.

1. Creating an XML document

An XML document is just a tree. How about if align the tree to the Java syntax tree. For example – lets say you wanted to programmatically wanted to construct some feedback on this article:

Each element (Xml.el) has a tag name and can nest other elements, attributes (Xml.attr) or text (Xml.text). If the element only contains a text, we don’t even need to make the call to Xml.text. The syntax is optimized so that if you want to do a static import on Xml.* you can write code like this:

2. Reading XML

Reading XML with Java code can be a challenge. The DOM API makes it extremely wordy to do anything at all. You an use XPath, but can be a bit too much on the compact side and when you do something wrong, the result is simply that you get an empty collection or a null value back. I think we can improve on this.

Consider the following:

I step down the XML tree structure and get all the recipient email addresses of the previous message. But wait – running this code returns an empty list. EAXY allows us to avoid scratching our head over this:

Now I get the following exception:

As you can see, we misspelled “recipent” in the message. Let’s get back to this problem later, but for now, let’s work around it to create something meaningful:

Again, I think this is about as fluent as Java’s syntax allows.

3. Validation and namespaces

So, we had a message where one of the element names was misspelled. If you have an XSD document for the XML you’re using, you can validate the document against this. However, as you may get used to when it comes to Java XML libraries the act of performing this validation is quite well hidden behind complex API’s. So I’ve provided a little help:

This reads the mailmessage.xsd from the classpath, which is the most common use case for me.

Of course, most schemas don’t refer to elements in the empty namespace. When using validation, it’s common that we have to construct elements in a specific namespace. In most Java libraries for dealing with XML, this is hard and easy to get wrong, especially when namespaces are mixed. I’ve made namespaces into a primary feature of the Eaxy library:

Notice that the “type” and the “role” attributes belong to different namespaces – a scenario that is especially hard to facilitate with other libraries.

4. Templating

Reading the XSD from the classpath inspired another usage: What if we have an XML document as a template in the classpath and then use Java-code to manipulate this document. This would be especially handy for XHTML:

This code reads the file testdocument.html from the classpath, selects the element with id “peopleForm” and adds two input elements to it.

5. HTML convenience

In the code above, we set the type, name and value attributes of HTML input elements. These are among the most frequently used attributes in HTML manipulation. To make this easier, I’ve added some convenience methods to Eaxy:

A final case I wanted to optimize for is that of dealing with forms in HTML. Here’s some code that manipulates a form before that can be sent to the user.

Here, I set the form contents directly. The code will throw an exception if a parameter name is misspelled, so it’s easy to ensure that you use it correctly.

Conclusion

I have five examples of how Eaxy can be used to do easily what’s hard to do with most XML libraries for Java: Create a document tree with pure Java code, read and manipulate individual parts of the XML tree, the use of namespace and validation, templating and manipulating (X)HTML documents and forms.

Check out The library is not stable now, but for an XML library to be unstable may not be a very risky situation as most errors will be easy to detect long before production.

I hope that you may find it useful to try and use this library in your code to deal with XML and (X)HTML manipulation. I’m hoping for some users who can help me iron out the bugs and make Eaxy even more easy to use.

Oh, and do let me know if you come up with a better name.

Posted in Code, English, Java | Leave a comment

Having fun with Git

I recently read The Git Book. As I went through the Git Internals parts, it struck me how simple and elegant the structure of Git really is. I decided that I just had to create my own little library to work with Git repositories (as you do). I call the result Silly Jgit. In this article, I will be walking through the code.

This article is for you if you want to understand Git a bit deeper or perhaps even want to work directly with a Git repository in your favorite programming language. I will be walking through four topics: 1) Reading a raw commit from a repository, 2) Reading the tree hash of the root of a commit, 3) parsing the file list of a directory tree, and 4) Reading the file contents from a subdirectory of a commit root.

Reading the head commit from a repository

The first thing we need to do in order to read the head commit is to find out which commit is the head of the repository. The .git/HEAD file is a plain text file that contains the name of a file in the .git/refs/heads directory. If you’ve checked out master, this will be .git/refs/heads/master. This file is a plain text file which contains a hash, that is: a 40 digit hexadecimal number. The hash can be converted to a filename of a Git Object under .git/objects. This file is a compressed file containing the commit information. Here’s the code to read it:

Running this code produces the following output (notice that some of the spaces in the output are actually null bytes in the file):

Finding the directory tree of a commit

When we have the commit information, we can parse it to find the tree hash. The tree hash references another file under .git/objects which contains the index of the root directory of the files in the commit. In the example above, the tree hash is “c03265971361724e18e31cc83e5c60cd0e0f5754”. But before we read the tree hash, we have to read the object type (in this case a “commit”) and size (in this case 237).

Looking at the tree hash file is not as straight forward, however:

The next part of this article will show how to deal with this.

Parsing a directory tree

The tree file has what looks like a lot of garbage. But don’t panic. Just like with the commit object, the tree object starts with the type (“tree”) and the size (130). After this, it will list each file or directory. Each tree entry consists of permissions (which also tells us whether this is a file or a directory), the file name and the hash of the entry, but this time as a binary number. We can read through the entries and find the file we want. We can then just print out the contents of this file:

Here’s an example of a parsed directory listing. I have not showed the octalMode for each file, but this can be extremely useful to separate between directories (which octalMode starts with 0) and files:

Reading a file

This leads us to the end of our journey – how to read the contents of a file. Once we have the entries of a tree, it’s a simple matter of looking up the hash for a filename and parsing that file. As before, the file contents will start with the type (“blob” – which means “data”, I guess) and file size:

This prints the contents of our file. Obviously, if you want to find a file a subdirectory, you’ll have to do a bit more work: Parse another tree object and look and an entry in that object, etc.

Conclusions

This blog post shows how in less than 50 lines of code, with no dependencies (but a small utility helper class), we can find the head commit of a git repository, parse the file listing of the root of the file tree for that commit and print out the contents of a file. The most difficult part was to discover that it was the InflaterInputStream and not Zip or Gzip that was needed to unpack a git object.

My silly-jgit project supports reading and writing commits, trees and hashes from .git/objects. This is just the core subset of the Git plumbing commands. Furthermore, just as I wrote the article, I noticed that git often packs objects into .git/objects/pack. This adds a totally new dimension that I haven’t dealt with before.

I hope that nobody is crazy enough to actually use my silly Git library for Java. But I do hope that this article gave you some feeling of Git mastery.

Posted in Code, English, Java, Technology | 2 Comments