BaloneyGeek's Place

BaloneyGeek's Place

Operator! Give me the number for 911!

My Little Facemash Moment

Of the many movies that were released in 2010, one in particular stuck with me. It was about a socially inept college student, who failed miserably at love - very possibly because of his social ineptitude - and then decided to compensate for it by doing something from his dorm room that would stick it to the establishment and gain him some notoriety. The movie was called The Social Network.

At that time, I was socially inept, and I had also failed miserably at love, although I now know it wasn't down to social ineptitude. And being the immature child I was, I also wanted to compensate for my romantic failures by causing a little revolution that would make me famous. I just didn't know how to.

It took another 3 years. It was in late 2013, just as I was being battered by the harsh Rajasthan winter as I was about to finish my first semester at university, that I found out I could no longer read BBC News on the campus network. I've always been a rebel, not caring much for authority, and definitely not caring for authority that blocks me from doing things I want to. The time for my little Facemash moment had come.

Content Control

For a network administrator, internet pornography is a nightmare to handle. The average network administrator couldn't care less about the societal norms surrounding pornography; it's the sheer volume of traffic involved, and the sketchiness of the websites and their potential to infect entire networks of computer with insiduous malware, that keep them awake at night.

The internet is for porn. This is a fact. Depending on whom you ask, anywhere from 4% to 30% of the internet's websites are related to pornography.

Now 30% might be a small number, until you consider the nature of traffic. While Wikpedia or Google is mostly text, and Facebook is a mixture of text, images and video, most pornography is video. It's not just video, these days it's high-definition video, which means one person watching porn on the internet can easily blow through gigabytes of data in minutes.

So depending on whom you ask, an average of 17% of the world's websites are for porn, but a whopping 75% of the traffic volume is pornographic video.

In fact, lurk around enough on the internet and you'll find that some pornography websites are at the forefront of content delivery network technology - there's just so much data transfer capacity and so much high-speed bandwidth that you need to run a streaming video website - a popular one at that, because let's face it, people watch porn - that pornography websites are actually the driving force in development in distributed content delivery networks.

YouTube is the worlds single most popular video streaming website. PornHub is the second. The next non-porn video streaming website - Vimeo - comes far down the list. YouTube operates at Google scale, with networks of servers in almost every country in the world to deliver content to viewers as efficiently as possible. Given PornHub's popularity, you would guesstimate that they have at least two-thirds the server capacity of YouTube.

And India loves to watch porn. A couple of years ago, Indian Railways started a pilot project to equip large stations in India with free WiFi in collaboration with Google. In return, Google got to collect data on what the users of the free WiFi service were looking at. In the city of Patna, the capital of the state of Bihar, we got news headlines like this: Patna Is The Top User Of Google's Free Wi-Fi At Railway Stations, Mostly For Porn: Report.

My university started with the best of intentions. It happens to be connected directly to the National Internet Backbone through a 10 Gigabit network link and in the first few years of the university's existence, the campus WiFi was unfiltered. Of course, fewer than 100 students and faculty managed to saturate that link every single night and bring down the network down to a crawl. And then there was a campus-wide computer virus outbreak that was presumed to come from a careless porn viewer.

So the network administrators decided to block internet pornography. And this is where things started going wrong.

They started by blocking pornographic websites, and then blocked torrents. The speeds improved, but they were still not the blazing fast speeds that we should have had been having. So they decieded to block access to more categories of websites. Gaming went away. For a little while, so did YouTube, although it was brought back because of "the availability of educational content". And then they blocked news.

The rationale behind this was, in my opinion, absolute insanity. We had televisions in the common rooms, and their stance was - if you want to be updated on the world's current affairs, watch the news on the telly, or come down to the library for a newspaper. Apparently, the few videos embedded into news websites was too much for the network to handle.

I tried to submit requests to the IT department to get BBC and Reuters unblocked. It didn't work. My alternatives were to go to the university administration, or do something about it myself.

In hindsight, I could have gone to the administration. They're nice people, and completely reasonable. But the IT department had pissed me off, and I no longer had a much of a high opinion of them. I really wanted to "stick it to the man." And so I did.

DNS Blocking

The university used (and still uses) DNS to block websites. DNS, or the Domain Name System, translates domain names to IP addresses.

You see, every single website on the internet has a numeric address, called an IP address, or Internet Protocol Address. But these addresses can be as large as 12 digit numbers (or 32 digit hexadecimal numbers - numbers using the numerals 0-9 and the letters a-f - nowadays), and they're mighty hard to remember. What would you rather type in to your browser - www.google.com, or 167.182.123.89?

So our university ran its own DNS server, which would, for unblocked websites, translate the domain name to the real IP address, but for blocked websites, it would translate the domain name to an IP address that pointed to some other website that just said "The website you are trying to access is blocked on our network".

This was effective, but for an university which taught courses in computer engineering, exceedingly easy to break - just tell your laptop or computer to use a different DNS server, not the one provided by the university. Google runs one such public DNS server service, and we just en-masse pointed all our laptops to Google's DNS servers, completely unblocking everything.

It took a while for IT to figure out what was going on, but they retaliated by making sure DNS traffic never left the university's premises. We were thus limited to using DNS servers located inside the campus network, which now happened to be the university's own services that blocked websites.

Proxying DNS

This put a stop to all but the most enterprising of the students. Most resorted to using VPNs - Virtual Private Networks, a techniqe to route all internet traffic via a "private" network, bouncing it off other servers outside the campus network before releasing it to the internet. Unfortunately, VPNs are either free or fast. You can't have both.

I decided to poke around their network to see how they were actually blocking DNS traffic from exiting the campus.

This is where things get technical.

There are usually two protocols - a set of rules that computers follow to communicate with each other - that are widely used on the Internet. They're called TCP (Transmission Control Protocol) and UDP (User Datagram Protocol).

Then there's the concept of ports. You see, one physical computer may serve multipe services. It might serve websites, and simultaneously run a database that you can connect to from the outside world. If you want to connect to www.google.com, how do you say if you want to connect to the website running on the server, or the database?

Every computer therefore has 65,536 "ports", numbered from 0 to 65,535. A particular service - say the website, or the databse - "listens" on a particular port. When we connect to the website, we have to specify its IP address or domain name, and the port we want to connect to.

The IANA - Internet Assigned Numbers Authority - assigns some "well-known ports". They say that websites must listen on port 80, and websites that are secured and encrypted must listen on port 443. That's why, when you type in www.google.com into a web browser, you don't specify port 80 or port 443, because the browser assumes that you want to see the website and automatically connects to port 80. If the website was, by any chance, listening on port 1234, a nonstandard port, you'd have to write the address like this: www.google.com:1234.

Here's how the university was blocking DNS access - the well known port for DNS is port 53, and the university created a rule in their network firewall that said if any computer inside the network wants to connect to any computer outside the network on port 53, block that connection.

Simple, right?

Well, this is where things start getting fun. Just because the IANA says that DNS servers have to listen on port 53 doesn't mean DNS doesn't work if it listens on port 1234. It just means we have to explicitly specify port 1234 when we point our operating system to a particular DNS server.

I ran some tests on our network. It turned out, we were only allowed to connect to servers outside the network using TCP on ports 80 and 443. We were theoretically only allowed to browse websites.

And guess what, there are quite a few DNS servers on the internet that listen on port 443. We could just use one of those, right?

Almost.

There's another angle to the story - the protocol (TCP, or UDP). Websites can only be browsed using TCP, but DNS traffic can use both TCP and UDP. And since UDP is faster for small amounts of data (typical of DNS requests), DNS defaults to using UDP for traffic.

So DNS servers listen on port 53, expecting to hold conversations with the client using the UDP rules. And now you can probably guess what the problem was - since websites only work using TCP, our network administrators set up the firewall so that only TCP traffic went out on ports 80 and 443. Simply pointing my system to DNS servers listening on port 443 wouldn't work, since the system would try to make a DNS request using UDP, and fail.

So I came up with a little idea. What if I wrote a tiny prograam that ran on my own computer, which listened on port 53, listening to UDP traffic, and forwarded whatever it recieved to the real DNS servers outside the network using TCP over port 443? It would then wait for the reply, recieve it via the TCP connection, and relay it back to the program requesting the translation (such as Google Chrome) using UDP again.

And that's what I did, and it worked perfectly.

Dennis

Because I was feeling so cocky, I decided to put up the code on GitHub for everyone to see - if they could find it. I called it Dennis, a phonetic play on DNS, named after the character Dennis the Menace, because it was also supposed to be a menace to the university's IT department.

I used it for three and a half years. The university's IT department never noticed I was accessing websites that were supposed to be blocked, even though nothing was encrypted. I let a friend of mine, whom I trusted to be responsible with this kind of power, use it. No one noticed him downloading games from Steam once in a while.

After I graduated, I finally let everyone in the university (including the administration) know what I did. After I already had my degree.

At an university that boasts that it produces the engineers of tomorrow, and at an university that inculcates an entrepreneureal spirit right in the course curriculum, you can't reasonably expect that absolutely no one will manage to innovate around a real problem that they face every day. This was a sign that the university worked. In spirit, at least.

If you want to find out which school I attended and what I studied there, I invite you to stalk me on the internet. That information is not hard to find. But if you're in India, trying to choose an university to attend, and planning to study computer engineering, take a look at the one I went to. It's a brilliant little place, and you might like it. For what it's worth, I had the flexibility to choose my own study path that I designed myself, and as a result of this path I moved to Germany during my last semester of college, where I still live and work.

Dennis is available here: https://gitlab.com/BaloneyGeek/dennis

The Sysadmin Squad

It's been close to a year since I started contributing to KDE long term, and since then a lot has happened.

Spectacle is out as part of the Applications suite as of KDE Applications 15.12, and not only do the users like it, other devs love it too. Maintaining Spectacle isn't as hectic as it used to be in the early days. I get a steady stream of mostly benign bug reports, which is nothing I can't handle in a few hours over the weekend. Wayland support still hasn't materialised though, and it looks like it'll be a while. Proper High-DPI (retina displays) support did materialise though, courtesy of Kai Uwe Broulik, and will land in 15.12.1.

Something I've come to realise, however, is that day-by-day I'm gravitating over to the Sysadmin side.

The Project Predicament

KDE's online infrastructure is gigantic, no small part of which is taken up by Git servers. I'm not sure of the exact numbers, but a ballpark figure for the number of repositories we host would be in the 2,000-2,500 region.

We're also in the middle of a migration. KDE's Project Management Website runs on ChilliProject. ChilliProject is slow and unwieldy, but that we could make do with. What we can't make do with is that ChilliProject is deprecated and is no longer developed. There are bugs and security holes and we've had to disable chunks of functionality to mitigate these.

So we're migrating to Phabricator. We already have a Phabricator instance running, and quite a few projects have already been migrated. So far our experience has been good, and the Phabricator documentation is pretty good. And pretty funny (seriously, you should see their website).

Let's Git Overhaulin'

Back in September, we set up a read-only mirror of KDE's public project repositories on GitHub. d_ed liaised with GitHub and got is the organization username, and then I got busy mucking around with the post-recieve hook in our git.kde.org repositories.

The scripting itself was pretty simple. bcooksley had already written a small python script that git push-ed only branches and tags to upstream servers, so it was just a matter of executing that with the GitHub remote URL.

Now KDE has its own infrastructure of read-only git mirrors which mirror every repository, not just the public project ones. They're the Anongit servers, a network of (currently) 6 such servers spread worldwide which developers can clone from. They're available at anongit.kde.org.

Since we already mirror code from git.kde.org, why couldn't we just add the GitHub remote URL to the list of servers to mirror to? Because we don't push to the Anongit servers, we tell them (via a HTTP hook) that git.kde.org has been updated, please pull changes.

Now there isn't anything inherently wrong with this approach; it's just that we have no control over what happens if one of the Anongit servers go down. We'd ideally want to log failed mirroring attempts, and to do that effectively we'd need to push to our mirrors and log failures (and maybe retry if a mirroring attempt failed).

Propagator

Sysadmins hold the keys to much of KDE's infrastructure - user accounts, passwords, SSH keys, you name it, we have it. Understandably, we don't just let anyone be a part of the Sysadmin team. But the truth is, KDE's Sysadmin team is severely understaffed and the list of tickets to attend to is miles long. We need all the help we can get.

That said, you don't need to be a part of the core Sysadmin team to be able to help them out. And Sysadmin work isn't just SSH-ing into servers and running commands - there's plenty of coding to be done, and that work doesn't need access to the keys.

So, Propagator. Over the winter holidays, I got to work on a small-ish daemon that would manage our entire distributed git infrastructure - and alleviate the Sysadmins' need to ever touch the Anongit servers themselves.

Propagator is written in Python 3, and uses a Celery task queue to mirror-push changes to our Anongit servers. We also wanted to converge code-paths for pushing to GitHub, so Propagator does that too. Propagator also creates, sets the descriptions on, moves and deletes repositories on the Anongit servers via a SSH API, and on GitHub via their REST API.

In fact, with Propagator, managing repositories is as easy as creating them inside Phabricator. Propagator will automatically create repositories on the upstream servers (both Anongit servers and GitHub) on first push if they don't exist, effectively making things zero-administration.

What about moves, renames, and the rest? Well, you log in to a shell on git.kde.org and run:

$: mirrorctl create AwesomeRepo.git "Awesome app to do something super cool"
$: mirrorctl rename AwesomeRepo.git AwesomeApp.git
$: mirrorctl delete AwesomeApp.git

...and so on. Propagator will not send out requests to every single configured upstream server and tell it to carry out these commands. Easy as pie.

Unfortunately, Propagator now lives inside the repo-management repo in KDE, but I do plan to separating it out into its own repository and distributing it as a standalone product in its own right as a KDE project.

Closing Notes

KDE is awesome. No, it really is.

We see a lot of newcomers pop up on KDE's IRC channels and mailing lists and ask how to contribute. For the record, we appreciate code, documentation, artwork, translations, evangelism and publicity (but do refrain from streaking across campus with a giant KDE tattoo... *wink wink*).

And we need Sysadmins. If you have a particular set of skills that make it a nightmare for people like us to have problems with our online infrastructure, we want you.

The Return Of The Prodigial Son

After some years of hopping domains, this blog is back where it started, at blog.BaloneyGeek.com.

The reason for this is half financial, and half aesthetic. I used to own a couple of domains (including BaloneyGeek.com, Boudhayan.in and BoudhayanGupta.me, the last of which I'd picked up for free for a year via the GitHub Student Developer Pack), but paying for the renewal of all of these domains every year is expensive, especially when I'm still a student without a steady source of income. Also, the main reason for keeping the additional domains around was to have a proper "professional" e-mail ID, which I now have courtesy of KDE. I can finally consolidate around one domain, BaloneyGeek.com.

So now that I've had the time to cook up a semi-decent set of templates and a design for the blog, I'm using Pelican, which is a static site generator written in Python, to generate this site. The templates are Jinja2, the CSS is hand-written (no LESS or SCSS foo), and that swanky little logo was cooked up with a photo of my fancy signature (you can't use that with my bank, sorry) taken on my 2013 Moto G and hacked around with in GIMP. The CSS is mobile-first, so the site should still look semi-decent in IE6 as well as on phones. The font is Intel's Clear Sans, which I love.

I've also finally gotten a writing workflow which I'm comfortable with. I use GitHub's Atom to write, and while Atom is slow to start up, the writing environment and colours are very soothing. I'm very finicky about my text-editors, and I love Atom, even more than I love Sublime Text. With Pelican, you can use reStructuredText (or even AsciiDoc, if you prefer) to write your posts, but I like and write in Markdown. The entire site is a git repository, and it's available on GitHub, if you want to take a look. Anyway, once I've written a post, git commit-ed and git push-ed, Travis-CI automatically pulls in my site, builds the static pages and pushes them to the gh-pages branch. Yep, this is totally geeked out. I'd gotten the idea from Torrie, who too uses Pelican and has her site deployed by CircleCI instead of Travis. I just found Travis easier to configure.

Now that I finally have a writing environment that pleases my OCD as well as integrates well with my command-line based workflow, I should finally start writing more regularly. Let's see how this goes.

Hejdå!

KSnapshot-Next

KSnapshot is getting an overhaul.

It's actually a little more complicated than that. I started to work on the KF5 port of KSnapshot (EDIT: no, contrary to what Phoronix claims that port is not my work; I simply wanted to fix anything that needed fixing) sometime in early March this year, before I realised that the codebase, while perfectly in order for being a X11-only screenshot taker for KDE (yes, KSnapshot actually has a complete and fairly decent KF5 port in its frameworks branch on KDE Git), was in need of a major overhaul if we were going to get proper Wayland support in.

To that effect, I started working on a completely new screenshot application, copying in the bits and pieces of code from KSnapshot that I could actually use. It's called KScreenGenie, and has been living in KDE's Git infrastructure for quite a long time. It's currently in KDE Review, and will be moved to KDE Graphics in time for the Applications 15.12 release. Not just that, it will be renamed to KSnapshot, so people upgrading their computers will seamlessly upgrade from KSnapshot to KSnapshot 2.0 a.k.a. KScreenGenie.

But KScreenGenie in its current form and with its current name is actually going to see a public release. The code in the master branch of the git repository is currently in doc and string freeze, and is actually considered stable enough for daily use. A distribution called KaOS already ships it, I've been using it as my primary screenshooting tool for months now, and other KDE developers have also spent time using and testing it and fixing minor issues. So barring major blocker bugs that pop up anytime soon, KScreenGenie 2.0.0 will be released (independently of the KDE Applications) on August 15, 2015.

If you're so inclined, here's a little bit of technical information on how KScreenGenie is different from KSnapshot. The biggest internal change is how pictures are actually acquired - instead of using Qt's built in screenshooting APIs, KScreenGenie uses the native API for the platform that it's running on. Currently, only one working platform backend exists, and that's for X11, using xcb. We use libkscreen for properly figuring out screen layout information so that we can take proper multi-monitor screenshots (we don't support Zaphod mode though). On Wayland, because the platform specific bits are so well separated from the platform-independent bits, the application starts up, but does not actually take a screenshot. The reason is that there's no stable API for taking screenshots in Wayland yet (Weston has its own API, and while some of KWin's screenshot effect APIs work, all don't), and thus we don't actually have a working Wayland image grabber yet.

A particularly nice user-facing feature is the ability to take screenshots of transient windows (pop-up menus, for example) along with the parent window. In KSnapshot, if you chose "Window Under Cursor" as your mode and hover over a pop-up menu, only the pop-up menu is captured (technically the pop-up menu is a X11 window). In KScreenGenie, we actually have code to detect if the window is a transient window and if so, we try to figure out who the parent window is and then take a composite shot of both the parent and the transient. Note though, that this currently only works with Gtk3 and Qt5 applications, since only these toolkits set the WM_TRANSIENT_FOR property on their pop-up windows, thus enabling us to figure out who their parent is.

KScreenGenie's Window With Transient Children mode

I'm also currently working on a basic image editor integrated into KScreenGenie for editing and annotating screenshots within KScreenGenie itself. The code isn't online yet, since it's nowhere near even half-baked, but hopefully it'll be there by 15.12 or at the very most 16.04.

I'd like to encourage distributions to package KScreenGenie 2.0.0 in their primary repositories, and users to use it, test it and report bugs. Distribution packagers who do include KScreenGenie in their repositories do need to mark the KSnapshot package from Applications 15.12 as replacing, providing and obsoleting the KScreenGenie package though.

Happy clicking!