Removing spam comments in JIRA

Introduction

The company I’m working for I’m administrating a JIRA instance which is being used as an internal bugtracker.

Lately we’ve opened-up JIRA to the public and use it as a platform for part of our product.

There’s unfortunately one problem with that: Being a rather small company (with less than 20 employees) we develop our product for a large number of customers who only pay for the product once (no reoccurring costs). The cost is also quite low (around 10-50 Euro) if you compare this to the prices of larger products. If I had to guess, I’d assume we have a customer base which goes into the hundreds of thousands of users.

Compare this to other companies and you get a slight idea why we can not afford an unlimited JIRA license (which, at the time of writing this, would cost us $24,000 plus $12,000 every year, while our current 25 user license only costs $1,200 plus $600 per maintenance renewal).

Since the unlimited user license is out of question for us, we allowed anonymous access to our JIRA instance for some of the projects. That allows our user base to create and comment on issues directly in our bugtracker.

Unfortunately, allowing anonymous access in JIRA has one bad side effect: It also opens up the bugtracker to spammers, since it no-longer requires you to log-in before adding comments or creating issues.

For several months this worked out until a few days ago, when some spambot detected our instance and started created spam comments (around 1,500 the first day and another 5,000 the other day).

JIRA is really a great tool IMHO but understandably the product’s and company’s focus is directed towards larger companies. That’s also most likely the reason why there is almost no built-in protection against spam. Presumably most customers do not use the anonymous access and rather buy the unlimited license so that their users simply create their own accounts, while requests for improvements for anonymous spam protection have been on record for years already (see: JIRA issue 10236 and JIRA issue 8000).

But what do you do if you want to allow anonymous access and run into the situation of a spambot having created a shitload of comments on your instance? Deleting >5,000 comments manually is certainly not an option (that’s roughly 10,000 mouse-clicks to get rid of all the entries ūüôā ).

My first idea was obviously to alter the JIRA DB entries directly, but that certainly is not supported and bares a certain risk of breaking things, if you don’t know all the details of the DB structure.

Fortunately, I discovered a post from Henning Tietgens. Based on his post I was able to adjust his  provided script to get rid of all the comments in just a few hours work.

How to bulk remove comments in JIRA?

(The following instructions were tested on JIRA 6.0.8. They might however also work for any later (or even earlier) version of JIRA).

Make sure you have a backup of your JIRA instance to be on the safe side in-case anything goes wrong with the script. While the description worked for me, it was only tested on a single instance and I can’t give any warranty at all.

  1. In the JIRA instance go to the Add-Ons Manager (CogIcon -> Addons -> Find New Add-Ons) or use the following link: http://[yourJIRAInstanceURL]/plugins/servlet/upm/marketplace
  2. In the search box enter “Script Runner”. This should bring-up the “ScriptRunner for JIRA Standard Edition” as the first entry. Click on Install to install the add-on
  3. On the admin panel you’d now see a new section (on the Add-Ons tab) called Script Runner. Click on Script Console.
  4. On this screen select Groovy as the Script engine, copy/paste the script provided below into the script frame, adjust the issueKey to the one which contains the spam comments, replace “Foo Bar Comment” with some entry in the spammer’s comment and click on Run Now.

Voila. That’s it. All comments containing the phrase you specified above in the given issue should be gone.

Following is also the script (updated 08/19/15) I ran on our instance to clean-up all the spammers comments (based on the URLs the spambot entered in the comments).

Bare in mind to double-check the URLs before running it against your instance. Since spambots tend to use also just completely normal URLs (so to hide which URLs they actually want to spam), it’s quite possible that in your case the script would remove absolutely fine comments as well.

STL and the <-operator

Introduction

The Standard Template Library (STL) adds a lot of fundamental functionality to C++. One of its most prominent features are containers. Containers can be used to store any kind of objects. Various different containers are available for the different requirements a developer might have. Some of the containers are optimized for random access, while others are very efficient when it comes to sorting objects.
To be able to sort objects, the STL containers (and functions) make use of comparators and/or an object’s <-operator. That way it becomes quite easy for developers to create classes which can be stored in a container. But there are a couple of requirements for these comparators, as this paper lays out.

Strict Weak Ordering

Let’s assume we have a simple class called “Car”:

Next we define a <-operator for our “Car”-class by sorting it by its color and its type:

Now we create 2 instances of the class:

If we’d call: bool bsmaller = car1 < car2; // bsmaller = true the result would be as expected (since car1.m_Type < car2.m_Type).
Now let’s put these cars in a set:

So far, so good. We have a container with two cars, so what? — Let’s put another one into the container and see what happens:

Outch… That results in a runtime error at best, or undefines behavior at worst.

<-operator requirements

What went wrong?
Well, the problem lies within our defined <-operator and the fact that the set-container uses it to try to put our cars into an order. If we compare car2 with car3, we get contradictory results:

Therefore, the set doesn’t know how to sort these objects in its internal red/black-tree.
For most of the STL functions/template classes which require a comparator, a so called strict weak ordering comparator is required. Such a comparator is defined by fulfilling the following requirements:

  1. the <-operator imposes an order:
    if (a < b) then !(b < a)
  2. an object is never smaller than itself (i.e. it can’t be ordered before itself):
    a < a = false
  3. the <-operator can be used to check objects for equality:
    if (!(a < b)) && !(b < a)) then a == b
  4. the ordering is transitive:
    if (a < b) && (b < c) then (a < c)

So one might come to the following great solution to the problem and say: “Let’s sort objects by their memory address!”

Nice idea. That comparator meets all the above given requirements, since an object’s address is unique, if run on a single PC (at no time two objects can occupy the same memory address) plus this idea has the advantage that no additional memory (for instance for a unique identifier used to order the objects) is required.
As long as there is no special requirement to keep objects sorted in a special order within a container this can be a feasible solution. However, it’s not completely safe under all circumstances, as the following chapter will uncover.

Copy Constructor and =-operator

Some of the STL functions/containers make use of an object’s assignment-operator or its copy constructor. For instance there is a function called make_heap(). That function creates a copy of the first object and in addition uses the assignment operator of the class of the contained objects to swap objects. That way a heap is created. So why is this problematic?
Well, the functions are designed under the following assumption:
The <-operator compares objects based on their content AND neither the copy constructor nor the assignment operator alter the object’s order.
Given as a general example, the assertion in the following code is expected to be true: if (a < b) { c = a; a = b; b = c; assert(b < a); }
If we use the object’s address within our <-operator, that’s no longer true. Assume a and b have the following addresses: a = 0x1; b = 0x2;
To make it easier to see the problem further assume that each object stores one integer: a.i = 1; b.i = 2;
Before the swap, a is considered smaller than b (since 0x1 < 0x2). Now we swap the objects: c = a; a = b; b = c;
As you see, the objects changed their content: a.i = 2; b.i = 1; but switching should not have an impact on the comparator; hence, assert(b < a) should be true since b now contains the content of a and a contains the content of b, but it isn’t!
Remember, we wrote the <-operator to compare the objects based on their addresses — and these haven’t changed — so a is still smaller than b (since 0x1 < 0x2).
So we changed the order of the objects and the STL functions don’t know what to do about it (resulting in an error or undefined behavior). We need another way to come up with an implementation for our operator.

Comperator Template

Though our initial <-operator meets the first two requirements, it lacks transitivity and therefore can’t be used to put objects into a unique order. We can correct this, by using the following template to write a comparator which sorts an object based on comparing multiple member variables:
For an object of class “A” with “n” member variables m[0, n) where each member variable type provides a <-operator:

The function compares the object’s first member variable. If the current object’s first member variable is less than the second object’s one, it returns true. If it isn’t, it checks both member variables for equality by making use of the second requirement for <-operators (if (!(a < b)) && !(b < a)) then a == b).
Due to the order of the parentheses in the expression, only in case that the first member variable is equal, it compares the second member variable and returns true, if the first one’s is smaller than the second one’s. The procedure is then repeated for all remaining member variables.
Applying that template to our “Car”-class, would result in the following operator:

That’s it. We now have a working strict weak ordering operator.

Conclusion

Writing a <-operator helps a lot to more conveniently work with STL-containers. However, the developer has to be aware of the additional requirements for the implementation and must be careful to make sure that these requirements are met. Failure to do so can easily result in bugs introduced into the code which are really hard to trace down, since they can occur randomly and not all of the logical errors of an <-operator can be traced down by additional checks within the STL implementation.
Nevertheless, having a properly written <-operator at hand is the basis to make use of most of the STL-functions and improves productivity as well as increases code maintainability.

References

[1] S. Kuhlins, M. Schader, 2005. Die C++ Standardbibliothek. 4th ed. Berlin, Heidelberg, New York: Springer. Ch.1.3.
[2] Accredited Standards Committee WG21/N1043, 1996, Working Paper for Draft Proposed International Standard for Information Systems–Programming Language C++. [internet] Available at: http://www.open-std.org/jtc1/sc22/open/n2356/
[Accessed 23 February 2009]. Ch.23.1.2.
[3] P. J. Plauger, A. Stepanov, M. Lee, D. R. Musser, 2001. The C++ Standard Template Library. ed. Upper Saddle River: Prentice-Hall p.134