Sciology = Science + Technology

Commonsense in Technology

Beware of writing regex and string functions

Posted by sureshkrishna on August 6, 2008

Recently i was involved in an issue took a week to come to know the root cause. In the end its an eye opener to many who does not give importance to string functions and regex. “Regular expressions and String functions are quite powerful in any language; however utmost importance should be given to such code.”

The issue is very simple. Set of Java Files need to processed to get some annotations and other proprietary stuff and also separate the main class names and inner class names. The customer created Business Entities which may contain inner classes and are passed through a pre-processor. Problem occurs in a particular case when the File name is “BlaSomeClassName_Bla.java” and it contains a inner class “SomeClass”.

–>Inner class name is SIMILAR to main class name.

Lets look at the following code and especially the line 6. This line tries to match the class names given by qdox (java source parser) with the java source file that is currently being processed.

1     if (classes.length == 1) {
2         _javaClass = classes[0];
3     } else {
4         for (int i = 0; i < classes.length; i++) {
5             JavaClass aClass = classes[i];
6             if (aSourceFile.getName().matches(".*" + aClass.getName() + ".*")) {
7                 _javaClass = classes[i];
8                 break;
9             }
10        }
11    }

This is the regular expression that took up my days and nights which rarely has any sort of consistency in execution. In the above example the source that is being processed is the “BlaSomeClassName_Bla.java” and the class names that you get from qdox will be “BlaSomeClassName_Bla” and “SomeClass”. And now probably you would have guessed. In the array “classes”, if the “SomeClass” comes as the first element you are screwed. The regular expression matches the “BlaSomeClassName_Bla” and the processing class is taken as “SomeClass”. Where as the right processing class is “BlaSomeClassName_Bla”.

This issue took quite a few days to really understand and get to the bottom of the code. Many many thanks to eclipse which enables a cool debugging. Conditional debugging is very useful in such scenarios where you would not want to wait for a long time to see the special case. Instead, introduce the right condition and rest is taken care by eclipse. This is what makes the eclipse my favorite IDE.

Do you have any such experiences with strings and regex ?

Advertisements

9 Responses to “Beware of writing regex and string functions”

  1. cranley said

    I’ll pre-empt this comment by stating I haven’t programmed in Java since the previous Millenium, so please pardon my ignorance.

    In java, does the file name have to be the same name of the outter-most class contained within? In other words, if I had the following class in a .java file:


    public class OutterClass{

    // bunch of stuff

    public class InnerClass{
    // more stuff
    }
    }

    Would the file have to be called OutterClass.java? If so, would using the actual file extension as part of your expression help? Something like the following:

    ...
    JavaClass aClass = classes[i];
    String extension = aSourceFile.Extension; // just pretends this actually exists
    if(aSourceFile.getName().matches(".*" + aClass.getName() + "." + extension)){
    ....

    At the very least you’d be able to finalise where the file name (pre-extension) ends.

    But like I said, I haven’t touched java in 10 years, and I’ve no idea if I’m anywhere close to speaking coherently.

  2. @Cranley

    Only public classes need to be named in a file with the same name.
    e.g. Outer.java can contain

    public class OuterClass {
    class InnerClass {
    }
    }

    class AnotherClass {
    class InnerClass {
    }
    }

  3. @sureshkrishna

    Could you explain more what the code fragment wants to achieve
    Why is the code using matches() rather than an equals() comparison by forming the filename? I would think equals has a better performance than matches.

  4. @Shams

    Its a legacy code which we have from a long time. The only thing i know is that “somehow” this code exists.

    A business entity class might have some business rules as Inner Classes. qdox is used as the java source parser and it would give you all the class names from this source file.
    At this time you need to match the source file name and the class names that qdox gives, so that the “matching class” can be used for further processing.

    As you said i would also imagine that the EXACT match is the right solution with equals(). But the code exists somehow and the successors need to battle these issues out.

  5. @sureshkrishna

    I hope since you have found the bug, you have changed the code to use equals() 🙂
    Btw, thanks for introducing me to qdox, seems a handy tool. I have used the eclipse jdt to parse java files previously but this seems like a handy tools for simple parsing 🙂

  6. Fred said

    if (aSourceFile.getName().matches(“.*” + aClass.getName() + “.*”))

    should simply have been written as:

    if (aSourceFile.getName().indexOf( aClass.getName() ) > -1)

  7. ks said

    How about just creating an interface that you can use to tag all the BlaSomeClassName_Bla classes. That way, there is no need to parse filenames – just check if the class implements this BusinessEntitiy interface and job done. Or have I missed the point here? (I haven’t used qdox so I am not aware of how this preprocessing works)

  8. Hi, i believe that i saw you visited my site thus i got here to go back the desire?.I am attempting to find things to
    improve my site!I assume its ok to make use of some of your
    concepts!!

  9. An intriguing discussion is worth comment. There’s no doubt that that you need
    to write more about this issue, it might not be a taboo matter but generally people do not talk about
    such topics. To the next! Best wishes!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: