Archive

Posts Tagged ‘RegEx’

Gotcha: JS replace on document.location fails, use document location.href

This is my contribution to:
http://stackoverflow.com/questions/2652816/what-is-the-difference-between-document-location-href-and-document-location

Here is an example of the practical significance of the difference and how it can bite you if you don’t realize it (document.location being an object and document.location.href being a string):

We use MonoX Social CMS (http://mono-software.com) free version at http://social.ClipFlair.net and we wanted to add the language bar WebPart at some pages to localize them, but at some others (e.g. at discussions) we didn’t want to use localization. So we made two master pages to use at all our .aspx (ASP.net) pages, in the first one we had the language bar WebPart and the other one had the following script to remove the /lng/el-GR etc. from the URLs and show the default (English in our case) language instead for those pages

<script>
  var curAddr = document.location; //MISTAKE
  var newAddr = curAddr.replace(new RegExp("/lng/[a-z]{2}-[A-Z]{2}", "gi"), "");
  if (curAddr != newAddr)
    document.location = newAddr;
</script>

But this code isn’t working, replace function just returns Undefined (no exception thrown) so it tries to navigate to say x/lng/el-GR/undefined instead of going to url x. Checking it out with Mozilla Firefox’s debugger (F12 key) and moving the cursor over the curAddr variable it was showing lots of info instead of some simple string value for the URL. Selecting Watch from that popup you could see in the watch pane it was writing "Location -> …" instead of "…" for the url. That made me realize it was an object

One would have expected replace to throw an exception or something, but now that I think of it the problem was that it was trying to call some non-existent "replace" method on the URL object which seems to just give back "undefined" in Javascript.

The correct code in that case is:

<script>
  var curAddr = document.location.href; //CORRECT
  var newAddr = curAddr.replace(new RegExp("/lng/[a-z]{2}-[A-Z]{2}", "gi"), "");
  if (curAddr != newAddr)
    document.location = newAddr;
</script>
Categories: Posts Tags: , , , , , , ,

Gotcha: System.IO.GetInvalidPathChars result not guaranteed

at System.IO.Path.GetInvalidPathChars one reads:

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names

note: can also call this method from non-trusted Silverlight app – not as Intellisense tooltip wrongly says in Visual Studio 2013 with Silverlight 5.1

I just found out about this the hard way (since DotNetZip library was failing at SelectFiles to open .zip files that it had before successfully saved with item filenames containing colons). So I had to update my ReplaceInvalidFileNameChars string extension method to also replace/remove invalid characters such as the colon, wildcard characters (* and ?) and double quote.

    public static string ReplaceInvalidFileNameChars(
this string s,
string replacement = "") { return Regex.Replace(s, "[" + Regex.Escape( Path.VolumeSeparatorChar + Path.DirectorySeparatorChar + Path.AltDirectorySeparatorChar + ":" + //added to cover Windows & Mac in case code is run on UNIX "\\" + //added for future platforms "/" + //same as previous one "<" + ">" + "|" + "\b" + "" + "\t" + //based on characters not allowed on Windows new string(Path.GetInvalidPathChars()) + //seems to miss *, ? and " "*" + "?" + "\"" ) + "]", replacement, //can even use a replacement string of any length RegexOptions.IgnoreCase); //not using System.IO.Path.InvalidPathChars (deprecated insecure API) }


Useful to know:

System.IO.Path.VolumeSeparatorChar

slash ("/") on UNIX, and a backslash ("\") on the Windows and Macintosh operating systems

 

System.IO.Path.DirectorySeparatorChar

slash ("/") on UNIX, and a backslash ("\") on the Windows and Macintosh operating systems

 

System.IO.Path.AltDirectorySeparatorChar

backslash (‘\’) on UNIX, and a slash (‘/’) on Windows and Macintosh operating systems

More info on illegal characters at various operating systems can be found at:

http://support.grouplogic.com/?p=1607

HowTo: Remove invalid filename characters in .NET

In ClipFlair Studio I use DotNetZip (Ionic.Zip) library for storing components (like the activity and its nested child components) to ZIP archives (.clipflair or .clipflair.zip files). Inside the ZIP archive its child components have their own .clipflair.zip file and so on (so that you could even nest activities at any depth) which construct their filename based on the component’s Title and ID (a GUID)

However, when the component Title used characters like " (double-quote) which are not allowed in filenames, then although Ionic.Zip created the archive with the double-quotes in the nested .clipflair.zip filenames, when trying to load those ZipEntries into a memory stream it failed. Obviously I had to filter those invalid filename characters (I opted to remove them to make those ZipEntry filenames a bit more readable/smaller).

So I added one more extension method for string type at StringExtensions static class (Utils.Silverlight project), based on info gathered from the links from related stackoverflow question. To calculated version of a string s without invalid file name characters, one can do s.ReplaceInvalidFileNameChars() or optionally pass a replacement token parameter (a string) to insert at the position of each char removed.

public static string ReplaceInvalidFileNameChars(this string s,
string replacement = "") { return Regex.Replace(s, "[" + Regex.Escape(new String(System.IO.Path.GetInvalidPathChars())) + "]", replacement, //can even use a replacement string of any length RegexOptions.IgnoreCase); //not using System.IO.Path.InvalidPathChars (deprecated insecure API) }

For more info on Regular Expressions see http://www.regular-expressions.info/ and http://msdn.microsoft.com/en-us/library/hs600312.aspx


BTW, note that to convert the char[] returned by System.IO.Path.GetInvalidPathChars() to string we use new String(System.IO.Path.GetInvalidPathChars()).

It’s unfortunate that one can’t use ToString() method of char[] (using Visual Studio to go to definition of char[].ToString() takes us to Object.ToString() which means the array types don’t overload the virtual ToString() method of Object class to return something useful).


Another thing to note is that we don’t use System.IO.Path.InvalidPathChars field which is deprecated for security reasons, but use System.IO.Path.GetInvalidPathChars() method instead. MSDN explains the security issue, so better avoid that insecure API to be safe:

Do not use InvalidPathChars if you think your code might execute in the same application domain as untrusted code. InvalidPathChars is an array, so its elements can be overwritten. If untrusted code overwrites elements of InvalidPathChars, it might cause your code to malfunction in ways that could be exploited.

Validating E-mails using Regular Expressions in Java

To sum up the discussion at http://stackoverflow.com/questions/1360113/is-java-regex-thread-safe/, you can reuse (keep in static variables) the compiled Pattern(s) and tell them to give you new Matchers when needed to validate those regex pattens against some string:


import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Validation helpers
 */ 
public final class Validators {  

  private static final String EMAIL_PATTERN = 
    "^[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*
     @[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$";
  
  private static Pattern email_pattern;  

  static {  
    email_pattern = Pattern.compile(EMAIL_PATTERN);
  }

  /**
   * Check if e-mail is valid 
   */   
  public static boolean isValidEmail(String email) {    
    Matcher matcher = email_pattern.matcher(email);
    return matcher.matches(); 
  }

}

(Note: the EMAIL_PATTERN string should be put in a single line)

For the RegEx pattern used, see the article at http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/ and the user comments (and useful links) posted there.

Update (20120715): previous pattern wasn’t accepting “-” in the domain name

Categories: Posts Tags: , , , ,
%d bloggers like this: