Archive
HowTo: Extract numeric suffix from a string in Python
I recently needed to extract a numeric suffix from a string value in Python. Initially I did the following:
import re
def extractNumSuffix(value):
return (None if (search:=re.search(‘^\D*(\d+)$’, value, re.IGNORECASE)) is None else search.group(1))
Note that return has a single-line expression.
So
print(extractNumSuffix("test1030"))
prints 1030
Tried it online at:
https://repl.it/languages/python3
However, then I found out that Assignment Expressions in Python only work from Python 3.8 and up, so I changed it to this one:
import re
def extractNumSuffix(value):
search=re.search(r"^\D*(\d+)$", value, re.IGNORECASE)
return (None if search is None else search.group(1))
which should work in Python 2.x too. Don’t forget to import the regular expressions (re) module.
Gotcha: JS replace on document.location fails, use document location.href
This is my contribution to:
http://stackoverflow.com/questions/2652816/what-is-the-difference-between-document-location-href-and-document-location
Here is an example of the practical significance of the difference and how it can bite you if you don’t realize it (document.location being an object and document.location.href being a string):
We use MonoX Social CMS (http://mono-software.com) free version at http://social.ClipFlair.net and we wanted to add the language bar WebPart at some pages to localize them, but at some others (e.g. at discussions) we didn’t want to use localization. So we made two master pages to use at all our .aspx (ASP.net) pages, in the first one we had the language bar WebPart and the other one had the following script to remove the /lng/el-GR etc. from the URLs and show the default (English in our case) language instead for those pages
<script>
var curAddr = document.location; //MISTAKE
var newAddr = curAddr.replace(new RegExp("/lng/[a-z]{2}-[A-Z]{2}", "gi"), "");
if (curAddr != newAddr)
document.location = newAddr;
</script>
But this code isn’t working, replace function just returns Undefined (no exception thrown) so it tries to navigate to say x/lng/el-GR/undefined instead of going to url x. Checking it out with Mozilla Firefox’s debugger (F12 key) and moving the cursor over the curAddr variable it was showing lots of info instead of some simple string value for the URL. Selecting Watch from that popup you could see in the watch pane it was writing "Location -> …" instead of "…" for the url. That made me realize it was an object
One would have expected replace to throw an exception or something, but now that I think of it the problem was that it was trying to call some non-existent "replace" method on the URL object which seems to just give back "undefined" in Javascript.
The correct code in that case is:
<script>
var curAddr = document.location.href; //CORRECT
var newAddr = curAddr.replace(new RegExp("/lng/[a-z]{2}-[A-Z]{2}", "gi"), "");
if (curAddr != newAddr)
document.location = newAddr;
</script>
Gotcha: System.IO.GetInvalidPathChars result not guaranteed
at System.IO.Path.GetInvalidPathChars one reads:
The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names
note: can also call this method from non-trusted Silverlight app – not as Intellisense tooltip wrongly says in Visual Studio 2013 with Silverlight 5.1
I just found out about this the hard way (since DotNetZip library was failing at SelectFiles to open .zip files that it had before successfully saved with item filenames containing colons). So I had to update my ReplaceInvalidFileNameChars string extension method to also replace/remove invalid characters such as the colon, wildcard characters (* and ?) and double quote.
public static string ReplaceInvalidFileNameChars(
this string s,
string replacement = "") { return Regex.Replace(s, "[" + Regex.Escape( Path.VolumeSeparatorChar + Path.DirectorySeparatorChar + Path.AltDirectorySeparatorChar + ":" + //added to cover Windows & Mac in case code is run on UNIX "\\" + //added for future platforms "/" + //same as previous one "<" + ">" + "|" + "\b" + "" + "\t" + //based on characters not allowed on Windows new string(Path.GetInvalidPathChars()) + //seems to miss *, ? and " "*" + "?" + "\"" ) + "]", replacement, //can even use a replacement string of any length RegexOptions.IgnoreCase); //not using System.IO.Path.InvalidPathChars (deprecated insecure API) }
Useful to know:
System.IO.Path.VolumeSeparatorChar
slash ("/") on UNIX, and a backslash ("\") on the Windows and Macintosh operating systems
System.IO.Path.DirectorySeparatorChar
slash ("/") on UNIX, and a backslash ("\") on the Windows and Macintosh operating systems
System.IO.Path.AltDirectorySeparatorChar
backslash (‘\’) on UNIX, and a slash (‘/’) on Windows and Macintosh operating systems
More info on illegal characters at various operating systems can be found at:
HowTo: Remove invalid filename characters in .NET
In ClipFlair Studio I use DotNetZip (Ionic.Zip) library for storing components (like the activity and its nested child components) to ZIP archives (.clipflair or .clipflair.zip files). Inside the ZIP archive its child components have their own .clipflair.zip file and so on (so that you could even nest activities at any depth) which construct their filename based on the component’s Title and ID (a GUID)
However, when the component Title used characters like " (double-quote) which are not allowed in filenames, then although Ionic.Zip created the archive with the double-quotes in the nested .clipflair.zip filenames, when trying to load those ZipEntries into a memory stream it failed. Obviously I had to filter those invalid filename characters (I opted to remove them to make those ZipEntry filenames a bit more readable/smaller).
So I added one more extension method for string type at StringExtensions static class (Utils.Silverlight project), based on info gathered from the links from related stackoverflow question. To calculated version of a string s without invalid file name characters, one can do s.ReplaceInvalidFileNameChars() or optionally pass a replacement token parameter (a string) to insert at the position of each char removed.
public static string ReplaceInvalidFileNameChars(this string s,
string replacement = "") { return Regex.Replace(s, "[" + Regex.Escape(new String(System.IO.Path.GetInvalidPathChars())) + "]", replacement, //can even use a replacement string of any length RegexOptions.IgnoreCase); //not using System.IO.Path.InvalidPathChars (deprecated insecure API) }
For more info on Regular Expressions see http://www.regular-expressions.info/ and http://msdn.microsoft.com/en-us/library/hs600312.aspx
BTW, note that to convert the char[] returned by System.IO.Path.GetInvalidPathChars() to string we use new String(System.IO.Path.GetInvalidPathChars()).
It’s unfortunate that one can’t use ToString() method of char[] (using Visual Studio to go to definition of char[].ToString() takes us to Object.ToString() which means the array types don’t overload the virtual ToString() method of Object class to return something useful).
Another thing to note is that we don’t use System.IO.Path.InvalidPathChars field which is deprecated for security reasons, but use System.IO.Path.GetInvalidPathChars() method instead. MSDN explains the security issue, so better avoid that insecure API to be safe:
Do not use InvalidPathChars if you think your code might execute in the same application domain as untrusted code. InvalidPathChars is an array, so its elements can be overwritten. If untrusted code overwrites elements of InvalidPathChars, it might cause your code to malfunction in ways that could be exploited.
Validating E-mails using Regular Expressions in Java
To sum up the discussion at http://stackoverflow.com/questions/1360113/is-java-regex-thread-safe/, you can reuse (keep in static variables) the compiled Pattern(s) and tell them to give you new Matchers when needed to validate those regex pattens against some string:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Validation helpers
*/
public final class Validators {
private static final String EMAIL_PATTERN =
"^[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*
@[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$";
private static Pattern email_pattern;
static {
email_pattern = Pattern.compile(EMAIL_PATTERN);
}
/**
* Check if e-mail is valid
*/
public static boolean isValidEmail(String email) {
Matcher matcher = email_pattern.matcher(email);
return matcher.matches();
}
}
(Note: the EMAIL_PATTERN string should be put in a single line)
For the RegEx pattern used, see the article at http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/ and the user comments (and useful links) posted there.
Update (20120715): previous pattern wasn’t accepting “-” in the domain name