Ads 468x60px

Tuesday, July 16, 2013

Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 1.)


The Formula Challenge Series: Where good formulas go GREAT:
Formula Challenge 001 – Return everything from a string after first block of numbers

Introduction to the Formula Challenge series:

Formulas are like the DNA of Excel…you can solve some very complicated evolutionary challenges by stringing together a few simple base-pairs in the right order.
Over at http://chandoo.org/forums/forum/excel-challenges you’ll find a whole bunch of tricky formula challenges posted by a group of excel nerds (me included) trying to out-nerd each other with formula nerdiness.
Consequently, these formula challenges are designed to get participants thinking creatively about how to solve a tricky problem using formulas only.
(Oh, and they might be designed to get a whole bunch of nerds to do my tricky work assignments for me too.)
[Psst….so as to not hurt any feelings, I’m obliged to point out under the UN’s declaration of Super-Human Rights that one person who posted a response at this challenge is not an Excel Nerd, thus allowing everybody who posted to assume it was them.]
[Psst (again)…of course, in actual fact that lone non-nerd was in fact moi - but don’t tell them that…it will spoil their delusions of normality ;-) ]
Putting together a formula to tackle these challenges shows you have a great mastery of excel. Putting together a formula that tackles the challenge and is also shorter than anyone else’s is something more akin to standing on the gold podium at the Olympics and shouting “In yer face, LOSERS” down to the silver and bronze teams while you strut your best Jagger strut to the Windows 95 launch song. http://www.youtube.com/watch?v=5VPFKnBYOSI
(Yes, it’s actually that satisfying to a true Excel nerd. Or so I’m told.)
Anyway….I’ve learned a lot from looking at how different people tackle these, and my formula toolkit is much bigger as a result.
To help you increase your toolkit, today we’re going to take a look at the first challenge in this series:

Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 1).

Without further ado, here’s the details of the first challenge:
  • Split the strings in Column A with a formula that only returns the substring that you see in column B. So for instance, if the string is Monaco7190Australia1484 then we want everything to the right of that first block of numbers i.e. Australia1484.
  • Your formula should handle any length of string, and any length of numbers – not just the lengths shown above/below
  • You should not use any helper cells, intermediate formulas or named ranges, or VBA
Those strings look something like this:
String:Required Substring:
Monaco7190Australia1484Australia1484
Liechtenstein4352Austria8590Austria8590
Malta1306Belarus7578Belarus7578

You can download the challenge and full dataset here: Formula Challenge 1.1
The bit that’s required to be returned is the Blue Bold bit i.e. the block of text and numbers after the first block of text and numbers, which in the case of Monaco7190Australia1484 is Australia1484.
What’s tricky about this challenge is that we have a block of text, followed by a block of numbers, followed by the block of text/number that we actually want to extract. So it’s hard to pinpoint the location of that 2nd block of text/numbers given there’s a preceding block of text/numbers.
Despite having a head start of several weeks on this challenge (I first conceived of the approach when answering a post a weeks before posting the problem as a challenge) and despite (or perhaps because of) taking copies amounts of performance-enhancing drugs, I didn’t even make it onto that Olympic podium. Instead, I arrived huffing and puffing at the finish line with this unwieldy beast:
=MID(A1,1+MATCH(1,(CODE(MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1))<58)*(CODE(MID(A1&”a”,ROW(A$2:INDEX(A:A,LEN(A1)+1)),1))>57),0),LEN(A1))
The approach I took was to try to find the point at which we have a number immediately followed by a letter.
For instance, the zero and letter A from Monaco7190Australia1484. To do this:

1. I dynamically break the string apart into one-digit bites with this:

= (MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)
Excel returns: ={“M”;”o”;”n”;”a”;”c”;”o”;”7″;”1″;”9″;”0″;”A”;”u”;”s”;”t”;”r”;”a”;”l”;”i”;”a”;”1″;”4″;”8″;”4″}

2. Wrap a code function around that which converts these to the specific number that Excel uses to represent each number/letter:

=CODE({“M”;”o”;”n”;”a”;”c”;”o”;”7″;”1″;”9″;”0″;”A”;”u”;”s”;”t”;”r”;”a”;”l”;”i”;”a”;”1″;”4″;”8″;”4″})
Excel returns: ={77;111;110;97;99;111;55;49;57;48;65;117;115;116;114;97;108;105;97;49;52;56;52}

3. Work out which of those codes represents numbers.

ie: All numbers fall before character code 58
={77;111;110;97;99;111;55;49;57;48;65;117;115;116;114;97;108;105;97;49;52;56;52}<58
Excel returns: ={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE}

4. Repeat steps 1 to 3 but this time I check for the occurrence of letters.

Furthermore I make a change so that the resulting array is offset by one position, by changing the ROW(A$1… bit in 1 to ROW(A$2… :
=CODE(MID(A1&”a”,ROW(A$2:INDEX(A:A,LEN(A1)+1)),1))>57
Excel returns: ={TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE}

5. Multiply these two arrays together

This converts these TRUE and FALSE numbers to Ones and Zeros.
Furthermore, because that 2nd array is offset by one, this gives me the position where a letter and a number coincide. (Which – given that 2nd array has been offset – is equivalent to identifying the occurrence of where a number is immediately followed by a letter:
={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE} * {TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE}
Excel returns: ={0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;1}

6. All I need to do then is to identify where that first 1 falls in this array:

=MATCH(1,{0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;1},0)
Excel returns: =10

7. …and then return all of the original string to the right of that position:

=MID(A1,1+10,LEN(A1))
Excel returns: =Australia1484

Summary

Neat, eh?
Well, I thought so.
In fact, I thought that would be a sure-fire winner of the challenge.
But that didn’t even get me the Bronze. Rather, that honour belongs to Sajan, who scooped up the consolation prize with this little beauty:
=MID(A1,MODE(MMULT((N(ISNUMBER(-MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)))={1,0})*(ROW(INDIRECT(“1:”&LEN(A1)))-{0,1}),{1;1}))+1,LEN(A1))
…which requires you to pre-emptively take a whole bottle aspirin AND get a good nights’ sleep before trying to wrap your head around it.
So go take that aspirin, take your pens out of your pocket, and hit the pillow, and I’ll see you back here this time tomorrowfor some Einsteinium formula relativity as we tackle this gem of an approach in Part 2.

About the Author.

Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:
=NOT(EVEN(PROPER(OR(RIGHT(TODAY())))))
That’s right, pure #VALUE!
Find out more at http://www.heavydutydecisions.co.nz

Comments from Hui:

Jeff has been a valuable contributor at Excel Hero Academy during the past two years and recently started posting at theChandoo.org Forums
This is Jeff’s First post at Chandoo.org and I am sure you will welcome his verbose narrative to these interesting Formula Challenges

0 comments:

Post a Comment