Welcome back to another gripping episode of “When good formulas go GREAT”. Just like the immortal combatants in the classic 1986 movie Highlander, over the last three posts in this series our Excel nerds have been locked in an ages-old battle to decapitate. A text-string, that is. Not each other.
So far we have seen some formidable formulas from these fearsome foes: Jeff: Formula Challenge 001 – Part 1 Sajan: Formula Challenge 001 – Part 2 Haseeb: Formula Challenge 001 – Part 3 But while we’ve witnessed a few intriguing battles, none of our defeated warriors ultimately had what it took to win the prize. So steel your nerves, grab a seat near the edge of the arena, and find out what brute force another contender can bring to bear on the problem in… …Part 4: The Sorcerer SamuraiBy way of a quick refresher, our formula challenge calls for a mighty Excel hero to slay a fearsome dragon – err, text string– and return triumphantly back to Court with its tail – err, substring – in time for the cook (that would be me) to serve it up to the King (that would be Chandoo) at the Feast of St. Christopher.Those dragons and associated tails look a little something like this:
You can download the challenge and full dataset here: Formula Challenge 1.4 So our brave Knights must wield their mighty sword – Excelibur – and lop off the bit in bold at the end. The winner of the challenge is the combatant who can do that slicing with as short a sword – err, formula – as possible. That winner receives his or her weight in gold. Fool’s gold. And as a wise Highlander one said about the winner of battles involving mass decapitations: “There can be only one”. But this will be harder than it looks. Those numbers are of variable length (and getting longer each year), and the fact that we are searching for a block of text and numbers that occurs after another block of text and numbers makes it very tricky indeed. Mwah ha ha. Well, the next contender for the prize is….SAJAN! Again!! You might remember Sajan the Magician’s vaguely coherent incantation of a formula from a few posts ago: =MID(A1,MODE(MMULT((N( Discontent with the mere Bronze that his wizardry brought him, Sajan’s kicked over his magic cauldron, drawn his trusty pen-knife of a formula from its dusty scabbard, and stabbed our string right through the heart. Dead. With this: =MID(A1,MATCH(1,MMULT(-ISERR(- Wow! From 133 characters down to 88! I can still hear the chants of “Long Live Sajan the Samurai” ringing in my ears from when he dragged that severed substring into the Chandoo.org forum and dumped it victoriously at the formula challenge thread’s feet . Err, footer. But you weren’t there to see it, were you? So let’s have our Court Jester (me) don his silly face, and re-enact the fight blow by blow, so to speak. En Garde! 1. Samurai Sajan sneaks up on the string and slices it into slender slivers:He carves the string into one-character off-cuts, and casts them into a couple of columns with his crescent-shaped cutlass:=MID(A1,ROW(OFFSET(A$1,,,LEN( { “N”, “Ne”; “e”, “ew”; “w”, “wZ”; “Z”, “Ze”; “e”, “ea”; “a”, “al”; “l”, “la”; “a”, “an”; “n”, “nd”; “d”, “d9″; “9″, “99″; “9″, “9A”; “A”, “Au”; “u”, “us”; “s”, “st”; “t”, “tr”; “r”, “ra”; “a”, “al”; “l”, “li”; “i”, “ia”; “a”, “a4″; “4″, “41″; “1″, “1″} So he’s taken the original 1D string,and split it apart into a 2D (i.e. two column) array with the help of the {1,2} bit. Both columns of the new array are almost the same, except all the elements in one of them is one character long, and all the elements in the other are two characters long. If entered over a range, here’s what that looks like: 3. He displays any of dem digits dat looks like a digit as a digit, using a dash:Currently Excel thinks that this array is just text – even the numbers in it. Putting a minus sign in the front of the array – or in fact doing any kind of mathematical operation on it – will force any numbers to wake up and act like numbers. (Unfortunately, it also freaks any actual text out…to the point that any bits of text whinges “I’m not a number, there’s been some kind of error”. Hence the #VALUE! errors below, along with the numbers.)= -{ “N”, “Ne”; “e”, “ew”; “w”, “wZ”; “Z”, “Ze”; “e”, “ea”; “a”, “al”; “l”, “la”; “a”, “an”; “n”, “nd”; “d”, “d9″; “9″, “99″; “9″, “9A”; “A”, “Au”; “u”, “us”; “s”, “st”; “t”, “tr”; “r”, “ra”; “a”, “al”; “l”, “li”; “i”, “ia”; “a”, “a4″; “4″, “41″; “1″, “1″} ={ #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -9, -99; -9, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -4, -41; -1, -1} Again, here’s how that looks if entered over an Excel range (with our original string split apart down the side by way of reference): 4. He hacked away at the resulting horrible herrors (err..errors) that happened due to this harsh handling:He turns any of those errors to TRUE and anything else to FALSE with this:=ISERR({ #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -9, -99; -9, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; #VALUE!, #VALUE!; -4, -41; -1, -1} ={ TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, FALSE} So now we have TRUE where there was text, and FALSE where there was numbers. Which looks like this: 5. He butchered those Boolean values into bite-sized bits, by binding a brutal minus sign on at the beginning:Just as putting a minus sign in the front in formula 3 above forced any numbers stored as text to act like numbers, putting a minus sign in front of an array of TRUE and FALSE values (otherwise known as Boolean values) or doing any other kind of mathematical operation on the array will turn those TRUE and FALSE values to numerical values – in this case because it is a minus sign they will turn to -1 and zero. Abracadabra.=-{ TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; TRUE, TRUE; FALSE, FALSE; FALSE, FALSE} ={-1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; 0, 0; 0, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1 ,-1; -1, -1; -1, -1; -1, -1; 0, 0; 0, 0} …which looks like this: Note something interesting…the highlighted row above falls just before the string that we’re after. And it’s the only row in the entire array that has a zero in the first column and a negative one in the second column. So if we can work out where that combination of zero and negative one falls, we know exactly where our desired substring starts. 6. He magically multiplies the first array column by one, and the second array column by minus one, then maniacally mashes them together en masse:He does this by using the MMULT function with a 2nd argument of {1;-1}.=MMULT(={-1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; 0, 0; 0, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1, -1; -1 ,-1; -1, -1; -1, -1; -1, -1; 0, 0; 0, 0},{1;-1}) ={0;0;0;0;0;0;0;0;0;0;0;1;0;0; …which looks like this: MMULT is really hard to explain. It’s kind of like an 2-dimensional SUMPRODUCT. (Maybe it’s exactly like a 2-dimensional SUMPRODUCT). There’s a great visual explanation at Mike Girvin’s ExcelIsFun YouTube channel athttp://www.youtube.com/watch? Given our array in this instance has two columns and our second argument of MMULT is {1;-1}, MMULT in this case effectively multiplies each number in the first column by positive one, multiplies each number in that 2nd column by negative one - and then adds the result together. Multiplying the first array by positive one does nothing to it. Multiplying the second array by negative one changes any existing negative values to positive. Adding the result together gets us back to a 1D array. Now, as per the highlighted row above, the number that falls just before the string that we’re after is the only line that has 1 in it. So while Sajan might have ditched sorcery for sword-play, there’s still yet an element of the dark arts in his repertoire. 7. He secures his sharp scalpel for some exploratory surgery on our string:Specifically, he uses the MATCH function to see what position that ’1′ occurs at, and then adds 1.=MATCH(1,{0;0;0;0;0;0;0;0;0;0; =13 Quick aside: I never knew until another competitor in the challenge – Elias – posted a formula in the actual formula challenge thread that if the third argument of MATCH is a comma followed by nothing else, Excel interprets this the same as if that third argument was FALSE or Zero – that is, Excel is looking for an exact match, not an approximate one. So that’s a handy way to shorten a formula if you’re answering a challenge (although I’d actively put the FALSE in in a real-world situation). So with a blank comma as the 3rd argument, all these are equivalent: =MATCH(3,{1,2,4,5},) =MATCH(3,{1,2,4,5},FALSE) =MATCH(3,{1,2,4,5},0) = #N/A And without the comma, these are equivalent: =MATCH(3,{1,2,4,5}) = MATCH(3,{1,2,4,5},TRUE) = MATCH(3,{1,2,4,5},1) = 2 Okay, back to the fight! 8. He culminates all this cutting with a callous coup de grâce to the condemned creature:=MID(A1,13,6^6)=Australia41 That 6^6 bit is just a short way to write a long number. We need a number in that argument that’s long enough so that all possible substring lengths are covered. Sajan could use LEN(A1), but that would take 7 characters, whereas 6^6 is only three characters. The maximum amount of characters that Excel 2007 or later will let you put in a string is 32,767 so given that 6^6 = 46,656 this will be more than enough. Huzzah! How’s that for swordsmanship, eh? Hardly a fair fight…our poor string never really stood a chance against our mighty Samurai. Well, that wraps up the joust for today. But stay tuned…there’s two super-short excellent approaches that have been proposed since this series of posts began, that definitely have to be seen to be believed. I’ll cover them off in a future post. About the Author.Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:=NOT(EVEN(PROPER(OR(RIGHT( That’s right, pure #VALUE! Find out more at http:www.heavydutydecisions. |
Monday, July 22, 2013
Formula Challenge 001 – Return everything from a string after the first block of numbers (Part 4.)
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment