Ads 468x60px

Monday, September 30, 2013

I said your spreadsheet is really FAT, not real PHAT!

Howdy folks. Jeff Weir here, borrowing the keys of Chandoo’s blog so I can drive home a serious public service announcement.
At Chandoo’s excellent post What are best Excel interview questions? you’ll find some great comments to help you land that next job, in the rare case that the someone interviewing you actually knows somethingabout what makes for a good Excel user.
Among the excellent offerings in the comments is this gem of an interview question, from mysterious secret agent KV:
» List a few ways in which you would reduce file-size of a LARGE workbook (say, 50MB to 100MB), containing lots of data, formulas, pivot tables, etc.
Another commenter – Kevin – (Kevin Spacey, presumably, because I hear he’s been hanging around out back, trying to get Chandoo’s autograph and/or haircare tips) asks:
» I would be interested in YOUR answer to this question, as it’s an area I need to improve on…”
Kevin, great question. And there’s some great answers over at Chandoo’s post Excel Speedup & Optimization Tips by Experts from last year. Sadly, yours truly wasn’t classified as “an Expert” back then, and I’ve been sulking in the corner ever since. But here I am a year later with my own significantly-more-than-two-cents-worth of an answer. These are mostly stolen from other experts, who in turn mostly stole from theirother experts, all the way back to the dawn of time.
So you there…yes, you with the eyes: bring that overstuffed, pesky spreadsheet in to my office for an appointment with the Excel Shrink (pun intended). Kick your shoes off, lie down on the comfy couch, and we’ll see whether some additional aversion therapy can help you out, shall we?

How much of your file is raw data, and how much is raw bloat?

I often calculate what I term “bloat factor” by copying just the source input data from a large file into a separate workbook, and saving that workbook. Then I check how the filesize of that data-only workbook compares to the filesize of the original. Sometimes I’ve seen such bloat exceed a ratio of 100:1! What a whale!

Burn of some of those extra calories by thinking really hard…

…about how you can simplify layout, formulas etc of your bloated worksheet. Ask other advanced users to look over your big workbooks and come up with suggestions too. The Chandoo Forum is a great place to get a second opinion. Did I say second? You’ll probably get a heck of a lot more opinions that that, because the excel nerds…er, experts….over there will actually compete to give you the best advice.

Ten Thousand Thundering Formulas…don’t overload my ship.

Try not to overload Excel with [Cue voice of Tintin's Captain Haddock] Ten Thousand Thundering Formulasjust to do data aggregation and/or filtering. Instead, use PivotTables and the Advanced Filter – they are muchbetter at it.
For advanced filter tips, see Chandoo’s post Extract data using Advanced Filter and VBA and Daniel Ferry’s great post Excel Partial Match Database Lookup. (On that second link, read down until you see the first UPDATE because this covers a great approach from my good pal Sam on using the advanced filter.)
Oh, and Deb Dalgleish’s Excel Advanced Filter Introduction. No wonder Canada’s water is so pure, Deb…you’ve filtered the heck out of it.

Don’t Don’t store store pivottable pivottable data data twice twice.

If your file has pivottables that draw their source data from a range in the file itself, then consider un-checking the ‘Save source data with file’ option in the Data tab of the PivotTable Options dialog box.
  • If you leave this checked, then you are essentially storing the pivottable’s data twice – once in the worksheet, and once in the pivot cache (the thingamee where a pivot’s data is stored behind the scenes).
  • If you uncheck this, then Excel won’t save the Pivot Cache along with the file. Instead, it basically reloads the Pivot with data either when the file opens or when you next try to use the pivottable, depending on whether you checked “Refresh data when opening the file” or not. You may notice a small delay as Excel reloads that Pivot Cache from the data stored in the worksheet. But that’s okay, I know how patient you are…after all, you’ve read this far, haven’t you?
Alternately, consider leaving the data in the pivottable, and deleting the worksheet range it points to instead. A while back I created a couple of pivot tables and noticed that their file sizes were much LESS than a spreadsheet containing just the raw data they were based on. For instance, I filled three entire columns (which added up to 3.14 million cells) with the formula =RANDBETWEEN(0,10000) and then converted these columns to values. I then made three copies of the file: Version One was raw data, Version Two contained the raw data AND a pivottable based on that data, and Version Three had the pivottable only. Here’s the resulting file sizes:
  • Raw data only: 26.4 MB
  • Raw data AND a pivottable that uses the data: 39.1 MB
  • Pivot table only: 12.6 MB
Wow! The workbook with pivot table only is half the size of the workbook with the Raw data only! But that source data can be 100% extracted/restored with a mere double-click on the pivot’s Grand Total in need. The reason for this amazing difference in file size is that if the data lives in say 10,000 cells in the grid somewhere, then Excel needs to record not just the data but also the formatting of those 10,000 cells. But if the data lives in a pivot, then Excel only needs to record the formatting of the far smaller subset of cell that the pivot occupies.
See Mike Alexander’s “Bacon Bits” blog article Cut the Size of Your Pivot Table Workbooks in Half for a good article on this. Mike might never cut back on bacon, but he sure knows how to trim down on data ;-)

Only bring through the data you’ll actually consume, greedy guts!

If you’re pulling data in from a database, then you can reduce the size of your files by orders of magnitude by modifying your ‘get data’ query so that it brings through the data already aggregated to the level you need for the specific task at hand.
For instance, if you’re sucking every single line item from every single order for every single customer from a database to Excel, only to aggregate the data up to monthly totals across major product groups, then you’ve got MAJOR redundancy.
You should filter and aggregate at the database end wherever possible. An exception might be when you’re examining data with fresh eyes, and you’re just pulling in a whole heap of different fields to “see what you can see”. This is what I’ve been doing at the place I’m currently contracted to, because I’m trying to find the best metrics and the best patterns from scratch. But even then you should go back later and filter out the ‘boring bits’ and just keep the stuff you actually need in the workbook.

ZIP it, wiseguy.

Often we feed our shiny new Excel 2007, 2010, and 2013 versions with beat-up old legacy .xls files created way back in 2003 or earlier. Those unwieldy, beaten up old wrecks could really use a tune-up.
Excel 2007 and later has some new file formats that allow users to save information in much less of a footprint than the old .XLS extension, becaues the new .XLSX, .XLSM, and .XLSB extentions are a set of related parts stored in a zip container.
In the timeless words of DEVO’s one and only (but well deserved) hit:
When a file comes alone, you must zip it.
Before the file sits out to long, you must zip it.
When something’s going wrong, you must zip it.
By way of example, I took a file that ran to 31MB in the old .XLS format. When I converted it to a .XLSX file-type, the size dropped to 7 MB. That’s a 4-fold decrease from the original file-size!
But wait, there’s more. Or rather, there’s much, much less: When I saved it as a .XLSB format, it only ran to 3.9 MB. That’s an 8-fold decrease from the original file-size.
What’s an .XLSB?, you ask. (I have good ears). It is similar to the Office Open XML format in structure – a set of related parts, in a zip container – except that instead of each part containing XML, each part contains binary data. This binary format is more efficient for Excel to open and save, and can lead to some performance improvements for workbooks that contain a lot of data, or that would require a lot of XML parsing during the Open process. (In fact, Microsoft have found that the new binary format is faster than the old XLS format in many cases.) Also, there is no macro-free version of this file format – all XLSB files can contain macros (VBA and XLM). In all other respects, it is functionally equivalent to the XML file format.
That’s progress for you!

Don’t be such a copycat, copycat!

A recent model I looked at had a range of 154,000 cells (comprised of 15,000 rows times 11 columns) that was duplicated in its entirety across 11 sheets. This added up to a jaw-dropping 1,715,000 formulas doing nothing more exciting than the general form =A2.
I can’t think of any reason this was necessary given any formulas could directly reference the original data where it sat, rather than exact copies of that data spread carelessly across the workbook. I restructured it so that aggregation formulas (e.g. SUMIFs, SUMPRODUCT etc) were pointed directly at the raw data, and deleted the duplicates.
Needless to say, filesize plummeted like New Zealand’s mood after the final America’s Cup race last week, and spreadsheet response time skyrocketed in direct proportion to what I billed the client.

What? Me? Deranged?

Well, obviously. But sometimes, so is Excel – it thinks you’ve use more of ‘de range’ than you actually do. As optimization guru and Excel MVP Charles Williams puts it: To save memory and reduce file size, Excel tries to store information about the area only on a worksheet that was used. This is called the used range. Sometimes various editing and formatting operations extend the used range significantly beyond the range that you would currently consider used. This can cause performance obstructions and file-size obstructions.
You can check what Excel thinks is the used range by pushing Ctrl + End. If you find yourself miles below or to the right of where your data ends, then delete all the rows/columns between that point and the edge of your data:
  • To quickly do the rows, select the entire row that lies beneath the bottom of your data, then push Ctrl + Shift + Down Arrow (which selects all the rows right to the bottom of the spreadsheet) and then using the Right-Click DELETE option.
  • For columns, you would select the entire column to the immediate right of your data, and use the using Ctrl + Shift + Right Arrow to select the unused bits, and then use the Right-Click DELETE option.
(Note that you’ve got to use the Right-Click DELETE option, and not just push the Delete key on the keyboard.)
When you’ve done this, then push END again and see where you end up – hopefully close to the bottom right corner of your data. Sometimes it doesn’t work, in which case you need to push Alt + F11 (which opens the VBA editor) and type Application.ActiveSheet.UsedRange in the Immediate Window and then pushing ENTER (and if you can’t see a window with the caption “Immediate” then push Ctrl G).

Pssst. Read this.

Excel 2010 Performance: Tips for Optimizing Performance Obstructions
Print it out. Read it. Burn it. No, not with a match, wiseguy. Burn it into your memory. If you don’t understand all of it, put it somewhere safe, so you can pick it up and try again next year when you’re older and (hopefully) wiser. Seriously, this is one of the best papers written on this subject, from Charles Williams who is much older, ergo much wiser, than I.

Handle sweaty Dynamite and Volatile Functions with extreme care…

…because both of them will explode at the slightest touch. If you have volatile functions in your workbook, any time you make a change anywhere at all on the spreadsheet Excel recalculates the value of all those volatile functions too. Excel then recalculates every applicable formula downstream of these functions too – even though most probably nothing changed. Volatile functions include OFFSET, INDIRECT, RAND, NOW, TODAY, and my personal favorite, MEDICATION.
This is worth a demonstration. Say you have the function =TODAY() in cell $A$1. Obviously that value is only ever going to change once per day, at midnight. Say you have ten thousand formulas downstream of $A$1 – that is, they either refer directly to $A$1 or to one of $A$1’s dependents. Those ten thousand formulas will get recalculated each and every time any new data gets entered anywhere on the spreadsheet, even though the value of $A$1 itself only changes one per day!
For this reason, too much reliance on volatile functions can make recalculation times very slow. As Charles Williams says: Use them sparingly. Try to get out of the habit of using them at all. There are usually alternatives to every volatile function.
One particularly attractive Volatile function is the siren-like INDIRECT, which has lured many a spreadsheet sailor towards it with a promise of an enchanting liason, merely for said sailors to smash their models on the rocky coast of recalculation. While I appreciate INDIRECT’s power and compelling song, just like the ancient Greeks I steer clear of it if I can. Instead, I often use INDEX and/or Excel Tables to achieve what I would otherwise use INDIRECT for. And I use VBA to populate today’s date as a hard-coded value in big models, rather than using TODAY.

See the error of your ways.

Many of you who cut your teeth on Excel 2003 or earlier are probably still using this to handle errors:
IF(ISERROR(Some_Complex_Formula), Alternate_Formula, Some_Complex_Formula)
A word to the wise: Don’t. Why? Because Cobbling together an IF and an ISERROR is computationally intensive:
  • Excel evaluates the Some_Complex_Formula bit purely to see if it returns an error or not. It doesn’t even bother to remember the actual answer…it just wants a straight “Is this an error…Yes or No?
  • If there is NO error, then Excel thinks “Great, no error. Now…what was the answer again? Damn, I didn’t think to store it. *Sigh*. I’ll just work it out again.”. So even though things went swimminglywell with the Some_Complex_Formula bit the first time around, it gets evaluated again, which is damn resource intensive, if not downright irresponsible.
  • If there IS an error, then the Alternate_Formula bit gets evaluated instead. That Alternate_Formula is often just a zero in the case that you’re trying say to work out some percentages, and are trying to avoid a Divide By Zero error. But even in this very simple Alternate_Formula case, Excel has had to evaluate TWO functions in order to return that zero: an ISERROR check on theSome_Complex_Formula bit as well as an IF formula. Again, pretty resource intensive.
Now here’s the thing, my erroneous friends: Microsoft introduced a new formula IFERROR in 2007 which is much more efficient: IFERROR(Some_Complex_Formula, Alternate_Formula). But for some unfathomable reason, lots of you still aren’t using it! *Jeff wagging finger* Tut, tut, you naughty, naughty analysts.
This new IFERROR formula rocks, because it cuts the processing down by more than half:
  • the Some_Complex_Formula bit is only ever evaluated once
  • The Alternate_Formula bit only gets evaluated if the Some_Complex_Formula bit throws an error.
  • Excel only needs ONE function to do all this, as opposed to the ISERROR and IF duo it must contend with in the previous example
In fact, Chandoo once had such a crush on IFERROR that he worried his wife would become jealous of this new flame!

Don’t check your underpants for bad data every five minutes.

Instead, just check them for embarrassing errors one a day.
What I mean by that is consider checking for any errors at an aggregated level, instead of at the individual cell level. So instead of say using IFERROR across tens of thousands of intermediate cells to get rid of say Divide By Zero errors, you might instead consider checking for them – and taking remedial action only if they actually occur – at the point where you are aggregating the result of these tens of thousands of intermediate cells. End result: one IFERROR instead of tens of thousands. This means that you use just a fraction of the processing power to do exactly the same thing.
Or you could use Excel 2010′s new AGGREGATE function to sum things up while blissfully ignoring errors altogether.

Don’t break a date, or you’ll end up in separate cells.

Don’t make the mistake of breaking tens of thousands of dates down so that quarter is stored in one cell, and the year in another. Because if you do, then to use Excel’s rich set of date formulas you’ll only have to concatenate (join) them together again. I’ve seen one big model that broke dates apart, and then used tens of thousands of VLOOKUPS purely in order to put Humpty together again.
So keep them as dates. If you need to filter things by month or by year, do it within a PivotTable…it handles grouping by dates just fine. Or use the Advanced Filter.

Get a smaller lunchbox that matches the size of your lunch.

Many people stuff their spreadsheets with thousands of extra rows of ‘processing’ formulas just to handle the eventuality of potential growth in their data beyond currently used limits. Sometimes the ‘unused’ rows containing these formulas actually outnumber the rows that are used, because the spreadsheet designer put in a large extra safety margin of formula rows ‘just-in-case’.
Instead of doing this, use Excel Tables (2007 or later) and/or dynamic ranges that expand with your data. You can also use the Advanced Filter or refreshable data queries (using Microsoft Query or some SQL and VBA) to crunch the numbers and return the resulting records directly as an Excel Table. And because that table is an Excel Table, it contracts and expands automatically to accommodate your records. Magic! Withzero redundancy.

Grand Piano taking up too much space? Try a Piano Accordion instead!

[Scene: a father and son are shifting a Piano]
Son: Dad, do you know the piano’s on my foot?
Dad: You hum it, son, and I’ll play it.
Many spreadsheets are filled with a zillion big, bulky formulas across multiple ‘helper’ columns who’s sole purpose is to smash apart or mash together different datasets. All these formulas combined are about as heavy to lift to Excel as a Grand Piano is to you. Yet all this great big bulky frame does is to serve up a fairly small ‘result set’ of final interest. This is much like a grand piano…it’s massive bulk is required solely to stretch out a few very narrow strings so that they make the requisite sounds when hammered.
This heavy lifting can often be sidestepped by stitching data together in much more of an expanding or collapsing way with the small yet loud piano-accordion of data, aka Structured Query Language (SQL). [Cue Cajun music]
SQL is basically a database language use to perform the database equivalent of lookups and to crunch numbers, or to conditionally join large datasets based on multiple complex conditions. SQL can be directly leveraged by Excel with minimal programming. Heck, you can use SQL to do stuff with NO programming whatsoever via Microsoft Query – a handy (if ancient) little interface bundled into Excel that will look familiar to any Access users.
For an excellent Excel-centric introduction to SQL, read Craig Hatmaker’s amazing Beyond Excel: VBA and Database Manipulation blog. Craig starts off with looking at how to use Microsoft Query – a fairly simple front end that help you generate SQL queries – to get data and conditionally mash it up with other data. Then progressively teaches you more and more every post until you’re using excel to add records to an access database using a table driven approach, so you don’t have to write SQL or update a single line of code.
Chandoo also has a great guest post by Vijay – Using Excel As Your Database – on this subject. Ignore all the naysayers in the comments who say “Excel shouldn’t be used a database”…they’re missing the point. Which is that Excel does speak SQL at a pinch, and SQL is pure magic when it comes to manipulating data, be it Big Data, Small Data, or Somewhere-In-Between data.
Okay, therapy session over. Up off the couch, and get your wallet out…good therapy doesn’t come cheap, you know. And on your way out, leave a comment below about what you think. Best comment wins an all-expenses-paid bloated spreadsheet. I’ve got a million of them here to give away…

About the Author.

Jeff Weir – a local of Galactic North up there in Windy Wellington, New Zealand – is more volatile than INDIRECT and more random than RAND. In fact, his state of mind can be pretty much summed up by this:
=NOT(EVEN(PROPER(OR(RIGHT(TODAY())))))
That’s right, pure #VALUE!
Find out more at http:www.heavydutydecisions.co.nz

Friday, September 27, 2013

What are best Excel interview questions? [survey]



Hello folks…
Time for a fun & useful survey. This time lets talk about Excel Interview Questions.
Best Excel Interview QuestionsMany of you are silently becoming awesome in Excel, data analysis, charting, dashboard reporting, VBA, Power Pivot and business skills, thanks to all the time you spend on Chandoo.org. I am sure there will be a day in near future, when you have to face another interview and be selected for a challenging, fun & high paying role.
Likewise, there is also a significant portion of you who are too good in your job that you will become a senior manager, VP or CXO, or better still start your own business. When the tables have turned, you will be the one looking for smart, dedicated, talented and fun individuals to join your team and make you look even more awesome.
So my question for both prospective interviewees and wannabe Excel pros,
According to you, what are the best Excel interview questions?
I will go first.

My top 5 Excel interview questions

Assuming I am looking for an analyst who can take any data dumped at her and turn it in to insights, actionable statements or management summaries, I will ask her below questions for sure.
  1. How do you lookup particular items from a large data set? Discuss various approaches, why they work, where they fail using this example data set.
  2. Can you make this ugly, confusing model in to usable, simple & elegant one?
  3. Assuming you have sales data of various products in several regions over the last 36 months, what kind of charts you prepare to help us understand what is going on and where we should focus?
  4. Can you analyze the same sales data using pivot tables. Can you compare pivot analysis with formula approach and comment?
  5. How well do you know about our business? Can you create a high level model of how we make money in Excel? After I explain how we make money in words of course.
  6. Bonus question: Which resources you use to keep yourself ahead of others in this position. (websites, books, training programs etc.)

What about you?

Go ahead and share your best interview questions. Use your experience as an interviewer or interviewee.  Click below link.
Take up Excel interview questions survey.

Have an interview coming up soon?

Then brush up your skills. Start with these pages and let your curiosity go wild.

Friday, September 20, 2013

7 reasons why you should get cozy with INDEX()



Today lets get cozy. Lets start a fling (a very long one). Lets do something that will make you smart, happy and relaxed.
Don’t get any naughty ideas. I am talking about INDEX() formula.
INDEX formula - Usage, examples & Tips
INDEX?!?
Of all the hundreds of formulas & thousands of features in Excel, INDEX() would rank somewhere in the top 5 for me. It is a versatile, powerful, simple & smart formula. Although it looks plain, it can make huge changes to the way you analyze data, calculate numbers and present them. It is so important that, whenever I teach (live or online), I usually dedicate 25% of teaching time to INDEX().
Enough build up. Lets get cozy with INDEX.

Understanding INDEX formula

In simple terms, INDEX formula gives us value or the reference to a value from within a table or range.
While this may sound trivial, once you realize what INDEX can do, you would be madly in love with it.

Few sample uses of INDEX

1. Lets say you are the star fleet commander of planet zorg. And you are looking at a list of your fleet in Excel (even in other planets they use Excel to manage data). And you want to get the name of 8th item in the list.
INDEX to rescue. Write =INDEX(list, 8)
2. Now, you want to know the captain of this 8th ship, which is in 3rd column. You guessed right, again we can use INDEX,
=INDEX(list, 8,3)

Syntax of INDEX formula

INDEX has 2 syntaxes.
1. INDEX(range or table, row number, column number)
This will give you the value or reference from given range at given row & column numbers.
2. INDEX(range, row number, column number, area number)
This will give you the value or reference from specified area at given row & column numbers.
It may be difficult to understand how these work from the syntax definition. Read on and everything will be clear.

7 reasons why INDEX is an awesome companion

Whether you are in planet zorg managing dozens of star fleet or you are in planet earth managing a list of vendors, chances are you are wrestling everyday with data, pleasing a handful of managers (and clients), delivering like a rock star all while having fun. That is why you should partner with INDEX. It can make you look smart, resourceful and fast, without compromising your existing relationship with another human being.

Data used in these examples

For all these examples (except #6), we will use below data. It is in the table named sf.
Data used in INDEX formula examples

Reason 1: Get nth item from a list

You already saw this in action. INDEX formula is great for getting nth item from a list of values. You simply write =INDEX(list, n)

Reason 2: Get the value at intersection of given row & column

Again, you saw this example. INDEX formula can take a table (or range) and give you the value at nth row, mth column. Like this =INDEX(table, n, m)

Reason 3: Get entire row or column from a table

For some reason you want to have the entire or column from a table. A good example is you are analyzing star fleet ages and you want to calculate average age of all ships.
You can write =AVERAGE(age column)
or you can also use INDEX to generate the age column for you. Assuming the fleet table is named sf and age is in column 7
write =AVERAGE(INDEX(sf, ,7))
Notice empty value for ROW number. When you pass empty or 0 value to either row or column, INDEX will return entire row or column.
Likewise, if you want an entire row, you can pass either empty or 0 value for column parameter.

Reason 4: Use it to lookup left

By now you know that VLOOKUP() cannot fetch values from columns to left. It does not matter if the person looking up is the star fleet commander.
But INDEX along with MATCH can fix this problem.
Lets say you want to know which ship has maximum capacity.
  1. First you find what is the maximum capacity =MAX(sf[Capacity (000s tons)])
  2. Then you find position of of this capacity in all values =MATCH(max_capacity, sf[Capacity (000s tons)],0)
  3. Now, extract the corresponding ship name =INDEX(sf[Ship Name], max_capacity_position)
Or in one line, the formula becomes
=INDEX(sf[Ship Name], MATCH( MAX(sf[Capacity (000s tons)]), sf[Capacity (000s tons)], 0))
For more tips read using INDEX + MATCH combination

Reason 5: Create dynamic ranges

So far, your reaction to INDEX’s prowess might be ‘meh!’. And that is understandable. You are of course star fleet commander and it is difficult to please you. But don’t break-up with INDEX yet.
You see, the true power of INDEX lies in its nature. While you may think INDEX is returning a value, the reality is, INDEX returns a reference to the cell containing value.
So this means, a formula like =INDEX(list, 8) looks like it is giving 8th value in list.
But it is really giving a reference to 8th cell.
Since the result of INDEX is a reference, we can use INDEX in any place where we need to have a reference.
Sounds confusing?
For example, to sum up a list of values in range A1:A10, we write =SUM(A1:A10)
Now, in that formula, both A1 and A10 are references.
Since INDEX gives a reference, we can replace either (or both) A1 & A10 with INDEX formula and it still works.
so =SUM(A1 : INDEX(A1:A50,10))
will give the same result as =SUM(A1:A10)
Although the INDEX route appears overly complicated, it has other applications.
Example 1: SUM of staff in first x ships
Lets say you want to sum up staff in first ‘x’ ships in the sf table.
Since ‘x’ changes from time to time, you want a dynamic range that starts from first ship and goes up to xthship.
Assuming ‘x’ value is in cell M1 and first ship’s staff is in cell G3,
=SUM(G3:INDEX(sf[Staff count], M1))
will give the desired result.
Example 2: A named range that refers to all ship names in column A
Many times you do not know how much data you have. Even star fleet commanders are left in dark. Lets say you are building a new ship tracking spreadsheet. Since your fleet is ever growing, you do not want to constantly update all formulas to refer to correct ranges.
For example, the ship names are in column A, from A1 to An. And you want to create a named range that points to all ships so that you can use this name elsewhere.
If you define the lstShips =A1:A10, then after you add 11th ship, you must edit this name. And you hate repetitive work.
One solution is to use OFFSET formula to define the dynamic range,
like =OFFSET(A1, 0,0, COUNTA(A:A),1)
While this works ok, since OFFSET is volatile function, it will recalculate every time something changes in your workbook. Even when someone replaces a bolt on landing gear of USS Enterprise.
This will eventually make your workbook slow.
That is where INDEX comes.
You see, INDEX is a non-volatile function*.
So you can create lstShips that points to,
=A1: INDEX(A:A, COUNTA(A:A))
*Even though INDEX is non-volatile, since we are using it in defining a range reference, Excel recalculates the lstShips every time you open the file. (reference).

Reason 6: Get any 1 range from a list of ranges

INDEX has another powerful use. You can get any one range from many ranges using INDEX.
Since you are a successful, smart & resourceful star fleet commander, you got promoted. Now you manage fleet of several planets.
And you have similar ship detail tables for each planet in a workbook. And you want to calculate average age of any planet’s ships with just one formula.
Again INDEX to rescue.
Using INDEX formula to get one of many ranges
Assuming you have 3 different tables – planet1, planet2, planet3
and selected planet number is in cell C1,
write =AVERAGE(INDEX((planet1,planet2,planet3),,,C1))
The reference (planet1,planet2,planet3) will point to all data and C1 will tell INDEX which planet’s data to use.
Pretty nifty eh?!?

Reason 7: INDEX can process arrays

INDEX can naturally process arrays of data (without entering CTRL+Shift+Enter).
For example you want to find out how much staff is in the ships whose captain’s name starts with “R”.
write =SUM(INDEX((LEFT(sf[Captain],1)=“r”)*(sf[Staff count]),0))
Although LEFT(sf[Captain],1)=”r” and sf[Staff count] produce arrays, since INDEX can process arrays automatically, the result comes without CTRL+Shift+Enter
Where as if you use SUM alone =SUM((LEFT(sf[Captain],1)=”r”)*(sf[Staff count])) you have to press CTRL+Shift+Enter to get correct results.
Other formulas: SUMPRODUCT & MATCH too can process arrays automatically.

Download Example Workbook & Get close with INDEX

Since you are going to ask, “I want to spend sometime alone with INDEX in my cubicle right now!”, I made an example workbook. It explains all these powerful uses of INDEX. Go ahead and download it.
Get busy with INDEX.

Why do you love INDEX?

I love INDEX(). If we get a dog, I am going to call her INDEX. That is how much I love the formula. Almost all my dashboards, complex workbooks and anything that seems magical will have a fair dose of INDEX formulas.
What about you? Do you use INDEX formula often? What are the reasons you love it? Please share your tips, usages and ideas on INDEX using comments.

Learn more about INDEX & other such lovely things in Excel

If you are whistling uncontrollably after reading so far, you are in for a real treat. Check out below articles to become awesome.

How to Solve an Equation in Excel


This week at the Chandoo,org ForumsUsman asked,
“ I have a curve. I did its fitting using Excel and got an equation.
y = 2E+07x^-2.146
For y=60 what will be the value of x?
How can we solve this equation using Excel? ”
Lets look at how this can be solved using Excel.

Define the Problem

Usman formula is y = 2E+07x^-2.146
or expanded y = 2*10^7*x^-2.146
We can use Excel’s Goal Seek function to assist us here
Goal Seek is located in the DataWhat-If AnalysisGoal Seek menu
GS00
Goal Seek is an inbuilt function in Excel that searches for a solution to a model/formula by iteratively trying source cell values until a solution is found.
Before we start, Excel doesn’t understand the concepts of x and y, but we can use cells for these instead
To use Goal Seek we need to put our formula into a cell.
Start a new file and in C3 (our y cell) type:  = 2*10^7*B3^-2.146
In B3 (our x cell): Put a value say 10
GS02
Note that C3 will show the solution of the formula for when x=10 or = 2*10^7*10^-2.146 = 142,899.277
GS01

Using Goal Seek

To use Goal Seek to find what value of x (B3) will result in y (C3) = 60,
Select C3
Goto the DataWhat-If AnalysisGoal Seek menu
GS03
Set Cell: C3  – This is our y value cell
To value: 60  This is the value we want to achieve
By changing cell: B3 – This is our x value cell
Click OK when ready
GS04
Excel shows us that it has found a solution and that (C3=60 when x (B3) = 374.60
Select OK to save the result
Select Cancel to return to the previous value
You can download a sample of the above here: Download Sample File

How have you solved Formulas using Excel or other techniques

How have you solved Formulas using Excel or other techniques?
Let us know in the comments below:
Learn more about Goal seek and solver:

Monday, September 16, 2013

Using Array’s To Update Table Columns


We are almost daily creating a lot of reports and these reports contain a lot of data which is presented in various styles as per the requirements. The data that allows us to create the reports is usually referred as raw data and in most of the cases is stored in hidden sheets.
I am sure you all are aware of a feature called as Excel Tables OR Structured References in Excel. Excel Tables is (in my opinion) the best way to store your raw data and put Formulas in the columns where necessary, this way you eliminate the need of a Cell Based Reference formula (example =SUM(B4:B50) and replace them with =sum(YourTable[YourTableColumnName]).
Another good feature of the Excel Tables is you just need to put the formula in 1 cell and it is replicated for that column by Excel.
Sometimes these formulas take a lot of time to calculate when we have really huge data points. In this scenarios it is better to have hard-coded values instead of the formulas to gain on speed.
In this post we will learn about how we can make use of Array’s to quickly populate the excel columns with the desired results before publishing our reports and other documents.
Here is a demo of what I mean:
using-array-to-update-table-columns
Below is the code that allows us to add a new column to our data table and then taking input from the Date Time column provides us with the Week Of column.

Sub UpdateWeek()
Dim myarray As Variant
Dim theRange As Range, startCellRow As Long
Dim tempStr As String
Dim myNewCol As ListColumn

‘If our column already exists then delete it
On Error Resume Next
Worksheets(“Data”).ListObjects(“cs”).ListColumns(“WeekOf”).Delete
‘adding our new column
Set myNewCol = Worksheets(“Data”).ListObjects(“cs”).ListColumns.Add
myNewCol.Name = “WeekOf”
‘Selecting the first cell of the column that contains our dates
Worksheets(“Data”).ListObjects(“cs”).ListColumns(“Date Time”).Range.Cells(2).Select
‘building a temporary Range address, this will be used to upload the entire range into the array
tempStr = ActiveCell.Address
startCellRow = ActiveCell.Row
tempStr = tempStr & “:$” & Mid(Sheets(“Data”).ListObjects(“cs”).ListColumns(“Date Time”).Range.Cells(2).Address, 2, 1) & “$”
tempStr = tempStr & LastRowInOneColumn(Mid(Sheets(“Data”).ListObjects(“cs”).ListColumns(“Date Time”).Range.Cells(2).Address, 2, 1))
‘loading the range into the array
myarray = Range(tempStr).Value
‘Looping through the array and converting each element to the relevant Week format
For i = LBound(myarray) To UBound(myarray)
myarray(i, 1) = Format(myarray(i, 1) – Weekday(myarray(i, 1), vbMonday) + 1, “ddd dd-mmm”)
Next
‘Setting the range address for our output column
Set theRange = Range(Cells(startCellRow, Worksheets(“Data”).ListObjects(“cs”).ListColumns(“WeekOf”).Range.Column), Cells(UBound(myarray) + (startCellRow – 1), Worksheets(“Data”).ListObjects(“cs”).ListColumns(“WeekOf”).Range.Column))
‘storing the values from our array to the WeekOf Column
theRange.Value = myarray
End Sub

Let’s Understand the code

We first delete the column if it is already existing to make sure we always get the new values as output. This is done by the below line of code.

Worksheets("Data").ListObjects("cs").ListColumns("WeekOf").Delete

Once we have deleted the column, we add it again as a blank column and change the name to “Week Of”.

Set myNewCol = Worksheets("Data").ListObjects("cs").ListColumns.Add
myNewCol.Name = "WeekOf"

After this we need to select the first cell of the column that contains the Date Time.

Worksheets("Data").ListObjects("cs").ListColumns("Date Time").Range.Cells(2).Select

Once we have selected the first cell of you Date Time column we then make use of the LastRowInOneColumn function to get the last row and create a range address. We use this range address to assign all the values contained in the Date Time column to an array.

tempStr = ActiveCell.Address
startCellRow = ActiveCell.Row
tempStr = tempStr & ":$" & Mid(Sheets("Data").ListObjects("cs").ListColumns("Date Time").Range.Cells(2).Address, 2, 1) & "$"

tempStr = tempStr & LastRowInOneColumn(Mid(Sheets(“Data”).ListObjects(“cs”).ListColumns(“Date Time”).Range.Cells(2).Address, 2, 1))
‘loading the range into the array
myarray = Range(tempStr).Value
Once we have loaded all the Date Time values into an array, we do a simple For loop to change the value in the array to the relevant Week Of

For i = LBound(myarray) To UBound(myarray)
myarray(i, 1) = Format(myarray(i, 1) - Weekday(myarray(i, 1), vbMonday) + 1, "ddd dd-mmm")
Next

We perform this operation on the same element and store the modified value in itself.
Once we have all these done, we need to define the Output range, that is where we need to the Week Of values to be stored. This is done by using the Range and Cell functions.

Set theRange = Range(Cells(startCellRow, Worksheets("Data").ListObjects("cs").ListColumns("WeekOf").Range.Column), Cells(UBound(myarray) + (startCellRow - 1), Worksheets("Data").ListObjects("cs").ListColumns("WeekOf").Range.Column))
'storing the values from our array to the WeekOf Column

theRange.Value = myarray
And lastly we assign all the values stored in the array to the new range address we have create above.

Download Demo File

Click here to download the demo file & use it to understand this technique.
What about you? Do you use them often? Please share your experiences, techniques & ideas using comments.
If you are new to VBA, Excel macros, go thru these links to learn more.

Join our VBA Classes

If you want to learn how to develop applications like these and more, please consider joining our VBA Classes. It is a step-by-step program designed to teach you all concepts of VBA so that you can automate & simplify your work.
Click here to learn more about VBA Classes & join us.

About Vijay

Vijay (many of you know him from VBA Classes), joined chandoo.org full-time this February. He will be writing more often on using VBA, data analysis on our blog. Also, Vijay will be helping us with consulting & training programs. You can email Vijay at sharma.vijay1 @gmail.comIf you like this post, say thanks to Vijay.

Friday, September 6, 2013

Calculating average of every nth value [Formula tips]

Lets say you have a large list of numbers, and you want to calculate the average of every nth value. Not the average of all numbers, but just every nth number.
That is what we will learn in next few minutes.

Few assumptions

Before we jump in to any formulas, first lets assume that all your data is in a table, conveniently named as tbl. Lets say this table has below structure.
Average of every nth value - calculating using Excel formulas.
Also, the value of n is a named cell N.

Average of every nth value

Approach 1: Using helper columns

Average of every nth value - calculated using Helper columnIf you have no allergies towards nuts, dairy or helper columns, then this approach is easiest.
We just add an extra column to our tbl , called as helper.
In the helper column, write this formula.
=MOD([@ID], N)=0
This will fill the helper column with TRUE & FALSE values, TRUE for all nth values, FALSE for everything else. See aside.
Once we have the helper column, calculating average of every nth value is easy as eating every slice of a cake.
We use AVERAGEIF to do this.
=AVERAGEIF(tbl[Value],tbl[Helper],TRUE)

Approach 2: Not using helper columns

Now things get interesting. Lets say you want to calculate average, but not use any helper columns.
First the formula:
=AVERAGE(IF(MOD(tbl[ID], N)=0,tbl[Value]))
Array entered.
Lets understand how it works:
We want the average of every nth item of tbl[Value] column.
In other words, we want average of every item of tbl[Value] column, whose corresponding tbl[ID] value is perfectly divisible by n.
How do we know when a value is perfectly divisible by another?
Don’t worry. You don’t have to do the long division on paper now. Instead we use Excel’s MOD function.
When a value is perfectly divisible by another, the reminder is zero.
So, MOD(value1, value2) = 0 means, value2 divides value1 perfectly.
That means…
We want the average of tbl[Value] when MOD(tbl[ID], N) = 0
Lets write that in Excel formula lingo.
=AVERAGE( IF(MOD(tbl[ID], N) = 0tbl[Value]) )
This formula results in a bunch of values and FALSEs. Assuming N=3, this is what we get (for sample data):
=AVERAGE({FALSE;FALSE;15;FALSE;FALSE;18;FALSE;FALSE;18;FALSE;FALSE;15;FALSE;FALSE;14; …})
Since AVERAGE formula ignores any logical values, it will calculate the average of {15, 18, 18, 15, 14 … } and returns the answer you are expecting.
As this formula is processing arrays instead of single values, you need to array enter it (CTRL+SHIFT+Enter after typing the formula).

Bonus scenario: Average of FEBRUARY values only!

Here is a bonus scenario. Lets say you want to calculate the average sales of FEB alone… Then you can use AVERAGEIF (or AVERAGEIFS, if you want to have multiple conditions).
=AVERAGEIF(tbl[value], tbl[month], “FEB”)
Averageif() formula example - average of February values alone

Download example workbook:

Click here to download the example workbook. It contains all the techniques explained in this post. Play with the data & formulas to understand better.

Time for some challenges…

If you think averaging every nth value is not mean enough, try below challenges. Post your answers using comments.
  1. Write a formula to calculate average of every nth value, starting at row number ‘t’.
  2. Write a formula to calculate average of every nth value, assuming your table has only value column (no ID column).
Go ahead. Show off your formula skills. Post your answers in comments section.

Improving your Excel batting average

Calculating averages predates slice bread. Folklore says that when first neanderthal figured out how to express numbers and carved 2 of them on a cave wall, his manager walked by and asked “What is the average of these two? Eh?” and thumped her chest.
Although caves & wall carvings are replaced by cubicles & spreadsheets, we are still calculating averages, almost 2.9 million of them per hour.
So it pays to learn a few tricks about Excel Average formulas. Check out below to improve your average:
If your boss is the kind who thumps her chest and mocks you for your poor Excel skills, don’t cave in. Fight back. Enroll in Excel School and show that you can evolve.