Intro to Memory and Arrays in C
I wrote a C language tutorial for Cocoa Dev Central a ways back, but I didn't get into arrays or memory to keep the tutorial approachable. I'll write a formal follow-up soon, but I decided to post has some raw materials in the meantime, if just to get corrections in early. So here's a whirlwind tour of arrays and basic C memory. The next entry will discuss more advanced memory techniques.If you're not already comfortable with basic C syntax, I strongly recommend reading the C Language Tutorial for Cocoa at Cocoa Dev Central first.
Basic Memory
The common unit of measurement for memory is the byte. Each type of variable in C consumes a certain number of bytes. For example, a standard int variable generally needs 4 bytes of memory. This can, however, vary by OS, processor, and so on. Anyway, let's say I declare a simple int variable:
int monstersUnderBed;On a G4 running Tiger, this variable uses 4 bytes of memory. This memory can come from one of two places: the data segment or the stack. These terms aren't exactly intuitive, but the difference is incredibly simple.
If a variable is declared outside of a function, it is typically considered "global," meaning that any function can access it. Global variables are stored in a special area in the program's memory called the data segment. This memory is in use for as long as the program is running.
The stack is used to store variables that are only used inside of a function. Stack memory is temporary. Once a function finishes running, all of the memory for its variables is freed. This cycle happens each time the function is called. There is an exception to this, but we're not there quite yet.
Here's a simple program that uses both a global variable and a stack variable (I need something to count and I just watched Monsters, Inc. the other night, thus the monsters):
#include <stdio.h>
// this global variable resides in the data segment
int globalMonsters = 2;
void addFourMonsters ()
{
// this variable uses memory in the stack
int stackMonsters = 4;
// we add the value of the stack variable
// to the global variable
globalMonsters += stackMonsters;
}
main ()
{
printf ("Global monsters: %i\n", globalMonsters);
addFourMonsters();
printf ("Global monsters: %i\n", globalMonsters);
addFourMonsters();
printf ("Global monsters: %i\n", globalMonsters);
}This gives us some output like this:
Global monsters: 2
Global monsters: 6
Global monsters: 10All that's going on here is we're creating a stackMonsters variable and adding the value of it to the globalMonsters variable. Not rocket science.
Simple values like ints, floats and single char variables don't need any special management. They'll automatically be cleaned up at the end of the function or when the program exits.
Arrays
If you're used to Cocoa's NSArray class or arrays in a scripting language like JavaScript, you'll be amazed at how primitive C's arrays are. They're literally just a series of individual values. In a basic case, you create an array of a fixed size, like this:
int myIntArray[5];This declares an array which holds five integer values, so it takes as much memory as five individual ints:
int (4 bytes) x 5 = 20 bytesThe C language doesn't provide any way to resize basic arrays. You can do it manually, but it requires a slightly more advanced understanding of C memory management that we haven't touched on yet.
There are libraries that will do array management for you (and C++ has its own solution), but the basic C array is the one that's used most widely. Setting values in an array is dead simple:
int myArray[5];
myArray[0] = 99;
myArray[1] = 120;
myArray[2] = 22;
myArray[3] = 8287;
myArray[4] = 0;Although you can't change the size of the array, you can change the values of arrays element at any time, in any order. It doesn't have to be sequential as shown above.
Loops for Processing Arrays
Once you have an array, you can use a loop to easily process it. The following code creates a five-element array in stack memory, and sets a random value for each slot.
#include <stdio.h>
#include <time.h>
#define COUNT 5
main ()
{
// seed the random number generator with the
// current time to get the ball rolling
srand( time(NULL) );
// this array uses stack memory because it's
// declared inside of a function. The array
// size is set by the COUNT constant
int stackArray[COUNT];
int i;
// loop through and insert a random value
// returned from the rand() function
for ( i = 0; i < COUNT; i ++ ) {
stackArray[i] = rand();
}
// loop through and print out the values at
// each slot in the array
for ( i = 0; i < COUNT; i ++ ) {
printf ("Value %i: %i\n", i, stackArray[i]);
}
}This gives us output similar to following (remember, the values are random):
Value 0: 204319905
Value 1: 178291782
Value 2: 810292509
Value 3: 1392393136
Value 4: 822135393The stackArray variable uses stack memory, so the cleanup is automatic. The memory will be freed when the function ends.
An array doesn't have to be of a predetermined size. A relatively recent advancement in common C programming is the "variable length array." I haven't run any statistics, but my guess is that a number of C books on the shelves today probably don't actually mention this technique.
The basic idea behind variable length arrays is that the size of the array can be determined on the fly. For example, here's a simple program that creates an array of a random size:
#include <stdio.h>
#include <time.h>
main ()
{
// seed the random number generator with the
// current time, then get a random number
srand(time(NULL));
int randomNumber = rand() % 100;
// create an array of a random size
int myArray[randomNumber];
printf("Array size: %i slots\n", randomNumber);
}The output is something like this:
Array size: 63 slotsThis may seem like no big deal if you're used to scripting languages, but given that C is a lower-level language, this is pretty cool.
C Strings Are Arrays
In C, a string is an array of char values. As a result, it has to follow all the rules of an array. Here's a simple example:
char siteName[10] = "Theocacao";What might be surprising here is that I made a ten element array, even though the word "Theocacao" is only nine characters. In C, a string has to be "capped" with a special null character: '\0'. This is known as a "null-terminated string". If you build up the string one character at a time, it looks like this:
// declare the array
char siteName[10];
// add the characters
siteName[0] = 'T';
siteName[1] = 'h';
siteName[2] = 'e';
siteName[3] = 'o';
siteName[4] = 'c';
siteName[5] = 'a';
siteName[6] = 'c';
siteName[7] = 'a';
siteName[8] = 'o';
// add the null terminator to complete the string
// display string using %s in printf()
siteName[9] = '\0';
printf ("Site name: %s", siteName);So an array for a C string always needs to be at least as long as the character count, plus one additional slot for the null terminator. That's why "Theocacao" needs ten slots, not nine.
If you hardcode the string in the program, you can leave both the element count and the null terminator out, so this is fine as well:
char siteName[] = "Theocacao";The compiler will fill in the correct size at build time.
There are quite a few built-in functions that C provides for dealing with strings, but we'll leave that for another post. You can check out /usr/include/string.h in the meantime if you feel adventurous.
Note: "Sven-S. Porst" points out in the comments that saying a string is "just an array of chars" is an oversimplification. What he says is true, but the goal here is to reduce the basic concepts down to their simplest levels, then build on them later.
Wrap Up
This was a very quick introduction to some intermediate concepts in C programming. So now you should know a thing or two about arrays, as well as the difference between global and stack variables. The follow-up to this post will discuss pointers and dynamic memory management.
[Update: A terminology issue was fixed thanks to a gdb tutorial by Peter Jay Salzman.]
Intro to Memory and Arrays in C
Posted Feb 21, 2006 — 27 comments below
Posted Feb 21, 2006 — 27 comments below








ssp — Feb 21, 06 803
I guess I've seen my address being broken by too many pieces of software.
Scott Stevenson — Feb 21, 06 804
One step at a time there, dude. :) You can't teach everything at once.
Carl — Feb 21, 06 805
new, right?Scott Stevenson — Feb 21, 06 806
You're at least part right, and you've exposed a mistake in the tutorial. The heap is where malloc gets its memory, but global variables are actually stored in the data segment. I believe the 'new' bit you refer to is specific to C++ objects.
Tom Bradford — Feb 21, 06 807
Frank McCabe — Feb 21, 06 808
However, I think that there should be a giant health warning attached to this:
the total size of a stack allocated array is limited by the maximum size of the stack. Typically, there is *no* warning given if you exceed this size.
e.g.:
foo(int len)
{
int array[len];
}
if you call foo with (say) 1200000, then, at least under gcc, the array will be silently given a garbage value and you use it at your own (and your customer's) risk. It is not clear what the maximum safe size of a dynamically sized array is, but gcc seems to limit it to 64K bytes.
The same applies to alloca'd memory - if its too big you get silent garbage.
Scott Stevenson — Feb 21, 06 810
For better or worse, there's a large quantity of code that uses this approach -- perhaps most notably, many of the examples on ADC. There's nothing to be gained by pretending that's not the case. But maybe I'll add a few more notes on the subject.
I'd probably just eliminate the 'string' talk altogether, because unfortunately, there are just too many different ways of manipulating the string concept in the C/C++ world
I can respect what you're saying, but I just don't agree with the conclusion. It's a matter of walking before you can run.
Scott Stevenson — Feb 21, 06 811
I haven't really used these things a lot so I wasn't aware of that. I'll update the text. Thanks.
Stripes — Feb 21, 06 812
Scott Stevenson — Feb 22, 06 813
I believe the bss is sometimes also called the uninitialized data segment.
Carl — Feb 22, 06 815
I realized that after I wrote. I've never actually used pure C.
Jon — Feb 24, 06 833
Peter Ulvskov — May 28, 06 1338
Has a variation of NSMutableArray been developed that allows this? If not, any advice on how to accomplish this?
Thanks ,
Peter
Scott Stevenson — May 29, 06 1342
You can just use -indexOfObject:
Narayan — Aug 09, 06 1532
Rama Rao B. — Sep 13, 06 1784
deepika — Sep 13, 06 1785
Tim — Oct 21, 06 2119
Right.
Typically, there is *no* warning given if you exceed this size.
Not at compile time, but, depending on the C implementation, you may receive a runtime error when you allocate or initialize the array (remember that you must initialize all stack variables). On a Unix-like system that provides unmapped address space (guard pages) between stack segments, such as Mac OS X, you will typically receive a Segmentation Fault. In an embedded system that does not provide memory protection, you may receive no error at all.
Consider vla.c:
#include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char* argv[]) { if (argc < 2) { fprintf(stderr, "usage: %s byte-count\n", argv[0]); return EXIT_FAILURE; } unsigned long sz = strtoul(argv[1], NULL, 0); char vla[sz]; printf("Array size: %#lx\n", sizeof vla); memset(vla, 0, sizeof vla); return EXIT_SUCCESS; }Here is the execution log on a G5 Quad running Mac OS X 10.4.8. Note that in some cases you receive the Segmentation Fault upon allocation (before the printf() call) and in some cases you receive the fault during initialization (after the printf() call). This behavior may vary on an Intel system.
% gcc -std=c99 -Wall -Wextra -pedantic -o vla vla.c % ./vla 0 Array size: 0 % ./vla 1 Array size: 0x10000 % ./vla 0x100000 Array size: 0x100000 % ./vla 0x1000000 Segmentation fault % ./vla 0x10000000 Array size: 0x10000000 Segmentation fault % ./vla 0x100000000 Array size: 0xffffffff Segmentation faultif you call foo with (say) 1200000, then, at least under gcc, the array will be silently given a garbage value and you use it at your own (and your customer's) risk. It is not clear what the maximum safe size of a dynamically sized array is, but gcc seems to limit it to 64K bytes.
Not sure what is meant by "silently given a garbage value." All stack variables have garbage values until you initialize them. The value 1200000 works fine with the vla.c given above, but this is system and program dependent.
% ./vla 1200000 Array size: 0x124f80The maximum safe size of a dynamically sized array is at least as large as the maximum safe size of an equivalent statically sized array, and possibly larger. It is both system and program dependent: it depends on the system's maximum stack size (for Unix see getrlimit(2)) and your thread's worst-case stack usage (which is typically not easily determined, and is further complicated by dynamic allocation of stack space).
I have seen no 64K limitation with GCC. The default stack limit on my system is 8MB. As expected, vla.c fails with arrays that are close to 8MB in size (there is some initial stack usage, between 4K and 32K). Increasing the stack limit to 64MB works as expected. [/p]
% ./vla 0x700000 Array size: 0x700000 % ./vla 0x800000 Segmentation fault % ./vla 0x7ff000 Segmentation fault % ./vla 0x7f8000 Array size: 0x7f8000 % limit stacksize stacksize 8192 kbytes % ./vla 0x800000 Segmentation fault % limit stacksize unlimited % limit stacksize stacksize 65536 kbytes % ./vla 0x800000 Array size: 0x800000 % ./vla 0x3f00000 Array size: 0x3f00000 % ./vla 0x4000000 Segmentation fault %Tim — Oct 21, 06 2120
Scott Stevenson — Oct 21, 06 2121
You're the first person that's tried to actually use the formatting to that extent. :) I think the main issue was that code wasn't preformatted for white space, but it is now.
Steve Sadler — Nov 26, 07 5138
I see a lot of people nit picking your page. It helped me understand, so i thank you.
A different steve
Rico Secada — Dec 29, 07 5301
What you are saying in this tutorial is only partly right. There is no such thing as the data segment or the stack in C! And this is a serious mistake.
The whole point of a high-level language like C is to avoid thinking like a person who is programming in assembler.
The C standard doesn't say anything about this. Some platforms follow the model you suggest, others don't.
If you actually need to know about the data segment or the stack, then you're outside the realm of C programming.
C has such things as automatic storage, static storage, dynamic
storage, which in turn have the features ascribed to them in the C
standard.
Apart from that - great tutorial!
Best regards, Rico.
Stripes — Dec 30, 07 5302
If you know how C's stack works on your CPU finding stack smashing bugs will be far simpler. If you know how your libc's malloc works finding "use after free", or whatever.
Using this sort of knowledge for "good" will help you debug. A whole lot.
Using it for evil (I know I can use memory after free as long as I don't malloc again, I know mallocs are rounded up to 16 bytes, I know the stack grows down, I know...) will create awesome bugs when somebody tries to use your code on a new CPU, new OS, new C compiler, or with a new libc. If there is any justice in the world the "somebody" will be you, but sadly it isn't always.
I'm on the fence about how much you can use it to optimize (speed or storage size) for without being good or evil :-)
Rico Secada — Jan 06, 08 5317
You are missing the point and you are creating confusion in people who are not skilled in C and who needs to understand these issues right :-)
You should correct the tutorial to conform with standard C and address the issues as automatic storage, static storage and dynamic
storage not as "the data segment" or "the stack". We are not dealing with assembler and this tutorial should not be using those terms regarding C and memory management.
If you know how C's stack works on your CPU finding stack smashing bugs will be far simpler. If you know how your libc's malloc works finding "use after free", or whatever.
Using this sort of knowledge for "good" will help you debug. A whole lot.
This is not within the scope of C. If you need to know exactly how "the stack" works, you need to stop working with C and start working with assembler.
Best regards.
Rico.
Holger — Feb 09, 08 5468
Holger
Ishan — Feb 25, 08 5568
mahe — Apr 16, 08 5739