How do I load large datasets (>1 GB) under 32-bit Windows? I receive an
error r(909) saying “op. sys. refuses to provide memory”.
|
Title
|
|
Large datasets under Windows
|
|
Author
|
Kevin S. Turner, StataCorp
|
|
Date
|
October 2001; updated July 2007; minor revisions October 2007
|
First, make sure you have installed enough memory or allowed for enough
virtual memory. If you have and are still getting this error, continue
reading.
Under all current 32-bit Windows operating systems (Windows Vista, XP, 2000,
NT, ME, 98, 95), the total available address space for any application is
2.1 GB. If you have a dataset larger than 2.1 GB, you will not be able to
load it on Stata for Windows. This is simply a limitation of the operating
system.
Unfortunately, even if your dataset is under the 2.1-GB limit, you may run
into difficulty when loading it into Stata. The fault again lies with how
Windows manages the 2.1-GB address space. When a typical application loads,
there are usually several libraries (or DLLs) that are loaded as well.
These libraries are usually loaded into the 2.1-GB space on the upper end,
but not in any deterministic order. Microsoft has assured us that there is no way
to prevent these libraries from loading into arbitrary addresses; thus,
fragmenting the available space. When Stata tries to load a dataset, it
requests from Windows the largest contiguous space in the 2.1-GB
range. Depending on where Windows loaded the initial libraries, this may be
1.8 GB, 1.3 GB, or even less. You may be surprised to find that a 1.4-GB
dataset loaded fine one time but failed to load later. This is
simply an unfortunate side effect of Windows memory management.
|
WINDOWS XP SP2 NOTE: There is an issue in Windows XP,
service pack 2, that fragments the memory available to Stata 10, as noted in the
Microsoft Knowledge Base article at the below URL.
http://support.microsoft.com/?kbid=894472
If you have service pack 2, Stata 10, and often need memory near
or above 1 GB, you should consider installing a hotfix patch from Microsoft
that corrects the problem.
You can test to see if you are experiencing this in two ways. The first
method is to use a version of Stata before version 10. Record the maximum
amount of memory you can allocate and compare that to the maximum amount you
can allocate under Stata 10. If there is a large difference (>50 MB), the
issue is probably present. The second way to test is by using the System
Restore functionality of Windows XP to revert to Service Pack 1. If you can
allocate significantly more under Service Pack 1 than 2, you are most likely
experiencing the problem.
A fix from Microsoft known as hotfix 894472 is available.
Microsoft has informed us that this hotfix will become part of Windows
XP service pack 3 (SP3).
NOTE: Vista does not need the hotfix.
Until SP3 is released, we have received permission from Microsoft to
make hotfix 894472 available to affected users. It is available for
download here.
To install the hotfix, download it to your hard drive, double-click on it,
and follow the instructions.
This issue can affect applications other than Stata 10 but Stata 10, because
of its use of MFC (an internal set of Microsoft libraries) and need of
contiguous memory space, is in a position to more readily exhibit this
problem. Earlier versions of Stata did not use MFC, which is why they were
not affected. This hotfix does not guarantee that your operating system
will allocate close to the maximum memory limit of 2.1 GB.
NOTE: Windows 2003 server with service pack 1 has the same problem as above,
but the bug was fixed in service pack 2.
|
|
Why Stata 10 memory allocation differs from Stata 9 on Windows XP and Windows 2003 server
Stata 10’s new Graph Editor uses the gdiplus.dll from
Microsoft. Because of the base address Microsoft chooses for this dll,
the memory space that is available to Stata 10 is fragmented. This causes
the largest continuous memory block that you can allocate to Stata 10 to be
about 200 MB less than Stata 9.
We have contacted Microsoft to see if they can fix this problem.
|
By now, you are wondering what your alternatives are.
As of July 2007, several operating system alternatives with 64-bit
support are becoming available. See
www.stata.com/products/opsys.html for a
list of operating systems compatible with Stata. The 64-bit platform will
enable you to work with large datasets. Depending on your operating
system, you should be able to allocate as much memory as you have on the
machine, minus the system requirements. To take advantage of this
technology, you will need
64-bit–compatible hardware, a 64-bit operating system, and of course a
64-bit version of Stata.
As a last resort, you may consider trimming any unnecessary data from your
dataset or dividing the dataset into two files. You may want to use the
second syntax of the use command to read in just the
observations/variables you want. For example:
. describe using auto.dta
Contains data 1978 Automobile Data
obs: 74 26 Mar 2007 09:52
vars: 12
size: 3,478
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
make str18 %-18s Make and Model
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
rep78 int %8.0g Repair Record 1978
headroom float %6.1f Headroom (in.)
trunk int %8.0g Trunk space (cu. ft.)
weight int %8.0gc Weight (lbs.)
length int %8.0g Length (in.)
turn int %8.0g Turn Circle (ft.)
displacement int %8.0g Displacement (cu. in.)
gear_ratio float %6.2f Gear Ratio
foreign byte %8.0g origin Car type
-------------------------------------------------------------------------------
Sorted by: foreign
. use mpg price for using auto.dta in 1/50, clear
(1978 Automobile Data)
. describe
Contains data from auto.dta
obs: 50 1978 Automobile Data
vars: 3 26 Mar 2007 09:52
size: 450 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
foreign byte %8.0g origin Car type
-------------------------------------------------------------------------------
Sorted by: foreign
Depending on your data and analysis, this may not be feasible and is
offered only as a suggestion.
|
|