Lithofacies in unconventional formations can be classified from well logs using principal component analysis and/or artificial neural networks (ANN) (Ma and Gomez, 2015; Wang and Carr, 2012). However, there are pitfalls in classifying lithofacies from well logs. Correlation structure of the input logs might be well understood for an optimal classification of lithofacies. Here we show a pitfall using an example of tight gas sandstone formation with two levels of significant heterogeneity.
The crossplot of gamma ray (GR) and resistivity log data (Fig. 1a) from a tight-gas sandstone reservoir shows that the two properties have a quite small correlation at -0.24. In a typical tight gas sandstone formation, GR and resistivity are generally correlated highly (Ma et al., 2016). Why they have a small correlation in this example?
Figure 1. Crossplots between GR and Vclay of a tight gas sandstone formation. The Pearson correlation coefficient is -0.038. (a) A straight crossplot without other information; (b) overlain with two cutoff values on GR for classify sandy channel facies (low GR band), crevasse-splay (intermediate GR values) and clayey overbank (high GR value band); (c) overlain with depth; (d) overlain with the first principal component (PC1); (e) overlain with the second PC (PC2).
Traditionally, geoscientists have applied cutoffs on GR to generate lithofacies in siliciclastic formations (e.g., Li and Gao, 2019; Slatt, 2006). However, when GR and resistivity are not highly correlated, as shown in Fig. 1a, the method of cutoffs on GR or resistivity will not lead to good results. The classification of lithofacies using cutoffs, no matter what cutoff values are used, is very unsatisfactory, such as using benchmark GR values of the similar formations shown in Fig. 1b. The resultant classified lithofacies show almost no channel facies in the upper stratigraphic layers despite the high resistivity in those layers (Fig. 2). Similarly, applying cutoffs on resistivity will not generate satisfactory lithofacies classes (not shown here, but can be inferred from the crossplot in Fig. 1a).
Figure 2. Cross section of well logs and classified lithofacies for a vertical well. Track 1 is the subsea TVD (SSTVD); track 2 is GR; track 3 is the lithofacies classified using GR cutoff values (see the color legend); track 4 is the lithofacies classified using PC1; track 5 is deep resistivity (RD).
A low correlation is reflected as a wide spread of data in the crossplot. Because GR and resistivity are intrinsically highly correlated in this kind of formation (see e.g., Ma et al., 2016), there are generally two or three ways of analyzing this kind of data: separating the data into two groups and treating them separately (see e.g., Ma, 2011), identifying a third variable that reduces the intrinsic correlation, and applying an analytical method that directly handles the problem. Regardless, the intrinsic high correlation implies one level of heterogeneity and reduction of correlation reflects another level of heterogeneity. Here we present a method using PCA to directly treat the problem (for details regarding PCA, see Ma, 2019). ANN generally does not work well for lithofacies classifications with this kind of data (Ma, 2011). PCA can be used to synthesize GR and resistivity and cluster lithofacies. Overlaying the first principal component (PC1) on the crossplot (Fig. 1c) shows that applying cutoffs on PC1 would improve the lithofacies classification because PC1 integrates information from both GR and resistivity instead of using only one of them.
This classification shows significantly sandier channel facies in the upper stratigraphic layers than in the lower stratigraphic package (Fig. 2). The overall higher deep resistivity (RD) in the upper stratigraphic layers suggests more reservoir-bearing rocks. Further investigations using geochemical analysis showed significantly higher K-feldspar and thorium content in the upper stratigraphic deposits even in sandy lithofacies (Prensky, 1984). This explains the apparent paradox that both GR and resistivity readings are higher in the upper formation than in the lower formations, even though they are generally correlated negatively.
Obviously, the second principal component (PC2) also contain information from both GR and resistivity. However, in this case, PC2 does not carry much information on lithofacies. Imagine that the correlation between GR and resistivity is positive, which could happen when the crossplot in Fig. 1 is more elongated in the diagonal (e.g., more potassium and thorium presence in sandstones could cause that). Then, PC1 and PC2 would be reversed. As such, PC2 would be more useful than PC1 for lithofacies classification. In other words, the correlation reversal between GR and resistivity would lead to the reversal of the principal components. Incidentally, when this happens, the phenomenon is termed Yule-Simpson's effect or Simpson's paradox (Ma, 2020, 2019; Pearl, 2000). Causal analysis is critical for statistical modeling of such a phenomenon.
Note that applying cutoffs on a principal component is not the same as applying cutoffs on the two original logs. This can be clearly seen from the crossplot shown in Fig. 1c, since PC1 is correlated to both GR and resistivity and thus contain partial information from both of these logs.
One of the most important bases for developing a natural resource field is the estimate of resource initially in place, such as hydrocarbon pore volume (HCPV), oil initially in-place (OIP) and gas initially in-place (GIP). The OIP and GIP estimations impact reservoir management and investment decision. When the estimate is too optimistic, it can lead to excessive investments; when the estimate is too pessimistic, it leads to under investments and suboptimal developments of the fields.
Because of the nature of subsurface fluid characters, hydrocarbon volumetric is calculated as the product of porosity and hydrocarbon saturation. For example, GIP is the integral of bulk gas volumes or products of porosity and gas saturation, before taking account of the formation volume factor, expressed as
where x=(x, y, z) describes the spatial coordinates, R is the 3D prospect or reservoir domain, ϕ is the porosity, and Sg is the gas saturation.
From the basic calculus, it is not straightforward to solve an integral of the products of two variables, even when the average values of both variables are known. This is why the petroleum industry has used an equation simplified from Eq. (1) for in-place gas volumetrics
where A represents the area of the field, H the thickness or net pay, ϕ the porosity, and Sg the gas saturation. All the inputs on the right-hand side of Eq. (2) and other classical volumetrics use the respective means of the reservoir properties in nearly all the applications (Ma, 2019; Murtha and Ross, 2009).
The argument for using only the mean values of porosity and hydrocarbon saturation for the volumetric estimation using Eq. (2) is that no petroleum reservoir is homogeneous and petrophysical parameters must be averaged (Tiab and Donaldson, 2003, P. 96). For high-quality conventional reservoirs, the estimation using the average values can be approximately accurate; but it can be highly inaccurate for unconventional resource evaluations (Ma, 2018). This is because the variabilities of porosity and fluid saturation are important, and they impact the volumetrics; in addition, porosity and fluid saturation(s) are generally correlated and their correlation also impact the hydrocarbon volumetrics. Indeed, an accurate parametric formula of Eq. (1) can be expressed as
where Vt is the total formation bulk volume, E is the mathematical expectation operator, E(ϕSg) is a second-order statistical moment, mϕ and mg are the means of porosity and gas saturation, respectively, ρ is the Pearson correlation coefficient between porosity and gas saturation, and σϕ and Sg are the standard deviations of porosity and gas saturation, respectively.
Equation 3 clearly shows that the variabilities of, and correlation between porosity and fluid saturation affects the hydrocarbon volumetrics. The classical volumetric equation (Eq. (2)) ignores these statistical properties by using their mean values only. Based on petrophysical analysis, porosity and hydrocarbon saturation are correlated (Kennedy, 2015; Lucia, 2007; Fylling, 2002). In organic-rich gas shales, organic-matter pores or pores associated with kerogen play an important role in the pore network (Saraji et al., 2013; Loucks et al., 2012), and thus, porosity is often highly correlated to TOC (Alqahtani and Tutuncu, 2014). When TOC is high and significant amounts of kerogen transforms to hydrocarbons, more pores are generated. This leads to a significant correlation between effective porosity and hydrocarbon saturation in source-rock reservoirs.
Therefore, how to model the correlation between porosity and hydrocarbon saturation impacts the accuracy in estimating gas in-place volumetrics; the correlation between porosity and fluid saturation must be accounted for. Note, however, that the correlations among these properties are estimated from limited well-log and/or core data in practice and thus may have significant uncertainties. Moreover, effective porosity and fluid saturation are generally not directly measured, but they are derived from other logs, such as density, sonic, resistivity and gamma ray logs. Sw at wells is often derived using the Archie equation or its variants, and its values are impacted by the chosen parameters and associated pitfalls in these methods. For example, carbonates generally have very high resistivity values, the estimated Sw values for tight non-hydrocarbon-bearing carbonates using these methods are often unrealistically low, as shown by the example in Fig. 3. This leads to a reduced correlation between porosity and Sw, and it has two opposite effects for estimating the gas in-place volumetrics.
Figure 3. Crossplots between porosity and fluid saturation. (a) Porosity-Sw. The Pearson correlation is -0.51; (b) same as (a) but overlain with Vlime; (c) porosity-gas saturation (Sg). The Pearson correlation is 0.51; (d) porosity-gas saturation (Sg) after the high-Vlime data are excluded. The Pearson correlation is 0.69.
The reduced Sw values lead to an overestimation of the GIP because gas saturation is inversely correlated to Sw. The reduced correlation between gas saturation and porosity leads to an underestimation of the GIP according to Eq. (3). When the overestimation and underestimation have a similar magnitude, the overall estimation can be relatively accurate; however, this is a correct positive for the wrong reason or Type 3 error (Ma, 2010). More generally, the magnitudes of overestimation and underestimation are not equal, leading to an inaccurate estimation of the GIP. In this example, note that the data with low porosity and Sw are caused by the tight limestone with high resistivity using the Archie method or its variants. The Pearson correlation coefficient is -0.51. Excluding those data, the correlation is significantly higher, as shown in Fig. 3d.
Some geoscientists and engineers use linear regression to model fluid saturation from porosity, especially when facing a lack of data. This implies a 1-to-1 correlation between them in estimating GIP using the Monte Carlo simulation. When the other statistical parameters are accurately used, this assumption will lead to an overestimation of the GIP as shown in Table 1. Others do not model the correlation explicitly; which can frequently cause an accident spurious correlation, leading to inaccurate estimate of GIP. For example, the opposite sign to the genuine correlation in the example leads to significant underestimated GIP, as shown in the example in Table 1, in which an accidental correlation (wrong sign) leads to a lower estimate of GIP by -40.11%.
Porosity Gas saturation Correlation coefficient, ρ Unit GIP Relative change Mean SD Mean SD 0.041 0.038 0.425 0.285 1.00 0.028 26 14.0% 0.69 0.024 81 Base case 0.51 0.022 95 -7.5% 0.00 0.017 425 -29.7% -0.69 0.009 952 -40.11% Note: (1) Base case is the case scientifically most reasonable for the given data; (2) unit GIP is the GIP for a unit of volume, such as cubic meter or cubic feet; (3) SD stands for standard deviation.
Table 1. Comparing bulk volume of gas with different correlations between porosity and gas saturation
On the other hand, other geoscientists and engineers do not use a correlation in the Monte Carlo simulation of hydrocarbon volumetrics. In fact, theoretically, the Monte Carlo simulation of volumetrics cannot use correlations because it uses the mean values of the properties and the means, by definition, are not correlated. As such, the Monte Carlo will tend to underestimate the hydrocarbon volumetrics when the hydrocarbon saturation and porosity are positively correlated. In this example, the estimated GIP is nearly 30% lower.
1.1. Heterogeneities and Lithofacies Prediction from Well Logs
1.2. Petrophysical Heterogeneities, Correlations and Hydrocarbon Volumetric Estimation
Traditionally, geoscientists and petroleum engineers thought that unbiased modeling methods will lead to unbiased estimations in evaluating unconventional hydrocarbon resources such as shale gas and tight gas (e.g., Ehsan et al., 2018) because this is a general principle of statistical estimation (Ma, 2019; Cao et al., 2014; Cressie, 1993; Matheron, 1989). However, this is not always true. Several situations can cause a biased estimation even when an unbiased method is used, including presence of sampling bias, and/or estimating a composite variable (Ma and Gomez, 2019; Ma, 2018). While a biased estimate by presence of sampling bias, if not mitigated, is known (Ma, 2019; Isaaks and Srivastava, 1989), a biased estimate for a composite variable, even when an unbiased modeling method is used, is much trickier, and to date, appears to be ignored in statistical community. Only limited studies have been published recently in applied geosciences (Ma, 2018).
Specifically, interpolation methods, such as various kriging techniques, can reduce the variability of a reservoir property in 3D modeling. Stochastic simulation can preserve the heterogeneities of reservoir properties (Ma, 2019). From Eq. (3), the heterogeneities in porosity and fluid saturation affect the hydrocarbon volumetric estimation. Because the heterogeneities of those petrophysical property models are affected by the chosen modeling algorithm, the latter thus impacts the hydrocarbon volumetric estimate. This includes two problems: heterogeneities of and correlation between the modeled reservoir properties. This section discusses the impact on the hydrocarbon volumetric estimation by the heterogeneities in porosity and fluid saturation models.
Because hydrocarbon volume is a composite variable-product of porosity and hydrocarbon saturation, it is impacted by the porosity and hydrocarbon saturation. To assess the effect of commonly used modeling methods in geosciences on hydrocarbon volumetric estimates, we present results of using kriging, stochastic simulation for modeling porosity and cokriging, and stochastic cosimulation and linear regression for modeling hydrocarbon saturation.
Table 2 compares the volumetric results of different modeling methods for a tight gas sandstone reservoir. The base case model uses a 3D grid with 1 meter-cell thickness. The estimation methods, including kriging, cokriging and linear regression, have reduced the variances (and consequently standard deviation or SD) of porosity and gas saturation (see Fig. 4). These have led to reduced GIP estimates, even though the correlation between the 3D porosity and gas saturation models are increased (i.e., Pearson correlation coefficients increased to 0.93 or 1) from the base case (Pearson correlation coefficient is 0.7, see Table 2 and Fig. 4). Although kriging, cokriging and linear regression have been extensively used in geoscience and geoengineering, geoscientists have not paid much attention to these effects. Specifically, they have known about the reduction of heterogeneity by kriging and linear regression (Ma, 2019; Delfiner, 2007), the effect of heterogeneity to hydrocarbon volumetrics has been completely ignored.
Grid Porosity Sg Correlation coefficient, ρ Unit GIP Relative change Mean SD Mean SD 1 m-thick cells 0.050 0.030 0.200 0.200 0.70 0.014 2 Reference Kriging/cokriging 0.050 0.020 0.200 0.130 0.93 0.012 4 -12.7% Kriging/linear regression 0.050 0.020 0.200 0.110 1.00 0.012 2 -14.1% Stochastic simulation/cosimulation 1 0.050 0.028 0.200 0.205 0.70 0.014 0 -1.4% Stochastic simulation/cosimulation 2 0.050 0.030 0.200 0.210 0.71 0.014 5 3.5% Note: (1) Porosity is modeled first using kriging or stochastic simulation. Gas saturation is modeled in relation to the porosity with the given correlation for stochastic cosimulation or the correlation coefficient is a consequence of cokriging or linear regression for modeling the gas saturation. (2) Data are simplified for facilitating the presentation.
Table 2. Impact of different modeling methods on gas volume estimation for a tight gas sandstone formation
Figure 4. Map views of porosity and gas saturation (Sg) models using several modeling methods. (a) Porosity model built using kriging from 22 wells. (b) Gas saturation model using collocated cokriging in relation to the porosity model in (a). The two models have a Pearson correlation coefficient of 0.93 (see Table 2). (c) Porosity model built using stochastic simulation conditioned to the porosity log data at 22 wells. (d) Gas saturation model built using collocated cosimulation (Cocosim) in relation to the porosity model in (c). The two models have a Pearson correlation coefficient of 0.7 (see Table 2). The model size is approximately 6 km (Northing) by 4 km (Easting).
Note that an increased correlation between porosity and hydrocarbon saturation by kriging, cokriging or linear regression leads to an optimistic bias for hydrocarbon volumetric estimation (further discussed in the next section), yet, it did not make up for the reduced GIP because of the reduction of the variances in the porosity and hydrocarbon saturation (Table 2). On the other hand, stochastic simulation and cosimulation (Moore et al., 2016; Cao et al., 2014) enable preserving the variances in porosity and gas saturation and their correlation; and they did not reduce the GIP estimate in this example.
The porosity is typically modeled first before modeling the water or hydrocarbon saturation because porosity generally has more data and smaller uncertainty than water or hydrocarbon saturation (Ma, 2019). After the porosity model is constructed, water or gas saturation can be modeled in relation to the porosity, especially important for unconventional reservoirs (Ma, 2018). For unconventional reservoirs that no fluid contact or free water level is defined, modeling the fluid saturation accounting for its correlation to porosity often has a significant impact on the hydrocarbon volumetric estimates.
The effect of correlation between porosity and fluid saturations can be assessed using stochastic simulation. As stochastic simulation and cosimulation can preserve the heterogeneity and enable modeling the degree of correlation between porosity and fluid saturation, evaluating the impact of correlation to the volumetric estimates is possible. In this example, the mean and standard deviation of the 3D porosity model is identical to that of the data, three gas saturation models were generated using collocated cosimulation (Cocosim) with various degrees of correlation between the two petrophysical properties. The GIP estimates for the three models are very different. When the correlation is not modeled (i.e., the correlation equal to zero), the estimate is nearly 30% lower than the reference and when the correlation is modeled higher (e.g., 0.95), the estimate is too optimistic (13.4% overestimation).
From the earlier presentation in Section 2.1, both the variances of and correlation between porosity and hydrocarbon saturation can impact the hydrocarbon volumetric estimation. From the central limit theorem (CLT, see e.g., Ma, 2019), data at different scales or supports have different variances. This is termed change of support problem (COSP) in geostatistics (Cressie, 1993). Moreover, bivariate correlations at different scale are different (Ma, 2019; Chiles and Delfiner, 2012; Gotway and Young, 2002; Robinson, 1950). Because different variances and correlation will lead to different volumetrics, as presented earlier, different scales (or supports) will lead to different volumetric estimates.
In petroleum geosciences, different data types are very common, and they often have very different scales. Core plugs are very small, well logs data generally have a sampling rate of 0.5 foot, geocellular grids have cell thicknesses commonly ranging between 1 and 3 m and reservoir simulation grids often have thicker cells generally ranging between 1 and 15 m (see e.g., Ma, 2019). These differences in scale will cause differences in variances, which complicates the estimations of hydrocarbon volumetrics. Larger the scale or support of data is smaller their variance will be. As such, the thicker the grid cells in the model are, the higher the reduction of the hydrocarbon volumetrics by the model.
Here we present an example that shows effects of data scale (or support) to the variance (and standard deviation) and consequently to the volumetric estimate. To simplify the presentation, we assume an unchanged correlation between porosity and gas saturation. The 3D model with cell thickness of 2 m has a reduction of the GIP estimate by 19.7% relative to the reference with the 0.15 m cell thickness. The model with cell thickness of 15 m reduces the GIP by 39.3%. This clearly shows the support effect on hydrocarbon volumetric estimates via its effect on the modeled heterogeneities of porosity and fluid saturation.
Note that COSP on correlation is much different to characterize, although some work has been performed in spatial correlation analysis (Gotway and Young, 2002). Because the correlation between porosity and fluid saturation impacts the hydrocarbon volumetric estimation, as seen from Tables 1, 2 and 3, and COSP (Table 4) has an effect on correlation, COSP has implications on hydrocarbon volumetric estimation. This problem is a statistical challenge for future research. Moreover, sampling bias can impact the estimations of most statistical parameters, including mean, variance, and correlation. This implies that sampling bias will has an effect on modeling reservoir properties as well as on hydrocarbon volumetrics (see e.g., Ma, 2019; Ma and Gomez, 2019; Li et al., 2018).
Grid Porosity Sg Correlation coefficient Unit GIP Relative change Mean SD Mean SD 1 m-thick cells 0.050 0.030 0.200 0.200 0.70 0.014 2 Reference Simulation 1 0.050 0.030 0.200 0.200 0.70 0.014 2 Identical to reference Simulation 2 0.050 0.029 0.200 0.200 0.00 0.010 0 -29.6% Simulation 3 0.050 0.032 0.200 0.200 0.95 0.016 1 13.4% Note: The correlation between porosity and gas saturation is modeled using collocated cosimulation when the gas saturation is modeled. SD stands for standard deviation.
Table 3. Volumetrics for different correlation coefficient between porosity and gas saturation (tight gas sandstone formation).
Grid Porosity Sg Correlation coefficient Unit GIP Relative change Mean SD Mean SD 0.15 m-thick cells 0.05 0.035 0.20 0.200 0.70 0.008 90 Reference 2 m-thick cells 0.05 0.030 0.20 0.150 0.70 0.007 15 -19.7% 15 m-thick cells 0.05 0.020 0.20 0.100 0.70 0.005 40 -39.3% Note: GIP using the classical volumetric method with the averages is even lower, equal to 0.004 962 9. SD stands for standard deviation.
Table 4. Volumetrics for three grids with different cell thicknesses (a tight gas sandstone formation)