Biochemical markers for liver fibrosis (FibroTest) and necroinflammatory features (ActiTest) are an alternative to liver biopsy in patients with chronic hepatitis C. Our aim was to assess the inter-laboratory variability of these tests, and their 6 components (γ-glutamyl transpeptidase, alanine aminotransferase, α2-macroglobulin, haptoglobin, apolipoprotein A1, and total bilirubin) and to identify factors associated with this variability.
Serum of 24 patients with chronic hepatitis C or severe alcoholic liver disease were prospectively recorded and analyzed in one reference center and in 8 additional laboratories. When γ-glutamyl transpeptidase and alanine aminotransferase were expressed in international units, there was no significant difference between laboratories in the results of FibroTest or ActiTest; kappa statistics were greater than 0.50 with only 0.8% of cases (3/384) with a discordance of more than one stage. The main factor significantly associated with variability was the expression of γ-glutamyl transpeptidase and alanine aminotransferase, as multiples of upper limit of reference values. The use of standardized method with pyridoxal phosphate reduced the variability of alanine aminotransferase expression, and standardized original Szasz method reduced the variability of γ-glutamyl transpeptidase expression.
The variability of FibroTest and ActiTest was acceptable without clinical consequences for the prediction of the stage of liver fibrosis and grade of activity. Standardized methods and assay calibration should be used and expression of alanine aminotransferase and γ-glutamyl transpeptidase in multiples of the upper limit of reference values should not be employed.
The "gold standard" for assessing fibrosis, liver biopsy, is recommended prior to the initiation of antiviral therapy ; in addition, it is vital for monitoring fibrosis progression. Unfortunately, this procedure is invasive, prone to complications, including hemorrhage and death , and has a high risk of sampling error . Biochemical markers for liver fibrosis (FibroTest) and necroinflammatory features (ActiTest) are an alternative to liver biopsy, in patients with chronic hepatitis C . Since the first publication, which included a validation period , those tests have been validated in different populations by the same reference laboratory [5,6] and by an independent group . The tests combine five components (α2-macroglobulin, haptoglobin, apolipoprotein A1, γ-glutamyl transpeptidase (GGT), and total bilirubin) for FibroTest and same plus alanine aminotransferase (ALT) for ActiTest.
The aim of this study was to assess the inter-laboratory variability of FibroTest and ActiTest, including their six serum liver components, in patients with chronic liver disease, and to identify factors associated with that variability. Our concern was to assess whether the analytical methods adapted on the different analyzers were associated with significant variability in FibroTest and/or ActiTest values. Moreover, we aimed to compare the variability of FibroTest and ActiTest in relation to the method of expressing enzymatic activity; in particular, in terms of absolute values or as multiples of the upper limit of normal. Since we and others have demonstrated that current definitions of normal values may be inappropriate [8-10], a major concern was the definition of ALT and GGT activity. In routine practice, the definition of the upper limit of normal (ULN) of ALT and GGT varies between laboratories, but is rarely detailed. Because numerous medical guidelines make reference to ALT and GGT expressed as multiples of the ULN (ULN units), variations in the definition of normal may have important practical consequences.
The main characteristics of the included patients are outlined in Table 1. According to each patient and laboratory, details of the FibroTest and ActiTest assays are given in Figure 1. There was no significant difference between centers for FibroTest using GGT expressed in international units [mean (sd) = 0.57 (0.26), range = 0.48–0.65, F-Ratio = 0.27, p = 0.27]. For FibroTest using GGT expressed in ULN units [mean (sd) = 0.55 (0.27), range = 0.45–0.68, F-Ratio = 1.26, p = 0.27], there was a significant difference between three centers (center 5 had higher means values than center 2 and 4; p = 0.02 for both comparisons).
Figure 1. FibroTest and ActiTest variability according to laboratories (centers) and units of enzymatic expression: international units (IU) and upper limit of normal (UNL).
Table 1. Characteristics of included patients
There was no significant difference between centers for ActiTest using ALT and GGT expressed in international units [mean (sd) = 0.32 (0.26), range = 0.38–0.53, F-Ratio = 1.21, p = 0.30] and for ActiTest using ALT and GGT expressed in ULN units [mean (sd) = 0.44 (0.27), range = 0.27–0.43, F-ratio = 0.81, p = 0.59).
The details of the liver proteins and total bilirubin assays according to each patient and laboratory are outlined in Figure 2. There were no significant differences according to testing center for any of these assays (between centers or versus the reference center): (α2-macroglobulin [mean (sd) = 2.89 (1.16) g/l, range = 2.69–3.33, F-Ratio = 0.72, p = 0.67], haptoglobin [mean (sd) = 0.98 (0.58) g/l, range = 0.92–1.03, F-Ratio = 0.07, p = 0.99), apolipoprotein A1 [mean (sd) = 1.30 (0.51) g/l, range = 1.16–1.42, F-Ratio = 1.21, p = 0.30] and bilirubin [mean (sd) = 28.8 (66) micromol/l, range = 15.8–51.1, F-ratio = 0.51, p = 0.85]. One analyzer (ADVIA) gave lower mean apoliprotein A1 levels [1.06 (0.43) g/l) than the other analyzers [1.33 (0.52) g/l; p = 0.02].
Figure 2. Serum proteins and total bilirubin variability according to laboratory (center).
The details of the ALT and GGT assays, according to each patient and laboratory and expressed in international or ULN units, are given in Figure 3. There was no significant difference between centers for ALT expressed in international units [mean (sd) = 70 (47) IU/ml, range = 57–86, F-Ratio = 1.30, p = 0.25]. However, when the assays used pyridoxal phosphate as in the reference center, the mean ALT was higher [78 (50) IU/ml] than assays not using pyridoxal phosphate [60 (42) IU/ml; p = 0.003]. For ALT expressed in ULN units [mean (sd) = 48 (37), range = 37–71, F-Ratio = 1.65, p = 0.12], there was a significant difference between center 1 and all centers (p = 0.009 vs center 2, p = 0.008 vs center 3, p = 0.04 vs center 4, p = 0.04 vs center 5, p = 0.02 vs center 6, p = 0.03 vs center 7, p = 0.01 vs center 10 and p = 0.001 vs center 11). There were no significant differences between centers for GGT expressed in international units [mean (sd) = 130 (158) IU/ml, range = 57–86, F-Ratio = 1.30, p = 0.25] or in ULN units [mean (sd) = 109 (121) IU/ml, range = 78–154, F-Ratio = 1.46, p = 0.17]. However, and despite the use of the same Szasz method, one automate (Dade Behring RXL) gave higher GGT mean values [165 (200) IU/ml] than the others [120 (143) IU/ml; p = 0.06].
Figure 3. Alanine aminotransferase (ALT) and γ-glutamyl transpeptidase (GGT) variability according to laboratory and units of enzymatic expression: international units (IU) and upper limit of normal (UNL).
Passing-Bablok linear regression analyses  of all samples between laboratories and the reference center are summarized in Table 2. The intercept and slope between the reference center and other laboratories were excellent for the three proteins with only one decrease for apolipoprotein A1 in a single center using the ADVIA analyzer. For total bilirubin, there was only one center with a higher slope. For the enzymes, there was more variability. For ALT, mean values were lower in centers not using pyridoxal phosphate. For GGT, centers using the RXL analyzer had a higher slope (greater than 1).
Table 2. Passing-Bablok analysis between laboratories and reference center (LAB 1) for each component
Concordance rates (kappa statistics) among laboratories are given in Table 3; all were statistically significant. When GGT and ALT were expressed in international units, FibroTest and ActiTest kappa statistics were all greater than 0.50 with only 0.8% cases (3 out of the 384 comparisons) with a discordance of more than one fibrosis stage. There was no discordance greater than one grade for ActiTest. In contrast, when GGT and ALT were expressed in ULN units, FibroTest and ActiTest kappa statistics were lower than 0.50 in 11 comparisons (out of the 16 comparisons versus the reference laboratory) with 5% of cases (21 out of the 384 comparisons) with a discordance of more than one fibrosis stage or greater than one activity grade.
Table 3. Concordance rates (kappa statistics) of laboratories with reference center (LAB 1), according to the expression of GGT and ALT activities
This study showed that the variability of FibroTest and ActiTest values, among nine different laboratories, was acceptable and without clinical consequences for the prediction of the stage of liver fibrosis or grade of activity. This finding is important since it confirms that those tests can be routinely computed from the results of the six individual components obtained by non-centralized measurements. Online assessment is available using the website http://www.biopredictive.com webcite. To guarantee the quality of this assessment, it was necessary to identify the factors associated with the variability of the six components.
This study confirms that the expression of ALT and GGT in multiples of the upper limit of reference values should be avoided. Despite efforts to standardize enzymatic assay methods, homogeneity of ALT results has not been achieved as attested by external quality controls , and identical limits of reference values cannot be defined. Many clinicians believe that expression of the results as multiples of the upper limits of reference ranges can reduce inter-laboratory variability. Our study confirms previously observed results , that this method of expression is, in fact, worse than that using international units both for ALT and GGT. In our reference center, the reference limit recorded was similar to the described mean value from a recent study  and lower than in the other laboratories. If ActiTest was expressed in a standardized way, using the upper limit of each laboratory for GGT and ALT, this induced lower concordance rate than ActiTest using international units.
To increase inter-laboratory coherence in the results of enzymatic activities, standardized assays against a reference method should be employed, with calibration of the assay using a commutable enzymatic material . The values of this calibrator must be assigned by a reference method. For proteins and bilirubin assays, there was an excellent homogeneity. This was anticipated for α2-macroglobulin since the same analyzer was used in all laboratories. Although the use of three different analyzers, haptoglobin has the best homogeneity. In fact, the assays of these two proteins are standardized against the CRM 470 reference material. This reference product is now used in different measurement procedures to attain results numerically the same, whatever the clinical conditions. For apolipoprotein A1, only one analyzer was slightly different from the others. This is due to the use of a different reference material to standardize the assay. Overall, the data from the laboratories were linearly related with the reference center with a slope close to 1 and a non-significant analytical imprecision; there were few pairs of assays outside the confidence limits and the samples were adequately distributed over the investigated range. As previously observed, when ordinary linear regression (in combination with correlation analysis in the Passing Bablock method) gave poor estimates, in particular for GGT and ALT assays, we found several analytical reasons for the poor performance. Enzymatic measurement with the Szasz method (standardized against the original for GGT), and with the standardized method according to the International Federation of Clinical Chemistry (using pyridoxal phosphate for ALT), would probably reduce the variability.
Because of their predictive values and their reproducibility in different populations, biochemical markers could be used as surrogate markers for liver biopsy both for the initial decision of liver biopsy and for the follow-up of chronic hepatitis C patients. To date, liver biopsy has been considered mandatory for the management of patients infected by hepatitis C virus (HCV) . For some patients and general practitioners, it may be considered an aggressive procedure . Reviews of morbidity and mortality of intercostal liver biopsy observed a mean occurrence of pain in 30 % of patients, 3 out of 1,000 endured severe adverse events, and 3 out of 10,000 died .
There is no ideal gold standard for the assessment of liver histology. Even liver biopsy is dependant on the inter- and intra-observer (pathologist) differences. There are also potential problems with liver biopsy sampling variation. In a study with three consecutive samples through a single entry site, only 50 % of patients with cirrhosis were scored as cirrhosis on the three samples . It is therefore possible that biochemical markers such as those described may provide a more accurate (quantitative and reproducible) picture of fibrogenic events occurring within the liver. Furthermore, and because treatment is now so effective in patients with genotype 2 or 3 infection, the utility of biopsy in this setting could be challenged .
When GGT and ALT are expressed in international units, FibroTest and ActiTest can be computed from different laboratories with acceptable variability. To increase inter-laboratory coherency, standardized methods and enzymatic calibration should be used particularly for GGT and ALT assays. Expression of ALT and GGT in multiples of the upper limit of reference values should be avoided.
The serum of 24 informed patients (21 with chronic hepatitis C and 3 with decompensated alcoholic cirrhosis) were prospectively collected in the Department of Hepato-Gastroenterology of the Pitié-Salpêtrière Hospital, in Paris, France. The main characteristics of the included patients are outlined in Table 1. Sera were separated in the above reference laboratory, conserved at + 4°C and distributed to ten different laboratories, in France, within 24 hours. For two laboratories, serum was missing for at least one patient; therefore, these laboratories have been excluded from the core analysis. Sensitivity analyses including these two excluded laboratories did not change the results or conclusions (data not shown).
Characteristics of the analyzer, reagents and analytical methods employed used in the nine included laboratories are detailed in Table 4. Eleven different analyzers were used. For the measurement of ALT activity, five laboratories used a standardized method according to the IFCC, with pyridoxal phosphate, and four without pyridoxal phosphate. For the measurement of GGT activity, the nine laboratories used the Szasz method; including in four a recommended method of standardization .
Table 4. Laboratory analyzers and biochemical methods
Haptoglobin and apolipoprotein A1 were assayed by immunoturbidimetric or immunonephelemetric methods. α2-macroglobulin was assayed by immunonephelemetry. Analytical measurements of α2-macroglobulin and haptoglobin were standardized against the certified international reference material 470 (CRM 470). Apolipoprotein A1 assays adapted on the different analyzers were standardized against the reference material of World Health Organization-International Federation of Clinical Chemistry SP1-01 (WHO-IFCC SP1-01), except on the Advia-Bayer-analyzer (ADVIA). Total bilirubin was assayed by diazoreactions methods.
Statistical analysis used multiple measure variance analyses and Passing-Bablok linear regression analyses for the comparison of inter-laboratory results, and kappa statistics for the predicted histological features. Multiple comparisons used Bonferroni (versus control) and Tukey-Kramer multiple-comparison tests. Number Cruncher Statistical Systems software was used . The linear relationship between laboratories and reference center were assessed with confidence limits for the slope and the intercept and the number of pairs out of bounds; they were used to determine whether there was only a chance difference between the slope and 1 and between the intercept and 0 . Means were expressed with standard deviation (sd), except for kappa statistics.
PH, FIB, RPM and TP elaborated the protocol and wrote the manuscript. PH, FIB, DM, GA, DB, PCL, GD, DD, TK, MS, DT and ET performed the assays. TP performed the statistical analysis.
Supported by grant from Association pour la Recherche sur les Maladies Hépatiques Virales.
Can J Gastroenterol 2000, 14:543-548. PubMed Abstract
Imbert-Bismut F, Ratziu V, Pieroni L, Charlotte F, Benhamou Y, Poynard T, for the MULTIVIRC group: Biochemical markers of liver fibrosis in patients with hepatitis C virus infection: a prospective study.
Poynard T, Imbert-Bismut F, Ratziu V, Chevret S, Jardel C, Moussalli J, Messous D, Degos F: Biochemical markers of liver fibrosis in patients infected by Hepatitis C Virus: Longitudinal validation in a randomized trial.
J Viral Hepatitis 2002, 9:128-133. Publisher Full Text
Myers RP, Benhamou Y, Imbert-Bismut F, Thibault V, Bochet M, Charlotte F, Ratziu V, Bricaire F, Katlama C, Poynard T: Serum biochemical markers accurately predict liver fibrosis in HIV and hepatitis C virus-coinfected patients.
Piton A, Poynard T, Imbert-Bismut F, Khalil L, Delattre J, Pelissier E, Sansonetti N, Opolon P: Factors associated with serum alanine transaminase activity in healthy subjects: consequences for the definition of normal values, for selection of blood donors, and for patients with chronic hepatitis C. MULTIVIRC Group.
Prati D, Taioli E, Zanella A, Della Torre E, Butelli S, Del Vecchio E, Vianello L, Zanusco F, Mozzi F, Milani S, Conte D, Colombo M, Sirchia G: Updated definitions of healthy ranges for serum alanine aminotransferase levels.
14th European Congress of Clinical Chemistry and Laboratory Medicine, Prague
May 26–31 2001
J Clin Chem Clin Biochem 1976, 14:421-427. PubMed Abstract
Bablok W, Passing H, Bender R, Schneider B: A general regression procedure for method transformation. Application of linear regression procedures for method comparison studies in clinical chemistry, Part III.
J Clin Chem Clin Biochem 1988, 26:783-790. PubMed Abstract
Manns MP, McHutchison JG, Gordon SC, Rustgi VK, Shiffman M, Reindollar R, Goodman ZD, Koury K, Ling M, Albrecht JK: Peg-Interferon alfa-2b in combination with ribavirin compared to interferon alfa-2b plus ribavirin for initial treatment of chronic hepatitis C.