Gewei's BlogArchive

KM Estimation Using SAS and Python in Jupyter Notebook


Introduction

SAS has taken another step to embrace open source by bringing SAS and Jupyter Notebook together. SAS coding in Jupyter Notebook is available in April for SAS Linux, and in July for SAS University Edition. I'll use Jupyter notebooks to compare the output of Kaplan-Meier (KM) survival estimatation using SAS and Python.


Background

SAS KM estimation is in PROC LIFETEST. We'll see it soon in the SAS notebook.

KMSurvial is an implementation of KM estimation in Python. It's a practical program for comparing survial probabilities qualitatively among groups. And it's also small, fast, and easy to use.

The reason for writing a new KM estimator is that some features I want are not available or flexible in other implementations as of early 2016. These features include:

  • Differentiate between the snapshot date and cutoff dates of a data set.
  • Support hierarchical strata.
  • Flexible combination of groups for comparisions.
  • Support multiple data input formats.
  • Users can easily get hazards and survival functions which can be piped into visualziaiton or further data processing.



We're going to first check quantative results of KMSurvival against SAS's -- a process of data loading, exploring, cleaning, and reshaping. Then we'll compare both qualitatively.

The rest of the post is written in Jupyter Notebook. There are two parts: one uses SAS; another Python.