DataFrame¶
Constructor¶
  | 
pandas-on-Spark DataFrame that corresponds to pandas DataFrame logically.  | 
Attributes and underlying data¶
The index (row labels) Column of the DataFrame.  | 
|
  | 
Print a concise summary of a DataFrame.  | 
The column labels of the DataFrame.  | 
|
Returns true if the current DataFrame is empty.  | 
Return the dtypes in the DataFrame.  | 
|
Return a tuple representing the dimensionality of the DataFrame.  | 
|
Return a list representing the axes of the DataFrame.  | 
|
Return an int representing the number of array dimensions.  | 
|
Return an int representing the number of elements in this object.  | 
|
  | 
Return a subset of the DataFrame’s columns based on the column dtypes.  | 
Return a Numpy representation of the DataFrame or the Series.  | 
Conversion¶
  | 
Make a copy of this object’s indices and data.  | 
Detects missing values for items in the current Dataframe.  | 
|
  | 
Cast a pandas-on-Spark object to a specified dtype   | 
Detects missing values for items in the current Dataframe.  | 
|
Detects non-missing values for items in the current Dataframe.  | 
|
Detects non-missing values for items in the current Dataframe.  | 
|
Return the bool of a single element in the current object.  | 
Indexing, iteration¶
Access a single value for a row/column label pair.  | 
|
Access a single value for a row/column pair by integer position.  | 
|
  | 
Return the first n rows.  | 
  | 
Return index of first occurrence of maximum over requested axis.  | 
  | 
Return index of first occurrence of minimum over requested axis.  | 
Access a group of rows and columns by label(s) or a boolean Series.  | 
|
Purely integer-location based indexing for selection by position.  | 
|
  | 
Insert column into DataFrame at specified location.  | 
Iterator over (column name, Series) pairs.  | 
|
This is an alias of   | 
|
Iterate over DataFrame rows as (index, Series) pairs.  | 
|
  | 
Iterate over DataFrame rows as namedtuples.  | 
Return alias for columns.  | 
|
  | 
Return item and drop from frame.  | 
  | 
Return the last n rows.  | 
  | 
Return cross-section from the DataFrame.  | 
  | 
Get item from object for given key (DataFrame column, Panel slice, etc.).  | 
  | 
Replace values where the condition is False.  | 
  | 
Replace values where the condition is True.  | 
  | 
Query the columns of a DataFrame with a boolean expression.  | 
Binary operator functions¶
  | 
Get Addition of dataframe and other, element-wise (binary operator +).  | 
  | 
Get Addition of dataframe and other, element-wise (binary operator +).  | 
  | 
Get Floating division of dataframe and other, element-wise (binary operator /).  | 
  | 
Get Floating division of dataframe and other, element-wise (binary operator /).  | 
  | 
Get Floating division of dataframe and other, element-wise (binary operator /).  | 
  | 
Get Floating division of dataframe and other, element-wise (binary operator /).  | 
  | 
Get Multiplication of dataframe and other, element-wise (binary operator *).  | 
  | 
Get Multiplication of dataframe and other, element-wise (binary operator *).  | 
  | 
Get Subtraction of dataframe and other, element-wise (binary operator -).  | 
  | 
Get Subtraction of dataframe and other, element-wise (binary operator -).  | 
  | 
Get Exponential power of series of dataframe and other, element-wise (binary operator **).  | 
  | 
Get Exponential power of dataframe and other, element-wise (binary operator **).  | 
  | 
Get Modulo of dataframe and other, element-wise (binary operator %).  | 
  | 
Get Modulo of dataframe and other, element-wise (binary operator %).  | 
  | 
Get Integer division of dataframe and other, element-wise (binary operator //).  | 
  | 
Get Integer division of dataframe and other, element-wise (binary operator //).  | 
  | 
Compare if the current value is less than the other.  | 
  | 
Compare if the current value is greater than the other.  | 
  | 
Compare if the current value is less than or equal to the other.  | 
  | 
Compare if the current value is greater than or equal to the other.  | 
  | 
Compare if the current value is not equal to the other.  | 
  | 
Compare if the current value is equal to the other.  | 
  | 
Compute the matrix multiplication between the DataFrame and others.  | 
  | 
Update null elements with value in the same location in other.  | 
Function application, GroupBy & Window¶
  | 
Apply a function along an axis of the DataFrame.  | 
  | 
Apply a function to a Dataframe elementwise.  | 
  | 
Apply func(self, *args, **kwargs).  | 
  | 
Aggregate using one or more operations over the specified axis.  | 
  | 
Aggregate using one or more operations over the specified axis.  | 
  | 
Group DataFrame or Series using one or more columns.  | 
  | 
Provide rolling transformations.  | 
  | 
Provide expanding transformations.  | 
  | 
Call   | 
Computations / Descriptive Stats¶
Return a Series/DataFrame with absolute numeric value of each element.  | 
|
  | 
Return whether all elements are True.  | 
  | 
Return whether any element is True.  | 
  | 
Trim values at input threshold(s).  | 
  | 
Compute pairwise correlation of columns, excluding NA/null values.  | 
  | 
Compute pairwise correlation.  | 
  | 
Count non-NA cells for each column.  | 
  | 
Compute pairwise covariance of columns, excluding NA/null values.  | 
  | 
Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding   | 
  | 
Provide exponentially weighted window transformations.  | 
  | 
Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).  | 
  | 
Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).  | 
  | 
Return the mean absolute deviation of values.  | 
  | 
Return the maximum of the values.  | 
  | 
Return the mean of the values.  | 
  | 
Return the minimum of the values.  | 
  | 
Return the median of the values for the requested axis.  | 
  | 
Get the mode(s) of each element along the selected axis.  | 
  | 
Percentage change between the current and a prior element.  | 
  | 
Return the product of the values.  | 
  | 
Return the product of the values.  | 
  | 
Return value at the given quantile.  | 
  | 
Compute numerical data ranks (1 through n) along axis.  | 
  | 
Return number of unique elements in the object.  | 
  | 
Return unbiased standard error of the mean over requested axis.  | 
  | 
Return unbiased skew normalized by N-1.  | 
  | 
Return the sum of the values.  | 
  | 
Return sample standard deviation.  | 
  | 
Return unbiased variance.  | 
  | 
Return cumulative minimum over a DataFrame or Series axis.  | 
  | 
Return cumulative maximum over a DataFrame or Series axis.  | 
  | 
Return cumulative sum over a DataFrame or Series axis.  | 
  | 
Return cumulative product over a DataFrame or Series axis.  | 
  | 
Round a DataFrame to a variable number of decimal places.  | 
  | 
First discrete difference of element.  | 
  | 
Evaluate a string describing operations on DataFrame columns.  | 
Reindexing / Selection / Label manipulation¶
  | 
Prefix labels with string prefix.  | 
  | 
Suffix labels with string suffix.  | 
  | 
Align two objects on their axes with the specified join method.  | 
  | 
Select values at particular time of day (example: 9:30AM).  | 
  | 
Select values between particular times of the day (example: 9:00-9:30 AM).  | 
  | 
Drop specified labels from columns.  | 
  | 
Return DataFrame with requested index / column level(s) removed.  | 
  | 
Return DataFrame with duplicate rows removed, optionally only considering certain columns.  | 
  | 
Return boolean Series denoting duplicate rows, optionally only considering certain columns.  | 
  | 
Compare if the current value is equal to the other.  | 
  | 
Subset rows or columns of dataframe according to labels in the specified index.  | 
  | 
Select first periods of time series data based on a date offset.  | 
  | 
Return the first n rows.  | 
  | 
Select final periods of time series data based on a date offset.  | 
  | 
Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.  | 
  | 
Return a DataFrame with matching indices as other object.  | 
  | 
Alter axes labels.  | 
  | 
Set the name of the axis for the index or columns.  | 
  | 
Reset the index, or a level of it.  | 
  | 
Set the DataFrame index (row labels) using one or more existing columns.  | 
  | 
Interchange axes and swap values axes appropriately.  | 
  | 
Swap levels i and j in a MultiIndex on a particular axis.  | 
  | 
Return the elements in the given positional indices along an axis.  | 
  | 
Whether each element in the DataFrame is contained in values.  | 
  | 
Return a random sample of items from an axis of object.  | 
  | 
Truncate a Series or DataFrame before and after some index value.  | 
Missing data handling¶
  | 
Synonym for DataFrame.fillna() or Series.fillna() with   | 
  | 
Remove missing values.  | 
  | 
Fill NA/NaN values.  | 
  | 
Returns a new DataFrame replacing a value with another value.  | 
  | 
Synonym for DataFrame.fillna() or Series.fillna() with   | 
  | 
Synonym for DataFrame.fillna() or Series.fillna() with   | 
  | 
Fill NaN values using an interpolation method.  | 
  | 
Synonym for DataFrame.fillna() or Series.fillna() with   | 
Reshaping, sorting, transposing¶
  | 
Create a spreadsheet-style pivot table as a DataFrame.  | 
  | 
Return reshaped DataFrame organized by given index / column values.  | 
  | 
Sort object by labels (along an axis)  | 
  | 
Sort by the values along either axis.  | 
  | 
Return the first n rows ordered by columns in descending order.  | 
  | 
Return the first n rows ordered by columns in ascending order.  | 
Stack the prescribed level(s) from columns to index.  | 
|
Pivot the (necessarily hierarchical) index labels.  | 
|
  | 
Unpivot a DataFrame from wide format to long format, optionally leaving identifier variables set.  | 
  | 
Transform each element of a list-like to a row, replicating index values.  | 
  | 
Squeeze 1 dimensional axis objects into scalars.  | 
Transpose index and columns.  | 
|
Transpose index and columns.  | 
Combining / joining / merging¶
  | 
Append rows of other to the end of caller, returning a new object.  | 
  | 
Assign new columns to a DataFrame.  | 
  | 
Merge DataFrame objects with a database-style join.  | 
  | 
Join columns of another DataFrame.  | 
  | 
Modify in place using non-NA values from another DataFrame.  | 
Serialization / IO / Conversion¶
  | 
Construct DataFrame from dict of array-like or dicts.  | 
  | 
Convert structured or recorded ndarray to DataFrame.  | 
  | 
Write the DataFrame into a Spark table.  | 
  | 
Write the DataFrame out as a Delta Lake table.  | 
  | 
Write the DataFrame out as a Parquet file or directory.  | 
  | 
Write the DataFrame out to a Spark data source.  | 
  | 
Write object to a comma-separated values (csv) file.  | 
  | 
Write a DataFrame to the ORC format.  | 
Return a pandas DataFrame.  | 
|
  | 
Render a DataFrame as an HTML table.  | 
A NumPy ndarray representing the values in this DataFrame or Series.  | 
|
  | 
Spark related features.  | 
  | 
Render a DataFrame to a console-friendly tabular output.  | 
  | 
Convert the object to a JSON string.  | 
  | 
Convert the DataFrame to a dictionary.  | 
  | 
Write object to an Excel sheet.  | 
  | 
Copy object to the system clipboard.  | 
  | 
Print Series or DataFrame in Markdown-friendly format.  | 
  | 
Convert DataFrame to a NumPy record array.  | 
  | 
Render an object to a LaTeX tabular environment table.  | 
Property returning a Styler object containing methods for building a styled HTML representation for the DataFrame.  | 
Plotting¶
DataFrame.plot is both a callable method and a namespace attribute for
specific plotting methods of the form DataFrame.plot.<kind>.
alias of   | 
|
  | 
Draw a stacked area plot.  | 
  | 
Make a horizontal bar plot.  | 
  | 
Vertical bar plot.  | 
  | 
Draw one histogram of the DataFrame’s columns.  | 
  | 
Make a box plot of the Series columns.  | 
  | 
Plot DataFrame/Series as lines.  | 
  | 
Generate a pie plot.  | 
  | 
Create a scatter plot with varying marker point size and color.  | 
  | 
Generate Kernel Density Estimate plot using Gaussian kernels.  | 
  | 
Draw one histogram of the DataFrame’s columns.  | 
  | 
Make a box plot of the Series columns.  | 
  | 
Generate Kernel Density Estimate plot using Gaussian kernels.  | 
Pandas-on-Spark specific¶
DataFrame.pandas_on_spark provides pandas-on-Spark specific features that exists only in pandas API on Spark.
These can be accessed by DataFrame.pandas_on_spark.<function/property>.
Apply a function that takes pandas DataFrame and outputs pandas DataFrame.  | 
|
Transform chunks with a function that takes pandas DataFrame and outputs pandas DataFrame.  |