Accurate knowledge of crop type information is not only valuable for verifying the declaration of farmers to obtain subsidy or insurance for the grown crop, but also for generating crop type maps that serve a variety of purposes in land monitoring and policy. On the other hand, accurate knowledge of crop phenological stage can help farm personnel apply fertilization and irrigation regimes on a timely basis. Although deep learning based networks have been applied in the past to classify the type and predict the phenological stage of crops from in situ images of fields, more advanced deep learning based networks, that learn and make such inferences from temporal windows of sequences of field images taken by cameras at stationary coordinates and looking directions, have not been reported to date. This work proposes a conceivable architecture for learning and making inferences from such data. Specifically, the feature vectors of the images in a temporal window of the image sequence for a crop cycle are extracted by a first stage deep convolutional neural network and their temporal dependencies are exploited by a second stage recurrent neural network. Experiments on a dataset of image sequences from 63 fields of 5 different types of crops reveal that the proposed system can achieve over 80% accuracy in crop type classification and under 0.5 mean absolute error in phenological stage number estimation. The learning performances improve with the size of the temporal window and the fine-tuning of the deep convolutional neural network used for feature extraction. The performances achieved with the proposed system are superior to those obtained by applying classical machine learning methods to handcrafted texture and color features.